I`m trying to write a web crawler but i `don`t know how to handle the situation when i have to login to acces a site. How to type in that login form from my java program? I`ve searched on the net but the results points to JSP. I haven`t worked with JSP before, isn`t there a simplier method to complete those user and password HTML fields ??

Recommended Answers

All 10 Replies

search google for 'curl', that should get you started...

You are talking about JavacURL right ? the cURL library for Java, well, I tried to use that but the documentation from there http://curl.haxx.se/libcurl/ is made for C. In the downloaded packets of JavacURL i found just some instructions for installing that library in the Eclipse editor and a javadoc with few classes and methods but with lots of fields; :confused::confused::confused:I don`t know what fields I should use in my program ....Isn`t there another way to resolv my problem except this stupid JavacURL ? I`ve lost hours :(tonight fiding a way to make use this.. If there isn`t another solution, I`ll try again in the morning, but I hope you guys will gave some usefull tips about JavacURL

You can call the .exe or shell script with some arguments...

Thats the way I do it... Curls saves the results in text files and then I analyse those files... followed by a new call to curl...

I`m trying to write a web crawler but i `don`t know how to handle the situation when i have to login to acces a site. How to type in that login form from my java program? I`ve searched on the net but the results points to JSP. I haven`t worked with JSP before, isn`t there a simplier method to complete those user and password HTML fields ??

HttpUrlConnection as a POST request. See http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/2006-10/msg01212.html for an example (found in 20 seconds using Yahoo). It is for HttpsUrlConnection, rather than HttpUrlConnection, so strip out the SSL stuff, otherwise, it should be the same.

In summary for you, read the form, finding the different parameters (including the hidden ones), then, using those parameter names, create a POST response and connect again.

Hopefully, you are familiar with the forms already, as I would not count on being able to create some generic login form manipulator.

Edit: And, yes, the code is not that good, and he is complaining that it is not working over a proxy, but it is at least a starting point for you.

A nice handy help for doing network thingies from Java is Apache Commons Net (and/or Apache Commons HttpClient).

i`ve just found out that i need session authentication(this is the tupe of authentication for trackers), does that mean that i have to work with cookies ? uhhh, cookies are new for me :(

Yes indeed.

With curl for exemple, you login. Then you get a session coockie. That one you write to a file and pass it on to all your later curl calls...

I am using URL Connection to login to the site http://sms.vn/send/login.jsp but it just returns the same login page again.
My program will do like this: it opens an url connection to get the login form from the site, fill in the user name and password, ask for user to type captcha and send the POST request to the site with those information. My code works for another site like:
http://www.1your.com/drupal/sample_login.php
I am so sure that my user name and password are right, you can try it, it is free and will not cause much harm for me

import java.io.IOException;
import java.io.PrintStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 *
 * @author Duy Khang
 */
public class NewClass2 {

    private static String LOGIN_INPUT = "login";
    private static String LOGIN_VALUE = "LOGIN";
    private static String NICK_INPUT = "nick";
    private static String NICK_VALUE = "vodkhang";
    private static String PASS_INPUT = "pass";
    private static String PASS_VALUE = "vhnkozou";
    private static String CHECK_CODE_INPUT = "checkcode";
    private static String CHECK_CODE_VALUE = "";

    public static void main(String[] args) {
        try {
            URL url = new URL("http://sms.vn/send/login.jsp");
            URLConnection connection = url.openConnection();
            connection.setDoOutput(true);

            Scanner sc = new Scanner(connection.getInputStream());
            while (sc.hasNextLine()) {
                System.out.println(sc.nextLine());
            }
            Scanner sc2 = new Scanner(System.in);
            System.out.println("enter the check code value: ");
            CHECK_CODE_VALUE = sc2.nextLine();

            String encodedLoginUserName = URLEncoder.encode(NICK_VALUE, "UTF-8");
            String encodedLoginPassword = URLEncoder.encode(PASS_VALUE, "UTF-8");


            URLConnection connection1 = url.openConnection();
            connection1.setDoOutput(true);
            PrintStream output = new PrintStream(connection1.getOutputStream());

            StringBuilder request = new StringBuilder();
            request.append(LOGIN_INPUT);
            request.append("=");
            request.append(LOGIN_VALUE);
            request.append("&");

            request.append(NICK_INPUT);
            request.append("=");
            request.append(encodedLoginUserName);
            request.append("&");

            request.append(PASS_INPUT);
            request.append("=");
            request.append(encodedLoginPassword);
            request.append("&");

            request.append(CHECK_CODE_INPUT);
            request.append("=");
            request.append(CHECK_CODE_VALUE);
            System.out.println("request: " + request);

            output.println(request);
            System.out.println("----------------------------------------------------");
            Scanner sc3 = new Scanner(connection1.getInputStream());
            while (sc3.hasNextLine()) {
                System.out.println(sc3.nextLine());
            }
        } catch (MalformedURLException ex) {
            Logger.getLogger(NewClass1.class.getName()).log(Level.SEVERE, null, ex);
        } catch (IOException ex) {
            Logger.getLogger(NewClass1.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

I am using URL Connection to login to the sitehttp://sms.vn/send/login.jsp but it just returns the same login page again.
My program will do like this: it opens an url connection to get the login form from the site, fill in the user name and password, ask for user to type captcha and send the POST request to the site with those information. My code works for another site like:
http://www.1your.com/drupal/sample_login.php
I am so sure that my user name and password are right, you can try it, it is free and will not cause much harm for me

import java.io.IOException;
import java.io.PrintStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 *
 * @author Duy Khang
 */
public class NewClass2 {

    private static String LOGIN_INPUT = "login";
    private static String LOGIN_VALUE = "LOGIN";
    private static String NICK_INPUT = "nick";
    private static String NICK_VALUE = "vodkhang";
    private static String PASS_INPUT = "pass";
    private static String PASS_VALUE = "xxxxx";
    private static String CHECK_CODE_INPUT = "checkcode";
    private static String CHECK_CODE_VALUE = "";

    public static void main(String[] args) {
        try {
            URL url = new URL("http://sms.vn/send/login.jsp");
            URLConnection connection = url.openConnection();
            connection.setDoOutput(true);

            Scanner sc = new Scanner(connection.getInputStream());
            while (sc.hasNextLine()) {
                System.out.println(sc.nextLine());
            }
            Scanner sc2 = new Scanner(System.in);
            System.out.println("enter the check code value: ");
            CHECK_CODE_VALUE = sc2.nextLine();

            String encodedLoginUserName = URLEncoder.encode(NICK_VALUE, "UTF-8");
            String encodedLoginPassword = URLEncoder.encode(PASS_VALUE, "UTF-8");


            URLConnection connection1 = url.openConnection();
            connection1.setDoOutput(true);
            PrintStream output = new PrintStream(connection1.getOutputStream());

            StringBuilder request = new StringBuilder();
            request.append(LOGIN_INPUT);
            request.append("=");
            request.append(LOGIN_VALUE);
            request.append("&");

            request.append(NICK_INPUT);
            request.append("=");
            request.append(encodedLoginUserName);
            request.append("&");

            request.append(PASS_INPUT);
            request.append("=");
            request.append(encodedLoginPassword);
            request.append("&");

            request.append(CHECK_CODE_INPUT);
            request.append("=");
            request.append(CHECK_CODE_VALUE);
            System.out.println("request: " + request);

            output.println(request);
            System.out.println("----------------------------------------------------");
            Scanner sc3 = new Scanner(connection1.getInputStream());
            while (sc3.hasNextLine()) {
                System.out.println(sc3.nextLine());
            }
        } catch (MalformedURLException ex) {
            Logger.getLogger(NewClass1.class.getName()).log(Level.SEVERE, null, ex);
        } catch (IOException ex) {
            Logger.getLogger(NewClass1.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

I am using URL Connection to login to the site
http://sms.vn/send/login.jsp but it just returns the same login page again.
My program will do like this: it opens an url connection to get the login form from the site, fill in the user name and password, ask for user to type captcha and send the POST request to the site with those information. My code works for another site like:
http://www.1your.com/drupal/sample_login.php
I am so sure that my user name and password are right, you can try it, it is free and will not cause much harm for me

import java.io.IOException;
import java.io.PrintStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 *
 * @author Duy Khang
 */
public class NewClass2 {

    private static String LOGIN_INPUT = "login";
    private static String LOGIN_VALUE = "LOGIN";
    private static String NICK_INPUT = "nick";
    private static String NICK_VALUE = "vodkhang";
    private static String PASS_INPUT = "pass";
    private static String PASS_VALUE = "vhnkozou";
    private static String CHECK_CODE_INPUT = "checkcode";
    private static String CHECK_CODE_VALUE = "";

    public static void main(String[] args) {
        try {
            URL url = new URL("http://sms.vn/send/login.jsp");
            URLConnection connection = url.openConnection();
            connection.setDoOutput(true);

            Scanner sc = new Scanner(connection.getInputStream());
            while (sc.hasNextLine()) {
                System.out.println(sc.nextLine());
            }
            Scanner sc2 = new Scanner(System.in);
            System.out.println("enter the check code value: ");
            CHECK_CODE_VALUE = sc2.nextLine();

            String encodedLoginUserName = URLEncoder.encode(NICK_VALUE, "UTF-8");
            String encodedLoginPassword = URLEncoder.encode(PASS_VALUE, "UTF-8");


            URLConnection connection1 = url.openConnection();
            connection1.setDoOutput(true);
            PrintStream output = new PrintStream(connection1.getOutputStream());

            StringBuilder request = new StringBuilder();
            request.append(LOGIN_INPUT);
            request.append("=");
            request.append(LOGIN_VALUE);
            request.append("&");

            request.append(NICK_INPUT);
            request.append("=");
            request.append(encodedLoginUserName);
            request.append("&");

            request.append(PASS_INPUT);
            request.append("=");
            request.append(encodedLoginPassword);
            request.append("&");

            request.append(CHECK_CODE_INPUT);
            request.append("=");
            request.append(CHECK_CODE_VALUE);
            System.out.println("request: " + request);

            output.println(request);
            System.out.println("----------------------------------------------------");
            Scanner sc3 = new Scanner(connection1.getInputStream());
            while (sc3.hasNextLine()) {
                System.out.println(sc3.nextLine());
            }
        } catch (MalformedURLException ex) {
            Logger.getLogger(NewClass1.class.getName()).log(Level.SEVERE, null, ex);
        } catch (IOException ex) {
            Logger.getLogger(NewClass1.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.