0

Dajia hao, daniwebians! Just a quick question, is it possible to focus on a textbox on an externally loaded website?

My goal is to load an external website, focus on a textfield, auto fill it with the current date, then submit it to then run my newly created scraper tool on the resulting page.

Any ideas? I've done a bit of research and will continue to do so...

Edited by PsychicTide

2
Contributors
9
Replies
139
Views
2 Years
Discussion Span
Last Post by PsychicTide
1

You can use selenium script to get that text field you want.
Selenium find element you want by ID.
Then you can insert value there and with selenium you can "click" on button in same way using finding by ID.
Internet is full of tutorials about selenium, this is main site: http://www.seleniumhq.org/

Edited by milil

Votes + Comments
Thanks for the IDE!
0

I've downloaded the firefox plugin, very interesting IDE. I can manage to make it open a website, fill in the current date, and hit submit. User recording works very well... source tab makes it easy for tweaking.

Thank you very much! saved me hours of research!

Edited by PsychicTide

0

Seems to be very well documented... Would it be possible to run PHP in the source tab on the finally returned page via this framework?

Edited by PsychicTide

0

Hmm I didn't work that much with selenium but I think in some way it would be possible. If you want to start php script you can make selenium to open your localhost server and start it when you finished whole process of inserting data in website.
Just make selenium script that will open link http://localhost:8080/something/script.php and script will start when you finish doing what you need.
Maybe I didn't understand what exacly do you need to do, but I hope selenium can do that. :)

0

ok, so figured I would just pass the URL parameters to my own file/url on my host server at godaddy.com for scraping after I've aquired the resulting source URL.

Doing this works fine if I manually type in the parameters in my PHP file, however, in order to scrape the website URL I want using Selenium, I have to copy the hidden '__VIEWSTATE' and '__EVENTVALIDATION' variables (asp.net) as a URL input parameter... the problem becomes (I think), this URL variable string becomes entirely too large to process and throws a 500 error resulting in the page not loading

Would it be possible to increase the max length I can input?, could I use some other means of passing this vairable (use an iframe to pass in the background instead of URL)?, cookies? could I perhaps host the file on my localhost and avoid this problem?, or is this even the problem?

UPDATE: I attempted to manually type in this URL

Edited by PsychicTide

0

Now that I think about it, I will have to manually open firefox, then the IDE, then load and run this script every time. Is anyone familiar with how to use Selenium and it's web-driver or RC form (whichever i need)? To then use a PHP formatter output (phpunit?) and set it as a task in a .bat file.

Maybe I could skip this browser URL passing thing in general using localhost stuff.

Edited by PsychicTide

0

So I've attempted to convert this all to a single php script (using simple html dom parser functions here... http://simplehtmldom.sourceforge.net/).

I try to get the '__VIEWSTATE' and '__EVENTVALIDATION' via the hidden dom inputs...

//get viewstate and validation varibales
    $seed = 'http://something.com/search.aspx';
    $arrayNodes = web_scrape($seed);

    function web_scrape($url)
    {
        $data = file_get_html($url);
        $nodes = $data->find("input[type=hidden]");

        return $nodes;
    }
    $viewTemp = str_replace("<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="","", $arrayNodes[0]);
    $viewState = substr($viewTemp, 0, -4);
    $evalTemp = str_replace("<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="","", $arrayNodes[1]);
    $validation = substr($evalTemp, 0, -4);

    ...

    $html = file_get_html('http://something.com/index.php/?PARAMETERS');

Call to a member function find() on a non-object in *.php on line 34

Line 34, which works if I have the correct viewstate/eventval...

foreach($html->find('.class') as $itemContainer)

This seems to be the problem. I do get a resulting variable, but it doesn't work. If I physically open my browser, then navigate to this website, then manually set the PHP variables I've found in my source, then run my script, it works. Is there a native PHP way to create a new 'user' viewstate/eventval as if I were a user?

... still exploring selenium and free automation software like hudson continuous integration, but any ideas are much appreciated! As always, will share my findings.

Edited by PsychicTide

0

Update: I've echo'd the viewstate variable just before my call to file_get_html() and the variable is the entire thing, but when I run the script it gives me a warning on that call and cuts it off about halfway (is this just the max length of a wanring string?). I can manually type the string into the get function with no problem, but when I try to put it in a variable and use that variable in the get call, it breaks just saying warning and spits out half of my url (which leads to a fatal error, doesn't return my object request)... very confusing.

file_get_html('http://something.com/search.aspx?__VIEWSTATE=VERYLONGSTRING');
works, but

file_get_html('http://something.com/search.aspx?__VIEWSTATE='.$viewState);
does not

echo $viewState
Shows the entire string I need

The warning points to simple_html_dom on line 76
$use_include_path = false, $context=null, $offset = -1, $url is my sent var from file_get_html...

Line 76:   $contents = file_get_contents($url, $use_include_path, $context, $offset);

Edited by PsychicTide

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.