0

Hi guys,

I want to create a crawler to extract some infomation from a page.
The problem is that it is written with the Java Wicket Framework and I don't know how to scrape informations from it because I don't know how to submit some post parameters.

Is this possible to do ? :)
Thank you.

2
Contributors
6
Replies
29
Views
4 Years
Discussion Span
Last Post by Szabi Zsoldos
0

You can post to a page with cUrl.

pritaeas, thank you but how do I set the post methods for example to this link ?

The input field name is wmcCif:cif

https://portal.onrc.ro/ONRCPortalWeb/appmanager/myONRC/wicket/?wicket:interface=:9:1:::

There are the POST parameters viewed with Firebug.

Parametersapplication/x-www-form-urlencoded
cautare x
criteriu    filtru.cif
wmcCif:cif  1757980
Source
cautare=x&criteriu=filtru.cif&wmcCif%3Acif=1757980

And this is the form from the page

<form id="idTestWicket03__1__2" method="post" action="https://portal.onrc.ro:443/ONRCPortalWeb/appmanager/myONRC/wicket?_nfpb=true&_windowLabel=TestWicket03_1&_urlType=action&wlpTestWicket03_1__wu=%2FONRCPortalWeb%2Fappmanager%2FmyONRC%2Fwicket%2F%3Fwicket%3Ainterface%3D%3A4%3AformCautare%3A1%3AIFormSubmitListener%3A%3A"><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input type="hidden" name="idTestWicket03__1__2_hf_0" id="idTestWicket03__1__2_hf_0" /></div>
                            <table cellpadding="5">
                                <tr>
                                    <td width="150">
                                        Cautare dupa: 
                                    </td>
                                    <td>
                                        <select class="select_357" onchange="document.getElementById('idTestWicket03__1__2_hf_0').value='/ONRCPortalWeb/appmanager/myONRC/wicket/?wicket:interface=:4:formCautare:criteriu:1:IOnChangeListener::';document.getElementById('idTestWicket03__1__2').submit();" name="criteriu">
                                        <option value="filtru.buletin">Nr. de buletin</option>
                                        <option value="filtru.persoana">Persoana publicata în BPI</option>
                                        <option selected="selected" value="filtru.cif">CIF</option>
                                        <option value="filtru.reg">Nr. de ordine în Registru</option>
                                        <option value="filtru.dosar">Nr. dosar</option>
                                        <option value="filtru.interval">Interval de publicare</option>
                                        </select>
                                    </td>
                                </tr>
                            </table>
                            <div>
                                <table cellpadding="5">
                                    <tr>
                                        <td width="150">
                                            CIF: 
                                        </td>
                                        <td>
                                            <input class="input_142" type="text" value="1757980" name="wmcCif:cif"/>
                                        </td>
                                    </tr>
                                </table>                        
                            </div>
                            <div>
                            <button class="submit" type="submit" onclick="var e=document.getElementById('idTestWicket03__1__2_hf_0'); e.name='cautare'; e.value='x';var f=document.getElementById('idTestWicket03__1__2');var ff=f;if (ff.onsubmit != undefined) { if (ff.onsubmit()==false) return false; }f.submit();e.value='';e.name='';return false;"><span>Cauta</span></button>
                            </div>
                        </form>

Edited by Szabi Zsoldos

0

The problem is way harder than this, I'm familiarized with CURL but this one is giving me a hard time :(
I am successfully logging in but when it comes to extract certain data, it is not working.

0

I've tried different methods to login, succeded with DOM but not with CURL for this particular page.

function Login($data = "") {
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL,"https://portal.onrc.ro/ONRCPortalWeb/appmanager/myONRC/public?_nfpb=true&_pageLabel=login");
    curl_setopt($ch, CURLOPT_HEADER, FALSE);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS,http_build_query(array(
        "j_username"  => $data['j_username'],
        "j_password"  => $data['j_password']
    )));
    curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);


    $server_output = curl_exec ($ch);


    return $server_output;        
} 

I dont know if it's not based on some ajax calls that are encrypted....

Edited by Szabi Zsoldos

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.