Hi guys,

I want to create a crawler to extract some infomation from a page.
The problem is that it is written with the Java Wicket Framework and I don't know how to scrape informations from it because I don't know how to submit some post parameters.

Is this possible to do ? :)
Thank you.

You can post to a page with cUrl.

You can post to a page with cUrl.

pritaeas, thank you but how do I set the post methods for example to this link ?

The input field name is wmcCif:cif

https://portal.onrc.ro/ONRCPortalWeb/appmanager/myONRC/wicket/?wicket:interface=:9:1:::

There are the POST parameters viewed with Firebug.

Parametersapplication/x-www-form-urlencoded
cautare x
criteriu    filtru.cif
wmcCif:cif  1757980
Source
cautare=x&criteriu=filtru.cif&wmcCif%3Acif=1757980

And this is the form from the page

<form id="idTestWicket03__1__2" method="post" action="https://portal.onrc.ro:443/ONRCPortalWeb/appmanager/myONRC/wicket?_nfpb=true&_windowLabel=TestWicket03_1&_urlType=action&wlpTestWicket03_1__wu=%2FONRCPortalWeb%2Fappmanager%2FmyONRC%2Fwicket%2F%3Fwicket%3Ainterface%3D%3A4%3AformCautare%3A1%3AIFormSubmitListener%3A%3A"><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input type="hidden" name="idTestWicket03__1__2_hf_0" id="idTestWicket03__1__2_hf_0" /></div>
                            <table cellpadding="5">
                                <tr>
                                    <td width="150">
                                        Cautare dupa: 
                                    </td>
                                    <td>
                                        <select class="select_357" onchange="document.getElementById('idTestWicket03__1__2_hf_0').value='/ONRCPortalWeb/appmanager/myONRC/wicket/?wicket:interface=:4:formCautare:criteriu:1:IOnChangeListener::';document.getElementById('idTestWicket03__1__2').submit();" name="criteriu">
                                        <option value="filtru.buletin">Nr. de buletin</option>
                                        <option value="filtru.persoana">Persoana publicata în BPI</option>
                                        <option selected="selected" value="filtru.cif">CIF</option>
                                        <option value="filtru.reg">Nr. de ordine în Registru</option>
                                        <option value="filtru.dosar">Nr. dosar</option>
                                        <option value="filtru.interval">Interval de publicare</option>
                                        </select>
                                    </td>
                                </tr>
                            </table>
                            <div>
                                <table cellpadding="5">
                                    <tr>
                                        <td width="150">
                                            CIF: 
                                        </td>
                                        <td>
                                            <input class="input_142" type="text" value="1757980" name="wmcCif:cif"/>
                                        </td>
                                    </tr>
                                </table>                        
                            </div>
                            <div>
                            <button class="submit" type="submit" onclick="var e=document.getElementById('idTestWicket03__1__2_hf_0'); e.name='cautare'; e.value='x';var f=document.getElementById('idTestWicket03__1__2');var ff=f;if (ff.onsubmit != undefined) { if (ff.onsubmit()==false) return false; }f.submit();e.value='';e.name='';return false;"><span>Cauta</span></button>
                            </div>
                        </form>

First user comment on the page I linked.

The problem is way harder than this, I'm familiarized with CURL but this one is giving me a hard time :(
I am successfully logging in but when it comes to extract certain data, it is not working.

it is not working

Can you be more specific?

I've tried different methods to login, succeded with DOM but not with CURL for this particular page.

function Login($data = "") {
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL,"https://portal.onrc.ro/ONRCPortalWeb/appmanager/myONRC/public?_nfpb=true&_pageLabel=login");
    curl_setopt($ch, CURLOPT_HEADER, FALSE);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS,http_build_query(array(
        "j_username"  => $data['j_username'],
        "j_password"  => $data['j_password']
    )));
    curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);


    $server_output = curl_exec ($ch);


    return $server_output;        
} 

I dont know if it's not based on some ajax calls that are encrypted....

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.