943,072 Members | Top Members by Rank

Ad:
  • VB.NET Discussion Thread
  • Unsolved
  • Views: 3600
  • VB.NET RSS
Jan 2nd, 2010
0

How to read/extract text data from a web page?

Expand Post »
I am trying to build a VB.NET 2005 windows app which gets cars info from a webpage(webapplication) that has a username and password.
I was able to programmatically login to this webpage(by automatically populating the input boxes using webbrowser control) And after I logged in, I could view the cars data in browser and I did "View Source" but the cars data (such as car model, brand, color etc..) were not viewable in the page source code. So how can I read these data with my application?
I hope my question was clear and literally need some help.
Thanks
Similar Threads
Reputation Points: 10
Solved Threads: 0
Newbie Poster
jad2010 is offline Offline
2 posts
since Jan 2010
Jan 4th, 2010
0
Re: How to read/extract text data from a web page?
I think you are looking for html parser - Html Agility Pack..
Moderator
Reputation Points: 2134
Solved Threads: 1227
Posting Genius
adatapost is offline Offline
6,524 posts
since Oct 2008
Jan 5th, 2010
0
Re: How to read/extract text data from a web page?
The data did not appear in source code at all so I had to install a firefox addon called "Web developer"
and I clicked "View generated source code" and I could see the data there.
In my case the cars data are being shown in a dynamic table, 1 row adds up as soon as a new car comes.
Here is 1 row of data:

html Syntax (Toggle Plain Text)
  1. <td class="x-grid3-col x-grid3-cell x-grid3-td-dealer x-grid3-cell-first " style="width: 78px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-dealer" unselectable="on">Privat</div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-manufacturer " style="width: 98px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-manufacturer" unselectable="on">AUDI</div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-modelDescription " style="width: 157px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-modelDescription" unselectable="on">A2 1.4 </div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-price " style="width: 88px; text-align: right;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-price" unselectable="on"><span class="format-right">7.000</span></div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-firstRegistration " style="width: 58px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-firstRegistration" unselectable="on">6/2000</div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-mileage " style="width: 73px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-mileage" unselectable="on"><span class="format-right">122.000</span></div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-powerInKw " style="width: 48px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-powerInKw" unselectable="on"><span class="format-right">55</span></div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-modificationDate " style="width: 98px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-modificationDate" unselectable="on"><span class="format-right">Heute - 21:29</span></div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-location " style="width: 98px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-location" unselectable="on">85256 Vierkirchen</div></td>

few of values which i want to grap are 'AUDI', 7.000, "85256 Vierkirchen"
So far I coudn't grab these values using my code... So I thought of a new solution, I installed another ffox addon called "firebug" and checked how the data is coming and found out it's REVERSE Ajax DWR requests and responses.
I monitored these requests, and I noticed that every 1 minute, a request/response comes via POST and all these responses have the size of 90 bytes. But then I saw a response with a bigger size(about 648 bytes) I quickly checked the table in website, and YES a new car was there.

Since I had no idea about what reverse ajax dwr is, i googled a bit and found out that this website is using the polling method for pushing data every 1 minute from server to browser.
So to make this short, the data is coming from website to my browser every 1 minute, here is the response Body which I saw in firebug:

VB.NET Syntax (Toggle Plain Text)
  1. //#DWR-START#
  2.  
  3. adQueueListener.addNewAd({adId:124725772,category:'Limousine',changeInfo:{adId:124725772,adVersion:24,changeEventId:457859860,changeSubtype:['DESCRIPTION'],changeType:'Update',comment:"DESCRIPTION",creationTime:1262659849452,modificationTime:new Date(1262659849000),objectType:"AD"},city:"hanau",climatisation:'AUTOMATIC_CLIMATISATION',color:'BEIGE',commercial:false,consumerGrossPrice:4200.00,creationDate:new Date(1260208837000),dealer:false,dealerGrossPrice:null,doorCount:'FOUR_OR_FIVE',firstRegistration:new Date(788914800000),fuel:'PETROL',machtedFilterNames:["Mercedes"],makeName:"MERCEDES-BENZ",manufacturerColorName:"",metallic:false,mileage:149000,modelDescription:"C 180 Elegance",modificationDate:new Date(1262659849000),numSeats:5,powerInKw:90,powerInPs:122,priceDecrease:false,sellerId:2467966,transmission:'AUTOMATIC_GEAR',vehicleCategory:null,version:24,zipcode:"63450"});
  4. //#DWR-END#
  5. //#DWR-START#
  6. dwr.engine.remote.handleCallback("47","0",0);
  7. //#DWR-END#
The response location(ajax script location) is http://gb-ticker.mobile.de/ticker/dw...everseAjax.dwr The data I want are in the adQueueListener.addNewAd method. So how to get these data from this ajax function?

Please let me know whether it can be done by parsing the html code or the ajax requests.
I will be pleased if you can shed some lights.
Many thanks
Last edited by adatapost; Jan 6th, 2010 at 2:14 am. Reason: Added [code] tags. Encase your code in: [code] and [/code] tags.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
jad2010 is offline Offline
2 posts
since Jan 2010

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in VB.NET Forum Timeline: VB label
Next Thread in VB.NET Forum Timeline: Removing part of a string





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC