Get all links from a webpage

Please support our ASP.NET advertiser: Intel Parallel Studio Home
Reply

Join Date: Oct 2008
Posts: 89
Reputation: sfrider0 is an unknown quantity at this point 
Solved Threads: 0
sfrider0 sfrider0 is offline Offline
Junior Poster in Training

Get all links from a webpage

 
0
  #1
May 28th, 2009
I am having problem getting all the links from a webpage. I have successfully created this using just C#. I used
  1. foreach (HtmlElement link in webBrowser1.Document.Links)
  2. {
  3. string linkItem = link.GetAttribute("HREF").ToString();
Apparently C# in ASP.net doesn't have System.Windows.Forms so I can't use HtmlElement. I've tried a couple of different ways but I never get all the links. It seems that links that are located in javascript sections of the won't show up. Here are a couple of ways that I've tried:
  1. string webpage = TextBoxUrl.Text.ToString();
  2. string source = getPageSource(webpage);
  3.  
  4. string getPageSource(string URL)
  5. {
  6. System.Net.WebClient webClient = new System.Net.WebClient();
  7. string strSource = webClient.DownloadString(URL);
  8. webClient.Dispose();
  9. return strSource;
  10. }
And this that I found at http://www.csharp-station.com/HowTo/HttpWebFetch.aspx
  1. class WebFetch
  2. {
  3. static void Main(string[] args)
  4. {
  5. // used to build entire input
  6. StringBuilder sb = new StringBuilder();
  7.  
  8. // used on each read operation
  9. byte[] buf = new byte[8192];
  10.  
  11. // prepare the web page we will be asking for
  12. HttpWebRequest request = (HttpWebRequest)
  13. WebRequest.Create("http://www.mayosoftware.com");
  14.  
  15. // execute the request
  16. HttpWebResponse response = (HttpWebResponse)
  17. request.GetResponse();
  18.  
  19. // we will read data via the response stream
  20. Stream resStream = response.GetResponseStream();
  21.  
  22. string tempString = null;
  23. int count = 0;
  24.  
  25. do
  26. {
  27. // fill the buffer with data
  28. count = resStream.Read(buf, 0, buf.Length);
  29.  
  30. // make sure we read some data
  31. if (count != 0)
  32. {
  33. // translate from bytes to ASCII text
  34. tempString = Encoding.ASCII.GetString(buf, 0, count);
  35.  
  36. // continue building the string
  37. sb.Append(tempString);
  38. }
  39. }
  40. while (count > 0); // any more data to read?
  41.  
  42. // print out page source
  43. Console.WriteLine(sb.ToString());
  44. }
  45. }
Is this even possible to do?
Reply With Quote Quick reply to this message  
Join Date: Jan 2007
Posts: 130
Reputation: sedgey is on a distinguished road 
Solved Threads: 8
sedgey's Avatar
sedgey sedgey is offline Offline
Junior Poster

Re: Get all links from a webpage

 
0
  #2
May 28th, 2009
Hi sfRider,
Sadly, dynamic content is dynamic content, so if the links are being rendered on the fly by javascript then you will not be able to scrape them.
David Ridgway: so little daylight, too much caffeine
MCSD MCAD MCSE
http://web2asp.net
Reply With Quote Quick reply to this message  
Join Date: Oct 2008
Posts: 89
Reputation: sfrider0 is an unknown quantity at this point 
Solved Threads: 0
sfrider0 sfrider0 is offline Offline
Junior Poster in Training

Re: Get all links from a webpage

 
0
  #3
May 28th, 2009
Dang. Is there anyway to use the source code feature in internet explorer and copy/paste that into a file in my program, then read it that way? The source code it gives me in IE has all the links in it.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC