943,905 Members | Top Members by Rank

Ad:
  • ASP.NET Discussion Thread
  • Unsolved
  • Views: 3342
  • ASP.NET RSS
May 28th, 2009
0

Get all links from a webpage

Expand Post »
I am having problem getting all the links from a webpage. I have successfully created this using just C#. I used
ASP.NET Syntax (Toggle Plain Text)
  1. foreach (HtmlElement link in webBrowser1.Document.Links)
  2. {
  3. string linkItem = link.GetAttribute("HREF").ToString();
Apparently C# in ASP.net doesn't have System.Windows.Forms so I can't use HtmlElement. I've tried a couple of different ways but I never get all the links. It seems that links that are located in javascript sections of the won't show up. Here are a couple of ways that I've tried:
ASP.NET Syntax (Toggle Plain Text)
  1. string webpage = TextBoxUrl.Text.ToString();
  2. string source = getPageSource(webpage);
  3.  
  4. string getPageSource(string URL)
  5. {
  6. System.Net.WebClient webClient = new System.Net.WebClient();
  7. string strSource = webClient.DownloadString(URL);
  8. webClient.Dispose();
  9. return strSource;
  10. }
And this that I found at http://www.csharp-station.com/HowTo/HttpWebFetch.aspx
ASP.NET Syntax (Toggle Plain Text)
  1. class WebFetch
  2. {
  3. static void Main(string[] args)
  4. {
  5. // used to build entire input
  6. StringBuilder sb = new StringBuilder();
  7.  
  8. // used on each read operation
  9. byte[] buf = new byte[8192];
  10.  
  11. // prepare the web page we will be asking for
  12. HttpWebRequest request = (HttpWebRequest)
  13. WebRequest.Create("http://www.mayosoftware.com");
  14.  
  15. // execute the request
  16. HttpWebResponse response = (HttpWebResponse)
  17. request.GetResponse();
  18.  
  19. // we will read data via the response stream
  20. Stream resStream = response.GetResponseStream();
  21.  
  22. string tempString = null;
  23. int count = 0;
  24.  
  25. do
  26. {
  27. // fill the buffer with data
  28. count = resStream.Read(buf, 0, buf.Length);
  29.  
  30. // make sure we read some data
  31. if (count != 0)
  32. {
  33. // translate from bytes to ASCII text
  34. tempString = Encoding.ASCII.GetString(buf, 0, count);
  35.  
  36. // continue building the string
  37. sb.Append(tempString);
  38. }
  39. }
  40. while (count > 0); // any more data to read?
  41.  
  42. // print out page source
  43. Console.WriteLine(sb.ToString());
  44. }
  45. }
Is this even possible to do?
Similar Threads
Reputation Points: 16
Solved Threads: 0
Junior Poster
sfrider0 is offline Offline
149 posts
since Oct 2008
May 28th, 2009
0

Re: Get all links from a webpage

Hi sfRider,
Sadly, dynamic content is dynamic content, so if the links are being rendered on the fly by javascript then you will not be able to scrape them.
Reputation Points: 68
Solved Threads: 9
Junior Poster
sedgey is offline Offline
130 posts
since Jan 2007
May 28th, 2009
0

Re: Get all links from a webpage

Dang. Is there anyway to use the source code feature in internet explorer and copy/paste that into a file in my program, then read it that way? The source code it gives me in IE has all the links in it.
Reputation Points: 16
Solved Threads: 0
Junior Poster
sfrider0 is offline Offline
149 posts
since Oct 2008

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in ASP.NET Forum Timeline: twitter Api
Next Thread in ASP.NET Forum Timeline: Visual Studio Asp.Net Tutorials





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC