I am having problem getting all the links from a webpage. I have successfully created this using just C#. I used

foreach (HtmlElement link in webBrowser1.Document.Links)
            {
                string linkItem = link.GetAttribute("HREF").ToString();

Apparently C# in ASP.net doesn't have System.Windows.Forms so I can't use HtmlElement. I've tried a couple of different ways but I never get all the links. It seems that links that are located in javascript sections of the won't show up. Here are a couple of ways that I've tried:

string webpage = TextBoxUrl.Text.ToString();
        string source = getPageSource(webpage);

  string getPageSource(string URL)
    {
        System.Net.WebClient webClient = new System.Net.WebClient();
        string strSource = webClient.DownloadString(URL);
        webClient.Dispose();
        return strSource;
    }

And this that I found at http://www.csharp-station.com/HowTo/HttpWebFetch.aspx

class WebFetch
{
	static void Main(string[] args)
	{
		// used to build entire input
		StringBuilder sb  = new StringBuilder();

		// used on each read operation
		byte[]        buf = new byte[8192];

		// prepare the web page we will be asking for
		HttpWebRequest  request  = (HttpWebRequest)
			WebRequest.Create("http://www.mayosoftware.com");

		// execute the request
		HttpWebResponse response = (HttpWebResponse)
			request.GetResponse();

		// we will read data via the response stream
		Stream resStream = response.GetResponseStream();

		string tempString = null;
		int    count      = 0;

		do
		{
			// fill the buffer with data
			count = resStream.Read(buf, 0, buf.Length);

			// make sure we read some data
			if (count != 0)
			{
				// translate from bytes to ASCII text
				tempString = Encoding.ASCII.GetString(buf, 0, count);

				// continue building the string
				sb.Append(tempString);
			}
		}
		while (count > 0); // any more data to read?

		// print out page source
		Console.WriteLine(sb.ToString());
	}
}

Is this even possible to do?

Recommended Answers

All 2 Replies

Hi sfRider,
Sadly, dynamic content is dynamic content, so if the links are being rendered on the fly by javascript then you will not be able to scrape them.

Dang. Is there anyway to use the source code feature in internet explorer and copy/paste that into a file in my program, then read it that way? The source code it gives me in IE has all the links in it.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.