| | |
Get all links from a webpage
Please support our ASP.NET advertiser: Intel Parallel Studio Home
![]() |
•
•
Join Date: Oct 2008
Posts: 89
Reputation:
Solved Threads: 0
I am having problem getting all the links from a webpage. I have successfully created this using just C#. I used
Apparently C# in ASP.net doesn't have System.Windows.Forms so I can't use HtmlElement. I've tried a couple of different ways but I never get all the links. It seems that links that are located in javascript sections of the won't show up. Here are a couple of ways that I've tried:
And this that I found at http://www.csharp-station.com/HowTo/HttpWebFetch.aspx
Is this even possible to do?
ASP.NET Syntax (Toggle Plain Text)
foreach (HtmlElement link in webBrowser1.Document.Links) { string linkItem = link.GetAttribute("HREF").ToString();
ASP.NET Syntax (Toggle Plain Text)
string webpage = TextBoxUrl.Text.ToString(); string source = getPageSource(webpage); string getPageSource(string URL) { System.Net.WebClient webClient = new System.Net.WebClient(); string strSource = webClient.DownloadString(URL); webClient.Dispose(); return strSource; }
ASP.NET Syntax (Toggle Plain Text)
class WebFetch { static void Main(string[] args) { // used to build entire input StringBuilder sb = new StringBuilder(); // used on each read operation byte[] buf = new byte[8192]; // prepare the web page we will be asking for HttpWebRequest request = (HttpWebRequest) WebRequest.Create("http://www.mayosoftware.com"); // execute the request HttpWebResponse response = (HttpWebResponse) request.GetResponse(); // we will read data via the response stream Stream resStream = response.GetResponseStream(); string tempString = null; int count = 0; do { // fill the buffer with data count = resStream.Read(buf, 0, buf.Length); // make sure we read some data if (count != 0) { // translate from bytes to ASCII text tempString = Encoding.ASCII.GetString(buf, 0, count); // continue building the string sb.Append(tempString); } } while (count > 0); // any more data to read? // print out page source Console.WriteLine(sb.ToString()); } }
Hi sfRider,
Sadly, dynamic content is dynamic content, so if the links are being rendered on the fly by javascript then you will not be able to scrape them.
Sadly, dynamic content is dynamic content, so if the links are being rendered on the fly by javascript then you will not be able to scrape them.
![]() |
Similar Threads
- Opening a new window form with web browser control,with webbrowser.navigating event (VB.NET)
- Extract links from a webpage (PHP)
- Increase website PR? (Search Engine Optimization)
- how do you get all links in browser1 and display them in a listbox (Visual Basic 4 / 5 / 6)
- cheap jsp webhosting (JSP)
- eb browser question (C#)
- What is Link Popularity? (Promotion and Marketing Plans)
Other Threads in the ASP.NET Forum
- Previous Thread: twitter Api
- Next Thread: Visual Studio Asp.Net Tutorials
| Thread Tools | Search this Thread |
.net 2.0 3.5 activexcontrol advice ajax appliances asp asp.net beginner bottomasp.net box browser businesslogiclayer button c# c#gridviewcolumn cac checkbox child class click compatible confirmationcodegeneration content contenttype control countryselector courier css database datagrid datagridview datalist deadlock deployment development dgv dialog dropdownmenu dynamic edit embeddingactivexcontrol feedback fileuploader fill findcontrol flash flv form forms gridview gudi homeedition hosting iframe iis javascript jquery list menu mono mssql multistepregistration nameisnotdeclared novell objects order problem ratings redirect registration relationaldatabases rotatepage search security select serializesmo.table sessionvariables silverlight smartcard sql ssl tracking treeview typeof validatedate validation vb.net virtualdirectory vista visual-studio visualstudio vs2008 web webarchitecture webdevelopment wizard xml xsl





