Hi,
i am new to C# but have experiance in java and html and am having problems using the StreamReader class.

I am trying to run a C# script on my webserver to read a target website and extract particular weblinks and either save them to a local XML file or temporarily display them as clickable links on my generated page. The problem i am having is the link i want to extract isn't happily on a seperate line in the html code.

//xhtml

<h3><a href="http://tech-reviews.co.uk/reviews/prolimatech-megahalems-cpu-cooler/" rel="bookmark" title="Permanent Link to Prolimatech Megahalems CPU Cooler">Prolimatech Megahalems CPU Cooler</a></h3>

//xhtml

i want the reader class to extract from the above

'http://tech-reviews.co.uk/reviews/prolimatech-megahalems-cpu-cooler/'

Below is the code i am using so far

C#

<%@ Page language="c#"%>
<%@ Import Namespace="System.Net" %>
<%@ Import Namespace="System.IO" %>
<script runat="server" lang="c#">

private void Page_Load(object sender, System.EventArgs e)
{
//Retrieve URL from user input box
if(Page.IsPostBack)
litHTMLfromScrapedPage.Text = GetHtmlPage( tbURL.Text );
}
public String GetHtmlPage(string strURL)
{
// the html retrieved from the page
String strResult;
WebResponse objResponse;
WebRequest objRequest = System.Net.HttpWebRequest.Create(strURL);
objResponse = objRequest.GetResponse();
// the using keyword will automatically dispose the object 
// once complete
using (StreamReader sr = 
new StreamReader(objResponse.GetResponseStream()))
{
strResult = sr.ReadToEnd();

// Close and clean up the StreamReader
sr.Close();
}
return strResult;
}

//c#

Any help would be greatly appreciated and appologies if i have not adhered to the posting rules (my first post)

Recommended Answers

All 2 Replies

Please use code tags when posting on daniweb:

[code=c#] ...code here...

[/code]

Lastly this is a highly talked about topic called "scraping". If you google "C# Scraping" or "c# scrapers" you will find a lot of example projects that do exactly this.

WebClient wc = new WebClient();
Stream stream = wc.OpenRead("http://www.google.co.in/search?hl=en&q=regular+expression+in+.net&meta=&aq=7&oq=regular+expression+in+.ne");
        StreamReader reader = new StreamReader(stream);
        string s = reader.ReadToEnd();

        Regex r = new Regex(@"href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))",RegexOptions.IgnoreCase |  RegexOptions.Compiled);

        Match m = r.Match(s);
        while (m.Success)
        {
            Console.WriteLine(m.Groups[1].Value + " " + m.Groups[1].Index);
            m = m.NextMatch();
        }
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.