1.11M Members

Program Isn't Reading All The Webpage's HTML

 
0
 

Okay so alittle while ago I decided I wanted to write a program that would read in data from a website (the HTML data) and then use it from there (long story short I was to read in a list of urls, then access each of those and gather data from them).

So I used a code I copied from the web and tweaked

public void gatherWebsiteDate (string input)
     {
      string urlAddress = input;
 
      HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
      HttpWebResponse response = (HttpWebResponse)request.GetResponse();

      if (response.StatusCode == HttpStatusCode.OK)
       {
        Stream receiveStream = response.GetResponseStream();
        StreamReader readStream = null;

        if (response.CharacterSet == null)
         {
          readStream = new StreamReader(receiveStream);
         }
        else
         {
          readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
         }
        string data = readStream.ReadToEnd();

        richTextBox1.Text = data;

        response.Close();
        readStream.Close();
       }
     }

Then I ran my code, and while it did read the HTML data of the page it refuses to read the data I want.

Here's a page I would read the data from
http://uc.worldoftanks.com/uc/clans/1000000954-SAC/

Once on this page you see there are a list of members, by clicking on their name you can then got to their profile page and see data about the player (in this case I wanted to gather a clan's data and what tanks each member has used, I would use the collection or URLs to view their profile page and collect this data).

The data I am looking for just won't show up.

The HTML code looks like this

<tbody id="member_table_container">
                      <tr class="js-template js-hidden">
                            <td class="number t-number"></td>
                            <td class="name t-name b-user"></td>
                            <td class="role js-role"></td>
                            <td class="member_since"></td>
                            
                      </tr>
                <tr class="odd clan-role-recruit">
                            <td class="number t-number">1</td>
                            <td class="name t-name b-user js-rendered-template"><a href="/uc/accounts/1001157039-afbrad/">afbrad</a></td>
                            <td class="role js-role js-rendered-template">Recruit</td>
                            <td class="member_since js-rendered-template">16.08.2011</td>

but all I get is

<tbody id="member_table_container">
                      <tr class="js-template js-hidden">
                            <td class="number t-number"></td>
                            <td class="name t-name b-user"></td>
                            <td class="role js-role"></td>
                            <td class="member_since"></td>

For some reason it doesn't copy over any of the users data, it just doesn't see it. I have no clue why it's doing this, but this is what I need the program to do. No it's not hidden if I log out I can still view all the clan data fine, so it's not limited to log in credentials.

Hopefully this makes sense, I kind of am having trouble finding my words today, but hopefully someone can help me out here

 
0
 

It appears that the code you are looking for is generated by the javascript in the page, thus it isn't part of the HTML. When I use View Source in Firefox, it only shows the bottom HTML you have, nothing about each of the players.

 
0
 

See when I view it in Firefox I see all the code (well I view it with Firebug). If this is indeed the case is there way I can still aquire the data?

 
0
 

Edit I was also able to view all the date in my IE browser (with I hit F12)

 
0
 

Anyone? I really can't seem to figure this one out and would really like to get this program to work

 
0
 

Still not solved could really use some help.

 
0
 

If it's any help I usually tend to use the WebBrowser control for these things, and use the "DocumentCompleted" event handler to work with the read the "Document" property of the WebBrowser.

 
0
 

Well good news and bad news ...

Good news, this does work, at least I believe it did cause the bad news is why I said I think ...

I got this lovely message "Sorry, this browser does not support this site."
(http://uc.worldoftanks.com/uc/clans/1000000954-SAC/)

Anyone got any idea? Amyway to trick the system? Anything I am over looking? Could really use some help I thought I was making some progress, but I guess not

 
0
 

Yes, I made a website scraping app in C++ once, the "browser" is supposed to send a tag identifying it's self, and as such you can likely fool the website by altering your "user-agent" string.

Oh luckily there is a UserAgent property for the class (woo!)

http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.useragent.aspx

I don't know much more about it, but MSDN also says that class is obsoleted.

 
0
 

Okay so the user agent could be a possibility ... but how do I implement that into WebBrowser? Remember using HTTPWebRequest doesn't allow me to read the generate code (it only reads the source ... unless I am overlooking something)

 
0
 

Taken from:
http://stackoverflow.com/questions/937573/changing-the-useragent-of-the-webbrowser-control-winforms-c

You could try: webBrowser.Navigate("http://localhost/run.php", null, null, "User-Agent: Here Put The User Agent"); Or another of the solutions there.

I just tested that on "http://www.whatbrowseramiusing.co/", it reported "unknown user-agent" but the idea is to use one that is known, like chrome's user-agent or firefawks'.

 
1
 

You can read ALL the HTML easily using the following method:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
//Add using System.Net;
using System.Net;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //Creates the WebClient
            WebClient client = new WebClient();
            //Downloads the HTML
            String htmlCode = client.DownloadString("http://www.daniweb.com/");
            //Makes the TextBox Multiline and Make the Vertical Scroll Bar Active
            textBox1.ScrollBars = ScrollBars.Vertical;
            textBox1.Multiline = true;
            //Makes the TextBox Dock to the Form
            textBox1.Dock = DockStyle.Fill;
            //Displays the Text in a TextBox
            textBox1.Text = htmlCode;
        }
    }
}

Hope this helps and happy coding!

 
0
 

I think there are plenty of solutions here, I hope the thread owner closes this thread soon or tells us about a problem.

 
0
 

I will close it for now actually as I do believe I have enough to work with

Pseudorandom21 I do think your UserAgent might be the magical piece of code I needed (lol bad use of words I know, I am tired so I have a dry humor).

I need to contact someone about writing the actual string for the User Agent part.

Thanks for all the help to

Question Answered as of 2 Years Ago by pseudorandom21, Momerath and NetJunkie
You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: