1,105,450 Community Members

Program Isn't Reading All The Webpage's HTML

Member Avatar
Ange1ofD4rkness
Posting Whiz
307 posts since May 2010
Reputation Points: 29 [?]
Q&As Helped to Solve: 12 [?]
Skill Endorsements: 9 [?]
 
0
 

Okay so alittle while ago I decided I wanted to write a program that would read in data from a website (the HTML data) and then use it from there (long story short I was to read in a list of urls, then access each of those and gather data from them).

So I used a code I copied from the web and tweaked

public void gatherWebsiteDate (string input)
     {
      string urlAddress = input;
 
      HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
      HttpWebResponse response = (HttpWebResponse)request.GetResponse();

      if (response.StatusCode == HttpStatusCode.OK)
       {
        Stream receiveStream = response.GetResponseStream();
        StreamReader readStream = null;

        if (response.CharacterSet == null)
         {
          readStream = new StreamReader(receiveStream);
         }
        else
         {
          readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
         }
        string data = readStream.ReadToEnd();

        richTextBox1.Text = data;

        response.Close();
        readStream.Close();
       }
     }

Then I ran my code, and while it did read the HTML data of the page it refuses to read the data I want.

Here's a page I would read the data from
http://uc.worldoftanks.com/uc/clans/1000000954-SAC/

Once on this page you see there are a list of members, by clicking on their name you can then got to their profile page and see data about the player (in this case I wanted to gather a clan's data and what tanks each member has used, I would use the collection or URLs to view their profile page and collect this data).

The data I am looking for just won't show up.

The HTML code looks like this

<tbody id="member_table_container">
                      <tr class="js-template js-hidden">
                            <td class="number t-number"></td>
                            <td class="name t-name b-user"></td>
                            <td class="role js-role"></td>
                            <td class="member_since"></td>
                            
                      </tr>
                <tr class="odd clan-role-recruit">
                            <td class="number t-number">1</td>
                            <td class="name t-name b-user js-rendered-template"><a href="/uc/accounts/1001157039-afbrad/">afbrad</a></td>
                            <td class="role js-role js-rendered-template">Recruit</td>
                            <td class="member_since js-rendered-template">16.08.2011</td>

but all I get is

<tbody id="member_table_container">
                      <tr class="js-template js-hidden">
                            <td class="number t-number"></td>
                            <td class="name t-name b-user"></td>
                            <td class="role js-role"></td>
                            <td class="member_since"></td>

For some reason it doesn't copy over any of the users data, it just doesn't see it. I have no clue why it's doing this, but this is what I need the program to do. No it's not hidden if I log out I can still view all the clan data fine, so it's not limited to log in credentials.

Hopefully this makes sense, I kind of am having trouble finding my words today, but hopefully someone can help me out here

Member Avatar
Momerath
Senior Poster
3,831 posts since Aug 2010
Reputation Points: 1,327 [?]
Q&As Helped to Solve: 664 [?]
Skill Endorsements: 19 [?]
Featured
 
0
 

It appears that the code you are looking for is generated by the javascript in the page, thus it isn't part of the HTML. When I use View Source in Firefox, it only shows the bottom HTML you have, nothing about each of the players.

Member Avatar
Ange1ofD4rkness
Posting Whiz
307 posts since May 2010
Reputation Points: 29 [?]
Q&As Helped to Solve: 12 [?]
Skill Endorsements: 9 [?]
 
0
 

See when I view it in Firefox I see all the code (well I view it with Firebug). If this is indeed the case is there way I can still aquire the data?

Member Avatar
Ange1ofD4rkness
Posting Whiz
307 posts since May 2010
Reputation Points: 29 [?]
Q&As Helped to Solve: 12 [?]
Skill Endorsements: 9 [?]
 
0
 

Edit I was also able to view all the date in my IE browser (with I hit F12)

Member Avatar
Ange1ofD4rkness
Posting Whiz
307 posts since May 2010
Reputation Points: 29 [?]
Q&As Helped to Solve: 12 [?]
Skill Endorsements: 9 [?]
 
0
 

Anyone? I really can't seem to figure this one out and would really like to get this program to work

Member Avatar
Ange1ofD4rkness
Posting Whiz
307 posts since May 2010
Reputation Points: 29 [?]
Q&As Helped to Solve: 12 [?]
Skill Endorsements: 9 [?]
 
0
 

Still not solved could really use some help.

Member Avatar
pseudorandom21
Practically a Posting Shark
888 posts since Jan 2011
Reputation Points: 166 [?]
Q&As Helped to Solve: 115 [?]
Skill Endorsements: 0 [?]
 
0
 

If it's any help I usually tend to use the WebBrowser control for these things, and use the "DocumentCompleted" event handler to work with the read the "Document" property of the WebBrowser.

Member Avatar
Ange1ofD4rkness
Posting Whiz
307 posts since May 2010
Reputation Points: 29 [?]
Q&As Helped to Solve: 12 [?]
Skill Endorsements: 9 [?]
 
0
 

Well good news and bad news ...

Good news, this does work, at least I believe it did cause the bad news is why I said I think ...

I got this lovely message "Sorry, this browser does not support this site."
(http://uc.worldoftanks.com/uc/clans/1000000954-SAC/)

Anyone got any idea? Amyway to trick the system? Anything I am over looking? Could really use some help I thought I was making some progress, but I guess not

Member Avatar
pseudorandom21
Practically a Posting Shark
888 posts since Jan 2011
Reputation Points: 166 [?]
Q&As Helped to Solve: 115 [?]
Skill Endorsements: 0 [?]
 
0
 

Yes, I made a website scraping app in C++ once, the "browser" is supposed to send a tag identifying it's self, and as such you can likely fool the website by altering your "user-agent" string.

Oh luckily there is a UserAgent property for the class (woo!)

http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.useragent.aspx

I don't know much more about it, but MSDN also says that class is obsoleted.

Member Avatar
Ange1ofD4rkness
Posting Whiz
307 posts since May 2010
Reputation Points: 29 [?]
Q&As Helped to Solve: 12 [?]
Skill Endorsements: 9 [?]
 
0
 

Okay so the user agent could be a possibility ... but how do I implement that into WebBrowser? Remember using HTTPWebRequest doesn't allow me to read the generate code (it only reads the source ... unless I am overlooking something)

Member Avatar
pseudorandom21
Practically a Posting Shark
888 posts since Jan 2011
Reputation Points: 166 [?]
Q&As Helped to Solve: 115 [?]
Skill Endorsements: 0 [?]
 
0
 

Taken from:
http://stackoverflow.com/questions/937573/changing-the-useragent-of-the-webbrowser-control-winforms-c

You could try: webBrowser.Navigate("http://localhost/run.php", null, null, "User-Agent: Here Put The User Agent"); Or another of the solutions there.

I just tested that on "http://www.whatbrowseramiusing.co/", it reported "unknown user-agent" but the idea is to use one that is known, like chrome's user-agent or firefawks'.

Member Avatar
NetJunkie
Junior Poster
162 posts since Aug 2011
Reputation Points: 29 [?]
Q&As Helped to Solve: 29 [?]
Skill Endorsements: 0 [?]
 
1
 

You can read ALL the HTML easily using the following method:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
//Add using System.Net;
using System.Net;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //Creates the WebClient
            WebClient client = new WebClient();
            //Downloads the HTML
            String htmlCode = client.DownloadString("http://www.daniweb.com/");
            //Makes the TextBox Multiline and Make the Vertical Scroll Bar Active
            textBox1.ScrollBars = ScrollBars.Vertical;
            textBox1.Multiline = true;
            //Makes the TextBox Dock to the Form
            textBox1.Dock = DockStyle.Fill;
            //Displays the Text in a TextBox
            textBox1.Text = htmlCode;
        }
    }
}

Hope this helps and happy coding!

Member Avatar
pseudorandom21
Practically a Posting Shark
888 posts since Jan 2011
Reputation Points: 166 [?]
Q&As Helped to Solve: 115 [?]
Skill Endorsements: 0 [?]
 
0
 

I think there are plenty of solutions here, I hope the thread owner closes this thread soon or tells us about a problem.

Member Avatar
Ange1ofD4rkness
Posting Whiz
307 posts since May 2010
Reputation Points: 29 [?]
Q&As Helped to Solve: 12 [?]
Skill Endorsements: 9 [?]
 
0
 

I will close it for now actually as I do believe I have enough to work with

Pseudorandom21 I do think your UserAgent might be the magical piece of code I needed (lol bad use of words I know, I am tired so I have a dry humor).

I need to contact someone about writing the actual string for the User Agent part.

Thanks for all the help to

Question Answered as of 2 Years Ago by pseudorandom21, Momerath and NetJunkie
You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: