Hello DaniWebbers,

So I have been hard at work on a new program that reads in a webpage every X amount of seconds. Once doing so it detects if there has been a change on the webpage, if so, updates a form, and let me know of the change.

Well I have been running into a snag. I recently got a webpage reader class that works perfect for me ... or so I thought. I have recently come to learn that WebBrowser stores a cache of recent visited sites and if it detects the same sight it will access it from the cache (if it hasn't expired).

One of the webpages I have been test this code on, is causing a problem. The webpage will update with new data, but my WebBrowser keeps reading in the old data from it's cache.

Here's my code

namespace ScoreTableDetector_2v2
{
//===================================================================================================================
    class readInWebpage_v3 : IDisposable
    {
        WebBrowser wb;
        bool timerTriggered;
//-------------------------------------------------------------------------------------------------------------------
        public readInWebpage_v3 ()
        {
            wb = new WebBrowser();
            wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);

            timerTriggered = false;
        }
//-------------------------------------------------------------------------------------------------------------------
        public string downloadedData
        {
            get;
            private set;
        }
//-------------------------------------------------------------------------------------------------------------------
        public void readIn (Uri webLink, int secs)
        {
            DateTime timeNow = DateTime.Now;

            wb.Navigate(webLink);
            //wb.Refresh(WebBrowserRefreshOption.Normal); //DOESN'T fix problem

            TimeSpan elapsedTime;
            while (wb.ReadyState != WebBrowserReadyState.Complete)
            {
                elapsedTime = DateTime.Now - timeNow;
                if (elapsedTime.Seconds > secs) //This function does indeed work for a timed out (and supports Application.DoEvents() which seems to be needed for wb_DocumentCompleted)
                {
                    timerTriggered = true;
                    break;
                }
                Application.DoEvents();
            }

            if (timerTriggered == false)
            {
                //downloadedData = wb.Document.Body.InnerHtml; //Added this line, because the final HTML takes a while to show up 
                    //!!!! This seems redundent so hold up on it

                if (this.downloadedData.Contains("Navigation to the webpage was canceled")) //The URL lead to an invalid webpage
                {
                    downloadedData = "Invalid Webpage";
                }
            }
            else //timed out
            {
                wb.Stop();
                downloadedData = "Timed Out";
            }

            //wb.Dispose();
        }
//-------------------------------------------------------------------------------------------------------------------
        void wb_DocumentCompleted (object sender, WebBrowserDocumentCompletedEventArgs e) //when the webpage has finished loading (read it)
        {
            WebBrowser webBrows = (WebBrowser) sender;
            downloadedData = webBrows.Document.Body.InnerHtml;
        }
//-------------------------------------------------------------------------------------------------------------------
        public void Dispose () //used for disposing items
        {
            if (wb != null)
            {
                wb.Dispose();
                wb = null;
            }
            if (downloadedData != null)
            {
                downloadedData = "";
            }
        }
//-------------------------------------------------------------------------------------------------------------------
    }
//===================================================================================================================
}

Now I read online about the cache WebBrowser has (actually to be honest at first I guessed it did that, what do you know a lucky guess), and that using the Refresh() command is suppose to force the page to re-read the webpage in it's current state.

Well I tried this and no matter what I do I can't get it to work for me. I tried at one point adding in a cold start if() statement. If the class was called the first time the wb.Navigate() would be called and everytime after the wb.Refresh() would be (instead of Navigate). I tried different Refreshes to, not just the commented out one above. But no matter what I tried, when I used the Refresh on its own never called the DocumentCompleted event (which I kind of rely on).

So what I am trying to figure out is how can I force my WebBrowser to constantly gather new data from the webpage and stop relying on the cache?

Oh yeah this is how I call the class

readInData = new readInWebpage_v3();
readInData.readIn(webpageURI, refreshTimer);

//does a bunch of stuff with the data

readInData.Dispose();

That's all tucked into a backgroundWorker_ProgressChanged (I was hoping disposing and recreating the WebBrowser would prove successful but it doesn't). I also know that there are sites this does work fine on, but the site I am on it doesn't and that's what I need it to work for.

Thanks in advance for any help

Recommended Answers

All 4 Replies

You might want to consider using WebClient or WebRequest/WebResponse instead of WebBrowser. Not sure if they will meet your needs, depends on what you are looking at :)

Okay so knew I saw WebClient before, I used it in the first code I built to retrieve data from a webpage. I also stopped using it because WebClient would download the data right away from the page without giving it time to load (I think it's running like a jQuery or something like that ... either way data is loaded). And for the love of god I can't seem to find a way to suspend WebClient like I did in the WebBrowser above.

I looked into the Request and Respond a bit (not much) but from what I have found it has the same issue WebClient has.

I need to find a way to allow the page to finish loading before I retrieve it's data (this program isn't meant to work around just one webpage but be able to work all around). I test WebClient with this page

http://worldoftanks.com/uc/clans/1000000954-SAC/

If you try to use WebClient flat out, you'll get some data, but if you look at the page there is a clan roster, that seems to generate. WebBrowser does read this. So while WebClient should be able to fix the issue I have with the cache, it doesn't allow the page to load ... at least from what I have found.

So I am at right now, finding a way to make WebClient wait (like the code above for WebBrowser), or find a way to clear the cache for WebBrowser

This will be of help with clearing a WebBrowser's cache.

Hey Mikey I actually saw that the other day when searching the web, and I have it noted, however from what I read it clears all the IE cache, and I'm not sure if that's what I want to happen.

However I did find 2 solutions so far

The first was to use the following line

wb.Navigate(webLink + "?refreshToken=" + Guid.NewGuid().ToString());

From what I read it pretty much randomizes up the URL somehow to the point where the cache doesn't see it matching. I tried it multiple times last night and it works successfully ... however I really don't know how it entirely works (I'm willing for some help there).

The other solution, which might be something I implement in the future, is to use a .NET library for cURL. It exists and everything, and might be more successful in the future (espeically since I need to find a way to log into webpages programmically to)

I guess I can marked this solved for now, but please if someone sees something off or wrong let me know.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.