well kind of...

I want to check to for updates on a webpage/website but only refresh if it finds a change from the page currently loaded in the webbrowser control on the form.

More or less refresh in the background and only actually display the refresh if it finds a change or you manually refresh it using f5.

I can get the html source of the page saved to a text file. Could I just redownload the html source to another file compare the initial file, if the same loop, if not then continue? Seems there must be an easier way though.

any other ideas??

Thanks

Recommended Answers

All 7 Replies

This is really just a client-server model issue. With the web you must contact the server to check for updates, rather than the server alerting you of updates. So, for this to work with a 'standard' webpage you must get the page from the server and do the comparison.

If this were to be checking one webpage that you had control of, a little server-side javascript could tell the client when there are updates as opposed to repeated checking.

This is really just a client-server model issue. With the web you must contact the server to check for updates, rather than the server alerting you of updates. So, for this to work with a 'standard' webpage you must get the page from the server and do the comparison.

If this were to be checking one webpage that you had control of, a little server-side javascript could tell the client when there are updates as opposed to repeated checking.

Hey thanks for the info. I have no control of the wesite. So I'll start looking into comparing the text files or strings.

Just an extra little tip for you is that dynamic web pages often include information that isn't actually an 'update' to the page but merely a statistic.
E.g. page generation time.

So you would need to start thinking about how you would recognise non-important content such as that and content of importance to the user.

Depending on whether you want to be able to use this on one website or any website, you could be setting off on am extremely complex task.

As mentioned this depends on the client and server. The HTTP protocol has a header field for "Last-Modified" and if the HTTP server in question uses that field, you check for a modification date.

You can set the HttpWebRequest.Method = "HEAD"; which will only request the http headers for a document instead of the entire page ("GET" request). You can also check the content-length property, but use this with caution. Normally when a page changes the content length does too due to the nature of changes, but a page's content could be updated without the length changing. Here is an example of HTTP headers with a last-modified:

HTTP/1.1 206 Partial content
       Date: Wed, 15 Nov 1995 06:25:24 GMT
       Last-Modified: Wed, 15 Nov 1995 04:58:08 GMT
       Content-Range: bytes 21010-47021/47022
       Content-Length: 26012
       Content-Type: image/gif

You can also send a client-side "if-modified-since" header in your request. See section 14.25 of RFC-2616.

A GET method with an If-Modified-Since header and no Range header requests that the identified entity be transferred only if it has been modified since the date given by the If-Modified-Since header. The algorithm for determining this includes the following cases:

a) If the request would normally result in anything other than a
200 (OK) status, or if the passed If-Modified-Since date is
invalid, the response is exactly the same as for a normal GET.
A date which is later than the server's current time is
invalid.
b) If the variant has been modified since the If-Modified-Since
date, the response is exactly the same as for a normal GET.
c) If the variant has not been modified since a valid If-
Modified-Since date, the server SHOULD return a 304 (Not
Modified) response.

As mentioned this depends on the client and server. The HTTP protocol has a header field for "Last-Modified" and if the HTTP server in question uses that field, you check for a modification date.

You can set the HttpWebRequest.Method = "HEAD"; which will only request the http headers for a document instead of the entire page ("GET" request). You can also check the content-length property, but use this with caution. Normally when a page changes the content length does too due to the nature of changes, but a page's content could be updated without the length changing. Here is an example of HTTP headers with a last-modified:

HTTP/1.1 206 Partial content
       Date: Wed, 15 Nov 1995 06:25:24 GMT
       Last-Modified: Wed, 15 Nov 1995 04:58:08 GMT
       Content-Range: bytes 21010-47021/47022
       Content-Length: 26012
       Content-Type: image/gif

You can also send a client-side "if-modified-since" header in your request. See section 14.25 of RFC-2616.

Yeah, I looked into that, but the site didn't use those fields.


I did get it to do what I needed now.

Using a do/while loop
-- I get the html source and write it to a file.
-- Checks the file for the keyword I want it to look for.
-- if keyword not found loop.
-- if keyword found refresh page and do so and so.

At first the loop ran so fast it just crashed and locked up. So I added a timer to repeat the loop every second, and added a count integer to count the loops and displayed the loop count in a textbox.


By the way thanks for quick replies everyone. Thought posting here was a long shot. It was the help I needed, to confirm I was at least on the right track. Much Appreciated!!!!

Well you're welcome, and welcome to daniweb! :) So do you have everything working now?

Please mark this thread as solved if you have found a suitable answer to your question and good luck!

Well you're welcome, and welcome to daniweb! :) So do you have everything working now?

Please mark this thread as solved if you have found a suitable answer to your question and good luck!

Yep everything working the way I need it to, thanks again.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.