Correct me if I am wrong, ALL web sites viewed have their htm/html/aspx/jsp pages downloaded into the Temporary Internet Files right? I am trying to access the Temporary Internet Files to collect and copy Information from these web sites. For example if I view a page on Wikipedia, I want to pull the HTML file out of my Temporary Internet Files and then extract the content of the Wikipedia out of it.

So I am doing an experiment to see if I can copy files out of my Temporary Internet Files

I am trying to access my Temporary Internet Files and then copy out some files that are accessed at the same time the web page has completed loading or later(This is to ensure that I only copy out the files that from the web site I am currently viewing) but it is not working.

On top of that even if I were to try manually open my Temporary Internet Files, I do not see any htm/html/aspx/jsp, all I see are images and scripts. I am unsure if I am even in the correct direction to start with. Please direct me.

private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            currentDateTime = DateTime.Now;
        }

private void toolStripButton1_Click(object sender, EventArgs e)
        {
            
            String temporaryInternetFilesPath = Environment.GetFolderPath(Environment.SpecialFolder.InternetCache);
            DirectoryInfo directoryInfo = new DirectoryInfo(temporaryInternetFilesPath);
            int x = 0;
            foreach (FileInfo fileInfo in directoryInfo.GetFiles())
            {
                if (fileInfo.LastAccessTime >= currentDateTime)
                {
                    fileInfo.CopyTo(@"C:\Users\Justin\Documents\Visual Studio 2010\Projects\WindowsFormsApplication1\WindowsFormsApplication1\bin\Debug\Test\fileCopy" + x + ".txt");
                    x = x + 1;
                }
            }
        }

Recommended Answers

All 2 Replies

If you are using firefox you can find the location of the cache by typing about:cache into the address bar. It will show you the folder location of the cache. The other browsers may have similar functions.
Basically, you should get steered towards the AppData folder of your PC (on Vista at least). I had a look at the firefox cache on my machine and it isn't a straight forward list of html/aspx/jsp/etc pages so I think you'll have some work to do

Hmm... the problem is I can't see all the files and folders under Temporary Internet Files from my Explorer. It seems that there is more than meets the eye. Which is why I can't even tell if I am really looping through anything or not.

Secondly, I noticed that the Last Accessed time of a file does not really work, I opened a random file and closed it. I checked the Last Accessed time, it is not the current time, it is the time when it was first created???

You see what I want do is to make is a program that can copy and paste contents from the current web page I am viewing. I played around with frames bur frames are dynamic so I can't just extract HTML files based on frame links inside the HTML source code, not to mention many web sites uses other scripts for navigation which is why I am resorting to digging stuff from the cache.

My second problem is how do I retrieve only files that is only from the current Web page? Since comparing Last Accessed Time isn't accurate at all. Gee... I wonder what's that property for...

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.