I need to read in a webpage. Any places to start ?

Recommended Answers

All 21 Replies

could you be a little more specific?

well theres an intranet page that I want to grab. Basically it holds a list of names. I am going to read all those names and perform commands on them. I just need to know how to start streaming from the page. Or which route would be the best to basically read a webpages contents. Almost if I wanted to read its contents and then paste it to a notepad so I can start indexing stuff out of it to leave only what I want remaining (names) to perform other stuff on.

have you tried posting your question @ Web Development instead of @ software development?

Nope because I am not web designing.
lol..
hmmmm ... maybe I do need to look into some ole html LOL.
Alll I need to do is get everything from the website into a normal view source format and voila.. I can start manipulating stuff.

btw im the same person .. if you dont already know lo..
Any suggestions would be great but I am still searching.

i guess maybe web designers will have more experience in intranet and web pages... i mean... i get the idea... but beyond that, i'm blindfolded

The easiest way would be to fork off a call to wget and pipe back its output.

Its easy! (on linux) First invoke the command:

wget www.yourstuff.com

it will create a file called index.html then use ifstream to read the file.

Its easy! (on linux) First invoke the command:

wget www.yourstuff.com

it will create a file called index.html then use ifstream to read the file.

wget www.mystuff.com

??????

Ok a few questions. First off, I am windows programming does that make a difference with wget ???

Second if windows is not going to be a problem and I can still use wget , then how do I get the information from it.

Like once I use wget www.myIntraNetPage.com how do I get it to save the contents in the hard drive or anywhere for that matter so I can begin manipulating the data.

Please if you could post a small example of this !
Like get this webpage it saves the whole source to this destination .
end of story.

I am very anxious to use this ability.

you have to use linux. Maybe you could use a virtual machine? Maybe its for windows but i do not know. wget <whatever> it will take that webpage and store it as a file called index.html by default. Then you can read the file with ifstream.

heres a better mirror:
http://gnuwin32.sourceforge.net/packages/wget.htm

Here would be the code to get a webpage and read it.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main () 
{
  string line;

  system("wget www.google.com");

  ifstream file ("index.html");

    while (!myfile.eof())
    {
      getline(file,line);
      cout<<line<<"\n";
    }
  file.close();
  return 0;
}

Somehow, I don't like the solution of using system() calls. Maybe it's because

  • system() calls are way slower than using libraries or hard-coding the socket calls yourself
  • Any client machine would be required to have wget installed
  • It leaves files all over the hard drive
  • It requires read/write access to the hard disk when in fact it should only need read access
  • The code that is required to load a webpage using a library like libcurl is really, quite trivial

I dunno, Salem's suggestion just appeals a lot more to me.


Somehow, I don't like the solution of using system() calls. Maybe it's because

  • system() calls are way slower than using libraries or hard-coding the socket calls yourself
  • Any client machine would be required to have wget installed
  • It leaves files all over the hard drive
  • It requires read/write access to the hard disk when in fact it should only need read access
  • The code that is required to load a webpage using a library like libcurl is really, quite trivial

I dunno, Salem's suggestion just appeals a lot more to me.

1. true, you have a point
2. any client would need libcurl
3. its a trivial problem that requires one line of code
4. its irrelevent
5. true, my suggestion is a quick hack (they have there purposes) libcurl would be better. (its hard to get some libraries for some compilers working on windows)

>any client would need libcurl
Not if you statically link libcurl.

>its a trivial problem that requires one line of code
Something that you neglected to implement in your last example.

>its irrelevent
How so? If, when the program closes, nothing is changed on the hard drive, why make it so it does in fact need r/w permissions on the disk? Here's a better example: the program's installed on a Linux system in a directory such as /usr/bin . Now what happens when your program trys to save that file?

on a linux system libcurl would be a better option since it is easy to use libraries on a linux system, on windows its near impossible! That is why I recommended the use of wget.

Well yes wget does work for windows after all. The only problem is it doesnt work on the IntraNet page I am trying to read from. It's not reading into wget, but every other external internet webpage does !

http://infoget/caav.asp?site=AGT13&sort=&qry=&nav=all

That is the kind of link i am trying to link to lol . I can view that sites source via right click of the mouse. I cannot however view it with wget , unless theres something I am missing.

I understand about the index.html file it saves to the set path so if theres anything else that could help do this same procedure that would be great.

I dont want to have to port the wget.exe all over my network to use it, or when I let a fellow technician run the same software I am trying to stay hard coded as possible.


**once more this is for Windows XP Sp2** not linux, or unix.

Well yes wget does work for windows after all. The only problem is it doesnt work on the IntraNet page I am trying to read from. It's not reading into wget, but every other external internet webpage does !

http://infoget/caav.asp?site=AGT13&sort=&qry=&nav=all

Probably something to do with the ampersands in the web address. On Linux you would put quotes around the URL or escape it, I'm not sure about similar techniques for Windows.

Im not exactly sure I understand?

Here is a link which may assist you:

www.webmonkey.com

They have a pretty dedicated site on Web, Web based interfaces,etc...

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.