How To Read A Webpage ?

Question

generalGOTCHA 0 Newbie Poster

18 Years Ago

I need to read in a webpage. Any places to start ?

c++

8 Contributors
21 Replies
783 Views
4 Days Discussion Span
Latest Post 18 Years Ago Latest Post by kxh29

All 21 Replies

Rashakil Fol 978 Super Senior Demiposter

18 Years Ago

The easiest way would be to fork off a call to wget and pipe back its output.

Salem 5,265 Posting Sage

18 Years Ago

libcurl

John A 1,896 Vampirical Lurker

18 Years Ago

Somehow, I don't like the solution of using system() calls. Maybe it's because

system() calls are way slower than using libraries or hard-coding the socket calls yourself
Any client machine would be required to have wget installed
It leaves files all over the hard drive
It requires read/write access to the hard disk when in fact it should only need read access
The code that is required to load a webpage using a library like libcurl is really, quite trivial

I dunno, Salem's suggestion just appeals a lot more to me.

John A 1,896 Vampirical Lurker

18 Years Ago

>any client would need libcurl
Not if you statically link libcurl.

>its a trivial problem that requires one line of code
Something that you neglected to implement in your last example.

>its irrelevent
How so? If, when the program closes, nothing is changed on the hard drive, why make it so it does in fact need r/w permissions on the disk? Here's a better example: the program's installed on a Linux system in a directory such as /usr/bin . Now what happens when your program trys to save that file?

John A 1,896 Vampirical Lurker

18 Years Ago

Well yes wget does work for windows after all. The only problem is it doesnt work on the IntraNet page I am trying to read from. It's not reading into wget, but every other external internet webpage does !
http://infoget/caav.asp?site=AGT13&sort=&qry=&nav=all

Probably something to do with the ampersands in the web address. On Linux you would put quotes around the URL or escape it, I'm not sure about similar techniques for Windows.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

ndeniche 402 Posting Virtuoso Featured Poster · Answer 1 · 2007-05-04T09:02:32+00:00

ndeniche 402 Posting Virtuoso

18 Years Ago

could you be a little more specific?

generalGOTCHA 0 Newbie Poster · Answer 2 · 2007-05-04T09:07:28+00:00

well theres an intranet page that I want to grab. Basically it holds a list of names. I am going to read all those names and perform commands on them. I just need to know how to start streaming from the page. Or which route would be the best to basically read a webpages contents. Almost if I wanted to read its contents and then paste it to a notepad so I can start indexing stuff out of it to leave only what I want remaining (names) to perform other stuff on.

ndeniche 402 Posting Virtuoso Featured Poster · Answer 3 · 2007-05-04T09:12:19+00:00

have you tried posting your question @ Web Development instead of @ software development?

CodyOebel -2 Junior Poster in Training · Answer 4 · 2007-05-04T09:32:26+00:00

Nope because I am not web designing.
lol..
hmmmm ... maybe I do need to look into some ole html LOL.
Alll I need to do is get everything from the website into a normal view source format and voila.. I can start manipulating stuff.

CodyOebel -2 Junior Poster in Training · Answer 5 · 2007-05-04T09:40:27+00:00

btw im the same person .. if you dont already know lo..
Any suggestions would be great but I am still searching.

ndeniche 402 Posting Virtuoso Featured Poster · Answer 6 · 2007-05-04T09:54:55+00:00

i guess maybe web designers will have more experience in intranet and web pages... i mean... i get the idea... but beyond that, i'm blindfolded

Sturm 270 Veteran Poster · Answer 7 · 2007-05-05T06:53:25+00:00

Its easy! (on linux) First invoke the command:

wget www.yourstuff.com

it will create a file called index.html then use ifstream to read the file.

CodyOebel -2 Junior Poster in Training · Answer 8 · 2007-05-07T14:32:26+00:00

Its easy! (on linux) First invoke the command:
wget www.yourstuff.com
it will create a file called index.html then use ifstream to read the file.

wget www.mystuff.com

??????

Ok a few questions. First off, I am windows programming does that make a difference with wget ???

Second if windows is not going to be a problem and I can still use wget , then how do I get the information from it.

Like once I use wget www.myIntraNetPage.com how do I get it to save the contents in the hard drive or anywhere for that matter so I can begin manipulating the data.

Please if you could post a small example of this !
Like get this webpage it saves the whole source to this destination .
end of story.

I am very anxious to use this ability.

Sturm 270 Veteran Poster · Answer 9 · 2007-05-08T00:26:41+00:00

you have to use linux. Maybe you could use a virtual machine? Maybe its for windows but i do not know. wget <whatever> it will take that webpage and store it as a file called index.html by default. Then you can read the file with ifstream.

Sturm 270 Veteran Poster · Answer 10 · 2007-05-08T00:27:53+00:00

yes i just found out it is for windows see:
http://pages.interlog.com/~tcharron/wgetwin.html

Sturm 270 Veteran Poster · Answer 11 · 2007-05-08T00:29:55+00:00

heres a better mirror:
http://gnuwin32.sourceforge.net/packages/wget.htm

Here would be the code to get a webpage and read it.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main () 
{
  string line;

  system("wget www.google.com");

  ifstream file ("index.html");

    while (!myfile.eof())
    {
      getline(file,line);
      cout<<line<<"\n";
    }
  file.close();
  return 0;
}

Sturm 270 Veteran Poster · Answer 12 · 2007-05-08T06:03:23+00:00

Somehow, I don't like the solution of using system() calls. Maybe it's because
system() calls are way slower than using libraries or hard-coding the socket calls yourself

Any client machine would be required to have wget installed

It leaves files all over the hard drive

It requires read/write access to the hard disk when in fact it should only need read access

The code that is required to load a webpage using a library like libcurl is really, quite trivial

I dunno, Salem's suggestion just appeals a lot more to me.

1. true, you have a point
2. any client would need libcurl
3. its a trivial problem that requires one line of code
4. its irrelevent
5. true, my suggestion is a quick hack (they have there purposes) libcurl would be better. (its hard to get some libraries for some compilers working on windows)

Sturm 270 Veteran Poster · Answer 13 · 2007-05-08T06:41:10+00:00

on a linux system libcurl would be a better option since it is easy to use libraries on a linux system, on windows its near impossible! That is why I recommended the use of wget.

CodyOebel -2 Junior Poster in Training · Answer 14 · 2007-05-08T07:33:35+00:00

Well yes wget does work for windows after all. The only problem is it doesnt work on the IntraNet page I am trying to read from. It's not reading into wget, but every other external internet webpage does !

http://infoget/caav.asp?site=AGT13&sort=&qry=&nav=all

That is the kind of link i am trying to link to lol . I can view that sites source via right click of the mouse. I cannot however view it with wget , unless theres something I am missing.

I understand about the index.html file it saves to the set path so if theres anything else that could help do this same procedure that would be great.

I dont want to have to port the wget.exe all over my network to use it, or when I let a fellow technician run the same software I am trying to stay hard coded as possible.

**once more this is for Windows XP Sp2** not linux, or unix.

Sturm 270 Veteran Poster · Answer 15 · 2007-05-08T18:19:32+00:00

Sturm 270 Veteran Poster

18 Years Ago

Im not exactly sure I understand?

kxh29 0 Junior Poster in Training · Answer 16 · 2007-05-08T19:19:45+00:00

Here is a link which may assist you:

www.webmonkey.com

They have a pretty dedicated site on Web, Web based interfaces,etc...

How To Read A Webpage ?

Recommended Answers Collapse Answers

All 21 Replies

Recommended Answers