How To Read A Webpage ?

Question

generalGOTCHA 0 Newbie Poster

16 Years Ago

I need to read in a webpage. Any places to start ?

c++

8 Contributors
21 Replies
768 Views
4 Days Discussion Span
Latest Post 16 Years Ago Latest Post by kxh29

All 21 Replies

ndeniche 402 Posting Virtuoso

16 Years Ago

could you be a little more specific?

ndeniche 402 Posting Virtuoso

16 Years Ago

have you tried posting your question @ Web Development instead of @ software development?

ndeniche 402 Posting Virtuoso

16 Years Ago

i guess maybe web designers will have more experience in intranet and web pages... i mean... i get the idea... but beyond that, i'm blindfolded

Rashakil Fol 978 Super Senior Demiposter

16 Years Ago

The easiest way would be to fork off a call to wget and pipe back its output.

Sturm 270 Veteran Poster

16 Years Ago

Its easy! (on linux) First invoke the command:

wget www.yourstuff.com

it will create a file called index.html then use ifstream to read the file.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

generalGOTCHA 0 Newbie Poster · Answer 1 · 2007-05-04T09:07:28+00:00

well theres an intranet page that I want to grab. Basically it holds a list of names. I am going to read all those names and perform commands on them. I just need to know how to start streaming from the page. Or which route would be the best to basically read a webpages contents. Almost if I wanted to read its contents and then paste it to a notepad so I can start indexing stuff out of it to leave only what I want remaining (names) to perform other stuff on.

CodyOebel -2 Junior Poster in Training · Answer 2 · 2007-05-04T09:32:26+00:00

Nope because I am not web designing.
lol..
hmmmm ... maybe I do need to look into some ole html LOL.
Alll I need to do is get everything from the website into a normal view source format and voila.. I can start manipulating stuff.

CodyOebel -2 Junior Poster in Training · Answer 3 · 2007-05-04T09:40:27+00:00

btw im the same person .. if you dont already know lo..
Any suggestions would be great but I am still searching.

Salem 5,138 Posting Sage · Answer 4 · 2007-05-05T11:48:24+00:00

Salem 5,138 Posting Sage

16 Years Ago

libcurl

CodyOebel -2 Junior Poster in Training · Answer 5 · 2007-05-07T14:32:26+00:00

Its easy! (on linux) First invoke the command:
wget www.yourstuff.com
it will create a file called index.html then use ifstream to read the file.

wget www.mystuff.com

??????

Ok a few questions. First off, I am windows programming does that make a difference with wget ???

Second if windows is not going to be a problem and I can still use wget , then how do I get the information from it.

Like once I use wget www.myIntraNetPage.com how do I get it to save the contents in the hard drive or anywhere for that matter so I can begin manipulating the data.

Please if you could post a small example of this !
Like get this webpage it saves the whole source to this destination .
end of story.

I am very anxious to use this ability.

Sturm 270 Veteran Poster · Answer 6 · 2007-05-08T00:26:41+00:00

you have to use linux. Maybe you could use a virtual machine? Maybe its for windows but i do not know. wget <whatever> it will take that webpage and store it as a file called index.html by default. Then you can read the file with ifstream.

Sturm 270 Veteran Poster · Answer 7 · 2007-05-08T00:27:53+00:00

yes i just found out it is for windows see:
http://pages.interlog.com/~tcharron/wgetwin.html

Sturm 270 Veteran Poster · Answer 8 · 2007-05-08T00:29:55+00:00

heres a better mirror:
http://gnuwin32.sourceforge.net/packages/wget.htm

Here would be the code to get a webpage and read it.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main () 
{
  string line;

  system("wget www.google.com");

  ifstream file ("index.html");

    while (!myfile.eof())
    {
      getline(file,line);
      cout<<line<<"\n";
    }
  file.close();
  return 0;
}

John A 1,896 Vampirical Lurker Team Colleague · Answer 9 · 2007-05-08T04:41:21+00:00

Somehow, I don't like the solution of using system() calls. Maybe it's because

system() calls are way slower than using libraries or hard-coding the socket calls yourself
Any client machine would be required to have wget installed
It leaves files all over the hard drive
It requires read/write access to the hard disk when in fact it should only need read access
The code that is required to load a webpage using a library like libcurl is really, quite trivial

I dunno, Salem's suggestion just appeals a lot more to me.

Sturm 270 Veteran Poster · Answer 10 · 2007-05-08T06:03:23+00:00

Somehow, I don't like the solution of using system() calls. Maybe it's because
system() calls are way slower than using libraries or hard-coding the socket calls yourself

Any client machine would be required to have wget installed

It leaves files all over the hard drive

It requires read/write access to the hard disk when in fact it should only need read access

The code that is required to load a webpage using a library like libcurl is really, quite trivial

I dunno, Salem's suggestion just appeals a lot more to me.

1. true, you have a point
2. any client would need libcurl
3. its a trivial problem that requires one line of code
4. its irrelevent
5. true, my suggestion is a quick hack (they have there purposes) libcurl would be better. (its hard to get some libraries for some compilers working on windows)

John A 1,896 Vampirical Lurker Team Colleague · Answer 11 · 2007-05-08T06:29:55+00:00

>any client would need libcurl
Not if you statically link libcurl.

>its a trivial problem that requires one line of code
Something that you neglected to implement in your last example.

>its irrelevent
How so? If, when the program closes, nothing is changed on the hard drive, why make it so it does in fact need r/w permissions on the disk? Here's a better example: the program's installed on a Linux system in a directory such as /usr/bin . Now what happens when your program trys to save that file?

Sturm 270 Veteran Poster · Answer 12 · 2007-05-08T06:41:10+00:00

on a linux system libcurl would be a better option since it is easy to use libraries on a linux system, on windows its near impossible! That is why I recommended the use of wget.

CodyOebel -2 Junior Poster in Training · Answer 13 · 2007-05-08T07:33:35+00:00

Well yes wget does work for windows after all. The only problem is it doesnt work on the IntraNet page I am trying to read from. It's not reading into wget, but every other external internet webpage does !

http://infoget/caav.asp?site=AGT13&sort=&qry=&nav=all

That is the kind of link i am trying to link to lol . I can view that sites source via right click of the mouse. I cannot however view it with wget , unless theres something I am missing.

I understand about the index.html file it saves to the set path so if theres anything else that could help do this same procedure that would be great.

I dont want to have to port the wget.exe all over my network to use it, or when I let a fellow technician run the same software I am trying to stay hard coded as possible.

**once more this is for Windows XP Sp2** not linux, or unix.

John A 1,896 Vampirical Lurker Team Colleague · Answer 14 · 2007-05-08T08:15:16+00:00

Well yes wget does work for windows after all. The only problem is it doesnt work on the IntraNet page I am trying to read from. It's not reading into wget, but every other external internet webpage does !
http://infoget/caav.asp?site=AGT13&sort=&qry=&nav=all

Probably something to do with the ampersands in the web address. On Linux you would put quotes around the URL or escape it, I'm not sure about similar techniques for Windows.

Sturm 270 Veteran Poster · Answer 15 · 2007-05-08T18:19:32+00:00

Sturm 270 Veteran Poster

16 Years Ago

Im not exactly sure I understand?

kxh29 0 Junior Poster in Training · Answer 16 · 2007-05-08T19:19:45+00:00

Here is a link which may assist you:

www.webmonkey.com

They have a pretty dedicated site on Web, Web based interfaces,etc...

How To Read A Webpage ?

Recommended Answers Collapse Answers

All 21 Replies

Recommended Answers