Web Crawler and its uses

Question

BlackPhoenix 0 Junior Poster in Training

15 Years Ago

Hi everybody, I'm interested in creating a web crawler, but can't really settle on what I'd like the program to do. It's more of an exercise in the technology, and expanding it to achieve new, great things.

I am proficient in Python, so I will naturally be using that language, alongside the module urllib2, because I have some experience with it, and it is fantastic for pulling a webpage's source code, which can then be parsed.

So what we have so far:
-Python
-Urllib2 module

I will need to research regex and re-learn it, so that I will be able to create the functions that will handle parsing the page source in order to extract all URLs.

Now this is where my question really comes in. What types of things can I/Should I use a web crawler to do?

Throw at me some really interesting things! Thanks!

python regex

5 Contributors
6 Replies
235 Views
3 Years Discussion Span
Latest Post 11 Years Ago Latest Post by vegaseat

All 6 Replies

jwenting 1,905 duckman

15 Years Ago

The problem is that you just list a few buzzwords yet don't seem to even know what they mean and haven't apparently gone to the trouble of figuring out what they mean.
That shows a lack of interest in doing your own work, which leads to us being disinclined to help you.

vegaseat 1,735 DaniWeb's Hypocrite

11 Years Ago

Ah, just in time for halloween.

A web crawler visits a given URL and retrieves any URLs from the hyperlinks on that page. It visits these URLs and collects more URLs and so on. Kind of spooky.

What you do with these URLs is up to you. You can collect all the images, do data mining, spy, steal information etc.

Edited 11 Years Ago by vegaseat

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

BlackPhoenix 0 Junior Poster in Training · Answer 1 · 2010-02-25T05:02:54+00:00

No comments? Maybe this isn't the best board for me to be discussing this technology in.

Note to Mods: Maybe this should be moved to the Python boards? I'd rather not get in trouble for making a duplicate topic over there. Thanks

Nick Evan 4,005 Industrious Poster Team Colleague Featured Poster · Answer 2 · 2010-02-25T15:13:16+00:00

Nick Evan 4,005 Industrious Poster

15 Years Ago

Note to Mods: Maybe this should be moved to the Python boards?

That's fine with me. Moved.

Edited 15 Years Ago by Nick Evan because: n/a

BlackPhoenix 0 Junior Poster in Training · Answer 3 · 2010-02-25T23:03:19+00:00

Jwenting, not sure what your deal is, but I didn't throw around any buzzwords with no prior knowledge in what they mean. I've been programming in python for some time now, and have worked on some very large projects. I know what python is, I know what modules are, and I know what regex is, so what part of my post exactly did you have a problem with?

As for the point of the topic, all I wanted were some ideas related to: "What types of things can I/Should I use a web crawler to do?"

Please take your elitism elsewhere.

Max_5 0 Newbie Poster · Answer 4 · 2013-10-30T23:28:46+00:00

Do graph-y stuff:

https://www.coursera.org/course/sna

Web Crawler and its uses

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers