0

Hi there, I'm thinking of creating a mini comparision website like money supermarket or gocompare but have no clue as to how these sites manage to get all the information in the first place.

My first thought is web services and something similar to rss feeds but this is only my initial thought.

Any help would be appreciated :)

2
Contributors
3
Replies
4
Views
6 Years
Discussion Span
Last Post by griswolf
0

There are basically three techniques:

  1. You can ask users for feedback on things
  2. You can use a web service or RSS feed or the like if it is available
  3. You can write a (well behaved) web crawler and scrape the data for yourself

Of course, you can use any or all of these techniques together.

It is important that if you try option 2 that you obey all the rules: Avoid areas that are marked to be avoided, make sure that you aren't breaking copyright, etc.

Edited by griswolf: n/a

0

I see, I spose a web crawler option 3 could crawl google search results, for the information and display them in a user friendly manner.

I Hope that's legal.

0

No. Read the Google terms of service: http://www.google.com/accounts/TOS

5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.

On the other hand, Google does have a web services api that you are allowed to use. http://code.google.com/intl/en/more/

The point is that you need to actually research the web sites you want to examine. Getting that information from Google took me maybe 5 minutes. You need to be able to do that for yourself

And you need to understand how important it is: You can be sued. You can be prosecuted for malicious interference if your spider gets out of control and effectively commits a DOS attack, etc.

Edited by griswolf: n/a

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.