I am wanting to make a search for a website, and I have a plan on how to do it but I am not sure on how to go about indexing the site. My idea is to run a script through the pages that parses everything between the <p> and </p> tags and places each paragraph separately into a database.

I don't really know what function would help me do this, if anyone could just point me in the right direction on what I should be reading to make this happen it would be very helpful.

Recommended Answers

All 3 Replies

So you want to make a search engine lol. There is a current topic about a simular thing at http://www.daniweb.com/forums/thread200918.html
It talks about the steps to making a bot controlled search engine and even has attached to it some sample scripts. But be aware that there are 2 types of search engines. Bot controlled search engines and database lookup search engines as I call them. The difference - bot controlled search engines have a bot that scan for the data while the other looks up the raw data from or original mysql source live on the search which isn't commonly used. Hope the above links helps.

I decided to go with Sphider due to some time constraints on this project. Its open source and seems pretty easy to install and use. Thanks for all the help though!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.