Hi,

I am a newbie to Java and I want some help with designing a simple search engine in java. The program would read from a text file and would display the results. If I searched for some keyword say "star" it should show me something like the following:

Star found
Star belongs to Planets

Here Planet is the Topic and star is the sub-topic. Please help me with this. I started of with string search but I am not able to proceed.

Sample Text File:
1. Planet
bvdbkdbvjkbdjkdbdjbvjdbjdbvdjbvk
1.a star
fkjnfkjnbfjknbfjknfjnbf

Post your code within [code] [/code] tags and specific questions regarding the parts that are troubling you.

Also consider that your sample text file does not give any hint of a hierarchical relationship between those two entries.

I think I recall my professor saying--

"The way search engines work is through Hash-indexing. All you have to do to search is Hash-it-up!"

Or something along those lines.

It might be wise to look up HashMap or HashSet to see how you can locate certain values through indexing, though I've never tried this class myself.

Alternatives would result in using a variety of maps, but you'd most likely have to result to linear searches or other search algorithms that aren't as efficient.

Check to see if the Hash idea will suit your needs. I'll try to experiment on this myself too.

The code below can be used in conjunction with any .txt file or you can use the one I created that should be attached. Input hash.txt so you can see how this program works.

The way it works - much like a search engine or index. Basically the moment it runs into a value that has been tokenized it will use it as a key. If duplicate keys are encountered then the old value that's associated with the key is appended to the new value via String addition. Now you can retrieve the information of both lines if the same text appears in those lines.

I only ready from a file instead of indexing directly from an array because you wanted to create a search engine via a file.

import java.util.*;
import java.io.*;

public class TestHashMap
{
	public static void main(String[] args)
	{
		Hashtable<String, String> ht = new Hashtable<String, String>();
		Scanner kb = new Scanner(System.in);
		System.out.println("Enter the filename that you want to Search values for.");

		BufferedReader br = null;

		try
		{
			br = new BufferedReader(new FileReader(kb.nextLine()));//reads information from the file specified by user input
			System.out.println("The file was read. Processing information, please wait...");

			while(br.ready())//should repeat until there are no more lines to read
			{
				String line = br.readLine();//assigns the line read by the readerr to line
				String[] result = line.split("\\s");//tokenizes the line into seperate strings, based on spaces only

				for(int i = 0; i < result.length; i++)
				{
					if(!ht.containsKey(result[i]))
					{
						ht.put(result[i], line);//assigns a key to the line
					}
					else
					{
						ht.put(result[i], line+"\n"+ht.get(result[i]));//if a key was assigned to a value already we will
																	   //assign the old value to the new value to assosciate with this
		                                 							   //key and emulate an index
					}
				}
			}
		}
		catch(Exception e)
		{
			System.out.println(e);
			System.exit(1);
		}

		System.out.println(ht);

		do
		{
			System.out.println("Enter a value to search for.\n");
			System.out.println(ht.get(kb.nextLine()));
			System.out.println("\nKeep searching? Enter any key to continue, or type <NO> to end the process");
		}while(!kb.nextLine().equalsIgnoreCase("<NO>"));

		try
		{
			br.close();
		}
		catch(Exception e)
		{
			System.out.println(e);
			System.exit(1);
		}

	}//end main
}//end class

Here is an alternative chunk of code using a more lenient approach HashTable<String, ArrayList<String> >

import java.util.*;
import java.io.*;

public class TestHashMap2
{
	public static void main(String[] args)
	{
		Hashtable<String, ArrayList<String> > ht = new Hashtable<String, ArrayList<String> >();
		Scanner kb = new Scanner(System.in);
		System.out.println("Enter the filename that you want to Search values for.");

		BufferedReader br = null;

		try
		{
			br = new BufferedReader(new FileReader(kb.nextLine()));//reads information from the file specified by user input
			System.out.println("The file was read. Processing information, please wait...");

			while(br.ready())//should repeat until there are no more lines to read
			{
				String line = br.readLine();//assigns the line read by the reader to line
				String[] result = line.split("\\s");//tokenizes the line into seperate strings, based on spaces only

				for(int i = 0; i < result.length; i++)
				{
					if(!ht.containsKey(result[i]))
					{
						ArrayList<String> temp = new ArrayList<String>(1);
						temp.add(line);
						ht.put(result[i], temp);//assigns a key to anonymous ArrayList that stores the value
					}
					else
					{
						ArrayList<String> temp = (ArrayList<String>)ht.get(result[i]);//if the key has already been assigned, thats ok
						temp.add(line);                                               //just add the argument to the ArrayList!
					}
				}
			}
		}
		catch(Exception e)
		{
			System.out.println(e);
			System.exit(1);
		}

		System.out.println(ht);

		do
		{
			System.out.println("Enter a value to search for.\n");
			System.out.println(ht.get(kb.nextLine()));
			System.out.println("\nKeep searching? Enter any key to continue, or type <NO> to end the process");
		}while(!kb.nextLine().equalsIgnoreCase("<NO>"));

		try
		{
			br.close();
		}
		catch(Exception e)
		{
			System.out.println(e);
			System.exit(1);
		}

	}//end main
}//end class
Attachments
Once upon a time I saw someone falling down.
Player 1 wins!
Don't you dare get close to me! *slap*
Calling Dick Tracy! Calling Dick Tracy!
I didn't mean Once but Twice!
close the door, you're letting the hot air out!

Thank you so much for the help. I am yet to try this. I will check this and let you know. Thanks again for the help.

This article has been dead for over six months. Start a new discussion instead.