1.11M Members

Map words to line number in text file

 
0
 

Hi everyone,

I'm working on a small project and have run into some problems.
Briefly described, what I'm attempting to do is:

I've a rather large text file with a different sentence and I need to find all the words in each sentence and add them to some sort of index, so that the structure is: key = word and value = line number(s). For example like this:

mistake | 0, 2, 4, 6
sun | 0, 1, 5, 10
rain | 3, 4, 10, 22

I need to make a tool to find the occurrence of a word in a file, so if I search for the word "drum" is say: Drum occur 5 times in line 3, 8, 9, 11, 32

I've played around with some different structures but unfortunately I haven't been successful.

I read my text file as such:

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
while (true) {
     String str = br.readLine();
     if (str == null) break;
     lines.add(str);
}

Any help would be appreciated.

Sincere
- Mestika

 
0
 

Have a look at HashMap to map words to lists of line numbers, and ArrayList to hold a list of line numbers. All the info is in the usual Java API docs.

 
0
 

Hi James,

thank you very much for your reply.
I'll take a look at HashMaps and ArrayList. I had thought of using an ArrayList for the indexing but I'll try to read the java Doc.

 
0
 

James' solution is the one that I would have suggested. It's a good one if the file is pretty static, which is what I think you're describing.

It occurs to me, though that if the file is dynamic it's a bit hard to keep your lists correct. If the application is a word processor, for example, as soon as someone deletes a sentence on line 57, everything following is out of synch. I'm not sure what I'd do in that situation, but I think recalculating that list each time the file changes would be computationally expensive.
That's an interesting problem to consider, and you might learn some useful things by trying to solve it. (So might I, and I think I might)

 
0
 

Hey Jon,

thank you for your response. I'm certainly trying to follow the suggestion made my you and James with the Hashmap and ArrayList.

You are right that it is static files I'm dealing with (right now anyway), but you are right that it could raise a potential problem and thus a interesting puzzle to solve how to deal with an dynamically changed file.

I think though that I'll start on concentrating on the static file in first round.

 
0
 

Just as a matter of convenience, there is a LineNumberReader available that will keep track of what line you are on - not sure if you need it, you may be just keeping your own counter, but I thought I'd bring it up.

 
0
 

The good news is that if you need to change to some other means of tracking the lines / locations you will just change the Objects stored in the ArrayList; the HashMap side of things shouldn't be affected.

In fact, if I thought that was likely to happen I'd start out from day 0 with a class (eg Location) that, in its first iteration just holds a line number, but could always be improved.

You
This article has been dead for over six months: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: