954,510 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Map words to line number in text file

Hi everyone,

I'm working on a small project and have run into some problems.
Briefly described, what I'm attempting to do is:

I've a rather large text file with a different sentence and I need to find all the words in each sentence and add them to some sort of index, so that the structure is: key = word and value = line number(s). For example like this:

mistake | 0, 2, 4, 6
sun | 0, 1, 5, 10
rain | 3, 4, 10, 22

I need to make a tool to find the occurrence of a word in a file, so if I search for the word "drum" is say: Drum occur 5 times in line 3, 8, 9, 11, 32

I've played around with some different structures but unfortunately I haven't been successful.

I read my text file as such:

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
while (true) {
     String str = br.readLine();
     if (str == null) break;
     lines.add(str);
}


Any help would be appreciated.

Sincere
- Mestika

Mestika
Newbie Poster
11 posts since Mar 2010
Reputation Points: 10
Solved Threads: 0
 

Have a look at HashMap to map words to lists of line numbers, and ArrayList to hold a list of line numbers. All the info is in the usual Java API docs.

JamesCherrill
Posting Genius
Moderator
6,370 posts since Apr 2008
Reputation Points: 2,130
Solved Threads: 1,073
 

Hi James,

thank you very much for your reply.
I'll take a look at HashMaps and ArrayList. I had thought of using an ArrayList for the indexing but I'll try to read the java Doc.

Mestika
Newbie Poster
11 posts since Mar 2010
Reputation Points: 10
Solved Threads: 0
 

James' solution is the one that I would have suggested. It's a good one if the file is pretty static, which is what I think you're describing.

It occurs to me, though that if the file is dynamic it's a bit hard to keep your lists correct. If the application is a word processor, for example, as soon as someone deletes a sentence on line 57, everything following is out of synch. I'm not sure what I'd do in that situation, but I think recalculating that list each time the file changes would be computationally expensive.
That's an interesting problem to consider, and you might learn some useful things by trying to solve it. (So might I, and I think I might)

jon.kiparsky
Posting Virtuoso
1,849 posts since Jun 2010
Reputation Points: 383
Solved Threads: 187
 

Hey Jon,

thank you for your response. I'm certainly trying to follow the suggestion made my you and James with the Hashmap and ArrayList.

You are right that it is static files I'm dealing with (right now anyway), but you are right that it could raise a potential problem and thus a interesting puzzle to solve how to deal with an dynamically changed file.

I think though that I'll start on concentrating on the static file in first round.

Mestika
Newbie Poster
11 posts since Mar 2010
Reputation Points: 10
Solved Threads: 0
 

Just as a matter of convenience, there is a LineNumberReader available that will keep track of what line you are on - not sure if you need it, you may be just keeping your own counter, but I thought I'd bring it up.

dononelson
Junior Poster in Training
62 posts since Mar 2010
Reputation Points: 13
Solved Threads: 15
 

The good news is that if you need to change to some other means of tracking the lines / locations you will just change the Objects stored in the ArrayList; the HashMap side of things shouldn't be affected.

In fact, if I thought that was likely to happen I'd start out from day 0 with a class (eg Location) that, in its first iteration just holds a line number, but could always be improved.

JamesCherrill
Posting Genius
Moderator
6,370 posts since Apr 2008
Reputation Points: 2,130
Solved Threads: 1,073
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: