1,105,271 Community Members

Map words to line number in text file

Member Avatar
Mestika
Newbie Poster
11 posts since Mar 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi everyone,

I'm working on a small project and have run into some problems.
Briefly described, what I'm attempting to do is:

I've a rather large text file with a different sentence and I need to find all the words in each sentence and add them to some sort of index, so that the structure is: key = word and value = line number(s). For example like this:

mistake | 0, 2, 4, 6
sun | 0, 1, 5, 10
rain | 3, 4, 10, 22

I need to make a tool to find the occurrence of a word in a file, so if I search for the word "drum" is say: Drum occur 5 times in line 3, 8, 9, 11, 32

I've played around with some different structures but unfortunately I haven't been successful.

I read my text file as such:

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
while (true) {
     String str = br.readLine();
     if (str == null) break;
     lines.add(str);
}

Any help would be appreciated.

Sincere
- Mestika

Member Avatar
JamesCherrill
... trying to help
10,362 posts since Apr 2008
Reputation Points: 2,081 [?]
Q&As Helped to Solve: 1,749 [?]
Skill Endorsements: 47 [?]
Moderator
Featured
 
0
 

Have a look at HashMap to map words to lists of line numbers, and ArrayList to hold a list of line numbers. All the info is in the usual Java API docs.

Member Avatar
Mestika
Newbie Poster
11 posts since Mar 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi James,

thank you very much for your reply.
I'll take a look at HashMaps and ArrayList. I had thought of using an ArrayList for the indexing but I'll try to read the java Doc.

Member Avatar
jon.kiparsky
Posting Virtuoso
1,837 posts since Jun 2010
Reputation Points: 326 [?]
Q&As Helped to Solve: 192 [?]
Skill Endorsements: 6 [?]
 
0
 

James' solution is the one that I would have suggested. It's a good one if the file is pretty static, which is what I think you're describing.

It occurs to me, though that if the file is dynamic it's a bit hard to keep your lists correct. If the application is a word processor, for example, as soon as someone deletes a sentence on line 57, everything following is out of synch. I'm not sure what I'd do in that situation, but I think recalculating that list each time the file changes would be computationally expensive.
That's an interesting problem to consider, and you might learn some useful things by trying to solve it. (So might I, and I think I might)

Member Avatar
Mestika
Newbie Poster
11 posts since Mar 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hey Jon,

thank you for your response. I'm certainly trying to follow the suggestion made my you and James with the Hashmap and ArrayList.

You are right that it is static files I'm dealing with (right now anyway), but you are right that it could raise a potential problem and thus a interesting puzzle to solve how to deal with an dynamically changed file.

I think though that I'll start on concentrating on the static file in first round.

Member Avatar
dononelson
Junior Poster in Training
62 posts since Mar 2010
Reputation Points: 2 [?]
Q&As Helped to Solve: 15 [?]
Skill Endorsements: 0 [?]
 
0
 

Just as a matter of convenience, there is a LineNumberReader available that will keep track of what line you are on - not sure if you need it, you may be just keeping your own counter, but I thought I'd bring it up.

Member Avatar
JamesCherrill
... trying to help
10,362 posts since Apr 2008
Reputation Points: 2,081 [?]
Q&As Helped to Solve: 1,749 [?]
Skill Endorsements: 47 [?]
Moderator
Featured
 
0
 

The good news is that if you need to change to some other means of tracking the lines / locations you will just change the Objects stored in the ArrayList; the HashMap side of things shouldn't be affected.

In fact, if I thought that was likely to happen I'd start out from day 0 with a class (eg Location) that, in its first iteration just holds a line number, but could always be improved.

You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: