Hello everyone,

I have a project which is to find the permuted index of each word in a textfile.
Im restricted from using STL btw.

I read in input.txt and have to output each word and corresponding line number and if it is found multiple times, it will have (numoftimes) it appears.

sample input:
hello there hello where
see you see me

sample output:
hello: 1(2)
there: 1
where: 1
see: 2(2)
... and so on

I want to know if what I plan to do is good or need some hints on how to do it.
First off, is there a way to find out the number of lines of a text file before reading it?

I was planning using a linked list, each node containing a string, and array[numoflines]
and each time I see the same word I update array[numoflines]++

for example if I see "hello", my node would have string as hello and aray[1] = 1
when I see it again it would be array[1] = 2 so I know it appears in line 1 twice.

Is this a good way to start or is not very efficient?

Recommended Answers

All 10 Replies

Member Avatar for iamthwee

Might be able to use the map

#include <iostream>
#include <map>
#include <string>
#include <fstream>
 
int main()
{
  std::ifstream in ( "c:/foo.txt" );

  std::map < std::string, int > count;
  std::string s;

  while ( in >> s )
  {
    ++count[s];
  }
  in.close();

  std::map <std::string, int>::iterator it;
  for ( it = count.begin(); it != count.end(); ++it )
    std::cout << it->first << "  " << it->second << std::endl;

  std::cin.get();
}

>>Im restricted from using STL btw.
That means you might as well be writing a C, not a C++ program because you can't use anything from fstream, iostream, string, vector, map, etc. etc.

With that restriction I would create a structure that contains an int for counter, another int for line number, and char* that contains the word. Each time the program reads a word it needs to find out if it is already in the linked list of those structures. If found, just bump the counter up by one. If not then add a new structure to the linked list.

The word may appear in multiple lines so I cant have only 1 int storing line number.

So linkedlist is the way to go?

>>So linkedlist is the way to go?
That's the way I'd do it. You don't have any idea how many items are in the file so an array is not a very good or efficient alternative. You could use a dymically allocated array but its not very efficient.

commented: I would agree +14

>>So linkedlist is the way to go?
That's the way I'd do it. You don't have any idea how many items are in the file so an array is not a very good or efficient alternative. You could use a dymically allocated array but its not very efficient.

Is this the only way to do it because I can see it being very inefficient.
I would have to compare every word I encounter to see if is in in the linked list, if it is update count, if is not add it to the linked list.

Is this the closest efficent way without using the STL library or there are other ways todo this?

>>or there are other ways todo this
Nope -- there is no magic way to do it. std::map is the ideal way, but you can't use it. If you put the nodes in sorted order then you could use some sort of binary search algorithm to locate the correct node if it exists. But that requires more slightly complex coding. Search google for binary search algorithms because they are fairly popular.

Should I go through the input file and add each word to the linked list, sort them lexicographically. Then go through each word and do binary search algorithm and update its line/count

or I should go as I go through each word. Each word I search the linkedlist, if is in it, update if is not add it to the list.(I would have pointers to the begging and the end)

add the structures in sorted order. When adding a new node search for the word that is less than the current word and insert the new node there. Very similar to finding the insert position of an array of integers. You don't need a sort algorithm to do this.

So as I go through each word, I insert it into the linked list lexicographically.

So it makes the search little faster and at most it will take N time if the word begins with Z.

So I would search at most twice right?
Once to find if it is in the list already, if not, another search to find position to insert.

As for the array for holding the line count.

I was thinking go through the input file to determine number of lines.

Then allocated an array of int of that size to each node.

Would that be a waste of memory/space?
is there a better way to do this?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.