Determining line number while tokenizing a file stream

Question

Star Frost 0 Newbie Poster

15 Years Ago

I know this might seem like a repeat of an already existent thread, however give me a moment to explain...

What I have so far is a function that reads in a script file and stores each word in an array of custom objects.

function:

void fileTester::readFile( string tempName )
{
    fileName.append( tempName );
    inFile.open( fileName.c_str() );
    string tempChar;
    int tempx = 0;
    if( !inFile )
    {
        cout << "There was an error opening the script file\n";
        cout << inFile;
    }
    else
    {
        cout << "File found!\n";
    }
    while( inFile.peek() != EOF )
    {
        inFile >> tokens[ tempx ].tokenString;
        tempx++;
    }
    codeLength = tempx;
}

object:

typedef struct
{
    string tokenString;
    int tokenID;
    int lineNumber;
} tokenStruct;
tokenStruct tokens[ 5000 ];

The problem at hand is that I cant find a way to keep the functionality of the entire program while at the same time finding and storing the relevant line number the string was located at on the file in the token objects. The solutions I have tried thus far have resulted in either breaking the words up into bits that the rest of my program cant understand or dropping characters altogether. If you know of a simple alteration that doesn't interfere with the functionality of the rest of my program I would greatly appreciate it.

c++ file-stream file-system

3 Contributors
7 Replies
187 Views
4 Days Discussion Span
Latest Post 15 Years Ago Latest Post by Star Frost

All 7 Replies

Clinton Portis 211 Practically a Posting Shark

15 Years Ago

finding and storing the relevant line number the string was located at on the file in the token objects.

If I understand you correctly, you seem to be in need of keeping track of what line you are at whilst reading from a .txt file.

So, I will suggest a way to keep track of what line ye' are reading from.

It looks like you are >> streaming in single words at a time from the .txt file... I would suggest using the<string> class function called getline() instead, so you can keep a counter associated with how many times ye' have called getline() and thusly know which line ye' are on.

This is what you have:

while( inFile.peek() != EOF )
{

     inFile >> tokens[ tempx ].tokenString;

     tempx++;

}

Try changing to this:

int tempx = 0;

while(getline(infile, tokens[ tempx ].tokenString))
{    
     //Here 'tempx' can also be your "line counter"
     tempx++;
}

Now you have tempx as a line counter, refer to it as necessary to determine what line ye' are on.

But I bet you are asking yourself, "that's great clinton portis.. but now my token array contains entire lines instead of individual words."

So, to get around this, you can use <string> member functions to return individual words from your array (such as find(), find_first of(), and substr()). Or you can perform array operations on your token array to extract individual words. you could use strtok() (if you remember to use the .cstr() member to return as a c-string.) In summary, there are many ways to parse the words you need from an entire line.

Edited 15 Years Ago by Clinton Portis because: n/a

Tom Gunn 1,164 Practically a Master Poster

15 Years Ago

I just cant seem to fathom the last part of each line is being completely ignored...

I know the answer to this problem because it is a problem I create myself on a regular basis. :) On the last token, finish will be string::npos . After the last token, start should be string::npos so that you do not lose the last token. That means the loop should test on start , and you need to take care not to wrap beyond npos :

start = 0;
finish = tempToken[ x ].find_first_of( " ", start );
while( start != string::npos )
{
    tokens[ y ].tokenString = tempToken[ x ].substr( start, finish - start );
    tokens[ y ].lineNumber = x + 1;
    start = (finish != string::npos) ? finish + 1 : string::npos;
    finish = tempToken[ x ].find_first_of( " ", start );
    y++;
}

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Star Frost 0 Newbie Poster · Answer 1 · 2009-10-26T23:28:41+00:00

...So, to get around this, you can use <string> member functions to return individual words from your array (such as find(), find_first of(), and substr()). Or you can perform array operations on your token array to extract individual words. you could use strtok() (if you remember to use the .cstr() member to return as a c-string.) In summary, there are many ways to parse the words you need from an entire line.

Well I tried your suggestion but I will be the first to admit that I don't know as much as I should about the string library. So far I have retooled the function to look like this:

void fileTester::readFile( string tempName )
{
    fileName.append( tempName );
    inFile.open( fileName.c_str() );
    string tempChar;
    string tempToken[ 500 ];
    int tempx = 0;
    int tempy = 0;
    string tempWord = "";
    int x, y = 0;
    int start = 0;
    int finish = 0;
    
    if( !inFile )
    {
        // blah blah blah
    }
    else
    {
        // and more blah
    }

    while( getline( inFile, tempToken[ tempy ] ) )
    {
        tempy++;
    }
    totalLines = tempy;
    
    for( x = 0; x < totalLines; x++ )
    {
        start = 0;
        finish = tempToken[ x ].find_first_of( " ", start );
        while( finish != string::npos )
        {
            tokens[ y ].tokenString = tempToken[ x ].substr( start, finish - start );
            tokens[ y ].lineNumber = x + 1;
            start = finish + 1;
            finish = tempToken[ x ].find_first_of( " ", finish + 1 );
            y++;
        }
    }
    codeLength = y;
}

I must be doing something wrong because an input of:
int var1 = 0
int var2 = 2

float var4 = 5.6

results in:
tokens[ 0 ].tokenString = 'int'
tokens[ 0 ].lineNumber = 1
tokens[ 1 ].tokenString = 'var1'
tokens[ 1 ].lineNumber = 1
tokens[ 2 ].tokenString = '='
tokens[ 2 ].lineNumber = 1
tokens[ 3 ].tokenString = 'int'
tokens[ 3 ].lineNumber = 2
tokens[ 4 ].tokenString = 'var2'
tokens[ 4 ].lineNumber = 2
tokens[ 5 ].tokenString = '='
tokens[ 5 ].lineNumber = 2
tokens[ 6 ].tokenString = 'float'
tokens[ 6 ].lineNumber = 4
tokens[ 7 ].tokenString = 'var4'
tokens[ 7 ].lineNumber = 4
tokens[ 8 ].tokenString = '='
tokens[ 8 ].lineNumber = 4

As you can plainly see the line numbers work quite well, however for reasons that I just cant seem to fathom the last part of each line is being completely ignored... I am sure that I must be making some silly mistake, but I cant seem to figure out what.

Star Frost 0 Newbie Poster · Answer 2 · 2009-10-27T02:32:55+00:00

I know the answer to this problem...

That does indeed seem to do the trick! I especially liked the start = ( finish != string::npos ) ? finish + 1 : string::npos; part, that is something I was never taught how to do but now I think I am going to look for places to use it. However another problem has presented itself, one that was lost among all the others. Some of the tokens are being stored with blank spaces proceeding them.
For instance:

while( var1 >= 0 )
{
var1 --
}

The 6th element in this case is stored as " var1" (not sure how to show it but there should be several spaces before var1.)

I tried to fix this by changing the start = 0; in the beginning of the for loop to start = tempToken[ x ].find_first_not_of( " " ); but it doesnt seem to help in the least... I am sure its just my inexperience with string.h again but this one makes little sense to me since its just one function that SHOULD work...

Star Frost 0 Newbie Poster · Answer 3 · 2009-10-29T21:55:49+00:00

Star Frost 0 Newbie Poster

15 Years Ago

bump

Tom Gunn 1,164 Practically a Master Poster · Answer 4 · 2009-10-29T22:29:42+00:00

Some of the tokens are being stored with blank spaces proceeding them.

That sounds like you need to skip leading whitespace at the token level. When resetting start, instead of incrementing it, walk it over all whitespace. The following is a stream of consciousness. It is meant to spark your imagination, not to be an actual plugin for your code:

// start = (finish != string::npos) ? finish + 1 : string::npos;
if ((start = finish) != string::npos)
{
    while (start < str.length() && isspace(str[start])) ++start;
    if (start == str.length()) start = string::npos;
}

Star Frost 0 Newbie Poster · Answer 5 · 2009-10-30T01:23:11+00:00

Thanks for all the help! With some tweaking and fiddling I managed to get the exact results I was looking for.

Determining line number while tokenizing a file stream

Recommended Answers Collapse Answers

All 7 Replies

Recommended Answers