Hello,
I need to write a function that would take a string and break it up into tokens (ie words, numbers, and punctuation). First it needs to break up the string and store them into an array called Token[]. Obviously a for loop needs to written however should I just make the rest into if statements breaking it up.

bool getToken(char Block[], char Token[], int &TokenType, &pos);
{
    for (i = 0; i < strlen(Block); i++)
    {
        if (Block [i] == ' ')
            continue;
        else if ((Block[i] < 'z' && Block[i] > 'a') || (Block[i] < 'Z' && Block[i] > 'A'))
            for (j = 0; j )

            strcpy(Token[j], )
    }
}

Recommended Answers

All 16 Replies

Could you not just split the string using a space as delimiter?

What if you have something like "3rd" or "LOL!!!"

Look you the is***() functions -- isalpha() isdigit() ispunct() etc...

They are in the ctype (or ctypes, I can never remember) header

Thats not a problem, right now I am having a problem taking a Block (ie Block[] = "This block is #1!!!) and breaking it up into a token array:
ie:
Token[0] = This
Token[1] = block
Token[2] = is
Token[3] = #
Token[4] = 1
Token[5] = !
Token[6] = !
Token[7] = !

I can't seem to get my for loops right for the transfer.

    for (int i = 0; i < strlen(Block); i++)
    {
        if (Block [i] == ' ')
            continue;
        else if ((Block[i] < 'z' && Block[i] > 'a') || (Block[i] < 'Z' && Block[i] > 'A'))
            for (int j = 0; j < 15; j++)
            {
                strcpy(Token[i][j], Block[i]);
            }
    }

1) you can't use strcpy(). You are dealing with substring and this function is for full strings.
2) look up the functions I mentioned. They will make your life easier. Unless, of course, you like to string 15 conditionals inside 1 IF statement.

As walptP said, use isaplha(), isdigit() functions.

    for (int i = 0; i < strlen(Block); i++)
    {
              if (isalpha(Block[i]))
              {
                 while(isalpha(Block[i]))
                 {
                    /* store is in a variable */

                    i++;

                 }
              }

              else if( isdigit(Block[i]))
              {
                  while(isdigit(Block[i]))
                  {

                     /* store it in a variable */

                     i++;
                  }
              }

              /* Create a function thats checks for special characters like '.',';' etc
    }

Ok so I kinda got it but I am now having problems with the output. I don't know whats going on in the copying process.

    char Block[]="Earth is planet #3 from the Sun!!";

    while (Block[i])
    {
        if (isalpha(Block[i]))
        {
            Token[j][k] = Block [i];
            k++;
            Type[j] = 1;
        }
        else
        {
            numUsed++;
            Token[j][k+1]= '�';
            j++;
            if (isalnum(Block[i]))
            {
                k = 0;
                Token[j][k] = Block[i];
                k++;    
                Type[j] = 2;
            }
            if (ispunct(Block[i]))
            {
                k = 0;
                Token[j][k] = Block[i];
                k++;    
                Type[j] = 3;
            }
            if(Block[i] = ' ')
            {
                k = 0;      
            }
        }
        i++;
    }

Output:

Earthv [1]
is [1]
planet  [1]
 [0]
# [3]
3 [2]
fromx [1]
the [1]
Sun█ [1]
! [3]
! [3]

I don't why after some of the Token it adds character/letters. BTW the numbers in the bracket simply represent whether the token is Word[1] Number[2] Punctuation[3].

How about copying all of a type in one go:

    while (Block[i])
    {
        while (isalpha(Block[i]))
        {
            Token[j][k] = Block [i];
            k++;
            i++;
            Token[j][k] = '\0';  // be sure to 'end the string'
            Type[j] = 1;
        }

This will copy the entire word up to the first non-alpha character and leave you set at the following character.

A slightly cleaner option:

    while (Block[i])
    {
        if (isalpha(Block[i]))
        {
            k = 0;
            while (isalpha(Block[i]))
            {
                Token[j][k++] = Block [i++];
            }
            Token[j][k] = '\0';  // be sure to 'end the string'
            Type[j] = 1;
        }

A slightly cleaner option:

This is what I was saying. It will copy words untill a non alpha character is encountered.

@WaltP thank you very much it works well just one last thing when I change the Block to "He came in 1st!!" the output is:

/*
He [1]
came [1]
in [1]
 [2293500]
st [1]
! [3]
! [3]
*/

My code is:

    char Block[]="He came in 1st!!";

    while (Block[i])
    {
        while (isalpha(Block[i]))
        {
            Token[j][k] = Block [i];
            k++;
            i++;
            Token[j][k] = '\0'; 
            Type[j] = 1;
        }
        numUsed++;
        j++;
        if (isalnum(Block[i]))
        {
            k = 0;
            Token[j][k] = Block[i];
            k++;    
            Type[j] = 2;
        }
        if (ispunct(Block[i]))
        {
            k = 0;
            Token[j][k] = Block[i];
            k++;    
            Type[j] = 3;
        }
        if(Block[i] = ' ')
        {
            k = 0;      
        }
        Token[j][k+1]= '\0';
        i++;
    }

I am thinking the problem is that the j is getting plus plused in the wrong spot...

Have you tried this way ?

if (isalnum(Block[i]))
{
   j++;
   k = 0;
   Token[j][k] = Block[i];
   k++;
   Type[j] = 2;
}

else if (ispunct(Block[i]))
{
   j++;
   k = 0;
   Token[j][k] = Block[i];
   k++;
   Type[j] = 3;
}
else if(Block[i] = ' ')
{
   k = 0;
}

Token[j][k+1]= '\0';

i++;

}

Why don't you use the same technique i showed you for SPACEs, Numbers, and Punctuation? What I gave you was an example, not a complete answer. What if your 'sentence' is

Jonn Jonzz    2314 54th Street    High Plains, Mars --- out for lunch!!!

@np_complete that doesn't work because you are localizing the j++ and it doesn't work for the words.

@WaltP The way you showed me doesn't j++ so it would just keep replacing the same array

@WaltP The way you showed me doesn't j++ so it would just keep replacing the same array

I wonder if there's a fix for that. I'm not going to do all your work. You still need to think. That's part of programming...

Ok so thank you all for your help, I got it to work, however, when I want to to split my program into functions it doesn't seem to work right. I don't know what I am doing wrong, there is no output, its blank:

#include <cstdlib>
#include <iostream>
#include <cstring>

using namespace std;

int pos = 0, numUsed = 0;
const int MAX_ROWS = 50;
const int MAX_NAME_LENGTH = 101;

char getToken(char Token[][MAX_NAME_LENGTH], char Block[], int Type[MAX_ROWS], int numUsed, int pos);
void Show(char Token[][MAX_NAME_LENGTH], char Block[], int Type[MAX_ROWS], int numUsed);

int main()
{
    int Type[MAX_ROWS];
    char Block[]="This is the 3rd time I am typing this. lol!!";
    char Token[MAX_ROWS][MAX_NAME_LENGTH];

    while (Block[pos])
    {
        getToken(Token, Block, Type, numUsed, pos);
    }
    Show(Token, Block, Type, numUsed);

    return 0;
}

char getToken(char Token[][MAX_NAME_LENGTH], char Block[], int Type[MAX_ROWS], int numUsed, int pos)
{
    int j=0, k=0;
    if (isalpha(Block[pos]))
    {
        k=0;
        while (isalpha(Block[pos]))
        {
            Token[j][k++] = Block [pos++];
        }
        Token[j][k] = '\0'; //end the string
        Type[j] = 1;
        j++;
        numUsed++;
    }
    else if (isalnum(Block[pos]))
    {
        k=0;
        Token[j][k] = Block [pos];
        Token[j][k + 1] = '\0'; //end the string
        Type[j] = 2;
        pos++;
        j++;
        numUsed++;
    }
    else if (ispunct(Block[pos]))
    {
        k=0;
        Token[j][k] = Block [pos];
        Token[j][k + 1] = '\0'; //end the string
        Type[j] = 3;
        pos++;
        j++;
        numUsed++;
    }
    else
    {
        pos++;
    }
}
void Show(char Token[][MAX_NAME_LENGTH], char Block[], int Type[MAX_ROWS], int numUsed)
{
    for (int i = 0; i < numUsed; i++)
    {
        cout << Token[i] << " [" << Type [i] << "]" << endl;
    }   
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.