We're a community of 1077K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,076,272 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

String analysis

Hello,
I need to write a function that would take a string and break it up into tokens (ie words, numbers, and punctuation). First it needs to break up the string and store them into an array called Token[]. Obviously a for loop needs to written however should I just make the rest into if statements breaking it up.

bool getToken(char Block[], char Token[], int &TokenType, &pos);
{
    for (i = 0; i < strlen(Block); i++)
    {
        if (Block [i] == ' ')
            continue;
        else if ((Block[i] < 'z' && Block[i] > 'a') || (Block[i] < 'Z' && Block[i] > 'A'))
            for (j = 0; j )

            strcpy(Token[j], )
    }
}
4
Contributors
16
Replies
2 Days
Discussion Span
8 Months Ago
Last Updated
17
Views
Question
Answered
ShEeRMiLiTaNt
Light Poster
49 posts since Mar 2012
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Could you not just split the string using a space as delimiter?

Suzie999
Posting Whiz
319 posts since Jul 2010
Reputation Points: 49
Solved Threads: 15
Skill Endorsements: 0

What if you have something like "3rd" or "LOL!!!"

ShEeRMiLiTaNt
Light Poster
49 posts since Mar 2012
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Look you the is***() functions -- isalpha() isdigit() ispunct() etc...

They are in the ctype (or ctypes, I can never remember) header

WaltP
Posting Sage w/ dash of thyme
Team Colleague
11,404 posts since May 2006
Reputation Points: 3,421
Solved Threads: 1,055
Skill Endorsements: 37

Thats not a problem, right now I am having a problem taking a Block (ie Block[] = "This block is #1!!!) and breaking it up into a token array:
ie:
Token[0] = This
Token[1] = block
Token[2] = is
Token[3] = #
Token[4] = 1
Token[5] = !
Token[6] = !
Token[7] = !

I can't seem to get my for loops right for the transfer.

    for (int i = 0; i < strlen(Block); i++)
    {
        if (Block [i] == ' ')
            continue;
        else if ((Block[i] < 'z' && Block[i] > 'a') || (Block[i] < 'Z' && Block[i] > 'A'))
            for (int j = 0; j < 15; j++)
            {
                strcpy(Token[i][j], Block[i]);
            }
    }
ShEeRMiLiTaNt
Light Poster
49 posts since Mar 2012
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

1) you can't use strcpy(). You are dealing with substring and this function is for full strings.
2) look up the functions I mentioned. They will make your life easier. Unless, of course, you like to string 15 conditionals inside 1 IF statement.

WaltP
Posting Sage w/ dash of thyme
Team Colleague
11,404 posts since May 2006
Reputation Points: 3,421
Solved Threads: 1,055
Skill Endorsements: 37

Seems you already have a thread running on this exact subject
with some decent advice in it.

http://www.daniweb.com/software-development/cpp/threads/416254/char-array-to-string-array

Suzie999
Posting Whiz
319 posts since Jul 2010
Reputation Points: 49
Solved Threads: 15
Skill Endorsements: 0

As walptP said, use isaplha(), isdigit() functions.

    for (int i = 0; i < strlen(Block); i++)
    {
              if (isalpha(Block[i]))
              {
                 while(isalpha(Block[i]))
                 {
                    /* store is in a variable */

                    i++;

                 }
              }

              else if( isdigit(Block[i]))
              {
                  while(isdigit(Block[i]))
                  {

                     /* store it in a variable */

                     i++;
                  }
              }

              /* Create a function thats checks for special characters like '.',';' etc
    }
np complete
Posting Whiz
385 posts since Sep 2010
Reputation Points: 18
Solved Threads: 36
Skill Endorsements: 0

Ok so I kinda got it but I am now having problems with the output. I don't know whats going on in the copying process.

    char Block[]="Earth is planet #3 from the Sun!!";

    while (Block[i])
    {
        if (isalpha(Block[i]))
        {
            Token[j][k] = Block [i];
            k++;
            Type[j] = 1;
        }
        else
        {
            numUsed++;
            Token[j][k+1]= '�';
            j++;
            if (isalnum(Block[i]))
            {
                k = 0;
                Token[j][k] = Block[i];
                k++;    
                Type[j] = 2;
            }
            if (ispunct(Block[i]))
            {
                k = 0;
                Token[j][k] = Block[i];
                k++;    
                Type[j] = 3;
            }
            if(Block[i] = ' ')
            {
                k = 0;      
            }
        }
        i++;
    }

Output:

Earthv [1]
is [1]
planet  [1]
 [0]
# [3]
3 [2]
fromx [1]
the [1]
Sun█ [1]
! [3]
! [3]

I don't why after some of the Token it adds character/letters. BTW the numbers in the bracket simply represent whether the token is Word[1] Number[2] Punctuation[3].

ShEeRMiLiTaNt
Light Poster
49 posts since Mar 2012
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

How about copying all of a type in one go:

    while (Block[i])
    {
        while (isalpha(Block[i]))
        {
            Token[j][k] = Block [i];
            k++;
            i++;
            Token[j][k] = '\0';  // be sure to 'end the string'
            Type[j] = 1;
        }

This will copy the entire word up to the first non-alpha character and leave you set at the following character.

A slightly cleaner option:

    while (Block[i])
    {
        if (isalpha(Block[i]))
        {
            k = 0;
            while (isalpha(Block[i]))
            {
                Token[j][k++] = Block [i++];
            }
            Token[j][k] = '\0';  // be sure to 'end the string'
            Type[j] = 1;
        }
WaltP
Posting Sage w/ dash of thyme
Team Colleague
11,404 posts since May 2006
Reputation Points: 3,421
Solved Threads: 1,055
Skill Endorsements: 37

A slightly cleaner option:

This is what I was saying. It will copy words untill a non alpha character is encountered.

np complete
Posting Whiz
385 posts since Sep 2010
Reputation Points: 18
Solved Threads: 36
Skill Endorsements: 0

@WaltP thank you very much it works well just one last thing when I change the Block to "He came in 1st!!" the output is:

/*
He [1]
came [1]
in [1]
 [2293500]
st [1]
! [3]
! [3]
*/

My code is:

    char Block[]="He came in 1st!!";

    while (Block[i])
    {
        while (isalpha(Block[i]))
        {
            Token[j][k] = Block [i];
            k++;
            i++;
            Token[j][k] = '\0'; 
            Type[j] = 1;
        }
        numUsed++;
        j++;
        if (isalnum(Block[i]))
        {
            k = 0;
            Token[j][k] = Block[i];
            k++;    
            Type[j] = 2;
        }
        if (ispunct(Block[i]))
        {
            k = 0;
            Token[j][k] = Block[i];
            k++;    
            Type[j] = 3;
        }
        if(Block[i] = ' ')
        {
            k = 0;      
        }
        Token[j][k+1]= '\0';
        i++;
    }

I am thinking the problem is that the j is getting plus plused in the wrong spot...

ShEeRMiLiTaNt
Light Poster
49 posts since Mar 2012
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Have you tried this way ?

if (isalnum(Block[i]))
{
   j++;
   k = 0;
   Token[j][k] = Block[i];
   k++;
   Type[j] = 2;
}

else if (ispunct(Block[i]))
{
   j++;
   k = 0;
   Token[j][k] = Block[i];
   k++;
   Type[j] = 3;
}
else if(Block[i] = ' ')
{
   k = 0;
}

Token[j][k+1]= '\0';

i++;

}
np complete
Posting Whiz
385 posts since Sep 2010
Reputation Points: 18
Solved Threads: 36
Skill Endorsements: 0

Why don't you use the same technique i showed you for SPACEs, Numbers, and Punctuation? What I gave you was an example, not a complete answer. What if your 'sentence' is

Jonn Jonzz    2314 54th Street    High Plains, Mars --- out for lunch!!!
WaltP
Posting Sage w/ dash of thyme
Team Colleague
11,404 posts since May 2006
Reputation Points: 3,421
Solved Threads: 1,055
Skill Endorsements: 37

@np_complete that doesn't work because you are localizing the j++ and it doesn't work for the words.

@WaltP The way you showed me doesn't j++ so it would just keep replacing the same array

ShEeRMiLiTaNt
Light Poster
49 posts since Mar 2012
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

@WaltP The way you showed me doesn't j++ so it would just keep replacing the same array

I wonder if there's a fix for that. I'm not going to do all your work. You still need to think. That's part of programming...

WaltP
Posting Sage w/ dash of thyme
Team Colleague
11,404 posts since May 2006
Reputation Points: 3,421
Solved Threads: 1,055
Skill Endorsements: 37

Ok so thank you all for your help, I got it to work, however, when I want to to split my program into functions it doesn't seem to work right. I don't know what I am doing wrong, there is no output, its blank:

#include <cstdlib>
#include <iostream>
#include <cstring>

using namespace std;

int pos = 0, numUsed = 0;
const int MAX_ROWS = 50;
const int MAX_NAME_LENGTH = 101;

char getToken(char Token[][MAX_NAME_LENGTH], char Block[], int Type[MAX_ROWS], int numUsed, int pos);
void Show(char Token[][MAX_NAME_LENGTH], char Block[], int Type[MAX_ROWS], int numUsed);

int main()
{
    int Type[MAX_ROWS];
    char Block[]="This is the 3rd time I am typing this. lol!!";
    char Token[MAX_ROWS][MAX_NAME_LENGTH];

    while (Block[pos])
    {
        getToken(Token, Block, Type, numUsed, pos);
    }
    Show(Token, Block, Type, numUsed);

    return 0;
}

char getToken(char Token[][MAX_NAME_LENGTH], char Block[], int Type[MAX_ROWS], int numUsed, int pos)
{
    int j=0, k=0;
    if (isalpha(Block[pos]))
    {
        k=0;
        while (isalpha(Block[pos]))
        {
            Token[j][k++] = Block [pos++];
        }
        Token[j][k] = '\0'; //end the string
        Type[j] = 1;
        j++;
        numUsed++;
    }
    else if (isalnum(Block[pos]))
    {
        k=0;
        Token[j][k] = Block [pos];
        Token[j][k + 1] = '\0'; //end the string
        Type[j] = 2;
        pos++;
        j++;
        numUsed++;
    }
    else if (ispunct(Block[pos]))
    {
        k=0;
        Token[j][k] = Block [pos];
        Token[j][k + 1] = '\0'; //end the string
        Type[j] = 3;
        pos++;
        j++;
        numUsed++;
    }
    else
    {
        pos++;
    }
}
void Show(char Token[][MAX_NAME_LENGTH], char Block[], int Type[MAX_ROWS], int numUsed)
{
    for (int i = 0; i < numUsed; i++)
    {
        cout << Token[i] << " [" << Type [i] << "]" << endl;
    }   
}
ShEeRMiLiTaNt
Light Poster
49 posts since Mar 2012
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
Question Answered as of 7 Months Ago by WaltP, np complete and Suzie999

This question has already been solved: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page rendered in 0.1153 seconds using 2.79MB