943,985 Members | Top Members by Rank

Ad:
  • C++ Discussion Thread
  • Unsolved
  • Views: 893
  • C++ RSS
Oct 27th, 2009
0

Splitting a string into tokens

Expand Post »
I have a string and I need to split the string into tokens

The word can contain and letter/number but can also contain punctation such as brackets

I need to be able to find an occurence of punctuation, copy the string up to that point as a token, copy the puctuation mark as a token etc until it reaches the end of the string

Here is my output for the string dressed(with)
Quote ...
vector 0: dressed
vector 1: (with)
vector 2: dressed(with
vector 3: )
vectors 0 and 3 are correct but the output should be
Quote ...
vector 0: dressed
vector 1: (
vector 2: with
vector 3: )
here is my code
C++ Syntax (Toggle Plain Text)
  1. #include <string>
  2. #include <iostream>
  3. #include <vector>
  4.  
  5. using namespace std;
  6.  
  7. int main()
  8. {
  9. vector <string> vec;
  10. string str = "dressed(with)";
  11. string tmp;
  12.  
  13. char punct[] = {'+','-','*','/','<','=','!','>','{','(',')','}',';',','};
  14.  
  15. for (int i=0; i < sizeof(punct); i++)
  16. {
  17. unsigned int pos = str.find(punct[i], 0);
  18.  
  19. if(pos != string::npos)
  20. {
  21. tmp.assign(str, 0, pos);
  22. vec.push_back(tmp);
  23. tmp.assign(str, pos, pos);
  24. vec.push_back(tmp);
  25. }
  26. }
  27.  
  28. for(int a=0; a < vec.size(); a++)
  29. {
  30. cout << "vector " << a << ": " << vec.at(a) << endl;
  31. }
  32.  
  33. return 0;
  34. }
Reputation Points: 10
Solved Threads: 0
Junior Poster in Training
AdRock is offline Offline
64 posts
since Dec 2008
Oct 27th, 2009
2
Re: Splitting a string into tokens
Your search always starts from position 0, that is why tokens are being duplicated. You need to skip over the extracted tokens as you go. Something like this:
C++ Syntax (Toggle Plain Text)
  1. #include <iostream>
  2. #include <string>
  3. #include <vector>
  4.  
  5. int main()
  6. {
  7. using namespace std;
  8.  
  9. string const punct = "+-*/<=!>{()};,";
  10.  
  11. string str = "dressed(with)";
  12. string::size_type pos = 0;
  13. vector<string> vec;
  14.  
  15. while (pos != string::npos)
  16. {
  17. string::size_type end = str.find_first_of(punct, pos);
  18.  
  19. if (end == pos) end = str.find_first_not_of(punct, pos);
  20.  
  21. vec.push_back(str.substr(pos, end - pos));
  22. pos = end;
  23. }
  24.  
  25. for (int a = 0; a < vec.size(); ++a)
  26. {
  27. cout << "vector " << a << ": " << vec.at(a) << '\n';
  28. }
  29.  
  30. return 0;
  31. }
Reputation Points: 1446
Solved Threads: 135
Practically a Master Poster
Tom Gunn is offline Offline
681 posts
since Jun 2009
Oct 27th, 2009
0
Re: Splitting a string into tokens
If I recall correctly (from a previous thread), the OP also wants each punctuation element to be in its own vector (OP, please clarify), so if the string was "dressed(with))", the OP wants this:
  1. dressed
  2. (
  3. with
  4. )
  5. )

rather than
  1. dressed
  2. (
  3. with
  4. ))

so the OP would have to take it another step or two and further split strings with multiple consecutive punctuation characters into separate strings (again, that's if I was interpreting correctly from earlier threads). But Tom Gunn's code gets you one step closer regardless!
Featured Poster
Reputation Points: 2614
Solved Threads: 687
Posting Expert
VernonDozier is offline Offline
5,375 posts
since Jan 2008
Oct 27th, 2009
0
Re: Splitting a string into tokens
If I recall correctly (from a previous thread), the OP also wants each punctuation element to be in its own vector (OP, please clarify), so if the string was "dressed(with))", the OP wants this:
  1. dressed
  2. (
  3. with
  4. )
  5. )

rather than
  1. dressed
  2. (
  3. with
  4. ))

so the OP would have to take it another step or two and further split strings with multiple consecutive punctuation characters into separate strings (again, that's if I was interpreting correctly from earlier threads). But Tom Gunn's code gets you one step closer regardless!
Yes....that is a good point are you are right. there may be an occurence where that would be needed. Every punctuation mark should be in it's own vector.

How would i rewrite that?
Last edited by AdRock; Oct 27th, 2009 at 11:14 am.
Reputation Points: 10
Solved Threads: 0
Junior Poster in Training
AdRock is offline Offline
64 posts
since Dec 2008
Oct 27th, 2009
0
Re: Splitting a string into tokens
Quote ...
How would i rewrite that?
Give it a try before asking for help. Depending on your experience solving problems, at least an hour to several hours of solid work at it should be the minimum. Honestly, if somebody tells you how to write the code every time, you will not learn anything substantial and you will end up asking for help with everything.
Reputation Points: 1446
Solved Threads: 135
Practically a Master Poster
Tom Gunn is offline Offline
681 posts
since Jun 2009
Oct 27th, 2009
0
Re: Splitting a string into tokens
Thanks

I've just solved another problem I've had for ages and that's splitting the strings of each line into tokens

This is where my current problem leads onto where i need to split each signle string into tokens
Reputation Points: 10
Solved Threads: 0
Junior Poster in Training
AdRock is offline Offline
64 posts
since Dec 2008
Oct 27th, 2009
0
Re: Splitting a string into tokens
I am struggling to come up with a solution for this as everything i have tried either gets the same output or the program crashes.

This is how i understand it

C++ Syntax (Toggle Plain Text)
  1. string::size_type end = str.find_first_of(punct, pos);
assign to the variable end where the first occurrence of any of the puncs starting from the first char

C++ Syntax (Toggle Plain Text)
  1. if (end == pos)
if the end variable is 0 then

C++ Syntax (Toggle Plain Text)
  1. end = str.find_first_not_of(punct, pos);
assign to the variable end where the first occurrence of any non puncs starting from the first char

It then loops around starting at the new pos which using this string
Quote ...
(dressed(with))
which would be 0,18,9,13 but stops at 13

Do i have to perform another loop inside of the while loop and have the vec.push_back inside?

The way i thought it would be done is it finds a punc at pos whatever and then it should go through the while loop again
Reputation Points: 10
Solved Threads: 0
Junior Poster in Training
AdRock is offline Offline
64 posts
since Dec 2008
Oct 28th, 2009
1
Re: Splitting a string into tokens
Here's what I think you should do. Don't worry for now about why/how the code did what it did (obviously, you can and probably should go over it later and figure out what it does and why for your own personal knowlege). Test it out with all sorts of input and make sure it does what it's supposed to (break strings into "all non-punctuation" and "all punctuation" strings). If it does work for all possible test cases, go to the next step.

C++ Syntax (Toggle Plain Text)
  1. while (pos != string::npos)
  2. {
  3. string::size_type end = str.find_first_of(punct, pos);
  4.  
  5. if (end == pos) end = str.find_first_not_of(punct, pos);
  6.  
  7. vec.push_back(str.substr(pos, end - pos));
  8. pos = end;
  9. }

Break line 7 into two lines:

C++ Syntax (Toggle Plain Text)
  1. string newString = str.substr (pos, end - pos);
  2. vec.push_back (newString);

So you end up with this:

C++ Syntax (Toggle Plain Text)
  1. while (pos != string::npos)
  2. {
  3. string::size_type end = str.find_first_of(punct, pos);
  4.  
  5. if (end == pos) end = str.find_first_not_of(punct, pos);
  6.  
  7. string newString = str.substr (pos, end - pos);
  8. vec.push_back (newString);
  9. pos = end;
  10. }

Now, if newString contains punctuation, you need to change it into one string for every character. If it doesn't, push the whole string as you do in line 8 above. Test whether it has any punctuation in it, as before, and act accordingly (push it onto the vector if it's all non-punctuation, split it further if it is punctuation):

C++ Syntax (Toggle Plain Text)
  1. if (newString.find_first_of (punct) == string::npos)
  2. {
  3. // newString doesn't contain punctuation. Push it.
  4. vec.push_back (newString);
  5. }
  6. else
  7. {
  8. // newString is punctuation. Break newString into one-character strings and push each of them onto vec.
  9. }


So your job is:
  1. Try Tom Gunn's code out. Make sure it "works" for all possible test cases (i.e. change line 11 below for every possible test case you can think of and make sure the code "behaves". I imagine it does. Tom Gunn's code generally does. . But you need to verify that.
  2. If it does, look at my revised code below. Run it. See what it does. now delete my line 31. Change line 30 so it does what you need it to do, which is to take a string like "****))" strored in newString and break it into six separate strings, and push them all onto the vec vector

C++ Syntax (Toggle Plain Text)
  1. #include <iostream>
  2. #include <string>
  3. #include <vector>
  4.  
  5. int main()
  6. {
  7. using namespace std;
  8.  
  9. string const punct = "+-*/<=!>{()};,";
  10.  
  11. string str = "dressed(to**!impress{{)";
  12. string::size_type pos = 0;
  13. vector<string> vec;
  14.  
  15. while (pos != string::npos)
  16. {
  17. string::size_type end = str.find_first_of(punct, pos);
  18.  
  19. if (end == pos) end = str.find_first_not_of(punct, pos);
  20.  
  21. string newString = str.substr (pos, end - pos);
  22.  
  23. if (newString.find_first_of (punct) == string::npos)
  24. {
  25. // newString doesn't contain punctuation. Push it.
  26. vec.push_back (newString);
  27. }
  28. else
  29. {
  30. // newString is punctuation. Break newString into one-character strings and push each of them onto vec.
  31. vec.push_back ("PUNCTUATION");
  32. }
  33.  
  34. pos = end;
  35. }
  36.  
  37. for (int a = 0; a < vec.size(); ++a)
  38. {
  39. cout << "vector " << a << ": " << vec.at(a) << '\n';
  40. }
  41.  
  42. return 0;
  43. }
Featured Poster
Reputation Points: 2614
Solved Threads: 687
Posting Expert
VernonDozier is offline Offline
5,375 posts
since Jan 2008
Oct 28th, 2009
1
Re: Splitting a string into tokens
Quote ...
Try Tom Gunn's code out. Make sure it "works" for all possible test cases (i.e. change line 11 below for every possible test case you can think of and make sure the code "behaves".
It does not, as you proved. I did not consider adjacent punctuation in my haste to get my post out the door and ended up over engineering the whole thing. Since punctuation is always a single character in this case, the simpler solution for matching punctuation works better:
C++ Syntax (Toggle Plain Text)
  1. #include <iostream>
  2. #include <string>
  3. #include <vector>
  4.  
  5. std::vector<std::string> SplitOnPunct(std::string const& str,
  6. std::string const& punct)
  7. {
  8. std::vector<std::string> vec;
  9.  
  10. if (str.length() == 0) return vec;
  11.  
  12. std::string::size_type pos, end;
  13.  
  14. for (pos = 0; pos != std::string::npos; pos = end)
  15. {
  16. end = str.find_first_of(punct, pos);
  17.  
  18. if (end == pos && ++end == str.size()) end = std::string::npos;
  19.  
  20. vec.push_back(str.substr(pos, end - pos));
  21. }
  22.  
  23. return vec;
  24. }
  25.  
  26. int main()
  27. {
  28. std::vector<std::string> vec = SplitOnPunct("dressed(with)", "+-*/<=!>{()};,");
  29.  
  30. for (std::vector<std::string>::size_type a = 0; a < vec.size(); ++a)
  31. {
  32. std::cout << "vector " << a << ": " << vec.at(a) << '\n';
  33. }
  34. }
I still do not guarantee 100% correctness because it is hard to find my own mistakes. All of the basic test cases seem to work though.
Reputation Points: 1446
Solved Threads: 135
Practically a Master Poster
Tom Gunn is offline Offline
681 posts
since Jun 2009

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in C++ Forum Timeline: How to switch lines/record inside a textfile?
Next Thread in C++ Forum Timeline: mouse input problems





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC