| | |
Splitting a string into tokens
Please support our C++ advertiser: Intel Parallel Studio Home
![]() |
•
•
Join Date: Dec 2008
Posts: 57
Reputation:
Solved Threads: 0
I have a string and I need to split the string into tokens
The word can contain and letter/number but can also contain punctation such as brackets
I need to be able to find an occurence of punctuation, copy the string up to that point as a token, copy the puctuation mark as a token etc until it reaches the end of the string
Here is my output for the string dressed(with)
vectors 0 and 3 are correct but the output should be
here is my code
The word can contain and letter/number but can also contain punctation such as brackets
I need to be able to find an occurence of punctuation, copy the string up to that point as a token, copy the puctuation mark as a token etc until it reaches the end of the string
Here is my output for the string dressed(with)
•
•
•
•
vector 0: dressed
vector 1: (with)
vector 2: dressed(with
vector 3: )
•
•
•
•
vector 0: dressed
vector 1: (
vector 2: with
vector 3: )
C++ Syntax (Toggle Plain Text)
#include <string> #include <iostream> #include <vector> using namespace std; int main() { vector <string> vec; string str = "dressed(with)"; string tmp; char punct[] = {'+','-','*','/','<','=','!','>','{','(',')','}',';',','}; for (int i=0; i < sizeof(punct); i++) { unsigned int pos = str.find(punct[i], 0); if(pos != string::npos) { tmp.assign(str, 0, pos); vec.push_back(tmp); tmp.assign(str, pos, pos); vec.push_back(tmp); } } for(int a=0; a < vec.size(); a++) { cout << "vector " << a << ": " << vec.at(a) << endl; } return 0; }
2
#2 33 Days Ago
Your search always starts from position 0, that is why tokens are being duplicated. You need to skip over the extracted tokens as you go. Something like this:
C++ Syntax (Toggle Plain Text)
#include <iostream> #include <string> #include <vector> int main() { using namespace std; string const punct = "+-*/<=!>{()};,"; string str = "dressed(with)"; string::size_type pos = 0; vector<string> vec; while (pos != string::npos) { string::size_type end = str.find_first_of(punct, pos); if (end == pos) end = str.find_first_not_of(punct, pos); vec.push_back(str.substr(pos, end - pos)); pos = end; } for (int a = 0; a < vec.size(); ++a) { cout << "vector " << a << ": " << vec.at(a) << '\n'; } return 0; }
-Tommy (For Great Justice!) Gunn
•
•
Join Date: Jan 2008
Posts: 3,814
Reputation:
Solved Threads: 501
0
#3 33 Days Ago
If I recall correctly (from a previous thread), the OP also wants each punctuation element to be in its own vector (OP, please clarify), so if the string was "dressed(with))", the OP wants this:
rather than
so the OP would have to take it another step or two and further split strings with multiple consecutive punctuation characters into separate strings (again, that's if I was interpreting correctly from earlier threads). But Tom Gunn's code gets you one step closer regardless!
- dressed
- (
- with
- )
- )
rather than
- dressed
- (
- with
- ))
so the OP would have to take it another step or two and further split strings with multiple consecutive punctuation characters into separate strings (again, that's if I was interpreting correctly from earlier threads). But Tom Gunn's code gets you one step closer regardless!
•
•
Join Date: Dec 2008
Posts: 57
Reputation:
Solved Threads: 0
0
#4 33 Days Ago
•
•
•
•
If I recall correctly (from a previous thread), the OP also wants each punctuation element to be in its own vector (OP, please clarify), so if the string was "dressed(with))", the OP wants this:
- dressed
- (
- with
- )
- )
rather than
- dressed
- (
- with
- ))
so the OP would have to take it another step or two and further split strings with multiple consecutive punctuation characters into separate strings (again, that's if I was interpreting correctly from earlier threads). But Tom Gunn's code gets you one step closer regardless!
How would i rewrite that?
Last edited by AdRock; 33 Days Ago at 11:14 am.
0
#5 33 Days Ago
•
•
•
•
How would i rewrite that?
-Tommy (For Great Justice!) Gunn
•
•
Join Date: Dec 2008
Posts: 57
Reputation:
Solved Threads: 0
0
#7 32 Days Ago
I am struggling to come up with a solution for this as everything i have tried either gets the same output or the program crashes.
This is how i understand it
assign to the variable end where the first occurrence of any of the puncs starting from the first char
if the end variable is 0 then
assign to the variable end where the first occurrence of any non puncs starting from the first char
It then loops around starting at the new pos which using this string
which would be 0,18,9,13 but stops at 13
Do i have to perform another loop inside of the while loop and have the vec.push_back inside?
The way i thought it would be done is it finds a punc at pos whatever and then it should go through the while loop again
This is how i understand it
C++ Syntax (Toggle Plain Text)
string::size_type end = str.find_first_of(punct, pos);
C++ Syntax (Toggle Plain Text)
if (end == pos)
C++ Syntax (Toggle Plain Text)
end = str.find_first_not_of(punct, pos);
It then loops around starting at the new pos which using this string
•
•
•
•
(dressed(with))
Do i have to perform another loop inside of the while loop and have the vec.push_back inside?
The way i thought it would be done is it finds a punc at pos whatever and then it should go through the while loop again
•
•
Join Date: Jan 2008
Posts: 3,814
Reputation:
Solved Threads: 501
1
#8 32 Days Ago
Here's what I think you should do. Don't worry for now about why/how the code did what it did (obviously, you can and probably should go over it later and figure out what it does and why for your own personal knowlege). Test it out with all sorts of input and make sure it does what it's supposed to (break strings into "all non-punctuation" and "all punctuation" strings). If it does work for all possible test cases, go to the next step.
Break line 7 into two lines:
So you end up with this:
Now, if newString contains punctuation, you need to change it into one string for every character. If it doesn't, push the whole string as you do in line 8 above. Test whether it has any punctuation in it, as before, and act accordingly (push it onto the vector if it's all non-punctuation, split it further if it is punctuation):
So your job is:
C++ Syntax (Toggle Plain Text)
while (pos != string::npos) { string::size_type end = str.find_first_of(punct, pos); if (end == pos) end = str.find_first_not_of(punct, pos); vec.push_back(str.substr(pos, end - pos)); pos = end; }
Break line 7 into two lines:
C++ Syntax (Toggle Plain Text)
string newString = str.substr (pos, end - pos); vec.push_back (newString);
So you end up with this:
C++ Syntax (Toggle Plain Text)
while (pos != string::npos) { string::size_type end = str.find_first_of(punct, pos); if (end == pos) end = str.find_first_not_of(punct, pos); string newString = str.substr (pos, end - pos); vec.push_back (newString); pos = end; }
Now, if newString contains punctuation, you need to change it into one string for every character. If it doesn't, push the whole string as you do in line 8 above. Test whether it has any punctuation in it, as before, and act accordingly (push it onto the vector if it's all non-punctuation, split it further if it is punctuation):
C++ Syntax (Toggle Plain Text)
if (newString.find_first_of (punct) == string::npos) { // newString doesn't contain punctuation. Push it. vec.push_back (newString); } else { // newString is punctuation. Break newString into one-character strings and push each of them onto vec. }
So your job is:
- Try Tom Gunn's code out. Make sure it "works" for all possible test cases (i.e. change line 11 below for every possible test case you can think of and make sure the code "behaves". I imagine it does. Tom Gunn's code generally does.
. But you need to verify that. - If it does, look at my revised code below. Run it. See what it does. now delete my line 31. Change line 30 so it does what you need it to do, which is to take a string like "****))" strored in
newStringand break it into six separate strings, and push them all onto the vec vector
C++ Syntax (Toggle Plain Text)
#include <iostream> #include <string> #include <vector> int main() { using namespace std; string const punct = "+-*/<=!>{()};,"; string str = "dressed(to**!impress{{)"; string::size_type pos = 0; vector<string> vec; while (pos != string::npos) { string::size_type end = str.find_first_of(punct, pos); if (end == pos) end = str.find_first_not_of(punct, pos); string newString = str.substr (pos, end - pos); if (newString.find_first_of (punct) == string::npos) { // newString doesn't contain punctuation. Push it. vec.push_back (newString); } else { // newString is punctuation. Break newString into one-character strings and push each of them onto vec. vec.push_back ("PUNCTUATION"); } pos = end; } for (int a = 0; a < vec.size(); ++a) { cout << "vector " << a << ": " << vec.at(a) << '\n'; } return 0; }
1
#9 32 Days Ago
•
•
•
•
Try Tom Gunn's code out. Make sure it "works" for all possible test cases (i.e. change line 11 below for every possible test case you can think of and make sure the code "behaves".
C++ Syntax (Toggle Plain Text)
#include <iostream> #include <string> #include <vector> std::vector<std::string> SplitOnPunct(std::string const& str, std::string const& punct) { std::vector<std::string> vec; if (str.length() == 0) return vec; std::string::size_type pos, end; for (pos = 0; pos != std::string::npos; pos = end) { end = str.find_first_of(punct, pos); if (end == pos && ++end == str.size()) end = std::string::npos; vec.push_back(str.substr(pos, end - pos)); } return vec; } int main() { std::vector<std::string> vec = SplitOnPunct("dressed(with)", "+-*/<=!>{()};,"); for (std::vector<std::string>::size_type a = 0; a < vec.size(); ++a) { std::cout << "vector " << a << ": " << vec.at(a) << '\n'; } }
All of the basic test cases seem to work though. -Tommy (For Great Justice!) Gunn
![]() |
Similar Threads
- Problem Converting String to Float - Using ATOF (C)
- [Cry for Help]Getting individual words from a File - FIFO help[/Cry For Help] (C)
- Word Location in Text File (C++)
- Splitting a String using a delimiter (C++)
- Code Snippet: Parsing a String into Tokens Using strcspn 1 (C)
- C++ reading from .txt file and excluding punctuation! (C++)
- Splitting a string? (C)
- Code Snippet: Parsing a String into Tokens Using strcspn, Part 3 (C)
- Code Snippet: Parsing a String into Tokens Using strcspn, Part 2 (C)
Other Threads in the C++ Forum
- Previous Thread: How to switch lines/record inside a textfile?
- Next Thread: levelorder traversal with queue
| Thread Tools | Search this Thread |
api array arrays based beginner binary bitmap c++ c/c++ calculator char char* class code coding compile compiler console conversion count data database delete deploy developer dll download dynamic dynamiccharacterarray email encryption error file forms fstream function functions game getline givemetehcodez graph gui homeworkhelp homeworkhelper iamthwee ifstream input int java lib linker list loop looping loops map math matrix memory multiple news node number numbertoword output parameter pointer problem program programming project proxy python random read recursion recursive reference rpg sorting string strings temperature template test text text-file tree url variable vector video visual visualstudio win32 windows winsock word wordfrequency wxwidgets






