| | |
Lexer- Tokenizer problem
Please support our C++ advertiser: Intel Parallel Studio Home
![]() |
hi, i use the flex tool {http://www.gnu.org/software/flex/manual/} to generate a tokenizer ,but i have the following problem {it has to do with the way that flex tokenizes the input::
FILE : flex.l
Example file:
What i want is to have the above string tokenized as
STRING SPACE WEB
instead flex recognizes it as STRING, because it tries to match the longest input....
How can i fix this problem?
all ideas are welcomed....
PS:: to compile:
FILE : flex.l
C++ Syntax (Toggle Plain Text)
%{ #define WEB 0 #define SPACE 1 #define STRING 2 %} string_component [0-9a-zA-Z \t\.!#$%^&()*@_] %% "daniweb" {return WEB;} [ \t\n] {return SPACE;} {string_component}+ {return STRING;} %% #include <iostream> using namespace std; int main() { cout<<yylex()<<endl; cout<<yylex()<<endl; return 0; } int yywrap(void){return 1;}
Example file:
C++ Syntax (Toggle Plain Text)
test_string daniweb
What i want is to have the above string tokenized as
STRING SPACE WEB
instead flex recognizes it as STRING, because it tries to match the longest input....
How can i fix this problem?
all ideas are welcomed....
PS:: to compile:
C++ Syntax (Toggle Plain Text)
flex flex.l g++ lex.yy.c ./a.out <example
•
•
•
•
Your string component matches spaces, and now you're complaining that you don't want to match spaces.
You can't have it both ways.
Thank you for answering {apparently, few people have read the post...}
Yes you are rigth, it seems that i can't have it both ways... but from where i stand i want to use flex in order to do the following:::
Recognize some specif keywords {in the simplified example i provided the keyword was "daniweb"} and recognize everything else as a string...any ideas on how can i do that?
PS: maybe start conditions could help me solve the problem?{ i havven't understand them so well...}
PS2:in the beggining i thought it wouldn't be that difficult, but i was wrong...
•
•
•
•
What is this Flex? some kinda regular expression library or something. Do you even need it or can your problem be simplified?
Flex (The Fast Lexical Analyzer)
Flex is a fast lexical analyser generator. It is a tool for generating programs that perform pattern-matching on text. Flex is a non-GNU free implementation of the well known Lex program.
http://www.gnu.org/software/flex/
http://flex.sourceforge.net/
•
•
•
•
There's a way to set precedence of regex's in flex. I don't remember the exact syntax, but you should put it before your catchall regex that you have defined there.
unfortunately i haven't found the solution...i worked around my problem by changing the grammar {i.e. bison file}, and finally i gave the project... Now when i find the time i will try to find a solution using starting conditions
•
•
Join Date: Dec 2006
Posts: 1,089
Reputation:
Solved Threads: 164
using boost.spirit may be much easier: http://www.boost.org/libs/spirit/doc/quick_start.html
cpp Syntax (Toggle Plain Text)
#include <boost/spirit/core.hpp> #include <iostream> #include <string> #include <vector> #include <algorithm> #include <boost/assign.hpp> using namespace std ; using namespace boost ; using namespace boost::spirit ; using namespace boost::assign ; struct parse_it { void operator() ( const string& str ) const { vector<string> tokens ; const char* cstr = str.c_str() ; size_t n = 0 ; while( n < str.size() ) n += parse( cstr + n, (+space_p) [ push_back_a( tokens, "SPACE" ) ] | str_p("daniweb") [ push_back_a( tokens, "WEB" ) ] | str_p("lexer") [ push_back_a( tokens, "LEX" ) ] | str_p("tokenizer") [ push_back_a( tokens, "TOK" ) ] | (+~space_p) [ push_back_a( tokens, "STRING" ) ] ).length ; cout << '\n' << "parsed: " << str << "\ntokens: " ; copy( tokens.begin(), tokens.end(), ostream_iterator<string>(cout," ") ) ; cout << '\n' ; } }; int main() { vector<string> test_cases = list_of ( "test daniweb lexer xyz tokenizer lexer" ) ( "daniweblexer tokenizerlexer abcd lexerlexer" ) ( "daniwebtest lexerdaniweblexertest tokenizerxxx" ) ; for_each( test_cases.begin(), test_cases.end(), parse_it() ) ; } /** >g++ -Wall -std=c++98 -I/usr/local/include keyword.cpp && ./a.out parsed: test daniweb lexer xyz tokenizer lexer tokens: STRING SPACE WEB SPACE LEX SPACE STRING SPACE TOK SPACE LEX parsed: daniweblexer tokenizerlexer abcd lexerlexer tokens: WEB LEX SPACE TOK LEX SPACE STRING SPACE LEX LEX parsed: daniwebtest lexerdaniweblexertest tokenizerxxx tokens: WEB STRING SPACE LEX WEB LEX STRING SPACE TOK STRING */
Last edited by vijayan121; Aug 30th, 2007 at 2:55 pm.
![]() |
Similar Threads
- StringTokenizer problem (Java)
- simple program tokenizer problem (Java)
Other Threads in the C++ Forum
- Previous Thread: please help
- Next Thread: Optomizing for Pentium Pro
| Thread Tools | Search this Thread |
api array based binary c++ c/c++ calculator char char* class classes code coding compile console conversion count database delete deploy desktop developer directshow dll download dynamic dynamiccharacterarray email encryption error file forms fstream function functions game givemetehcodez google graph gui homeworkhelp homeworkhelper iamthwee ifstream input int integer java linkedlist linker linux list loop looping loops map math matrix memory multiple news number numbertoword output parameter pointer problem program programming project python random read recursion recursive reference return rpg sorting string strings struct temperature template templates test text text-file tree unix url variable vector video visualstudio win32 windows winsock wordfrequency wxwidgets






