943,846 Members | Top Members by Rank

Ad:
  • C++ Discussion Thread
  • Unsolved
  • Views: 2192
  • C++ RSS
You are currently viewing page 1 of this multi-page discussion thread
Aug 28th, 2007
0

Lexer- Tokenizer problem

Expand Post »
hi, i use the flex tool {http://www.gnu.org/software/flex/manual/} to generate a tokenizer ,but i have the following problem {it has to do with the way that flex tokenizes the input::

FILE : flex.l

C++ Syntax (Toggle Plain Text)
  1. %{
  2. #define WEB 0
  3. #define SPACE 1
  4. #define STRING 2
  5. %}
  6.  
  7. string_component [0-9a-zA-Z \t\.!#$%^&()*@_]
  8.  
  9. %%
  10.  
  11. "daniweb" {return WEB;}
  12. [ \t\n] {return SPACE;}
  13. {string_component}+ {return STRING;}
  14.  
  15. %%
  16.  
  17. #include <iostream>
  18.  
  19. using namespace std;
  20.  
  21. int main()
  22. {
  23. cout<<yylex()<<endl;
  24. cout<<yylex()<<endl;
  25.  
  26. return 0;
  27. }
  28.  
  29. int yywrap(void){return 1;}

Example file:
C++ Syntax (Toggle Plain Text)
  1. test_string daniweb

What i want is to have the above string tokenized as
STRING SPACE WEB
instead flex recognizes it as STRING, because it tries to match the longest input....

How can i fix this problem?
all ideas are welcomed....

PS:: to compile:
C++ Syntax (Toggle Plain Text)
  1. flex flex.l
  2. g++ lex.yy.c
  3. ./a.out <example
Similar Threads
Reputation Points: 23
Solved Threads: 12
Posting Whiz in Training
n.aggel is offline Offline
202 posts
since Nov 2006
Aug 28th, 2007
0

Re: Lexer- Tokenizer problem

Your string component matches spaces, and now you're complaining that you don't want to match spaces.

You can't have it both ways.
Team Colleague
Reputation Points: 5862
Solved Threads: 950
Posting Sage
Salem is offline Offline
7,164 posts
since Dec 2005
Aug 29th, 2007
0

Re: Lexer- Tokenizer problem

Click to Expand / Collapse  Quote originally posted by Salem ...
Your string component matches spaces, and now you're complaining that you don't want to match spaces.

You can't have it both ways.

Thank you for answering {apparently, few people have read the post...}

Yes you are rigth, it seems that i can't have it both ways... but from where i stand i want to use flex in order to do the following:::

Recognize some specif keywords {in the simplified example i provided the keyword was "daniweb"} and recognize everything else as a string...any ideas on how can i do that?

PS: maybe start conditions could help me solve the problem?{ i havven't understand them so well...}
PS2:in the beggining i thought it wouldn't be that difficult, but i was wrong...
Reputation Points: 23
Solved Threads: 12
Posting Whiz in Training
n.aggel is offline Offline
202 posts
since Nov 2006
Aug 29th, 2007
0

Re: Lexer- Tokenizer problem

What is this Flex? some kinda regular expression library or something. Do you even need it or can your problem be simplified?
Last edited by iamthwee; Aug 29th, 2007 at 2:42 pm.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Aug 29th, 2007
0

Re: Lexer- Tokenizer problem

Click to Expand / Collapse  Quote originally posted by iamthwee ...
What is this Flex? some kinda regular expression library or something. Do you even need it or can your problem be simplified?
Flex

Flex (The Fast Lexical Analyzer)
Flex is a fast lexical analyser generator. It is a tool for generating programs that perform pattern-matching on text. Flex is a non-GNU free implementation of the well known Lex program.


http://www.gnu.org/software/flex/
http://flex.sourceforge.net/
Reputation Points: 23
Solved Threads: 12
Posting Whiz in Training
n.aggel is offline Offline
202 posts
since Nov 2006
Aug 29th, 2007
0

Re: Lexer- Tokenizer problem

Um ok, please explain this:

string_component [0-9a-zA-Z \t\.!#$%^&()*@_]

and what you think it does?
Last edited by iamthwee; Aug 29th, 2007 at 3:11 pm.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Aug 29th, 2007
0

Re: Lexer- Tokenizer problem

There's a way to set precedence of regex's in flex. I don't remember the exact syntax, but you should put it before your catchall regex that you have defined there.
Last edited by nedrocks; Aug 29th, 2007 at 5:40 pm.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
nedrocks is offline Offline
1 posts
since Aug 2007
Aug 30th, 2007
0

Re: Lexer- Tokenizer problem

Click to Expand / Collapse  Quote originally posted by nedrocks ...
There's a way to set precedence of regex's in flex. I don't remember the exact syntax, but you should put it before your catchall regex that you have defined there.
i haven't seen what you mention in the manual...

unfortunately i haven't found the solution...i worked around my problem by changing the grammar {i.e. bison file}, and finally i gave the project... Now when i find the time i will try to find a solution using starting conditions
Reputation Points: 23
Solved Threads: 12
Posting Whiz in Training
n.aggel is offline Offline
202 posts
since Nov 2006
Aug 30th, 2007
0

Re: Lexer- Tokenizer problem

First you gotta know what your regular expressions are doing.

To me string_component [0-9a-zA-Z \t\.!#$%^&()*@_] and the example you have given is contradictory, like salem mentioned.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Aug 30th, 2007
0

Re: Lexer- Tokenizer problem

using boost.spirit may be much easier: http://www.boost.org/libs/spirit/doc/quick_start.html
cpp Syntax (Toggle Plain Text)
  1. #include <boost/spirit/core.hpp>
  2. #include <iostream>
  3. #include <string>
  4. #include <vector>
  5. #include <algorithm>
  6. #include <boost/assign.hpp>
  7. using namespace std ;
  8. using namespace boost ;
  9. using namespace boost::spirit ;
  10. using namespace boost::assign ;
  11.  
  12. struct parse_it
  13. {
  14. void operator() ( const string& str ) const
  15. {
  16. vector<string> tokens ;
  17. const char* cstr = str.c_str() ;
  18. size_t n = 0 ;
  19. while( n < str.size() )
  20. n += parse( cstr + n,
  21. (+space_p) [ push_back_a( tokens, "SPACE" ) ] |
  22. str_p("daniweb") [ push_back_a( tokens, "WEB" ) ] |
  23. str_p("lexer") [ push_back_a( tokens, "LEX" ) ] |
  24. str_p("tokenizer") [ push_back_a( tokens, "TOK" ) ] |
  25. (+~space_p) [ push_back_a( tokens, "STRING" ) ]
  26. ).length ;
  27. cout << '\n' << "parsed: " << str << "\ntokens: " ;
  28. copy( tokens.begin(), tokens.end(),
  29. ostream_iterator<string>(cout," ") ) ;
  30. cout << '\n' ;
  31. }
  32. };
  33. int main()
  34. {
  35. vector<string> test_cases = list_of
  36. ( "test daniweb lexer xyz tokenizer lexer" )
  37. ( "daniweblexer tokenizerlexer abcd lexerlexer" )
  38. ( "daniwebtest lexerdaniweblexertest tokenizerxxx" ) ;
  39. for_each( test_cases.begin(), test_cases.end(), parse_it() ) ;
  40. }
  41. /**
  42. >g++ -Wall -std=c++98 -I/usr/local/include keyword.cpp && ./a.out
  43.  
  44. parsed: test daniweb lexer xyz tokenizer lexer
  45. tokens: STRING SPACE WEB SPACE LEX SPACE STRING SPACE TOK SPACE LEX
  46.  
  47. parsed: daniweblexer tokenizerlexer abcd lexerlexer
  48. tokens: WEB LEX SPACE TOK LEX SPACE STRING SPACE LEX LEX
  49.  
  50. parsed: daniwebtest lexerdaniweblexertest tokenizerxxx
  51. tokens: WEB STRING SPACE LEX WEB LEX STRING SPACE TOK STRING
  52. */
Last edited by vijayan121; Aug 30th, 2007 at 2:55 pm.
Reputation Points: 1159
Solved Threads: 285
Posting Virtuoso
vijayan121 is offline Offline
1,606 posts
since Dec 2006

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in C++ Forum Timeline: please help
Next Thread in C++ Forum Timeline: Optomizing for Pentium Pro





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC