0

hi, i use the flex tool {http://www.gnu.org/software/flex/manual/} to generate a tokenizer ,but i have the following problem {it has to do with the way that flex tokenizes the input::

FILE : flex.l

%{
		#define WEB 0
		#define SPACE 1
		#define STRING 2	
%}

string_component [0-9a-zA-Z \t\.!#$%^&()*@_]

%%

"daniweb"		              {return WEB;}
[ \t\n]			{return SPACE;}
{string_component}+	{return STRING;}

%%

#include <iostream>
			
using namespace std;
		
int main()
{	
	cout<<yylex()<<endl;
	cout<<yylex()<<endl;

	return 0;
}

int yywrap(void){return 1;}

Example file:

test_string daniweb

What i want is to have the above string tokenized as
STRING SPACE WEB
instead flex recognizes it as STRING, because it tries to match the longest input....

How can i fix this problem?
all ideas are welcomed....

PS:: to compile:

flex flex.l
g++ lex.yy.c
./a.out <example
5
Contributors
11
Replies
12
Views
9 Years
Discussion Span
Last Post by n.aggel
0

Your string component matches spaces, and now you're complaining that you don't want to match spaces.

You can't have it both ways.

0

Your string component matches spaces, and now you're complaining that you don't want to match spaces.

You can't have it both ways.

Thank you for answering {apparently, few people have read the post...}

Yes you are rigth, it seems that i can't have it both ways... but from where i stand i want to use flex in order to do the following:::

Recognize some specif keywords {in the simplified example i provided the keyword was "daniweb"} and recognize everything else as a string...any ideas on how can i do that?

PS: maybe start conditions could help me solve the problem?{ i havven't understand them so well...}
PS2:in the beggining i thought it wouldn't be that difficult, but i was wrong...

0

What is this Flex? some kinda regular expression library or something. Do you even need it or can your problem be simplified?

0

What is this Flex? some kinda regular expression library or something. Do you even need it or can your problem be simplified?

Flex

Flex (The Fast Lexical Analyzer)
Flex is a fast lexical analyser generator. It is a tool for generating programs that perform pattern-matching on text. Flex is a non-GNU free implementation of the well known Lex program.


http://www.gnu.org/software/flex/
http://flex.sourceforge.net/

0

There's a way to set precedence of regex's in flex. I don't remember the exact syntax, but you should put it before your catchall regex that you have defined there.

0

There's a way to set precedence of regex's in flex. I don't remember the exact syntax, but you should put it before your catchall regex that you have defined there.

i haven't seen what you mention in the manual...

unfortunately i haven't found the solution...i worked around my problem by changing the grammar {i.e. bison file}, and finally i gave the project... Now when i find the time i will try to find a solution using starting conditions

0

First you gotta know what your regular expressions are doing.

To me string_component [0-9a-zA-Z \t\.!#$%^&()*@_] and the example you have given is contradictory, like salem mentioned.

0

using boost.spirit may be much easier: http://www.boost.org/libs/spirit/doc/quick_start.html

#include <boost/spirit/core.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <boost/assign.hpp>
using namespace std ;
using namespace boost ;
using namespace boost::spirit ;
using namespace boost::assign ;

struct parse_it
{ 
  void operator() ( const string& str ) const
  {
    vector<string> tokens ;
    const char* cstr = str.c_str() ;
    size_t n = 0 ;
    while( n < str.size() )
      n += parse( cstr + n,
                  (+space_p) [  push_back_a( tokens, "SPACE" ) ] |
                  str_p("daniweb") [ push_back_a( tokens, "WEB" ) ] |
                  str_p("lexer") [ push_back_a( tokens, "LEX" ) ] |
                  str_p("tokenizer") [ push_back_a( tokens, "TOK" ) ] |
                  (+~space_p) [ push_back_a( tokens, "STRING" ) ]
                ).length ;
    cout << '\n' << "parsed: " << str << "\ntokens: " ;      
    copy( tokens.begin(), tokens.end(), 
               ostream_iterator<string>(cout," ") ) ;
    cout << '\n' ;      
  }
};
int main()
{
  vector<string> test_cases = list_of
                ( "test daniweb lexer xyz tokenizer lexer" )
                ( "daniweblexer tokenizerlexer abcd lexerlexer" )
                ( "daniwebtest lexerdaniweblexertest tokenizerxxx" ) ;
  for_each( test_cases.begin(), test_cases.end(), parse_it() ) ;
}
/**
>g++ -Wall -std=c++98 -I/usr/local/include keyword.cpp && ./a.out

parsed: test daniweb lexer xyz tokenizer lexer
tokens: STRING SPACE WEB SPACE LEX SPACE STRING SPACE TOK SPACE LEX

parsed: daniweblexer tokenizerlexer abcd lexerlexer
tokens: WEB LEX SPACE TOK LEX SPACE STRING SPACE LEX LEX

parsed: daniwebtest lexerdaniweblexertest tokenizerxxx
tokens: WEB STRING SPACE LEX WEB LEX STRING SPACE TOK STRING
*/
0

man, i did not know that boost had a parsing tool... unfortunately i was obligated to use bison and flex from the project guide lines!

0

First you gotta know what your regular expressions are doing.

To me string_component [0-9a-zA-Z \t\.!#$%^&()*@_] and the example you have given is contradictory, like salem mentioned.

ok, maybe it is contradictory, but how can you express in flex the concept i wrote before? eg recognize some tokens and consider everything else a string...

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.