i am writting a program for finding the no of keyword ( occurances ) and operators occurances.

i am taking the all the keywords in an array and all the operators in seperate arrays for each category of operators.

char *kw[] = {"auto","break"..."while" };
char *Arth_op[] = { "%","*","+","-","/"};

according to the ascii value of that characters.

and using the binary search and other support functions i am finding the occurances of all the keywords and opearators.

now the problem is, there are operators with two characters
like :

>= , <=, ++, --, ==, += -= and many

here i cannot use my bsearch function.
to find the occurance of an operators

all the keywords and operators are arranged in the array in increasing ascii value of characters for single charcater operators.

but how to handle the assignment and relational operators.

please suggest me a way to solve this problem.

or suggest me is there any other method to solve the problem.

Maybe I am missing something.

here i cannot use my bsearch function.

Why?

all the keywords and operators are arranged in the array in increasing ascii value of characters for single charcater operators.

but how to handle the assignment and relational operators.

Can you just change the comparison function?

please suggest me a way to solve this problem.

or suggest me is there any other method to solve the problem.

Usually for problems like this lex is the solution.

>please suggest me a way to solve this problem
It's easy to break the world down into comments, identifiers (which includes keywords), operators, and literals. From there you can simply check for existence in your keyword and operator lists to find matches.

A C compiler will use a greedy tokenizing algorithm to handle this kind of thing. Let's say you find a '=' character in the token stream. If the next character is not an expected operator character you don't connect it. But if the next character is expected, connect it and check the next character until you get a failed match. The connected characters after a failed match is your token, and the next token starts at the character that failed to match.

You encountered a similar problem with operators that you'll find with keywords. If you aren't careful with the search (or how you tokenize), you'll end up with false positives on identifiers like "first_break" (or any other similar construction).

This article has been dead for over six months. Start a new discussion instead.