Adnan Saleem 0 Newbie Poster

i am currently making a project in Data Structures which checks similarities between two c++ source codes....i am going to use the tokenization approch to do it.... but i am having trouble in finding an efficient way of parcing source code so that i can turn it in the form of tokens and cluster them in suffix trees.........

Anyone can suggest me what is going to be an efficien t approach .... to do this with the help of c++......