hy for all .

i have to do a new university work about text mining and specialy on sentiment analysis . i have to try to analyze some mails and forums to extract from about what the student are discussin ..

i've seen a lot of documentation about this , i know all the concepts , i know java and i discover LUCENE (a text mining package)

my Question is :

i don't know how to start workin ?
i want to know the steps i will cross to do my work .

please help me

Here are few initial thoughts.

Approach this in small steps. The first step is to make a list of the things you actually need to do. For example, how do you get the text from the forums? This is the first problem I would tackle. There are a number of other sites on the web that collect forum discussions, such as nabble, markmail.org, etc.

Then you might start to develop a dictionary of words of interest. The words "turbocharger", "barbeque" and "null pointer" might be words or phrases of interest. "The", "of", "how", "when" would certainly not be. This would probably be initially built by hand as you think about some type of engine that can pull out words of significance on its own.

Obviously, this dictionary should reside in a relational database of some type. What would its structure look like?

You'll need to develop some type of intelligence engine that can make assumptions about what the discussion is regarding. For example, if you came across the words "cylinder", "burn" and "oil", the discussion would probably be about internal combustion engines of some sort. However, if you only ran across "burn" and "oil", how would you discern if the discussion were about cooking, furnaces, or engines?

There are many, many more things to think about, but these, in my humble opinion, are some of the first ones I would start putting thought to.

Hope this helps. Sounds like an interesting project - I hope you have more than a couple of weeks to pull it off!

thank you for your answer , i 'll start with my tokenizer and Lucene package of java , and also NLP (Natural langage processing ) techniques , i am afraid this project but i like its difficulties ...

This article has been dead for over six months. Start a new discussion instead.