I am doing project in java that analyzes tweets related to a particular topic and categorizes them as positive,negative or neutral.I have been able to extract the tweets but donot know how to build a classifier to categorize them.How do i build a classifier and how do i train it?Please help me out.

Sounds like AI stuff.
Most of your problem will be design.
When you get a design then we can help you with the java coding part.
Good luck.

Do you need to take the tweets and decipher their meaning?

If so, go become a masters degree in Artificial Intelligence.
(Not to be discouraging, but that is truly very, very difficult to do.)

If not, use a loop and a switch

(

for(int i=0;i</*Value*/;i++)
{
switch(testValue)
{
case testThing:
//blah
case testThing2:
//something else
default:
//default action 
}
}

)

This would iterate through all of the tweets, in an array. Then test them to find what type of tweet it is.

That code probably is not going to work as is, even if you put in the right values and variables. You will have to make that part on your own.

I have thought of preprocessing the tweets and then categorizing them according to the kind of words or adjectives that have been used..for eg."that was good" would be classified as positive..it obviously wont work out for all cases but i wanted to give it a try

One possible way is as you suggested in your first post: Build a massively parallel recognizer system and train it on data that you have screened yourself. This is a non-trivial task.

Another is to try to build a recognizer using grammar rules. In my experience, that can be done, though not easily and not perfectly. There have been many many programmer years expended on each of these two options.

The idea that tweets are in english (or any spoken/written language) is pretty funny. ROFLMAO. Speaking of which: Is "LOL" a positive comment, a negative one, or neutral? Probably depends on context, eh?

I was once tasked with building a filter to remove objectionable words from tweets. I suggested to my manager that the task was impossible, but he persisted. Being a manager, he didn't even understand my point when I told him FUH Q. Sigh.

I am doing project in java that analyzes tweets related to a particular topic and categorizes them as positive,negative or neutral.I have been able to extract the tweets but donot know how to build a classifier to categorize them.How do i build a classifier and how do i train it?Please help me out.

I heard that lot of buzz is going on here... I am pretty sure there are lot of good open source project like Mahout, Mallet which can be used to build excellent classifiers. I have done little bit of experimentation and yielded good results... I would recommend you to have a look at them.

well the project has already been completed by now...i couldnt use Mallet and other open source classifiers because i was actually trying to build one (though not accurate)..i collected a sample of about 8000 tweets and preprocessed them (that includes removing stop words like punctuation marks and other words like a,an,the which dont give any particular meaning ...followed by reducing each word to its root ). i then tagged the parts of speech of each word of a tweet using the jar file of Stanford pos taggger and then extracted adjectives and adverbs from each tweet seperately,then,tagged each tweet as positive or negative manually..this completed the "training" process..as for "testing", i used the Bayesian classifier because it involved only probability calculations and was much easier to understand and implement..

Hi Abita,

I want to know how you developed the classifier.
How you created the training sets. I want some suggestion from you since im on the same track. I too doing a similar project like you using tweets.
But i want to find similarity among the senetences.whether the sentences are replicating the same meaning. How to carry out this using classifier.
Help me out to develop the classifier.
Can i use bayesian classifier .
Regards,
Siva.

This question has already been answered. Start a new discussion instead.