We're a community of 1076K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,075,965 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

Machine Learning Email Prioritization - Python

I have been working on a Python coded priority email inbox, with the ultimate aim of using a machine learning algorithm to label (or classify) a selection of emails as either important or un-important. I will begin with some background information and then move into my question.

I have so far developed code to extract data from an email and process it to discover the most important ones. This is achieved using the following email features:

Senders Address Frequency
Thread Activity
Date Received (time between replies)
Common Words in body/subject

The code I have currently applies a ranking (or weighting) (value 0.1-1) to each email based on its importance and then applies a label of either ‘important’ or ‘un-important’ (In this case this is just 1 or 0). The status of priority is awarded if the rank is >0.5. This data is stored in a CSV file (as below).

 From           Subject       Body        Date          Rank    Priority 
 test@test.com  HelloWorld    Body Words  10/10/2012    0.67    1
 rest@test.com  ByeWorld      Body Words  10/10/2012    0.21    0
 best@test.com  SayWorld      Body Words  10/10/2012    0.91    1
 just@test.com  HeyWorld      Body Words  10/10/2012    0.48    0
 etc        …………………………………………………………………………

I have two sets of email data (One Training, One Testing). The above applies to my training email data. I am now attempting to train a learning algorithm so that I can predict the importance of the testing data.

To do this I have been looking at both SCIKIT and NLTK. However, I am having trouble transferring the information I have learnt in the tutorials and implementing into my project. I have no particular requirements in regards to which learning algorithm is used. Is this as simple as applying the following? And if so how?

X, y = email.data, email.target

from sklearn.svm import LinearSVC
clf = LinearSVC()

clf = clf.fit(X, y)

X_new = [Testing Email Data]

clf.predict(X_new)

1
Contributor
0
Replies
2
Views
ZeeeeeV
Newbie Poster
1 post since Feb 2013
Reputation Points: 0
Solved Threads: 0
Skill Endorsements: 0

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page rendered in 0.0558 seconds using 2.67MB