Naive Bayes text classification

Question

jancho1911 0 Newbie Poster

16 Years Ago

Hi!!!

I am making a program that is supose to use Naive bayes classifier to classify text from few categories. This is the best i can do to explain, here is what i have done so far:

import math

d=open('D.txt', 'r')
di=open('kat1.txt', 'r')

posleden = di.readlines()
total = d.readlines()
Di = posleden[len(posleden) -1]
D = total[len(total) -1]

P = float(Di)/float(D)

fajl = open('prob.txt', 'w')
fajl.write('P1=' + str(float(P)))
fajl.close()

for rec in di:
split_rec = rec.split('\t')
if len(split_rec) > 1:
print "word=%s, freq=%s" % \
(split_rec[0], split_rec[1])

Prob=float(split_rec[1] + 1)/(float(Di) + float(D))

fajl = open('prob.txt', 'w')
fajl.write('Prob1 =' + str(float(Prob)))

:(

python

6 Contributors
8 Replies
215 Views
3 Years Discussion Span
Latest Post 12 Years Ago Latest Post by callmerudy

All 8 Replies

Stefano Mtangoo 455 Senior Poster

16 Years Ago

Mh Great man!
That was great trick that always tricked me.
Tahnks Gribouillis for asking. When one asks one might give answers but many around get knowledge
Bravo!

woooee 814 Nearly a Posting Maven

16 Years Ago

I would use a dictionary with the category as key, and a list of all words in the subset as the value

Let V be the vocabulary of all words in the documents in D
For each category ci
----->ci=dictionary key
Let Di be the subset of documents in D in category ci
----->Di = value associated with each key = list of words in this category
P(ci) = |Di| / |D|
Let Ti be the concatenation of all the documents in Di
----->Already have this as a list is a concatenation in the sense that I think it is being used here
Let ni be the total number of word occurrences in Ti
----->(not unique occurrences but all occurrences??)
----->ni = len(dictionary[ci]) i.e. the length of the list
For each word wj
Let nij be the number of occurrences of wj in Ti
----->You can loop through each key's list or use a_list.count(wj)
Let P(wj | ci) = (nij + 1) / (ni + |V|)
----->Not sure what all of this means, but it's values should be found in the above calcs

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

jlm699 320 Veteran Poster · Answer 1 · 2008-10-20T23:42:02+00:00

Please use code tags so that your indentation is not lost, and so that we may better read your posts.

Code tags go like this:
[code=python] # MY code goes between these tags!

[/code]

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 2 · 2008-10-20T23:49:24+00:00

Gribouillis 1,391 Programming Explorer

16 Years Ago

jlm699 how can you put the tag without triggering their action ?

jlm699 320 Veteran Poster · Answer 3 · 2008-10-20T23:59:15+00:00

jlm699 320 Veteran Poster

16 Years Ago

Wrap the code tags in [noparse][/noparse] tags.

jancho1911 0 Newbie Poster · Answer 4 · 2008-10-21T17:28:27+00:00

Sorry about that...

Here is the formulas i need to implement, the thing is that i don't understand what is what...

Text Naive Bayes Algorithm

Let V be the vocabulary of all words in the documents in D
For each category ci C
Let Di be the subset of documents in D in category ci
P(ci) = |Di| / |D|
Let Ti be the concatenation of all the documents in Di
Let ni be the total number of word occurrences in Ti
For each word wj V
Let nij be the number of occurrences of wj in Ti
Let P(wj | ci) = (nij + 1) / (ni + |V|)

And here is the code so far...

import math


v=open('V.txt', 'r')
totalv = v.readlines()
V = totalv[len(totalv) -1]

freq=open('freq1.txt', 'r')
totalf = freq.readlines()
Freq = totalf[len(totalf) -1]


for rec in freq:
		split_rec = rec.split('\t')
		if len(split_rec) > 1:
			print "zbor=%s, freq=%s" % \
			(split_rec[0], split_rec[1])
			

			P=(float(split_rec[1]) + 1)/(float(Freq) + float(V))
		
			fajl = open('bayes.txt', 'w')
			fajl.write('P1 =' + str(float(P)))

jancho1911 0 Newbie Poster · Answer 5 · 2008-10-22T14:14:59+00:00

jancho1911 0 Newbie Poster

16 Years Ago

thank you, it helped a lot...

callmerudy 0 Newbie Poster · Answer 6 · 2012-07-03T21:49:51+00:00

callmerudy 0 Newbie Poster

12 Years Ago

Hey do you still have the code for your program?

Naive Bayes text classification

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers