you need to give the input text files in the command line...
Python codes for some algorithms
#! /usr/bin/python
import re
import sys
import math
import os
os.system('clear')
print "$$$$$ THE DOCUMENT-DISTANCE PROBLEM $$$$$"
print "\n\nTHE NAMES OF THE SCRIPT & TWO DOCUMENTS :",(str(sys.argv))
for i in range(1,len(sys.argv)):
for j in range(1,len(sys.argv)):
text_1 = open(sys.argv[i],'r')
text_2 = open(sys.argv[j],'r')
words_1 = re.split('\W+|\ ',text_1.read().lower())
words_2 = re.split('\W+|\ ',text_2.read().lower())
freqs_1 = {}
for word in words_1:
freqs_1[word] = freqs_1.get(word, 0) + 1
freqs_2 = {}
for word in words_2:
freqs_2[word] = freqs_2.get(word, 0) + 1
def square(x):
return x*x
def norm_function(a):
return math.sqrt(sum(map(square,a)))
norm_1 = norm_function(freqs_1.values())
norm_2 = norm_function(freqs_2.values())
diction = dict( (n, freqs_1.get(n, 0)*freqs_2.get(n, 0)) for n in set(freqs_1) & set(freqs_2) )
s = sum(diction.values())
x = round(s/(norm_1*norm_2), 5)
angle = math.acos(x)
print "THE NORM OF DOCUMENT",sys.argv[i],"is :", norm_1
print "THE NORM OF DOCUMENT",sys.argv[j],"is :", norm_2
print "THE ANGLE BETWEEN TWO DOCUMENTS :", angle
print "\n$$$$$ THE-END $$$$$\n"
aditya369 2 Newbie Poster
Tcll commented: doing some model hacking for a game, where the format requires a BST, this could be useful. :) +2
aditya369 2 Newbie Poster
aditya369 2 Newbie Poster
aditya369 2 Newbie Poster
TrustyTony 888 ex-Moderator Team Colleague Featured Poster
Nils_1 0 Newbie Poster
sneekula 969 Nearly a Posting Maven
Nils_1 0 Newbie Poster
Nils_1 0 Newbie Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.