I have a project to make a text comparer. I've downloaded almost 90000 texts from a site, parsed them and stored the texts in a file(150MB). So now, a new text is in another file and should be compared with the others, and return a result of 20 most similar texts, together with an ID.
The problem is that the comparing it is going to slow. The program opens the file and stores one article at a time in a list (word by word), and compares it with the new text also in a list. The comparing part is going something like this:
for word1 in list1: for word2 in list2: if word1 == word2: common += 1
Is there a way to speed things up?