I'm trying to take two text files, compare them, then output the matches into a new text file. I've read the thread started by the1last, but that's outputting differences, not matches. I just started learning about python a couple days ago, so I don't know anything about what syntax or modules I should use.
Any thoughts or suggestions would be appreciated.

I'm new to Python as well, but I think I could offer some help.

Well if you're looking to find lines that match up, instead of words, then you can do something like this:

file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
file3 = open("file3.txt", "a")


list1 = file1.readlines()
list2 = file2.readlines()

for i in list1:
    for j in list2:
        if i == j:
            file3.write("FILE 1:",i)
            file3.write("FILE 2:",j)

Now if you are talking about words, then you can probably go through each list and use <string>.split() on it, so you have each word in a seperate list item, but I'm not sure if that would work. Just an idea, really.

Hope I helped.

Thanks for the replies.
I tried the code that Zebibyte suggested, but I'm getting the following error:

Traceback (most recent call last):
File '/home/zach/bin/ps3.py", line 7, in <module>
list1 = file1.readLines()
AttributeError: 'file' object has no attribute 'readLines'

Thanks to both of you for your help.

Here is the final working version of the code:

#!/usr/bin/env python

textfile = file('results.txt', 'wt')

file1 = open("list1.txt", "r")
file2 = open("list2.txt", "r")
file3 = open("results.txt", "a")

list1 = file1.readlines()
list2 = file2.readlines()

file3.write("The following entries appear in both lists: \n")

for i in list1:
[INDENT]for j in list2:[/INDENT]

I need to compare a text in a txt file against another txt file containing a list of words (one word/expresion per line) to get the words present in the tetx file NOT contained in the list of words written to a new file.

This way i can focus on new words to bullid up a dictionary foro my pupils.
Thx in advance.

This is probably not the fastest one and requires that words and one input file fit easily in memory, but here is some start point:

import string

prevwords = newwords = set()
drop = string.punctuation+string.digits

for filename in ('test.txt','advsh12.txt'):
    print('Loading %s' % filename)
    prevwords = prevwords.union(newwords)
    newwords = set(word.strip(drop)
                   for word in inputstring.lower().split())
    newwords = newwords.difference(prevwords)
    ## first round all word in first file, after new ones in last file only
    print('The %i new words are:\n%s' % (len(newwords),'\t'.join(sorted(newwords))))

Edited 6 Years Ago by pyTony: n/a

Quick, well explained and most helpful!!

I have a problem here. I'm comparing 2 files which have the artist name and the song name. I want to display the common artist from both files and then display all the songs under that artist from both the files. Can anybody help me asap?

"Ozzy Ozbourne", "Crazy Train"
"Michael Jackson", "Beat It"
"Metallica", "Enter Sandman"
"Stevie Ray Vaughan", "Mary had a little Lamb, (original)"
"Nirvana", "Smells Like Teen Spirit"
"Pantera", "Floods"
"Metallica", "Unforgiven III"
"Guns N' Roses", "Welcome to the jungle"


"The Train", "Drive By"
"Pitbull", "Back in time"
"Metallica", "Nothing Else Matters"
"Avcii", "Levels"
"Bon Jovi", "It's my Life"
"Iron Maiden", "Dance of Death"
"Breaking Benjamin", "Evil Angel"
"Rammstein", "Amerika"

Do not hijackold thread, make your own and link to old
This question has already been answered. Start a new discussion instead.