954,541 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Find matches in two text files, output them into a third text file

Hi,
I'm trying to take two text files, compare them, then output the matches into a new text file. I've read the thread started by the1last, but that's outputting differences, not matches. I just started learning about python a couple days ago, so I don't know anything about what syntax or modules I should use.
Any thoughts or suggestions would be appreciated.
Thanks.

zmjman08
Newbie Poster
4 posts since Aug 2009
Reputation Points: 10
Solved Threads: 0
 

I'm new to Python as well, but I think I could offer some help.

Well if you're looking to find lines that match up, instead of words, then you can do something like this:

file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
file3 = open("file3.txt", "a")

file1.seek(0,0)
file2.seek(0,0)

list1 = file1.readlines()
list2 = file2.readlines()

for i in list1:
    for j in list2:
        if i == j:
            file3.write("FILE 1:",i)
            file3.write("FILE 2:",j)


Now if you are talking about words, then you can probably go through each list and use <string>.split() on it, so you have each word in a seperate list item, but I'm not sure if that would work. Just an idea, really.

Hope I helped.

Zebibyte
Newbie Poster
8 posts since Aug 2009
Reputation Points: 10
Solved Threads: 1
 

Zebibyte's code idea is correct, although lines 5 & 6 are unnecessary(I think).

sravan953
Posting Whiz in Training
243 posts since May 2009
Reputation Points: 2
Solved Threads: 30
 

Thanks for the replies.
I tried the code that Zebibyte suggested, but I'm getting the following error:

Traceback (most recent call last):
File '/home/zach/bin/ps3.py", line 7, in
list1 = file1.readLines()
AttributeError: 'file' object has no attribute 'readLines'

zmjman08
Newbie Poster
4 posts since Aug 2009
Reputation Points: 10
Solved Threads: 0
 

readLines() is a type error; the 'l' is not capitals. So- readlines()

sravan953
Posting Whiz in Training
243 posts since May 2009
Reputation Points: 2
Solved Threads: 30
 

Oh yeah, thanks.

zmjman08
Newbie Poster
4 posts since Aug 2009
Reputation Points: 10
Solved Threads: 0
 

Thanks to both of you for your help.

Here is the final working version of the code:

#!/usr/bin/env python

textfile = file('results.txt', 'wt')

file1 = open("list1.txt", "r")
file2 = open("list2.txt", "r")
file3 = open("results.txt", "a")

list1 = file1.readlines()
list2 = file2.readlines()

file3.write("The following entries appear in both lists: \n")

for i in list1:[INDENT]for j in list2:[/INDENT][INDENT][INDENT]if i==j:[/INDENT][/INDENT][INDENT][INDENT][INDENT]file3.write(i)[/INDENT][/INDENT][/INDENT]
zmjman08
Newbie Poster
4 posts since Aug 2009
Reputation Points: 10
Solved Threads: 0
 

What codee would I have to add/replace if i wanted to compare words and not lines?

eldeingles
Newbie Poster
6 posts since Aug 2010
Reputation Points: 10
Solved Threads: 0
 
pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

I need to compare a text in a txt file against another txt file containing a list of words (one word/expresion per line) to get the words present in the tetx file NOT contained in the list of words written to a new file.

This way i can focus on new words to bullid up a dictionary foro my pupils.
Thx in advance.

eldeingles
Newbie Poster
6 posts since Aug 2010
Reputation Points: 10
Solved Threads: 0
 

This is probably not the fastest one and requires that words and one input file fit easily in memory, but here is some start point:

import string

prevwords = newwords = set()
drop = string.punctuation+string.digits

for filename in ('test.txt','advsh12.txt'):
    print('-'*60)
    print('Loading %s' % filename)
    inputstring=open(filename).read()
    prevwords = prevwords.union(newwords)
    newwords = set(word.strip(drop)
                   for word in inputstring.lower().split())
    newwords = newwords.difference(prevwords)
    ## first round all word in first file, after new ones in last file only
    print('The %i new words are:\n%s' % (len(newwords),'\t'.join(sorted(newwords))))
pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You