Find matches in two text files, output them into a third text file

Question

zmjman08 0 Newbie Poster

15 Years Ago

Hi,
I'm trying to take two text files, compare them, then output the matches into a new text file. I've read the thread started by the1last, but that's outputting differences, not matches. I just started learning about python a couple days ago, so I don't know anything about what syntax or modules I should use.
Any thoughts or suggestions would be appreciated.
Thanks.

python

6 Contributors
11 Replies
14K Views
2 Years Discussion Span
Latest Post 12 Years Ago Latest Post by santosh2430

All 11 Replies

Zebibyte 0 Newbie Poster

15 Years Ago

I'm new to Python as well, but I think I could offer some help.

Well if you're looking to find lines that match up, instead of words, then you can do something like this:

file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
file3 = open("file3.txt", "a")

file1.seek(0,0)
file2.seek(0,0)

list1 = file1.readlines()
list2 = file2.readlines()

for i in list1:
    for j in list2:
        if i == j:
            file3.write("FILE 1:",i)
            file3.write("FILE 2:",j)

Now if you are talking about words, then you can probably go through each list and use <string>.split() on it, so you have each word in a seperate list item, but I'm not sure if that would work. Just an idea, really.

Hope I helped.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sravan953 · Answer 1 · 2009-08-15T12:53:14+00:00

Zebibyte's code idea is correct, although lines 5 & 6 are unnecessary(I think).

zmjman08 0 Newbie Poster · Answer 2 · 2009-08-17T19:27:56+00:00

Thanks for the replies.
I tried the code that Zebibyte suggested, but I'm getting the following error:

Traceback (most recent call last):
File '/home/zach/bin/ps3.py", line 7, in <module>
list1 = file1.readLines()
AttributeError: 'file' object has no attribute 'readLines'

sravan953 · Answer 3 · 2009-08-17T19:44:43+00:00

readLines() is a type error; the 'l' is not capitals. So- readlines()

zmjman08 0 Newbie Poster · Answer 4 · 2009-08-17T20:34:07+00:00

zmjman08 0 Newbie Poster

15 Years Ago

Oh yeah, thanks.

zmjman08 0 Newbie Poster · Answer 5 · 2009-08-17T21:00:32+00:00

Thanks to both of you for your help.

Here is the final working version of the code:

#!/usr/bin/env python

textfile = file('results.txt', 'wt')

file1 = open("list1.txt", "r")
file2 = open("list2.txt", "r")
file3 = open("results.txt", "a")

list1 = file1.readlines()
list2 = file2.readlines()

file3.write("The following entries appear in both lists: \n")

for i in list1:
[INDENT]for j in list2:[/INDENT]
[INDENT][INDENT]if i==j:[/INDENT][/INDENT]
[INDENT][INDENT][INDENT]file3.write(i)[/INDENT][/INDENT][/INDENT]

eldeingles 0 Newbie Poster · Answer 6 · 2010-08-08T01:42:27+00:00

What codee would I have to add/replace if i wanted to compare words and not lines?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 7 · 2010-08-08T01:56:29+00:00

TrustyTony 888 ex-Moderator

14 Years Ago

Use difflib.

eldeingles 0 Newbie Poster · Answer 8 · 2010-08-08T03:12:10+00:00

I need to compare a text in a txt file against another txt file containing a list of words (one word/expresion per line) to get the words present in the tetx file NOT contained in the list of words written to a new file.

This way i can focus on new words to bullid up a dictionary foro my pupils.
Thx in advance.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 9 · 2010-08-08T03:53:27+00:00

This is probably not the fastest one and requires that words and one input file fit easily in memory, but here is some start point:

import string

prevwords = newwords = set()
drop = string.punctuation+string.digits

for filename in ('test.txt','advsh12.txt'):
    print('-'*60)
    print('Loading %s' % filename)
    inputstring=open(filename).read()
    prevwords = prevwords.union(newwords)
    newwords = set(word.strip(drop)
                   for word in inputstring.lower().split())
    newwords = newwords.difference(prevwords)
    ## first round all word in first file, after new ones in last file only
    print('The %i new words are:\n%s' % (len(newwords),'\t'.join(sorted(newwords))))

santosh2430 0 Newbie Poster · Answer 10 · 2012-07-03T12:11:13+00:00

I have a problem here. I'm comparing 2 files which have the artist name and the song name. I want to display the common artist from both files and then display all the songs under that artist from both the files. Can anybody help me asap?
FILE1:

"Ozzy Ozbourne", "Crazy Train"
"Michael Jackson", "Beat It"
"Metallica", "Enter Sandman"
"Stevie Ray Vaughan", "Mary had a little Lamb, (original)"
"Nirvana", "Smells Like Teen Spirit"
"Pantera", "Floods"
"Metallica", "Unforgiven III"
"Guns N' Roses", "Welcome to the jungle"

FILE2:

"The Train", "Drive By"
"Pitbull", "Back in time"
"Metallica", "Nothing Else Matters"
"Avcii", "Levels"
"Bon Jovi", "It's my Life"
"Iron Maiden", "Dance of Death"
"Breaking Benjamin", "Evil Angel"
"Rammstein", "Amerika"

Find matches in two text files, output them into a third text file

Recommended Answers Collapse Answers

All 11 Replies

Recommended Answers