You can use lists or 2 sets. But you would want both set1.difference(set2) and set2.difference(set1). You can set up a process to read both files like a merge sort would, but the set solution seems more pythonic. Depends on how large the files are though.
woooee
Nearly a Posting Maven
2,454 posts since Dec 2006
Reputation Points: 777
Solved Threads: 714
Vegaseat left this example of the difflib module somewhere in the code snippets:
# find the difference between two texts
# tested with Python24 vegaseat 6/2/2005
import difflib
text1 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond: Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Spell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Fashion
Ralph Nader's List of Pleasures
"""
text2 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond: Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Sell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Passion
Ralph Nader's List of Pleasures
"""
# create a list of lines in text1
text1Lines = text1.splitlines(1)
print "Lines of text1:"
for line in text1Lines:
print line,
print
# dito for text2
text2Lines = text2.splitlines(1)
print "Lines of text2:"
for line in text2Lines:
print line,
print
diffInstance = difflib.Differ()
diffList = list(diffInstance.compare(text1Lines, text2Lines))
print '-'*50
print "Lines different in text1 from text2:"
for line in diffList:
if line[0] == '-':
print line,
bumsfeld
Nearly a Posting Virtuoso
1,445 posts since Jul 2005
Reputation Points: 404
Solved Threads: 184
With some additions to the data, note that it reports "1. first different line" as a difference when it is not and doesn't find "Another line that is different". Sorting text1Lines and text2Lines should solve the first problem since it seems to be comparing in file order. This may not make a difference since the file appears to be in ascending date order already. If there are lines in the 2nd file that are not in the first, then you will also have to insert a
diffList = list(diffInstance.compare(text2Lines, text1Lines)) routine. In general, when comparing we want to know how it is comparing.
#!/usr/bin/python
# find the difference between two texts
# tested with Python24 vegaseat 6/2/2005
import difflib
text1 = """The World's Shortest Books:
Human Rights Advances in China
Add some text lines that are not in either
1. first different line
2. line 2 added
3. also a third
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond: Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Spell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Fashion
Ralph Nader's List of Pleasures
"""
text2 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond: Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Sell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Passion
Ralph Nader's List of Pleasures
Another line that is different
1. first different line
"""
# create a list of lines in text1
text1Lines = text1.splitlines(1)
##text1Lines.sort() ## uncomment to sort
print "Lines of text1:"
for line in text1Lines:
print line,
print
# dito for text2
text2Lines = text2.splitlines(1)
##text2Lines.sort() ## uncomment to sort
print "Lines of text2:"
for line in text2Lines:
print line,
print
diffInstance = difflib.Differ()
diffList = list(diffInstance.compare(text1Lines, text2Lines))
print '-'*50
print "Lines different in text1 from text2:"
for line in diffList:
if line[0] == '-':
print line,
print
woooee
Nearly a Posting Maven
2,454 posts since Dec 2006
Reputation Points: 777
Solved Threads: 714