DaniWeb IT Discussion Community

DaniWeb IT Discussion Community (http://www.daniweb.com/forums/index.php)
-   Python (http://www.daniweb.com/forums/forum114.html)
-   -   Trying to compare the contents of two text files and save the difference (http://www.daniweb.com/forums/thread96638.html)

the1last Nov 13th, 2007 9:04 pm
Trying to compare the contents of two text files and save the difference
 
I have two text files containing multiple lines of text from a datalogger, and I need to compare the two files and save the difference into a third text file.

ie....

text1:
10/13/01, 21:34:23, 4324
10/14/01, 09:12:32, 3423
10/15/01, 04:45:54, 7834

text2:
10/12/01, 43:34:34, 6453
10/13/01, 21:34:23, 4324
10/14/01, 09:12:32, 3423
10/15/01, 04:45:54, 7834
10/16/01, 05:34:26, 8323

text3:
10/12/01, 43:34:34, 6453
10/16/01, 05:34:26, 8323

I am able to accomplish this using a bash script, but since the rest of my code is in the python I would rather stick to using just python. Any advice would be great!

Thanks

woooee Nov 13th, 2007 10:15 pm
Re: Trying to compare the contents of two text files and save the difference
 
You can use lists or 2 sets. But you would want both set1.difference(set2) and set2.difference(set1). You can set up a process to read both files like a merge sort would, but the set solution seems more pythonic. Depends on how large the files are though.

bumsfeld Nov 13th, 2007 11:46 pm
Re: Trying to compare the contents of two text files and save the difference
 
Vegaseat left this example of the difflib module somewhere in the code snippets:
# find the difference between two texts
# tested with Python24  vegaseat  6/2/2005

import difflib

text1 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Spell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Fashion
Ralph Nader's List of Pleasures
"""

text2 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Sell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Passion
Ralph Nader's List of Pleasures
"""

# create a list of lines in text1
text1Lines = text1.splitlines(1)
print "Lines of text1:"
for line in text1Lines:
  print line,

print

# dito for text2
text2Lines = text2.splitlines(1)
print "Lines of text2:"
for line in text2Lines:
  print line,

print 

diffInstance = difflib.Differ()
diffList = list(diffInstance.compare(text1Lines, text2Lines))

print '-'*50
print "Lines different in text1 from text2:"
for line in diffList:
  if line[0] == '-':
    print line,

the1last Nov 14th, 2007 2:56 am
Re: Trying to compare the contents of two text files and save the difference
 
Thanks for the advice guys. Using the difflib module things are up and running nicely. My only question at this point is how would the module react to files with many entires (say > 2000). I haven't had a chance to setup a test run like this yet, but I plan to soon.

woooee Nov 14th, 2007 12:52 pm
Re: Trying to compare the contents of two text files and save the difference
 
With some additions to the data, note that it reports "1. first different line" as a difference when it is not and doesn't find "Another line that is different". Sorting text1Lines and text2Lines should solve the first problem since it seems to be comparing in file order. This may not make a difference since the file appears to be in ascending date order already. If there are lines in the 2nd file that are not in the first, then you will also have to insert a
diffList = list(diffInstance.compare(text2Lines, text1Lines)) routine. In general, when comparing we want to know how it is comparing.
#!/usr/bin/python

# find the difference between two texts
# tested with Python24  vegaseat  6/2/2005

import difflib

text1 = """The World's Shortest Books:
Human Rights Advances in China
Add some text lines that are not in either
1. first different line
2. line 2 added
3. also a third
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Spell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Fashion
Ralph Nader's List of Pleasures
"""

text2 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Sell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Passion
Ralph Nader's List of Pleasures
Another line that is different
1. first different line
"""

# create a list of lines in text1
text1Lines = text1.splitlines(1)
##text1Lines.sort()                ## uncomment to sort
print "Lines of text1:"
for line in text1Lines:
  print line,
print

# dito for text2
text2Lines = text2.splitlines(1)
##text2Lines.sort()                ## uncomment to sort
print "Lines of text2:"
for line in text2Lines:
  print line,
print 

diffInstance = difflib.Differ()
diffList = list(diffInstance.compare(text1Lines, text2Lines))

print '-'*50
print "Lines different in text1 from text2:"
for line in diffList:
  if line[0] == '-':
    print line,
print


All times are GMT -4. The time now is 10:05 am.

Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2009 DaniWeb® LLC