1,105,391 Community Members

Trying to compare the contents of two text files and save the difference

Member Avatar
the1last
Newbie Poster
2 posts since Nov 2007
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

I have two text files containing multiple lines of text from a datalogger, and I need to compare the two files and save the difference into a third text file.

ie....

text1:
10/13/01, 21:34:23, 4324
10/14/01, 09:12:32, 3423
10/15/01, 04:45:54, 7834

text2:
10/12/01, 43:34:34, 6453
10/13/01, 21:34:23, 4324
10/14/01, 09:12:32, 3423
10/15/01, 04:45:54, 7834
10/16/01, 05:34:26, 8323

text3:
10/12/01, 43:34:34, 6453
10/16/01, 05:34:26, 8323

I am able to accomplish this using a bash script, but since the rest of my code is in the python I would rather stick to using just python. Any advice would be great!

Thanks

Member Avatar
woooee
Posting Maven
2,798 posts since Dec 2006
Reputation Points: 783 [?]
Q&As Helped to Solve: 836 [?]
Skill Endorsements: 12 [?]
 
0
 

You can use lists or 2 sets. But you would want both set1.difference(set2) and set2.difference(set1). You can set up a process to read both files like a merge sort would, but the set solution seems more pythonic. Depends on how large the files are though.

Member Avatar
bumsfeld
Posting Virtuoso
1,537 posts since Jul 2005
Reputation Points: 399 [?]
Q&As Helped to Solve: 261 [?]
Skill Endorsements: 7 [?]
 
0
 

Vegaseat left this example of the difflib module somewhere in the code snippets:

# find the difference between two texts
# tested with Python24   vegaseat  6/2/2005

import difflib

text1 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Spell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Fashion
Ralph Nader's List of Pleasures
"""

text2 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Sell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Passion
Ralph Nader's List of Pleasures
"""

# create a list of lines in text1
text1Lines = text1.splitlines(1)
print "Lines of text1:"
for line in text1Lines:
  print line,

print

# dito for text2
text2Lines = text2.splitlines(1)
print "Lines of text2:"
for line in text2Lines:
  print line,

print  

diffInstance = difflib.Differ()
diffList = list(diffInstance.compare(text1Lines, text2Lines))

print '-'*50
print "Lines different in text1 from text2:"
for line in diffList:
  if line[0] == '-':
    print line,
Member Avatar
the1last
Newbie Poster
2 posts since Nov 2007
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Thanks for the advice guys. Using the difflib module things are up and running nicely. My only question at this point is how would the module react to files with many entires (say > 2000). I haven't had a chance to setup a test run like this yet, but I plan to soon.

Member Avatar
woooee
Posting Maven
2,798 posts since Dec 2006
Reputation Points: 783 [?]
Q&As Helped to Solve: 836 [?]
Skill Endorsements: 12 [?]
 
0
 

With some additions to the data, note that it reports "1. first different line" as a difference when it is not and doesn't find "Another line that is different". Sorting text1Lines and text2Lines should solve the first problem since it seems to be comparing in file order. This may not make a difference since the file appears to be in ascending date order already. If there are lines in the 2nd file that are not in the first, then you will also have to insert a
diffList = list(diffInstance.compare(text2Lines, text1Lines)) routine. In general, when comparing we want to know how it is comparing.

#!/usr/bin/python

# find the difference between two texts
# tested with Python24   vegaseat  6/2/2005

import difflib

text1 = """The World's Shortest Books:
Human Rights Advances in China
Add some text lines that are not in either
1. first different line
2. line 2 added
3. also a third
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Spell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Fashion
Ralph Nader's List of Pleasures
"""

text2 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Sell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Passion
Ralph Nader's List of Pleasures
Another line that is different
1. first different line
"""

# create a list of lines in text1
text1Lines = text1.splitlines(1)
##text1Lines.sort()                 ## uncomment to sort
print "Lines of text1:"
for line in text1Lines:
  print line,
print

# dito for text2
text2Lines = text2.splitlines(1)
##text2Lines.sort()                 ## uncomment to sort
print "Lines of text2:"
for line in text2Lines:
  print line,
print  

diffInstance = difflib.Differ()
diffList = list(diffInstance.compare(text1Lines, text2Lines))

print '-'*50
print "Lines different in text1 from text2:"
for line in diffList:
  if line[0] == '-':
    print line,
print
Member Avatar
vani priya
Newbie Poster
1 post since Jan 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 1 [?]
Skill Endorsements: 0 [?]
 
0
 

Hello,

I am very new to python and these days learning this langauge.
At present I have to work on comparision of above two files and
generate a third file for the percentage difference between the
values.

Kindly help.

With my best regards,
Vani
File1
*ID4U.1 = 3.2516E-11
*ID4U.2 = 9.6499E-15
*ID4U.3 = 9.6499E-15
*ID4U.4 = 9.6499E-15
*ID4U.5 = 9.6499E-15
*ID4U.6 = 9.6499E-15
*ID4U.7 = 9.6499E-15
*ID4U.8 = 1.4720E-14
*ID4U.9 = 2.9930E-14
*ID4U.10 = 1.1154E-13
upto *ID4U.146

File2
id4u.1 = 7.4778456e-10
id4u.2 = 7.4778308e-10
id4u.3 = 7.4778228e-10
id4u.4 = 7.4778228e-10
id4u.5 = 7.4778228e-10
id4u.6 = 7.4778228e-10
id4u.7 = 7.4778228e-10
id4u.8 = 7.4778939e-10
id4u.9 = 7.4780360e-10
id4u.10 = 7.4788812e-10
upto id4u.146

Example:
((Value of *ID4U.1- value of id4u.1)/ (Value of *ID4U.1))*100 or

((3.2516E-11 - 7.4778456e-10)/3.2516E-11)*100

Editor's note:
Please don't hijack older threads with your problems. Write your own thread, title it properly and state your problem and code you have tried.

Question Answered as of 4 Years Ago by woooee, vani priya and bumsfeld
Member Avatar
radk
Newbie Poster
1 post since Jul 2011
Reputation Points: -3 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
-1
 

The example code is very simple and useful.Thanks for the same.

You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article