Trying to compare the contents of two text files and save the difference

Reply

Join Date: Nov 2007
Posts: 2
Reputation: the1last is an unknown quantity at this point 
Solved Threads: 0
the1last the1last is offline Offline
Newbie Poster

Trying to compare the contents of two text files and save the difference

 
0
  #1
Nov 13th, 2007
I have two text files containing multiple lines of text from a datalogger, and I need to compare the two files and save the difference into a third text file.

ie....

text1:
10/13/01, 21:34:23, 4324
10/14/01, 09:12:32, 3423
10/15/01, 04:45:54, 7834

text2:
10/12/01, 43:34:34, 6453
10/13/01, 21:34:23, 4324
10/14/01, 09:12:32, 3423
10/15/01, 04:45:54, 7834
10/16/01, 05:34:26, 8323

text3:
10/12/01, 43:34:34, 6453
10/16/01, 05:34:26, 8323

I am able to accomplish this using a bash script, but since the rest of my code is in the python I would rather stick to using just python. Any advice would be great!

Thanks
Reply With Quote Quick reply to this message  
Join Date: Dec 2006
Posts: 1,000
Reputation: woooee is a jewel in the rough woooee is a jewel in the rough woooee is a jewel in the rough 
Solved Threads: 283
woooee woooee is offline Offline
Veteran Poster

Re: Trying to compare the contents of two text files and save the difference

 
0
  #2
Nov 13th, 2007
You can use lists or 2 sets. But you would want both set1.difference(set2) and set2.difference(set1). You can set up a process to read both files like a merge sort would, but the set solution seems more pythonic. Depends on how large the files are though.
Reply With Quote Quick reply to this message  
Join Date: Jul 2005
Posts: 1,221
Reputation: bumsfeld will become famous soon enough bumsfeld will become famous soon enough 
Solved Threads: 137
bumsfeld's Avatar
bumsfeld bumsfeld is offline Offline
Nearly a Posting Virtuoso

Re: Trying to compare the contents of two text files and save the difference

 
0
  #3
Nov 13th, 2007
Vegaseat left this example of the difflib module somewhere in the code snippets:
  1. # find the difference between two texts
  2. # tested with Python24 vegaseat 6/2/2005
  3.  
  4. import difflib
  5.  
  6. text1 = """The World's Shortest Books:
  7. Human Rights Advances in China
  8. "My Plan to Find the Real Killers" by OJ Simpson
  9. "Strom Thurmond: Intelligent Quotes"
  10. America's Most Popular Lawyers
  11. Career Opportunities for History Majors
  12. Different Ways to Spell "Bob"
  13. Dr. Kevorkian's Collection of Motivational Speeches
  14. Spotted Owl Recipes by the EPA
  15. The Engineer's Guide to Fashion
  16. Ralph Nader's List of Pleasures
  17. """
  18.  
  19. text2 = """The World's Shortest Books:
  20. Human Rights Advances in China
  21. "My Plan to Find the Real Killers" by OJ Simpson
  22. "Strom Thurmond: Intelligent Quotes"
  23. America's Most Popular Lawyers
  24. Career Opportunities for History Majors
  25. Different Ways to Sell "Bob"
  26. Dr. Kevorkian's Collection of Motivational Speeches
  27. Spotted Owl Recipes by the EPA
  28. The Engineer's Guide to Passion
  29. Ralph Nader's List of Pleasures
  30. """
  31.  
  32. # create a list of lines in text1
  33. text1Lines = text1.splitlines(1)
  34. print "Lines of text1:"
  35. for line in text1Lines:
  36. print line,
  37.  
  38. print
  39.  
  40. # dito for text2
  41. text2Lines = text2.splitlines(1)
  42. print "Lines of text2:"
  43. for line in text2Lines:
  44. print line,
  45.  
  46. print
  47.  
  48. diffInstance = difflib.Differ()
  49. diffList = list(diffInstance.compare(text1Lines, text2Lines))
  50.  
  51. print '-'*50
  52. print "Lines different in text1 from text2:"
  53. for line in diffList:
  54. if line[0] == '-':
  55. print line,
Should you find Irony, you can keep her!
Reply With Quote Quick reply to this message  
Join Date: Nov 2007
Posts: 2
Reputation: the1last is an unknown quantity at this point 
Solved Threads: 0
the1last the1last is offline Offline
Newbie Poster

Re: Trying to compare the contents of two text files and save the difference

 
0
  #4
Nov 14th, 2007
Thanks for the advice guys. Using the difflib module things are up and running nicely. My only question at this point is how would the module react to files with many entires (say > 2000). I haven't had a chance to setup a test run like this yet, but I plan to soon.
Reply With Quote Quick reply to this message  
Join Date: Dec 2006
Posts: 1,000
Reputation: woooee is a jewel in the rough woooee is a jewel in the rough woooee is a jewel in the rough 
Solved Threads: 283
woooee woooee is offline Offline
Veteran Poster

Re: Trying to compare the contents of two text files and save the difference

 
0
  #5
Nov 14th, 2007
With some additions to the data, note that it reports "1. first different line" as a difference when it is not and doesn't find "Another line that is different". Sorting text1Lines and text2Lines should solve the first problem since it seems to be comparing in file order. This may not make a difference since the file appears to be in ascending date order already. If there are lines in the 2nd file that are not in the first, then you will also have to insert a
diffList = list(diffInstance.compare(text2Lines, text1Lines)) routine. In general, when comparing we want to know how it is comparing.
  1. #!/usr/bin/python
  2.  
  3. # find the difference between two texts
  4. # tested with Python24 vegaseat 6/2/2005
  5.  
  6. import difflib
  7.  
  8. text1 = """The World's Shortest Books:
  9. Human Rights Advances in China
  10. Add some text lines that are not in either
  11. 1. first different line
  12. 2. line 2 added
  13. 3. also a third
  14. "My Plan to Find the Real Killers" by OJ Simpson
  15. "Strom Thurmond: Intelligent Quotes"
  16. America's Most Popular Lawyers
  17. Career Opportunities for History Majors
  18. Different Ways to Spell "Bob"
  19. Dr. Kevorkian's Collection of Motivational Speeches
  20. Spotted Owl Recipes by the EPA
  21. The Engineer's Guide to Fashion
  22. Ralph Nader's List of Pleasures
  23. """
  24.  
  25. text2 = """The World's Shortest Books:
  26. Human Rights Advances in China
  27. "My Plan to Find the Real Killers" by OJ Simpson
  28. "Strom Thurmond: Intelligent Quotes"
  29. America's Most Popular Lawyers
  30. Career Opportunities for History Majors
  31. Different Ways to Sell "Bob"
  32. Dr. Kevorkian's Collection of Motivational Speeches
  33. Spotted Owl Recipes by the EPA
  34. The Engineer's Guide to Passion
  35. Ralph Nader's List of Pleasures
  36. Another line that is different
  37. 1. first different line
  38. """
  39.  
  40. # create a list of lines in text1
  41. text1Lines = text1.splitlines(1)
  42. ##text1Lines.sort() ## uncomment to sort
  43. print "Lines of text1:"
  44. for line in text1Lines:
  45. print line,
  46. print
  47.  
  48. # dito for text2
  49. text2Lines = text2.splitlines(1)
  50. ##text2Lines.sort() ## uncomment to sort
  51. print "Lines of text2:"
  52. for line in text2Lines:
  53. print line,
  54. print
  55.  
  56. diffInstance = difflib.Differ()
  57. diffList = list(diffInstance.compare(text1Lines, text2Lines))
  58.  
  59. print '-'*50
  60. print "Lines different in text1 from text2:"
  61. for line in diffList:
  62. if line[0] == '-':
  63. print line,
  64. print
Last edited by woooee; Nov 14th, 2007 at 12:59 pm.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC