Python: Compare two CSV files - Output differences/additions

Question

Terry_8 14 Newbie Poster

7 Years Ago

Have two CSV files containing client records and need to compare the two and then output to a third file those rows where there are differences to the values within the record (row) as well as output those records (rows) on the second file that are not on first file .

Example: File 1:
KeyField,Name,City, Zip,Location
123,Fred,Chicago,60558,A2
234,Mary,Orlando,12376,4L6
345,George,Pittsburgh,40567, 22
456,Peter,Topeka,00341,234
567,Doc,Birmingham,76543,H86

File 2:
KeyField,Name,City,Zip,Location
123,Fred,Chicago,60558,A2
234,Mary,Orlando,12376,4L6
345,George,Boston, 40567,22
456,Peter,Topeka,00341,234
567,Doc,Birmingham,7654,H86
678,Isabel,Guadalajara,87654,M111

The results should create a file containing :

345,George,Boston,40567,22
678,Isabel,Guadalajara,87654,M111

The following code gets me in the neighborhood as a visual check:

import os
import difflib
f=open('original.csv','r')  #open a file
f1=open('new.csv','r') #open another file to compare
str1=f.read()
str2=f1.read()
str1=str1.split()  #split the words in file by default through the spce
str2=str2.split()
d=difflib.Differ()     # compare and just print
diff=list(d.compare(str1,str2))
print '\n'.join(diff)

Can somebody suggest a quick solution, please?

python

4 Contributors
10 Replies
32K Views
1 Year Discussion Span
Latest Post 6 Years Ago Latest Post by Nani_2

All 10 Replies

rproffitt 2,706 https://5calls.org

7 Years Ago

This sounds like the old discussion at https://www.daniweb.com/programming/software-development/threads/96638/trying-to-compare-the-contents-of-two-text-files-and-save-the-difference which used a few ideas.

rproffitt 2,706 https://5calls.org

7 Years Ago

As to the file output, that's something basic I feel. There are folk that want every line of code needed for their app. Also, what is that additional record?

I see you are new here so if you are looking for a complete app that hits all the marks without you writing code, just go ahead and add that detail.

Edited 7 Years Ago by rproffitt because: Added clarification.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Terry_8 14 Newbie Poster · Answer 1 · 2018-01-09T22:52:55+00:00

Yes, the above is very similar but it doesn't put to a file nor does it seem to put out the one additional record on the second file.

Terry_8 14 Newbie Poster · Answer 2 · 2018-01-10T17:03:54+00:00

With some testing, here's the elegantly simple solution:

import os

# Read in the original and new file          
orig = open('original.csv','r')
new = open('new.csv','r')

#in new but not in orig
bigb = set(new) - set(orig)

# To see results in console if desired
print(bigb)

# Write to output file    

with open('different.csv', 'w') as file_out:
    for line in bigb:
        file_out.write(line)

#close the files  
orig.close()    
new.close()    
file_out.close()

pty 882 Posting Pro · Answer 3 · 2018-01-11T08:54:46+00:00

pty 882 Posting Pro

7 Years Ago

I use diff for this kind of thing.

Edited 7 Years Ago by pty

pty 882 Posting Pro · Answer 4 · 2018-01-11T13:11:14+00:00

An example output with a diff tool.

If you use a merging tool like Meld you can interactively (and graphically) merge the two files together, combining rows that are only differ by whitespace and copying rows that exist on one side but not the other.

Terry_8 14 Newbie Poster · Answer 5 · 2018-01-12T17:19:09+00:00

Thank you. While thus far, my solution meets the needs, I will try these other suggestions as well.

Nani_2 15 Newbie Poster · Answer 6 · 2019-05-28T04:53:30+00:00

i need a python (jupyter notebook)code for correlation between two csv files in plots

Nani_2 15 Newbie Poster · Answer 7 · 2019-05-28T05:00:14+00:00

i need a python (jupyter notebook)code for correlation between two csv files in plots
i need a python (jupyter notebook)code for histogram in csv files data will be 400001x16

Neat project (Jupyter) but when posting a new question, you should make a new post and don't spare the details.

Nani_2 15 Newbie Poster · Answer 8 · 2019-05-28T05:02:55+00:00

Nani_2 15 Newbie Poster

6 Years Ago

Prepare a correlation between two csv files

Python: Compare two CSV files - Output differences/additions

Recommended Answers Collapse Answers

All 10 Replies

Recommended Answers