Have two CSV files containing client records and need to compare the two and then output to a third file those rows where there are differences to the values within the record (row) as well as output those records (rows) on the second file that are not on first file .

Example: File 1:
KeyField,Name,City, Zip,Location
123,Fred,Chicago,60558,A2
234,Mary,Orlando,12376,4L6
345,George,Pittsburgh,40567, 22
456,Peter,Topeka,00341,234
567,Doc,Birmingham,76543,H86

File 2:
KeyField,Name,City,Zip,Location
123,Fred,Chicago,60558,A2
234,Mary,Orlando,12376,4L6
345,George,Boston, 40567,22
456,Peter,Topeka,00341,234
567,Doc,Birmingham,7654,H86
678,Isabel,Guadalajara,87654,M111

The results should create a file containing :

345,George,Boston,40567,22
678,Isabel,Guadalajara,87654,M111

The following code gets me in the neighborhood as a visual check:

import os
import difflib
f=open('original.csv','r')  #open a file
f1=open('new.csv','r') #open another file to compare
str1=f.read()
str2=f1.read()
str1=str1.split()  #split the words in file by default through the spce
str2=str2.split()
d=difflib.Differ()     # compare and just print
diff=list(d.compare(str1,str2))
print '\n'.join(diff)

Can somebody suggest a quick solution, please?

Recommended Answers

All 10 Replies

Yes, the above is very similar but it doesn't put to a file nor does it seem to put out the one additional record on the second file.

As to the file output, that's something basic I feel. There are folk that want every line of code needed for their app. Also, what is that additional record?

I see you are new here so if you are looking for a complete app that hits all the marks without you writing code, just go ahead and add that detail.

With some testing, here's the elegantly simple solution:

import os

# Read in the original and new file          
orig = open('original.csv','r')
new = open('new.csv','r')

#in new but not in orig
bigb = set(new) - set(orig)

# To see results in console if desired
print(bigb)

# Write to output file    

with open('different.csv', 'w') as file_out:
    for line in bigb:
        file_out.write(line)

#close the files  
orig.close()    
new.close()    
file_out.close()
commented: That's short and sweet. +14

I use diff for this kind of thing.

An example output with a diff tool.

Screen_Shot_2018-01-11_at_13_08_21.png

If you use a merging tool like Meld you can interactively (and graphically) merge the two files together, combining rows that are only differ by whitespace and copying rows that exist on one side but not the other.

Thank you. While thus far, my solution meets the needs, I will try these other suggestions as well.

i need a python (jupyter notebook)code for correlation between two csv files in plots

i need a python (jupyter notebook)code for correlation between two csv files in plots
i need a python (jupyter notebook)code for histogram in csv files data will be 400001x16

commented: Neat project (Jupyter) but when posting a new question, you should make a new post and don't spare the details. +15

Prepare a correlation between two csv files

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.