0

Have two CSV files containing client records and need to compare the two and then output to a third file those rows where there are differences to the values within the record (row) as well as output those records (rows) on the second file that are not on first file .

Example: File 1:
KeyField,Name,City, Zip,Location
123,Fred,Chicago,60558,A2
234,Mary,Orlando,12376,4L6
345,George,Pittsburgh,40567, 22
456,Peter,Topeka,00341,234
567,Doc,Birmingham,76543,H86

File 2:
KeyField,Name,City,Zip,Location
123,Fred,Chicago,60558,A2
234,Mary,Orlando,12376,4L6
345,George,Boston, 40567,22
456,Peter,Topeka,00341,234
567,Doc,Birmingham,7654,H86
678,Isabel,Guadalajara,87654,M111

The results should create a file containing :

345,George,Boston,40567,22
678,Isabel,Guadalajara,87654,M111

The following code gets me in the neighborhood as a visual check:

import os
import difflib
f=open('original.csv','r')  #open a file
f1=open('new.csv','r') #open another file to compare
str1=f.read()
str2=f1.read()
str1=str1.split()  #split the words in file by default through the spce
str2=str2.split()
d=difflib.Differ()     # compare and just print
diff=list(d.compare(str1,str2))
print '\n'.join(diff)

Can somebody suggest a quick solution, please?

3
Contributors
7
Replies
38
Views
1 Week
Discussion Span
Last Post by Terry_8
0

Yes, the above is very similar but it doesn't put to a file nor does it seem to put out the one additional record on the second file.

0

As to the file output, that's something basic I feel. There are folk that want every line of code needed for their app. Also, what is that additional record?

I see you are new here so if you are looking for a complete app that hits all the marks without you writing code, just go ahead and add that detail.

Edited by rproffitt: Added clarification.

1

With some testing, here's the elegantly simple solution:

import os

# Read in the original and new file          
orig = open('original.csv','r')
new = open('new.csv','r')

#in new but not in orig
bigb = set(new) - set(orig)

# To see results in console if desired
print(bigb)

# Write to output file    

with open('different.csv', 'w') as file_out:
    for line in bigb:
        file_out.write(line)

#close the files  
orig.close()    
new.close()    
file_out.close()
Votes + Comments
That's short and sweet.
0

An example output with a diff tool.

Screen_Shot_2018-01-11_at_13_08_21.png

If you use a merging tool like Meld you can interactively (and graphically) merge the two files together, combining rows that are only differ by whitespace and copying rows that exist on one side but not the other.

0

Thank you. While thus far, my solution meets the needs, I will try these other suggestions as well.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.