I'm trying to compare 2 different CSV files, mark those differences respectively, then produce it as an output. However, my code seems to be only reading the last part of the lines from sample1.csv and sample2.csv as you can see below:

Sample1.csv
Planet,Account,Name,Station,City
Earth,1234,Pete,Nebula,Phoenix
Earth,1234,Pete,Nebula,Phoenix
Earth,1234,Pete,Nebula,Phoenix

Sample2.csv
Planet,Account,Name,Station,City
Earth,1234,Pete,Nebula,Wakanda
Earth,1234,Pete,Nebula,Montgomery
Earth,1234,Pete,Nebula,Carlo

Current Output
History,Planet,Account,Name,Station,City
Changed,Earth,1234,Pete,Nebula,Carlo

Expected Output
History,Planet,Account,Name,Station,City
Changed,Earth,1234,Pete,Nebula,Wakanda
Changed,Earth,1234,Pete,Nebula,Montgomery
Changed,Earth,1234,Pete,Nebula,Carlo

Here is the code I have:

import csv        
with open('old.csv', newline='') as f_old:
    csv_old = csv.reader(f_old, delimiter=',')
    header = next(csv_old)
    old_data = {row[0] : row for row in csv_old}

with open('new.csv', newline='') as f_new:
    csv_new = csv.reader(f_new, delimiter=',')
    header = next(csv_new)
    new_data = {row[0] : row for row in csv_new}
set_new_data = set(new_data)
set_old_data = set(old_data)    
added = [['Added'] + new_data[v] for v in set_new_data - set_old_data]
deleted = [['Deleted'] + old_data[v] for v in set_old_data - set_new_data]
in_both = set_old_data & set_new_data
changed = [['Changed'] + new_data[v] for v in in_both if old_data[v] != new_data[v]]
print(changed)    
with open('difference.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output, delimiter=',')
    csv_output.writerow(['History'] + header)
    csv_output.writerows(sorted(added + deleted + changed, key=lambda x: x[1:]))

Does anyone know how to get the expected output? Any help is appreciated Thanks!

Recommended Answers

All 6 Replies

Originally, the idea of code was from that thread. That code does perfectly in terms of comparisson, but when I have to add a new column and add respective changes in every row in the output, that code is not enough and thus I improved the code. However, the code that I have right now, only works if the first column of the sample files are completely different, soon as it sees same rows in the first column, the code breaks.

I think your specification needs a lot of work. "I'm trying to compare 2 different CSV files, mark those differences respectively, then produce it as an output." Your example looks more like a merge than a compare.

I think you should run the code and see that it actually compares, as i stated BEFORE. The program compares 2 different files and prints em except when they have identical information in the rows of the first column. What am trying to do is to tackle that condition. Do you understand my question?

Reverse engineering? I'll bow out now. Some demand such work but here I don't mind a challenge but if members don't take time to write what they need and want others to reverse engineer by reading their code, well, let's see who will do that.

I'll think about this for a bit. But as presented, the spec looks off.

Thanks..

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.