Hello all, this is my first post here.

I seem to have a problem dealing with data, csvs and dictionaries.

Here is a snippet of contents of file 1 (Node,x,y,z):
1,19.0,18.2,22.4
2,8.4,9.4,10.2
4,22.2,2.5,3.1
7,3.2,6.1,4.3
10,3.1,4.2,33.7
11,3.7,23.8,23.4

Here is a snippet contents of file 2 (Node,x,y,z):
1,4.2,7.3,9.5
2,4.1,8.4,9.2
4,2.3,4.6,3.2
11,2.5,2.7,32.6

File 1 in reality has over 10000 lines of data and file 2 as over 2000 lines of data.

Here is my code so far:

import sys
import csv

#Place file1 into a dictionary
f1 = csv.reader(open('file1.csv','rb'))
f1_dict_data = {}
for node1, x1, y1, z1 in f1:
    f1_dict_data[node1] = x1,y1,z1
print "Length : %d" % len (f1_dict_data)

#Place file2 into a dictionary
f2 = csv.reader(open('file2.csv', 'rb'))
f2_dict_data = {}
for node2, x2, y2, z2 in f2:
    f2_dict_data[node2] = x2,y2,z2
print "Length : %d" % len (f2_dict_data)

# write compared dictionaries to file3
NewCoWriter = csv.writer(open('file3.csv', 'wb', buffering=0))

#Comparing the dictionaries
for key in f1_dict_data:
    if key in f2_dict_data:
        f1_dict_data[key] = f2_dict_data[key]
        a1 = key, f2_dict_data[key]
        NewCoWriter.writerow(a1)
    else:
        a2 = key, f1_dict_data[key]
        NewCoWriter.writerow(a2)

This is the results produced by the code so far:
11,"('2.5', '2.7', '32.6')"
10,"('3.1', '4.2', '33.7')"
1,"('4.2', '7.3', '9.5')"
2,"('4.1', '8.4', '9.2')"
4,"('2.3', '4.6', '3.2')"
7,"('3.2', '6.1', '4.3')"

Outcome
Basically this is the results I am after except the following problems:

Problems
1) I can't seem to find a solution for writing the data into a .csv format where all keys and values are separated by a delimiter, e.g. 11,2.5,2.7,32.6 ? Perhaps it would be a better idea to write it out into a formatted .txt file instead? The reason I have used dictionaries is as suggested is because I can then compare two files.
2) I will to sort the data produced in file 3 by the "Node" in ascending order, I have tried ways of doing so without any success.?
3) For some reason when I run this code on the 1000+ lines .csv files all of the dictionary values seem to get jumbled up?

I would really appreciate input from the members. Thanks.

Recommended Answers

All 6 Replies

I unfortunately do not know so deeply to csv module, but maybe you trying to do something like this (without file input to leave something to you also to do):

from collections import OrderedDict

data1 = """
1,19.0,18.2,22.4
2,8.4,9.4,10.2
4,22.2,2.5,3.1
7,3.2,6.1,4.3
10,3.1,4.2,33.7
11,3.7,23.8,23.4
""".splitlines()

data2 = """
1,4.2,7.3,9.5
2,4.1,8.4,9.2
4,2.3,4.6,3.2
11,2.5,2.7,32.6
""".splitlines()


def make_dict(data):
    return OrderedDict((d.split(',',1)[0],d) for d in data if d)

updated = make_dict(data1)
updated.update(make_dict(data2))

print('\n'.join(updated.values()))

"""Output:
1,4.2,7.3,9.5
2,4.1,8.4,9.2
4,2.3,4.6,3.2
7,3.2,6.1,4.3
10,3.1,4.2,33.7
11,2.5,2.7,32.6
"""

Wow, thank-you pyTony, the code is brilliant. How could I modify the code i want to read the data from .csv files instead of reading the data from instead the code? Thanks again.

You would use for example

with open('the_csv.csv') as infile:
    updated = make_dict(infile)

or similar

You must then change the make_dict slightly as file has the newlines intact:

def make_dict(data):
    return OrderedDict((d.split(',',1)[0],d.rstrip()) for d in data if d)

Thanks, I am tring to incoportate the open file code into the your code above without much success. I am not sure where I am going wrong? How could I incorporate the open file code into the main code? I have two .csv file where data1 is in file1.csv and data2 is in file2.csv.

Thanks so much for you help!

If you are not able to integrate this simple code, I think you should spend still some more time with basics before going to more ambitious things. Like review your knowledge compared to http://docs.python.org/tutorial/

I can't seem to find a solution for writing the data into a .csv format where all keys and values are separated by a delimiter, e.g. 11,2.5,2.7,32.6 ? Perhaps it would be a better idea to write it out into a formatted .txt file instead?

Using a txt file and writing to it so it is a csv file would be a good solution here

f3 = open('file3.csv', "w") 
...

x, y, z = f1_dict_data[key] 
if key in f2_dict_data:
    x, y, z = f2_dict_data[key] 
f3.write("%s, %s, %s, %s\n" % (key, x, y, z))

I will to sort the data produced in file 3 by the "Node" in ascending order, I have tried ways of doing so without any success.?

First, "Node" appears to be an integer so you have to convert to an integer before using it as the key for the dictionary, otherwise it sorts and compares as a string and will give different results. Then sort the keys and access the dictionary in sorted order, this assumes that you do not know if the original files are in order or not.

for node1, x1, y1, z1 in f1:
    f1_dict_data[int(node1)] = x1,y1,z1

f1_keys = f1_dict_data.keys()
f1_keys.sort()
for key in f1_keys:
    x, y, z = f1_dict_data[key] 
    if key in f2_dict_data:
        x, y, z = f2_dict_data[key] 
    f3.write("%d,%s,%s,%s\n" % (key, x, y, z))
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.