comparing files on two columns

Question

naieev 0 Newbie Poster

13 Years Ago

Hi,

I have a similar requirement.
We have a huge csv file (2GB odd) that has to be loaded into our database.
We're concerned with only two fields here say col1 and col2 (of the 32 fields)
before loading into the database we need to check if there is a change in col2 for the same value of col1 , for the file as extracted from database.
Example : say below is the format of the file:
Col1,Col2
AAAAAAA,X
BBBBBBB,Y
CCCCCCC,X
TTTTTTT,X
GGGGGGG,Y
And say below is the extract of col1 and col2 from the table into which the file loads (before the file load):
Col1,Col2
AAAAAAA,Y
BBBBBBB,Y
CCCCCCC,Y
TTTTTTT,X
GGGGGGG,Y
As we can see, the identity AAAAAAA and CCCCCCC have moved from type Y (as exists in the table) to type X (as seen from the extract)
SO these records have to identified for updating the table with the new types.

Please let me know the simplest way to do this...
Also do suggest if querying from the database directly would be better than extracting form the databse and comparing the file with the extract.

perl

2 Contributors
1 Reply
135 Views
13 Hours Discussion Span
Latest Post 13 Years Ago Latest Post by d5e5

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

d5e5 109 Master Poster · Answer 1 · 2012-03-19T20:51:20+00:00

You can read through the first file line by line, and for each line you can check if it represents a required update to the database. It doesn't matter too much that the first file is huge because you only hold one line at a time in memory.

However, if the second file extracted from the database is also huge, that would pose a problem because the simplest thing would be to load it all into a hash or some data structure in memory in order to look up each key from your first file. The data structure holding the data from the second file could possibly exceed your available memory, so you are better off looking up each key from the original table in the database instead. If it's a relational database you can download DBI and the appropriate DBD module from CPAN and write a query that will look up each key and retrieve the associated value from the database table.