Start New Discussion within our Software Development Community


I have a similar requirement.
We have a huge csv file (2GB odd) that has to be loaded into our database.
We're concerned with only two fields here say col1 and col2 (of the 32 fields)
before loading into the database we need to check if there is a change in col2 for the same value of col1 , for the file as extracted from database.
Example : say below is the format of the file:
And say below is the extract of col1 and col2 from the table into which the file loads (before the file load):
As we can see, the identity AAAAAAA and CCCCCCC have moved from type Y (as exists in the table) to type X (as seen from the extract)
SO these records have to identified for updating the table with the new types.

Please let me know the simplest way to do this...
Also do suggest if querying from the database directly would be better than extracting form the databse and comparing the file with the extract.

You can read through the first file line by line, and for each line you can check if it represents a required update to the database. It doesn't matter too much that the first file is huge because you only hold one line at a time in memory.

However, if the second file extracted from the database is also huge, that would pose a problem because the simplest thing would be to load it all into a hash or some data structure in memory in order to look up each key from your first file. The data structure holding the data from the second file could possibly exceed your available memory, so you are better off looking up each key from the original table in the database instead. If it's a relational database you can download DBI and the appropriate DBD module from CPAN and write a query that will look up each key and retrieve the associated value from the database table.

This article has been dead for over six months. Start a new discussion instead.