0

Hi everyone,

I have a very large CSV file. 51427 lines to be exact.

I need to import the entire file into a MySQL database, however, the script times out due to server settings and slow connection (and maybe other reasons that I am not aware of).

So - I am now passing parameters START and LIMIT via address bar to import, like this:

http://my.server.address/import.php?start=10000&limit=1000

This reads the entire CSV file into an array, and starts at line 10000 of the array and inserts into the database until it reaches line 11000, and then terminates the script.

This works very nicely, however, I am not happy having to reach the entire 51427 lines of the CSV file into an array before processing.

Is there not a way where I can only read the required lines into an array? That would speed things up significantly.

Thanks in advance,

Kobus

8
Contributors
19
Replies
20
Views
7 Years
Discussion Span
Last Post by pakkupakku
Featured Replies
  • 1

    You shouldn't read in the whole file in your memory, this will result in an enormous memory-usage. Just 1. open the file 2. read in 1 line 3. parse it 4. store it, 5. goto step 2 Read More

0

Hi,

Sorry I have been absent. I didn't get a mail that there was a response to my mail. I have in the meantime just set my execution time and memory usage via script after consulting with our server experts. They raised it temporarily for me.

I was using a PHP Script to do the importing. If someone has a solution for this, it would be welcome for future occurrences of this.

Thanks!

Kobus

1

You shouldn't read in the whole file in your memory, this will result in an enormous memory-usage. Just
1. open the file
2. read in 1 line
3. parse it
4. store it,
5. goto step 2

0

n_e,

This is why I made the post. I want to know HOW to read an arbitrary amount of lines. the

fgetscsv()

function does not allow this.

Kobus

0

Ok. How about something like this:

<?php
$fh = fopen("myfile.csv", "r");

while ($line = fgets($fh)) { // Read one line at-a-time until end of file
        $words =  explode  (';', $line ); // break the line down to words, by separating on a ';'
        echo "This line consisted of the following words:\n";
        foreach ($words as $word) { //display all words
                echo "Word: ". $word. "\n";
        }
        // update the database here. 
}
fclose($fh);
?>

Edited by n_e: n/a

0

Thank you, n_e, but this still does not allow me to read, say, line 42595 from the file without having to read all the lines before it. That is my actual problem.

Kobus

0

why would you send 42000 lines if you want 1
the script that creates the csv file for upload, could it be smart enough to check for modified data,
then import all the lines of the much smaller file
I think I am suggesting a proceedural change to the data collection

database has a crc column
on update from remote, the list of id and crc are compared to local versions and only updated lines sent
may not be possible, thought process not any actual code

Edited by almostbob: n/a

0

why would you send 42000 lines if you want 1
the script that creates the csv file for upload, could it be smart enough to check for modified data,
then import all the lines of the much smaller file
I think I am suggesting a proceedural change to the data collection

database has a crc column
on update from remote, the list of id and crc are compared to local versions and only updated lines sent
may not be possible, thought process not any actual code

The thing is, I do not want only 1 record. I want all the lines, but I need to split the import into 1000 at a time. But because I split the execution to 1000 records at a time, I have to read all of them from the CSV, traverse the array, and start at, say 20000 and insert 1000 into the database. Next time, I start at 21000, and insert another 1000 in the database. But everytime, I need to read the entire file. That was before using ini_set to set memory and execution time. I had the sys admins allow me to do that once-off.

This is a semi-once-off import. We will do a whole dump once a month or so, and then future updates will be incremental, i.e. only changed records.

Thanks for your response.

Kobus

0

Could you turn on warnings and errors to see what happens?
Your code?

The script was timing out, as I have said. Executing too long. But that was solved by the sys admins allowing me to do a once-off import and setting directives via ini_set.

The base question is, which seems a bit hidden in the thread, was whether I can read an arbitrary amount of lines, say, lines 20000 to 21000, without having to read the entire file. fgetscsv() does not allow this.

Thanks for your reply.

Kobus

0

having a better understanding of the problem,, still no solution, more thoughts that again may not be appropriate
Is it possible on the source/script generating the .csv file to have it create sequential files each consisting of a subset of records
eg, id 0-999; 1000-1999; ...... x000-x999
and process 10(example) smaller files to keep execution time below timeout for each iteration of the scriptupload.csv :: contains a single line to tell the script how many files there are in the data set so the script knows how to handle
upload0001.csv
upload0002.csv
upload0003.csvwould give a simple way to jump to a known record

Edited by almostbob: n/a

0

having a better understanding of the problem,, still no solution, more thoughts that again may not be appropriate
Is it possible on the source/script generating the .csv file to have it create sequential files each consisting of a subset of records
eg, id 0-999; 1000-1999; ...... x000-x999
and process 10(example) smaller files to keep execution time below timeout for each iteration of the scriptupload.csv :: contains a single line to tell the script how many files there are in the data set so the script knows how to handle
upload0001.csv
upload0002.csv
upload0003.csvwould give a simple way to jump to a known record

Hi,

That can probably be done, but I would need to ask the client to do that, and they may not like it. Perhaps it is not such a bad idea. I will see what they say. Thanks!

Kobus

0

Thank you, xylude. My script that I currently uses is exactly the same as yours, just more complicated, as I need to make provision for different field names, and transform data before inserting.

The problem that I do have is that I have to read the entire CSV file into an array, make the changes to the array, and then write the array to CSV. My question was not how to import though, but how to prevent having to read the entire file into an array, only a arbitrary amount of rows.

But thanks anyways - I have found a work-around. The sys admins will set the server permissions for me on a monthly basis. The only problem now is that I can't fire it from a cron job which was the original plan, but that is okay.

I am considering this thread closed now, as there is obviously not a solution to my real problem.

Thanks for everyone's responses. :-)

Kobus

0

Just before you close this as unsolvable, it seems that what would be needed is some kind of intelligent client that would be able to break up a large file into pieces of an arbitrary size and then upload the pieces and provide appropriate info to the php program on the other end.

I found a Java applet that is supposed to do most of this. It seems that is can break a large file into pieces and upload each piece. What it doesn't seem to do (but should theoretically be possible) is to upload one or more pieces but not the whole thing. They say that you can use it as a <input type=file... replacement. So, in terms of how to deal with large files this would seem to be a step in the right direction. I don't have any experience dealing with Java applets but I'm going to have to give this a try.
http://jumploader.com/index.html

1

Perhaps a cron job could be set by your administrators, under their higher privileged account, to change the execution and memory limits, run your job, then change the execution and memory limits back. Might that work?

0

Just before you close this as unsolvable, it seems that what would be needed is some kind of intelligent client that would be able to break up a large file into pieces of an arbitrary size and then upload the pieces and provide appropriate info to the php program on the other end.

I found a Java applet that is supposed to do most of this. It seems that is can break a large file into pieces and upload each piece. What it doesn't seem to do (but should theoretically be possible) is to upload one or more pieces but not the whole thing. They say that you can use it as a <input type=file... replacement. So, in terms of how to deal with large files this would seem to be a step in the right direction. I don't have any experience dealing with Java applets but I'm going to have to give this a try.
http://jumploader.com/index.html

Hi Chris,

I have not worked with Java in 10 years, so I am not even going to try this. I thank you for your time though - the time required for me to do re-engineer the Java code, combined with the deadline I have, does not make it feasible.

Thank you anyways for this. I will keep it in mind for future reference.

Kobus

0

Perhaps a cron job could be set by your administrators, under their higher privileged account, to change the execution and memory limits, run your job, then change the execution and memory limits back. Might that work?

Hi scaifer,

Thank you. I will bring that to their attention and see if they see that as a viable solution to the problem. I think they will, as they were willing to set the permissions for me on a one-time basis.

Have a great day!

Kobus

0

i want to do the same : [The thing is, I do not want only 1 record. I want all the lines, but I need to split the import into 1000 at a time. But because I split the execution to 1000 records at a time, I have to read all of them from the CSV, traverse the array, and start at, say 20000 and insert 1000 into the database. Next time, I start at 21000, and insert another 1000 in the database. But everytime, I need to read the entire file. That was before using ini_set to set memory and execution time. I had the sys admins allow me to do that once-off.]


<<Email Snipped>>

Edited by __avd: Email snipped.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.