0

Hello everyone,

I am trying to merge large number of CSV files into one, using the following "awk" command, but if I increase the number of files more than 10,000, it then gives me "argument too long" error.

Can someone suggest how I can manage to go beyond the limit? I am looking to merge some ~100,000 csv files together.

awk '{print $0"\t"FILENAME}' *.csv >  merged.csv

Thanks.

4
Contributors
5
Replies
41
Views
1 Year
Discussion Span
Last Post by Gribouillis
0

To me this sounds like our old question of how to merge that many files. While I won't ask why not cat it, I would break it up a little with say 26, 36 or 62 commands in your script. Support I did a..z (a.csv, b.csv...z.csv) and then 0 to 9 and so on.

Remember that this will take a long time since you remember what happens to folder/file speed with you put tens of thousands of files in a folder. You can help it along by putting the merged.csv in another folder.

0

This is most likely a command line length problem. I'd use a bash shell for loop and as rproffitt didn't suggest I would use cat to merge each candidate file into the merged.csv file.

0

Thanks for the suggestions.

The reason why I didn't (or maybe I could?) use 'cat' is because I want to also append the name of the file as a collumn within my "merged.csv" file. i.e. if we look at these 2 CSV files:

file_1.csv
1, 2, 3
2, 3, 4

file_2.csv
a, b, c
b, c, d

After merge, I would want something like this:

merged.csv
1, 2, 3, file_1.csv
2, 3, 4, file_1.csv
a, b, c, file_2.csv
b, c, d, file_2.csv

Reason is, I need to keep track of each record where (which file) they came from.

0

You could start with

find /absolute/path/to/csv/directory -maxdepth 1 -name "*.csv" > ~/csvlist.txt

this would create a file with the list of all csv files to process.

Edit: Then you would ask a script to traverse csvlist.txt and call a command

awk '{print $0"\t"FILENAME}' fubar.csv >>  ~/merged.csv

for each filename met. Note that your command seems to insert a tab instead
of a comma in the csv file. Check the csv dialect and encoding.

Edit: changed command to avoid * in arguments

Edited by Gribouillis

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.