awk command gives "Argument too long" error.

Question

new_2_java -3 Junior Poster

9 Years Ago

Hello everyone,

I am trying to merge large number of CSV files into one, using the following "awk" command, but if I increase the number of files more than 10,000, it then gives me "argument too long" error.

Can someone suggest how I can manage to go beyond the limit? I am looking to merge some ~100,000 csv files together.

awk '{print $0"\t"FILENAME}' *.csv >  merged.csv

Thanks.

shell-scripting unix

4 Contributors
5 Replies
1K Views
17 Hours Discussion Span
Latest Post 9 Years Ago Latest Post by Gribouillis

All 5 Replies

rproffitt 2,706 https://5calls.org

9 Years Ago

To me this sounds like our old question of how to merge that many files. While I won't ask why not cat it, I would break it up a little with say 26, 36 or 62 commands in your script. Support I did a..z (a.csv, b.csv...z.csv) and then 0 to 9 and so on.

Remember that this will take a long time since you remember what happens to folder/file speed with you put tens of thousands of files in a folder. You can help it along by putting the merged.csv in another folder.

Gribouillis 1,391 Programming Explorer

9 Years Ago

Use python

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

rubberman 1,355 Nearly a Posting Virtuoso Featured Poster · Answer 1 · 2016-05-24T12:21:47+00:00

This is most likely a command line length problem. I'd use a bash shell for loop and as rproffitt didn't suggest I would use cat to merge each candidate file into the merged.csv file.

new_2_java -3 Junior Poster · Answer 2 · 2016-05-24T19:06:33+00:00

Thanks for the suggestions.

The reason why I didn't (or maybe I could?) use 'cat' is because I want to also append the name of the file as a collumn within my "merged.csv" file. i.e. if we look at these 2 CSV files:

file_1.csv
1, 2, 3
2, 3, 4

file_2.csv
a, b, c
b, c, d

After merge, I would want something like this:

merged.csv
1, 2, 3, file_1.csv
2, 3, 4, file_1.csv
a, b, c, file_2.csv
b, c, d, file_2.csv

Reason is, I need to keep track of each record where (which file) they came from.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 3 · 2016-05-24T19:42:50+00:00

You could start with

find /absolute/path/to/csv/directory -maxdepth 1 -name "*.csv" > ~/csvlist.txt

this would create a file with the list of all csv files to process.

Edit: Then you would ask a script to traverse csvlist.txt and call a command

awk '{print $0"\t"FILENAME}' fubar.csv >>  ~/merged.csv

for each filename met. Note that your command seems to insert a tab instead
of a comma in the csv file. Check the csv dialect and encoding.

Edit: changed command to avoid * in arguments

awk command gives "Argument too long" error.

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers