Hi Daniweb members,

I have two files that are not of equal dimensions and I need to match them on a unique key much like an SQL natural join. The files are space delimited and contain no headings.

File 1
rs121 10001 A G
rs125 10002 C T
rs126 10003 T A

File 2

rs121 11001
rs122 11002
rs126 11004
rs125 11003

Required result:

file 3:
rs121 11001 A G
rs125 11003 C T
rs126 11004 T A

#all the fields from file two replace the first two columns of file 1 and the rest of file 1 is maintained as is.

The two files would have about 600 000 rows and 6042 columns so it is rather a large task to accomplish. I've started using awk:

awk 'BEGIN {OFS=" "} NR==FNR { a[$2]=$0 ;next} { for ( i in a) { if ($2==i) { print a[i],$4,$5} }}' file2 file1   >  file3

This produces the following result:

file3
rs121 11001
A G
rs125 11003
C T
rs126 11004
T A

How do I remove the carriage returns? I think this is the problem?

I need to include another loop that will tell awk to print to the end of the file, I suppose something like this:

awk '{for(i=3;i<=NF;++i)print $i}'

but I'm not sure where to insert it into the above loop?

Many thanks,
Newbi

Unless using awk is mandatory, I'd recommend using join .

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.