Hello

I want to sort a file and then uniq it, but ignoring the first field

so i have

REF | FOR | SUR
TLT090991|STEPHEN|GRIFFITHS
TLT090992|STEPHEN|GRIFFITHS

So i want to uniq but ignore the REF field

I had this but it doesnt work

cat $FILE | sort -t '|' -k2,3 | uniq > output

So what actual output are you expecting from that?

Does it matter which of TLT090991 or TLT090992 you get?

Reading the manual page seems to offer some ideas.

You can also "cut" out the first field before you sort, if you want :)

, Mike

Sorry?

I must have missed a post, because I could swear that you never posted any indication that you'd bothered to RTFM for uniq.

I posted a link at #2, and that's all the spoon-feeding you're going to get until you post something more concrete as to what you tried, the results (or lack thereof).

Sorry?

I must have missed a post, because I could swear that you never posted any indication that you'd bothered to RTFM for uniq.

I posted a link at #2, and that's all the spoon-feeding you're going to get until you post something more concrete as to what you tried, the results (or lack thereof).

Thats ok, no need to apologise

cat $FILE |wc -l
3
cat $FILE | sort -t '|' -k2,3 |tail -2| uniq > output

list of lines - 1 line of the first line.

Is this u want?

Comments
Maybe, but you're 3 YEARS TOO LATE to make a difference

Hi all! This can be done with a fairly simple one-liner!

As Salem pointed out, all the clues are in the man page, but I personally found the solution in 'sort' rather than 'uniq':

# The test file with your sample data
-> cat test.txt 
REF | FOR | SUR
TLT090991|STEPHEN|GRIFFITHS
TLT090992|STEPHEN|GRIFFITHS

#Test run with one-line sort command
-> sort -t\| -k2 -u test.txt 
REF | FOR | SUR
TLT090991|STEPHEN|GRIFFITHS

A trip through the man page for 'sort' reveals the following:

-t, --field-separator=SEP
use SEP instead of non-blank to blank transition

-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)

-u, --unique
with -c, check for strict ordering; without -c, output only the first of an equal run


TL;DR version:
-t sets the field separator, in this case "|" (escaped with \)
-k tells it which field to start with
-u says only print out unique lines (taking into consideration our starting position)

I hope this helps!
-G

This article has been dead for over six months. Start a new discussion instead.