I have file which is having large data in it. and there are some repeated rows. Basic idea here is : Sort data, remove duplicates based on first field and then print the whole....

I have tried teh following but no help..

#!/bin/sh
if [ $# -ne 1 ]
then
    echo "Usage - $0  file-name"
    exit 1
fi
if [ -f $1 ]
then
    echo "$1 file exist"
    sort -u $1 > results.cvs
    awk '!x[$0]++' results.cvs > results-new.cvs
 else
    echo "Sorry, $1 file does not exist"
fi

Input data and expected out put data Attached :

trying in HP-UNIX

Recommended Answers

All 9 Replies

I would use the sort utility for this one..

I have file which is having large data in it. and there are some repeated rows. Basic idea here is : Sort data, remove duplicates based on first field and then print the whole....

I have tried teh following but no help..

#!/bin/sh
if [ $# -ne 1 ]
then
    echo "Usage - $0  file-name"
    exit 1
fi
if [ -f $1 ]
then
    echo "$1 file exist"
    sort -u $1 > results.cvs
    awk '!x[$0]++' results.cvs > results-new.cvs
 else
    echo "Sorry, $1 file does not exist"
fi

Input data and expected out put data Attached :

trying in HP-UNIX

Actually looking at your script, you have sort...
Why are you using a temp file results.cvs. Won't it make more sense to pipe the result of sort directly into awk.

Like this..

#!/bin/sh
if [ $# -ne 1 ]
then
	echo "Usage - $0  file-name"
	exit 1
fi

touch results.cvs

if [ -f $1 ]
then
	echo "$1 file exist"
	sort -u $1 | awk '!x[$0]++' > results.cvs
else
	echo "Sorry, $1 file does not exist"
fi

Note - You really should have the proper exits in your program.

Actually, I am new to shell scriting.
you mean to say:

sort -u | awk '!x[$0]++'  $1 > results-new.cvs

if i had to write this in Vb script i would like this:

Option Explicit
Dim objFSO, strInputFile, strOutFile, objTextFile
Dim strData, strLine, arrLines, i, j, firstLine, secondLine, arrIds
CONST ForReading = 1  
Const ForAppending = 8 
strInputFile = "inputfile.cvs"
strOutFile = "outputfile.cvs"
Set objFSO = CreateObject("Scripting.FileSystemObject")
strData = objFSO.OpenTextFile(strInputFile,ForReading).ReadAll
arrLines = Split(strData,vbCrLf)
Set objFSO = Nothing
for i = 0 to UBound(arrlines)
    if arrlines(i) <> "" then
       firstLine = split(arrlines(i), ",")
       for j = i + 1 to UBound(arrlines)
           if arrlines(j) <> "" then
              secondLine = split(arrlines(j), ",")
              if(firstLine(0) = secondLine(0)) then
                 arrlines(j) = ""
              end if
           end if
        next
    end if
next

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile(strOutFile, ForAppending, True)

for each strline in arrlines
    if strline <> "" then
        objTextFile.WriteLine(strline)
    end if
next
objTextFile.Close
Set objFSO = Nothing

Actually, I am new to shell scriting.
you mean to say:

sort -u | awk '!x[$0]++' $1 > results-new.cvs

Almost $1 belongs with the sort command

sort -u $1 | awk '!x[$0]++'   > results-new.cvs

I never used VB script. It looks extravagant.

The worst thing you can do while bash shell scripting is think like a 'traditional' programmer. Things are done differently in bash scripting.

I am nt getting the desired out put here! I am still getting the duplicates...

my requirement is copare each and every first filed..

when i executed this i am getting three records:

awk 'x[$1]++ == 1 { print $1 " is duplicated"}' inputfile.cvs

is there anyway to compare only first field and print full data...

I am nt getting the desired out put here! I am still getting the duplicates...

my requirement is copare each and every first filed..

when i executed this i am getting three records:

awk 'x[$1]++ == 1 { print $1 " is duplicated"}' inputfile.cvs

is there anyway to compare only first field and print full data...

That's because awk is reading the file. You need to have sort read the file and pipe the results to awk...Like I showed you in the above postings.

cvs files are comma delimited right? You may have to set the sort delimiter with -t

Thank you! :) now i understood! this is working fine!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.