954,541 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

csv file duplicate removal

have a bit of an issue trying to obtain some data from a csv file using PERL. I can sort the file and remove any duplicates leaving only 4 or 5 rows containing data. My problem is that the data contained in the original file contains a lot more columns and when I try ro run this script it finds that all the data is unique.

I have the following fields within the orignal file:
LAO_START_WW,PROGRAM,ID,OP,PROBE_CARD,DEVREVSTEP,TEST_START,TESTER_ID

The data which I need to obtain and sort is within the op,probecard and tester_id fields.

How can I go about this?

The code that I use after manually deleting the fields that i do not require is as follows:

#!/usr/bin/perl -w

use strict;
my $csvfile = 'probecards.csv';
my $newfile = 'new.csv';
my $fieldnames = 1;
open (IN, "<$csvfile")  or die "Couldn't open input CSV file: $!";
open (OUT, ">$newfile") or die "Couldn't open output file: $!";
my $header;
$header = <IN> if $fieldnames;
my @data = sort <IN>;
while( <IN> ) {
    push @data, join "\t", (split /\t/)[4,5,8];
}
print OUT $header;
my $n = 0;
my $lastline = '';
foreach my $currentline (@data) {

  next if $currentline eq $lastline;
  print OUT $currentline;
  $lastline = $currentline;
  $n++;
}
close IN; close OUT;
print "Processing complete. In = " . scalar @data . " records, Out = $n records\n";
iNPUT CSV file:

LAO_START_WW,PROGRAM,ID,OP,PROBE_CARD,DEVREVSTEP,TEST_START,TESTER_ID
200812,12630M196,139,2660,S25E3N36,88BCRA,16/03/2008 12:05,IN01
200812,12630M196,1,2660,S25E3N36,88BLBHDA,16/03/2008 13:04,IN01
200812,12630M196,508,2660,S25E3N36,88BCRA,16/03/2008 13:41,IN01
200812,12630M196,437,2660,S25E3N35,88CLNHCC,16/03/2008 14:18,IN04
200812,12630M196,465,2660,S25E3N36,88BCRA,16/03/2008 15:34,IN01
200812,12630M196,27,2660,S25E3N36,88BCRA,16/03/2008 18:00,IN01
200812,12630M196,18,2660,S25E3N27,88BCRA,16/03/2008 19:03,IN03
200812,12630M196,11,2660,S25E3N36,88BCRA,17/03/2008 14:37,IN01
200812,12620M189,526,2660,S25E3N36,8PMVCVAE,17/03/2008 15:21,IN01
200812,12630M196,167,2660,S25E3N36,88BCRA,17/03/2008 19:02,IN01
200812,12630M196,652,2660,S25E3N36,88BCRA,17/03/2008 19:39,IN01
200812,12630M196,765,2660,S25E3N36,88CLNHCC,17/03/2008 20:15,IN01

Output required:

OP,PROBE_CARD,TESTER_ID
2660,S25E3N36,IN01
2660,S25E3N27,IN03
2660,S25E3N35,IN04


Any help would be grateful

I know its something to do with the duplicates being removed within the next if
but i cannot sort it out...maybe I have been looking at this to long

rgs
colin

LODEY
Newbie Poster
1 post since Mar 2008
Reputation Points: 10
Solved Threads: 0
 

You have this question also on DEVSHED where it has a number of replies.

KevinADC
Posting Shark
921 posts since Mar 2006
Reputation Points: 246
Solved Threads: 67
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You