have a bit of an issue trying to obtain some data from a csv file using PERL. I can sort the file and remove any duplicates leaving only 4 or 5 rows containing data. My problem is that the data contained in the original file contains a lot more columns and when I try ro run this script it finds that all the data is unique.

I have the following fields within the orignal file:
LAO_START_WW,PROGRAM,ID,OP,PROBE_CARD,DEVREVSTEP,TEST_START,TESTER_ID

The data which I need to obtain and sort is within the op,probecard and tester_id fields.

How can I go about this?

The code that I use after manually deleting the fields that i do not require is as follows:

#!/usr/bin/perl -w

use strict;
my $csvfile = 'probecards.csv';
my $newfile = 'new.csv';
my $fieldnames = 1;
open (IN, "<$csvfile")  or die "Couldn't open input CSV file: $!";
open (OUT, ">$newfile") or die "Couldn't open output file: $!";
my $header;
$header = <IN> if $fieldnames;
my @data = sort <IN>;
while( <IN> ) {
    push @data, join "\t", (split /\t/)[4,5,8];
}
print OUT $header;
my $n = 0;
my $lastline = '';
foreach my $currentline (@data) {

  next if $currentline eq $lastline;
  print OUT $currentline;
  $lastline = $currentline;
  $n++;
}
close IN; close OUT;
print "Processing complete. In = " . scalar @data . " records, Out = $n records\n";
iNPUT CSV file:

LAO_START_WW,PROGRAM,ID,OP,PROBE_CARD,DEVREVSTEP,TEST_START,TESTER_ID
200812,12630M196,139,2660,S25E3N36,88BCRA,16/03/2008 12:05,IN01
200812,12630M196,1,2660,S25E3N36,88BLBHDA,16/03/2008 13:04,IN01
200812,12630M196,508,2660,S25E3N36,88BCRA,16/03/2008 13:41,IN01
200812,12630M196,437,2660,S25E3N35,88CLNHCC,16/03/2008 14:18,IN04
200812,12630M196,465,2660,S25E3N36,88BCRA,16/03/2008 15:34,IN01
200812,12630M196,27,2660,S25E3N36,88BCRA,16/03/2008 18:00,IN01
200812,12630M196,18,2660,S25E3N27,88BCRA,16/03/2008 19:03,IN03
200812,12630M196,11,2660,S25E3N36,88BCRA,17/03/2008 14:37,IN01
200812,12620M189,526,2660,S25E3N36,8PMVCVAE,17/03/2008 15:21,IN01
200812,12630M196,167,2660,S25E3N36,88BCRA,17/03/2008 19:02,IN01
200812,12630M196,652,2660,S25E3N36,88BCRA,17/03/2008 19:39,IN01
200812,12630M196,765,2660,S25E3N36,88CLNHCC,17/03/2008 20:15,IN01

Output required:

OP,PROBE_CARD,TESTER_ID
2660,S25E3N36,IN01
2660,S25E3N27,IN03
2660,S25E3N35,IN04

Any help would be grateful

I know its something to do with the duplicates being removed within the next if
but i cannot sort it out...maybe I have been looking at this to long

rgs
colin

You have this question also on DEVSHED where it has a number of replies.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.