Hi all,

I'm trying to set up code that will allow me to do the following:

Read in a csv file,
In columns 3 and 5 there is similar information, I want to count how many times each unique entry is in either of these columns. For example:

col 3: col 5:

1 6
2 4
1 2
2 1
3 8
4 9
5 3
3 1
5 6
6 6
6 6
4 3
6 4
6 4
6 4
6 5
6 6
6 8
6 1
6 9

From these two columns i would think that 6 is the most common (Although I haven't checked!) But with perl is it possible to count how often each unique value appears in both column 3 and 5 and then grep all the data so that just lines that contain the most common value will appear?

Hope that makes sense!

Thanks

Recommended Answers

All 13 Replies

A hash with the values as keys and a one up number as the values.

Thanks for your reply, but I should have said i'm a novice perl programmer, could you explain further?

Thanks

As in what?

You read the file, and for every line you pull the two column values you're looking for and then one up a hash entry with them.

i.e.

hashVariableForColumnThreeValues{columnThreeValueVariable}++;
hashVariableForColumnFiveValues{columnFiveValueVariable}++;

Ok well i've been trying something like this. Am I on the right tracks?

Thanks again!

#!/usr/bin/perl
    #use strict;
    #use warnings;
    #use Text::CSV;
    
my $dat_file = 'my_file.csv';
@data =('col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7');

open (F, $dat_file), print "File Opened\n\n" || die ("Could not open file");

print $data[1];

while ($line = <F>)
  {
    ($column1,$column2,$column3,$column4,$column5,$column6) = split ',', $line;
 
  $column3{%column3_num}++;
  $column5{%column5_num}++;

print(column3_num);

  close (F);

Again there may be some silly errors in this but I am only learning perl at present!

Thanks

$column3 not %column3_num

Also, split works as long as there are no commas in any of the data fields. You would be better of using that CSV library.

Then, you also need to pull out keys from the hashmaps that relate to the maximum values.

Using the sample data you provided that has two columns delimited by a space, running the following script:

#!/usr/bin/perl
use strict;
use warnings;

*ARGV = *DATA if scalar(@ARGV) == 0;

my %count; #Hash containing count of occurences of each unique value

while (<>) {
    my ($col_x_val, $col_y_val) = split; #splits record by space (default delimiter) 
    $count{$col_x_val}++;#Increment count if found in column 'x'
    $count{$col_y_val}++;#Increment count if found in column 'y'
}

foreach (sort how_to_sort keys %count){
    print "$_ occurs $count{$_} times.\n";
}

sub how_to_sort{ #Specify numeric sort by counted value in descending order
    $count{$b} <=> $count{$a};
}
__DATA__
1 6
2 4
1 2
2 1
3 8
4 9
5 3
3 1
5 6
6 6
6 6
4 3
6 4
6 4
6 4
6 5
6 6
6 8
6 1
6 9

gives the following output:

6 occurs 15 times.
4 occurs 6 times.
1 occurs 5 times.
3 occurs 4 times.
2 occurs 3 times.
5 occurs 3 times.
8 occurs 2 times.
9 occurs 2 times.

Thanks for your help.

The columns I put in was just for demonstration. Really it will be read from a comma delimited file (column 3 and 5) But apart from that its what i'm looking for?

Is there much other adjustment?

Well, look at the input and the output and then study the code and decide for yourself.

You were on the right track and should continue doing it on your own. You learn nothing when you simply grab code from somewhere else with no understanding of how it works (which that last post makes clear).

Thanks for your help.

The columns I put in was just for demonstration. Really it will be read from a comma delimited file (column 3 and 5) But apart from that its what i'm looking for?

Is there much other adjustment?

No other adjustment that I'm aware of. You already know how to split on something other than space and save the desired columns into scalar variables. For very simple CSV data like your example split works OK, but for more complex data that you'll have to handle some day you'll be better off using a module that parses the CSV into columns for you. As masijade already pointed out, time invested in learning how to use Text::CSV; or one of the other csv parser modules will pay off if you ever have to deal with column values like 1,302 or "eats, shoots, and leaves" that have embedded commas that you do NOT want to split on.

Ok, I have got the variables counting, but now I am having trouble sorting them.
I have looked at numerous tutorial websites and keep getting the same method but when I do implement it does not work. It is not giving an error, just not printing anything to screen,

#!/usr/bin/perl
    #use strict;
    #use warnings;
    #use Text::CSV;
   
my $dat_file = 'file.csv';
 my $index = {};
  my $index_test = {};
@data =('col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7');

open (F, $dat_file), print "File Opened\n\n" || die ("Could not open file");


while ($line = <F>)
  {


    ($data1,$data2,$data3,$data4,$data5,$data6) = split ',', $line;

if (exists($index{$data3}))
{
  $index->{ $data3 } = 1;

}
else
{
    $index->{ $data3 } =$index->{ $data3 }+1 ;

}

if (exists($index{$data5}))
{
  $index->{ $data5 } = 1;

}
else
{
    $index->{ $data5 } = $index->{ $data5 }+1 ;

}


}

foreach $value (sort {$index{$a} cmp $index{$b} }
          	keys %index)
{
     print "$value =>			$index{$value}\n";
}

  close (F);
 
  print"\n";

This is the code I have, Its the parts at the end where I am trying to sort and print the list to screen but it displays nothing.

Can anyone see what might be wrong with this?

Thanks in advance.

N

Ok, I have got the variables counting, but now I am having trouble sorting them.
I have looked at numerous tutorial websites and keep getting the same method but when I do implement it does not work. It is not giving an error, just not printing anything to screen,

#!/usr/bin/perl
    #use strict;
    #use warnings;
    #use Text::CSV;
   
my $dat_file = 'file.csv';
 my $index = {};
  my $index_test = {};
@data =('col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7');

open (F, $dat_file), print "File Opened\n\n" || die ("Could not open file");


while ($line = <F>)
  {


    ($data1,$data2,$data3,$data4,$data5,$data6) = split ',', $line;

if (exists($index{$data3}))
{
  $index->{ $data3 } = 1;

}
else
{
    $index->{ $data3 } =$index->{ $data3 }+1 ;

}

if (exists($index{$data5}))
{
  $index->{ $data5 } = 1;

}
else
{
    $index->{ $data5 } = $index->{ $data5 }+1 ;

}


}

foreach $value (sort {$index{$a} cmp $index{$b} }
          	keys %index)
{
     print "$value =>			$index{$value}\n";
}

  close (F);
 
  print"\n";

This is the code I have, Its the parts at the end where I am trying to sort and print the list to screen but it displays nothing.

Can anyone see what might be wrong with this?

Thanks in advance.

N

You said earlier "I want to count how many times each unique entry is in either of these columns [columns 3 and 5]." I don't see how your code is supposed to accomplish this. You don't need to test if the value exists (i.e. has already been counted and saved in your hash) -- just count it. You can say $index{$data3}++; and if that entry doesn't exist in your hash, it does now and has a value of 1. If it already exists in your hash, it now has been incremented by 1. That's much simpler than what your code seems to attempt. I think what is happening is nothing is being saved in your hash and so there is nothing to print.

Also, you have commented out the use strict; and the use warnings; -- why? It's nearly always better to have these enabled and to fix the errors and warnings that you get.

Another thing I noticed is your sort, {$index{$a} cmp $index{$b} is doing an alpha comparison on the values. If you want to sort numeric values numerically ( so that 2 comes after 10), use <=> instead of cmp.

Ok, I have now changed this to a database and have it working (sorting the values). However for some reason it only works with small files and not the large one that I need.
It is giving me errors/warnings that I think might be causing the problem, these are

Use of uninitialized value in addition (+) at simplify.pl line 41.
Use of uninitialized value in addition (+) at simplify.pl line 52.
Use of uninitialized value in addition (+) at simplify.pl line 41.
Use of uninitialized value in addition (+) at simplify.pl line 52.
Use of uninitialized value in addition (+) at simplify.pl line 52.
HASH(0x8647968)


Line 41 and 52 are as follows:

34. if (exists($index{$data3})) 
35.    { 
36.       $index->{ $data3 } = 1; 
37. 
38.    } 
39.    else 
40.    { 
41.        $index->{ $data3 } = $index->{ $data3 }+1 ; 
42. 
43.    }  
44. 
45.    if (exists($index{$data5})) 
46.    { 
47.        $index->{ $data5 } = 1; 
48. 
49.   } 
50.    else 
51.    { 
52.       $index->{ $data5 } = $index->{ $data5 }+1 ; 
53. 
54.    }

I have tried a few things to sort out these problems, some of them get rid of the messages but then the code does not sort the values etc.

Any ideas where the problem might be with these errors?

Thanks!
N

#!/usr/bin/perl
use strict;
use warnings;

my $index;
my $data5 = 'whatever';

#Because no value has yet been assigned to $index->{ $data5 }
# $index->{ $data5 } is currently uninitialized

$index->{ $data5 } = $index->{ $data5 }+1 ;
print '$index->{ $data5 } is equal to ' . "$index->{ $data5 }\n";

#Outputs the following:
#$index->{ $data5 } is equal to 1
#Use of uninitialized value in addition (+) at /home/david/Programming/Perl/temp.pl line 8.
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.