I have a delim-tab file

ID     NAME   FAMILYTAG   EFFECT
001  John      Black               Positive
002  Kate      Rhodes,Mich           Positive
003  Aaron   Sunders               Negative
004  Shirley  Rhodes               Negative
005  Dexter    Sunders,Hark        Positive

I want to input this file(which is much larger) and read in a name for eg: Kate. I want the script to read Kate, recognize it's family tag i.e it contains "Rhodes" and then output the other family member i.e Shirley. Is there a way to do this? The output file will look like this

Kate  Rhodes 
Shirley Rhodes

Recommended Answers

All 20 Replies

Really, you could use any RDBMS to solve this. However, it was real fun doing it with Perl. The codes below does what you want though, but note it will only use the name to find other family members and print **as you wanted it **, not the order way round!
Here we go:

#!/usr/bin/perl
use warnings;
use strict;
use Carp qw(croak);

croak
"Usage: script.pl db_name_file.txt, you must specify the file that contained names to search from"
  unless defined $ARGV[0];    ## error msg if input file is not specified

print "Enter name to search: ";
my %has_val;
chomp( my $name = <STDIN> );    ## get name to search from user
my $input_file = $ARGV[0];
open my $fh, '<', $input_file
  or croak "can't open this file:$!";    ## open the file to read from
while (<$fh>) {
 next if !/^\d+?/;
  if (m/\d+?\s+?(.+?\s+?.+?)(\s+?|,).+?/) {
    my ( $name_id, $fam_name ) = split /\s+?/, $1;
    $has_val{$name_id} = $fam_name;
  }
}
close $fh or croak "can't close this file:$!";    ## close file

$name = ucfirst( lc($name) );    ## this will yield first letter capital always
foreach my $fnd ( keys %has_val ) {
  if ( $fnd eq $name ) {
    my $surname = $has_val{$fnd};
    while ( my ( $key, $value ) = each %has_val ) {
        print $key, '  ', $value, $/ if $surname eq $value;
    }
  }
}

Hi,
Thanks for the reply. I am not quite sure what this line means,

"Usage: script.pl db_name_file.txt, you must specify the file that contained names to search from"

I understand it is my input file with all the names and family tags so am I to input it like this:

"Usage: script.pl families.txt, you must specify the file that contained names to search from" ?

What is script.pl in that case? Sorry for such a basic query but I am new to perl and till now I have been dealing with straighforward inputs like the following:

my $input_gene_file = "/Users/Documents/myfolder/file.txt";
die "Cannot open $input_gene_file\n" unless (open(IN, $input_gene_file));

Can you pls elaborate on this? Pls let me know
Thanks in advance

Hi perllearner007,

Let me start by saying to run this program from your Command Line Interface (CLI)- all you need do is this:
script.pl families.txt [ where script.pl is the whatever name you give to your Perl program and families.txt is whatever name you give to the families names file you are searching from ]

I am not quite sure what this line means,

"Usage: script.pl db_name_file.txt, you must specify the file that contained names to search from"

I understand it is my input file with all the names and family tags so am I to input it like this:

"Usage: script.pl families.txt, you must specify the file that contained names to search from" ?

What is script.pl in that case?

Please note that lines 6-8 in my initial post is a single command line:

    croak "Usage: script.pl db_name_file.txt, you must specify the file that contained names to search from" unless defined $ARGV[0]; 

The above command uses croak from module Carp, which is a lot better in telling where and what the error occurs instead of just using die.
So want does it says?
There it is: "Tell whoever user who calls this program script.pl [or whatever name you gives the program ], without specifying the families.txt [ or whatever name you call the file containing names] file, that he/she must run this program like this : Let the script.pl [or whatever name you gives the program ], be followed by the families.txt [ or whatever name you call the file containing names ] file".
So unless that instruction is followed, the program does not run.

Sorry for such a basic query but I am new to perl and till now I have been dealing with straighforward inputs like the following:

my $input_gene_file = "/Users/Documents/myfolder/file.txt";

The above might be straight forward, but not dynamic, what happens, if and when the path to the file changes? ofcourse the obvious, the files can't be found, but the truth is that the path has changed so it's not a good practice to hard-core your path into your program.

die "Cannot open $input_gene_file\n" unless (open(IN, $input_gene_file));

ofcourse, this is Perl and there are more than one way to get a job done, however, so ways are alot better than others [though not for perl, but for humans [the programmer inclusive]].
The code above runs well, everything been equal, but the real reason for it (which is to open a file) is hidden, moreover, it good to always check for error if the file open command fails.
So it alot better [for humans] to write and read something like this:

open my $fh,'<',$input_gene_file or die "Cannot open $input_gene_file:$!"

I prefer the 3-agruments for open function to 2-agruments and so do several Perl Programmers I know.

Please, check the following for more:
perldoc -f open
perldoc Carp

I hope this helps.

Thank you so much 2teez for such an elaborate explanation. This makes it alot clear.

Hi perllearner007,
Am glad you got that. Please, mark the post as solved if your question is answered and you are fully satisfied. Thank you

Hi 2teez, I have understood your explanation and the code runs too however, I keep getting this msg:
-bash: kate: command not found

2teez when I run your script and enter Kate when prompted I get no output. No error but no output either. I made a few changes in the loop that builds the hash and now it works OK for me.

perllearner007 I couldn't recreate your error but you might as well try the following modified version which works OK for me.

#!/usr/bin/perl
use warnings;
use strict;
use Carp qw(croak);
croak
"Usage: script.pl db_name_file.txt, you must specify the file that contained names to search from"
  unless defined $ARGV[0];    ## error msg if input file is not specified
print "Enter name to search: ";
my %has_val;
chomp( my $name = <STDIN> );    ## get name to search from user
my $input_file = $ARGV[0];
open my $fh, '<', $input_file
  or croak "can't open this file:$!";    ## open the file to read from

while (<$fh>) {
    next if !/^\d+?/;
    #Had to make changes here to get it to work for me d5e5
    chomp;
    my ($name_id, $fam_tag) = (split /\s+/)[1,2];
    my ($fam_name) = split /,/, $fam_tag;
    $has_val{$name_id} = $fam_name;
    #End of d5e5's changes
}
close $fh or croak "can't close this file:$!";    ## close file
$name = ucfirst( lc($name) );    ## this will yield first letter capital always

foreach my $fnd ( keys %has_val ) {
    if ( $fnd eq $name ) {
        my $surname = $has_val{$fnd};
        while ( my ( $key, $value ) = each %has_val ) {
            print $key, '  ', $value, $/ if $surname eq $value;
        }
    }
}

d5e5:

2teez when I run your script and enter Kate when prompted I get no output. No error but no output either. I made a few changes in the loop that builds the hash and now it works OK for me.

Oops, human error! I used edited raw data given by perllearner007 in my original post, thinking data, the tab space before each of the lines where not suppose to be there, i.e that was included because it was posted with tab on daniweb. However, that is not an excuse!

So, to make the code work with or without the the initial tab space, just commet out line 17, [ or remove it all together ]:
that is instead of

next if !/^\d+?/;

put this:

#next if !/^\d+?/;

from my original post and all shall be smiling again! Thanks d5e5 for spotting that!

perllearner007:

Hi 2teez, I have understood your explanation and the code runs too however, I keep getting this msg:
-bash: kate: command not found

How do you run your script from your terminal?
Please don't forget that if you are running from linux like OS, to run the script alone without using perl first before it, use have to do:

chmod +x script.pl

then use like this:

./script.pl families.txt

then you are prompted for a name to search. Then you give kate or whichever name you want to search!
So, how are you running your script from the command line?

In my terminal first I am directing it to the directory and then using perl scriptname.pl families.txt
It asks me for the name...I input Kate and then it gives me that command not found. :(

So, I've been trying to make this alternate solution work too.. can you point out where might be the problem?

#!/usr/bin/perl
use strict;
use warnings;

my %names;
my %families;
my $input_file = "/Users/myfolder/families.txt";
die "Cannot open $input_file\n" unless (open(IN, $input_file));

while (<IN>) {



my @fields = split /\t/,$_;


    my $id = $fields[1];

        my $name = $fields[2];

    my $familytag= $fields[3];

        my $effect = $fields[4];



  for my $Tag (split /\t/, $familytag) {
    push @{ $names{$name} }, $Tag;
    push @{ $families{$Tag} }, $name;
  }
}

while () {

  print "\nName: ";
  chomp (my $name = <>);
  last unless $name =~ /\S/;
  print "\n";

  if (my $Tags = $names{$name}) {
    for my $Tag (@$Tags) {
      my $names = $families{$Tag};
      next unless @$names > 1;
      printf "%s %s\n", $_, $Tag for @$names;
    }
  }
  else {
    warn qq (No name "$name" found);
  }
}

When I run it in the terminal it says,

Name:

And when I input Kate, it says

No name Kate found

which seems weird because it is in the input file.

Line 40 probably doesn't do whatever you intend it to do. if ( my $Tags = $names{$name} ) { declares a new variable and assigns the value of $names{$name} to it, and tests this assignment with an if. That won't give an error message but it doesn't make much sense.

The following works for me. Make sure you change the path to your own directory before running it.

#!/usr/bin/perl
use warnings;
use strict;

use Tie::File;
use Fcntl 'O_RDONLY';

#Change the following path to input directory and file if necessary
my $filename = '/home/david/Programming/data/families.txt';

# open an existing file in read-only mode
tie my @families, 'Tie::File', $filename, mode => O_RDONLY or die "$!";

#Infinite loop, exit by entering blank input
while (1){
    print "\nName: ";
    chomp( my $name = <> );
    last unless $name =~ /\S/;
    print "\n";
    my %fam_names;
    foreach (@families){
        my @fields = split;
        my ($firstname, $family_tag) = @fields[1, 2];
        next unless $firstname eq $name;
        my ($lastname) = split /,/, $family_tag;
        undef $fam_names{$lastname};
    }

    foreach my $k (keys %fam_names){
        foreach my $rec (@families){
            my @fields = split /\s+/, $rec;
            my ($firstname, $family_tag) = @fields[1, 2];
            my ($lastname) = split /,/, $family_tag;
            print "$firstname $lastname\n" if $lastname eq $k;
        }
    }
}

Can you please let me know how your output looks like. Here is what I get,

Name: Kate


Name: Shirley


Name: Dexter


Name: 

It takes in the name and immediately asks me for the next name. Atleast, now it recognises the name because initially it did not even recognise Kate and kept telling me no name found.

When I run the above script in my terminal, searching for names that exist in the file (Kate, etc.) and one that doesn't (Ogden) I get the following output:

david@david-laptop:~/Programming/data$ perl ../Perl/temp01.pl

Name: Kate

Kate Rhodes
Shirley Rhodes

Name: Shirley

Kate Rhodes
Shirley Rhodes

Name: Ogden

Name: Dexter

Aaron Sunders
Dexter Sunders

Name:
david@david-laptop:~/Programming/data$

I don't know why it doesn't work for you. Are you sure your input file exists in the specified path and contains the expected records? One way to debug is to add print statements to the script at strategic points to print the contents of some of the variables. How do I debug my Perl programs Also, the fact that the script doesn't tell you when a name is not found is a bug, not a feature. I forgot to check for that case and give a message when the name doesn't match any of the records.

Try running the following, modified version of the script:

#!/usr/bin/perl
use warnings;
use strict;

use Tie::File;
use Fcntl 'O_RDONLY';

#Change the following path to input directory and file if necessary
my $filename = '/home/david/Programming/data/families.txt';

# open an existing file in read-only mode
die "Unable to find $filename" unless -e $filename;
tie my @families, 'Tie::File', $filename, mode => O_RDONLY or die "$!";

#added for debugging
my $count = @families;
print "$filename contains the following $count lines:\n", join "\n", @families, "\n";
#-----

#Infinite loop, exit by entering blank input
while (1){
    print "\nName: ";
    chomp( my $name = <> );
    last unless $name =~ /\S/;
    print "\n";
    my %fam_names;
    foreach (@families){
        my @fields = split;
        my ($firstname, $family_tag) = @fields[1, 2];
        next unless $firstname eq $name;
        my ($lastname) = split /,/, $family_tag;
        undef $fam_names{$lastname};
    }

    #added for debugging
    my $found = keys %fam_names;
    #-----

    if ($found == 0){
        print "Nothing found for $name\n";
    }

    foreach my $k (keys %fam_names){
        foreach my $rec (@families){
            my @fields = split /\s+/, $rec;
            my ($firstname, $family_tag) = @fields[1, 2];
            my ($lastname) = split /,/, $family_tag;
            print "$firstname $lastname\n" if $lastname eq $k;
        }
    }
}

perllearner 007,

In my terminal first I am directing it to the directory and then using perl scriptname.pl families.txt
It asks me for the name...I input Kate and then it gives me that command not found. :(

You are not suppose to direct it to any directory. families.txt is the files containing all the data as show originally by you.
All you need to do from your CLI is just:

perl scriptname.pl families.txt

That is all! Then you input the name when asked for.

And in case your families.txt containning names like Kate and co is located somewhere else then you can do:

perl scriptname.pl /Users/myfolder/families.txt

That is all.

d5e5,
When I checked your script i had exactly what perllearner007 had, just the input and spaces, until I changed your code line 46 in your last post from :

my ($firstname, $family_tag) = @fields[1, 2];

to:

my ( $firstname, $family_tag ) = @fields[ 2, 3 ];

then it worked.

Also, the fact that the script doesn't tell you when a name is not found is a bug, not a feature. I forgot to check for that case and give a message when the name doesn't match any of the records.

I will disagree with that on the premises that the initial question did not request for that, one can just include that to make the program robust. So apart from that for me is more of a feature than a bug!

2teez, when I made your change the script didn't work for me anymore. I just prints blank lines and prompts for Name again. I don't understand why it works for you. Maybe the default behavior of split has changed, or maybe the Tie::File module that comes with my version of Perl works differently. My version of Perl is 5.14 subversion 2 for i686 linux.

When I run the following simplified script:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

while (<DATA>){
    my @fields = split;
    print Dumper(\@fields);
}

__DATA__
002  Kate      Rhodes,Mich           Positive

My output is:

$VAR1 = [
          '002',
          'Kate',
          'Rhodes,Mich',
          'Positive'
        ];

d5e5,
I had the same output with your last post that is:

 $VAR1 = [
            '002',
            'Kate',
            'Rhodes,Mich',
            'Positive'
         ];

However, trying out your previous solution on Active State Perl 5.14.1 on Win 7 OS 32 bits, Strawberry Perl 5.14.2.1 for 32 bits on Win Xp, Perl 5.14.1 on Arch-Linux, yielded the same result as stated earlier until I changed line 46, before I could get the desired output. Maybe, regex could come to the resue ( just like I did initially ) to make it a one time solution for all?!

I wonder if the data gets modified in the process of posting to and copying from Daniweb?

When I select the data provided at the top of this article, copy and paste it into my text editor and save I get the attached (please see the File Attachment.) The families.txt file that I've been working with has no spaces to the left of the data. I didn't remove them so if perllearner007's original records start with spaces, then transferring to and from the Daniweb post somehow stripped them. If you will post the families.txt file you are testing with as a File Attachment then I can test my script with it.

I think if we can verify that we're all testing with the same data we can settle on a script that works for all, whether or not the script makes use of regex or the split function.

file attached worked as expected with your code, but failed with the data provided by perllearner007 as posted on top of this post and copied directly into notepad on Win7 OS [without changing anything]!
By and large, I think this trend is solved except perllearner007 has other things in mind!

Thanks for helping test this. I couldn't understand why my script worked for me and not for perllearner007.

Just as you did, I selected and copied the data provided by perllearner007 as posted on top of this post and pasted it into Komodo edit to make the above attached file, without changing anything. Komodo edit doesn't automatically strip leading spaces and my browser (Google Chrome) shouldn't make a difference, but who knows?

I agree that perllearner007's task is essentially solved. If we added some regex to strip out leading spaces before parsing each data line we could easily make a script that would work for data having or not having leading spaces. See perlfaq4 for one way to strip leading and trailing spaces.

Thank you both 2teez and D5e5. It turns out there was a problem with my input file. I have fixed it and the code works fine. Thank you both for your patience.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.