Hi,
I'd like to create a perl script that takes two input files, one being a master list of users/attributes, the other being a newly uploaded list.
I'd like two output files, one being a file with new users (not in the master list) as well as updated users (changed attributes). The 2nd output file should be an updated master list file.
I believe this would be best done with a hash table. I originally attempted with a nested loop but the performance was terrible as expected.

Here is an example of the two input files:

MASTER FILE
username,lastname,location,password --just a header for reference

adam,adams,alabama,alpha
brian,benson,wyoming,bravo
chad,carson,california,charlie
daniel,davis,delaware,delta

UPLOAD FILE

adam,adams,alabama,alpha
daniel,davis,texas,delta
frank,fortune,florida,foxtrot

OUTPUT 1

daniel,davis,texas,delta,UPDATE
frank,fortune,florida,foxtrot,NEW

OUTPUT 2 NEW MASTER FILE

adam,adams,alabama,alpha
brian,benson,wyoming,bravo
chad,carson,california,charlie
daniel,davis,texas,delta
frank,fortune,florida,foxtrot

Some fields will be case sensitive (password) while others will not be. Some can also contain numbers.
If anyone can help me with a start I would be most appreciative.
-Adam

Recommended Answers

All 6 Replies

Hi sayerada,

I believe this would be best done with a hash table

I think you are on track.
See Related Article: Search a file with multi-element contents of another file, as a pointer.
Also, show what you have been able to do and where you are having problems.
And finally, if I may ask, what about having different entry with either the same username or lastname?

Thanks for your reply and related article suggestion.
So far all I've really got is a nested loop version of what I'm trying to do, which I can attach although lengthy.

As far as troubles, I'm having difficulty understanding how to store the lines into the hash table, access them for comparison, and print the output I'm looking for. All the hash table tutorials I read are very confusing to me.

The primary key is the username. This cannot be duplicated and should be matched first before looking at the attributes.

Hi Again sayerada,

Below is a complete Perl Program that does what you want. However, you need to understand Subroutine, hash and reference in Perl.
It would have been cool, if you posted your code, atleast we could go over it and see where you are getting it wrong.
Anywhere, you could use this as a guide and see what you are doing wrong.
NOTE: This code below code still be hacked further.
Also note, that I didn't use any CPAN module that is not initrisic to Perl Core Module. Mind you this make the code lengthy and unnecessarily verbose! And might not be very effective with very large data [ I have not tested that claim. Large data is relative anyway ]. However, it works!
Enough of talk, peruse the code below!

#!/usr/bin/perl
use warnings;
use strict;
use 5.10.0;    # use for smart matching '~~'

my $user_and_attr_ref = {};    # intialize hash reference
open my $fh, '<', "master_file.txt" or die "can't open file";
while (<$fh>) {
    chomp;
    my ( $username, $lastname, $location, $passwd ) = split /,/, $_, 4;

    # use username as a key in the hash of Array
    push @{ $user_and_attr_ref->{$username} }, qq{$lastname,$location,$passwd};
}
close $fh or die "can't close file:$!";

my $file = 'upload_file.txt';

#return two files, using a subroutine file_outputs
my ( $output_file1, $output_file2 ) = file_outputs($file);

#using an anonymous hash reference in a subroutine write_out_new_file
# twice, file variable, filename and a subroutine reference was passed

write_out_new_file(
    {
        file     => $output_file1,
        filename => 'new_master_file.txt',
        code     => \&check_file_to_write,
    }
);
write_out_new_file(
    {
        file     => $output_file2,
        filename => 'show_update_file.txt',
        code     => sub { return $_[0]; },
    }
);

sub file_outputs {
    my ($filename) = @_;
    my ( $new_master_file_str, $update_file_str ) = ( q{}, q{} ); # q{} means ''
    open my $fh, '<', $filename or die "can't open file";
    while (<$fh>) {
        chomp;
        my ( $username, $lastname, $location, $passwd ) = split /,/, $_, 4;
        if ( exists $user_and_attr_ref->{$username} ) {
            my @user_data_in_upload_file = qq{$lastname,$location,$passwd};
            my @user_data_in_master_file =
              grep { $_ } values @{ $user_and_attr_ref->{$username} };

            if ( @user_data_in_upload_file ~~ @user_data_in_master_file ) {
                $new_master_file_str .=
                  qq{$username,$lastname,$location,$passwd} . $/;
            }
            else {
                my $modified_user_data =
                  compare_user_attr( \@user_data_in_master_file,
                    \@user_data_in_upload_file );
                $update_file_str .=
                  $username . ",@{$modified_user_data},UPDATE" . $/;
                $new_master_file_str .=
                  $username . ",@{$modified_user_data}" . $/;
            }
        }
        else {
            $update_file_str .=
              qq{$username,$lastname,$location,$passwd,NEW} . $/;
            $new_master_file_str .=
              qq{$username,$lastname,$location,$passwd} . $/;
        }
    }
    return $new_master_file_str, $update_file_str if wantarray;
}

sub compare_user_attr {
    my ( $master_file, $upload_file ) = @_;

    foreach my $user_attr ( 0 .. $#$master_file ) {
        if ( $master_file->[$user_attr] ne $upload_file->[$user_attr] ) {
            $master_file->[$user_attr] = $upload_file->[$user_attr];
        }
    }
    return $upload_file;
}

sub write_out_new_file {

    my ($file) = @_;
    open my $fh, '>', $file->{'filename'} or die "can't open file:$!";
    print {$fh} $file->{'code'}->( $file->{'file'} );
}

sub check_file_to_write {
    my $new_user_and_attr_ref = {};    #initialize a hash reference
    foreach ( split "\n", $_[0] ) {
        my ( $username, $lastname, $location, $passwd ) = split /,/, $_, 4;
        push @{ $new_user_and_attr_ref->{$username} },
          qq{$lastname,$location,$passwd};
    }

    my @key = sort keys %$user_and_attr_ref;

    for (@key) {
        if ( !exists $new_user_and_attr_ref->{$_} ) {
            push @{ $new_user_and_attr_ref->{$_} },
              @{ $user_and_attr_ref->{$_} }, $/;
        }
    }
    my $file = q{};
    foreach my $sorted_data ( sort keys %$new_user_and_attr_ref ) {
        $file .= sprintf "%s,%s\n", $sorted_data,
          @{ $new_user_and_attr_ref->{$sorted_data} };
    }
    return $file;
}

To run this script. Place the perl program, the master file called "master_file.txt" and upload file called "upload_file.txt" in the same directory. The run your perlscript. You will have two new files called 'new_master_file.txt' and 'show_update_file.txt'

OUTPUT

adam,adams,alabama,alpha
brian,benson,wyoming,bravo
chad,carson,california,charlie
daniel,davis,texas,delta
frank,fortune,florida,foxtrot

Hope this helps.

The Second OUTPUT file:

daniel,davis,texas,delta,UPDATE
frank,fortune,florida,foxtrot,NEW

Thanks the code does work! It didn't however compensate for case insensitive fields. I used the lc() function on the case insensitive fields to get around that.

I am now realizing that it is okay for the username to be valid at multiple locations. How do I go about making the primary key a combination of username and location?

I have modified the code to account for the additional fields/formatting for my data files but must have missed something somewhere as I'm getting a "Use of uninitialized value in string ne at line 62"

I'd post my data files but they are data sensitive. I could email them to you if it would be beneficial.

Here is the modified code

#!/usr/bin/perl
use warnings;
use strict;
use 5.10.0;    # use for smart matching '~~'
my $user_and_attr_ref = {};    # intialize hash reference
open my $fh, '<', "master_file.csv" or die "can't open file";
while (<$fh>) {
    chomp;
    my ( $username, $lastname, $location, $passwd, $grade ) = split /,/, $_, 5;
    # use username as a key in the hash of Array
    push @{ $user_and_attr_ref->{lc($username)} },"$lastname,$location,$passwd"; #here is where we change attributes we look for on updates
}
close $fh or die "can't close file:$!";
my $file = 'upload_file.csv';
#return two files, using a subroutine file_outputs
my ( $output_file1, $output_file2 ) = file_outputs($file);
#using an anonymous hash reference in a subroutine write_out_new_file
# twice, file variable, filename and a subroutine reference was passed
write_out_new_file(
    {
        file     => $output_file1,
        filename => 'new_master_file.csv',
        code     => \&check_file_to_write,
    }
);
write_out_new_file(
    {
        file     => $output_file2,
        filename => 'show_update_file.csv',
        code     => sub { return $_[0]; },
    }
);
sub file_outputs {
    my ($filename) = @_;
    my ( $new_master_file_str, $update_file_str ) = ( q{}, q{} ); # q{} means ''
    open my $fh, '<', $filename or die "can't open file";
    while (<$fh>) {
        chomp;
        my ( $action, $location, $username, $passwd, $firstname, $mi, $lastname, $grade, $email, $active ) = split /,/, $_, 10;
        if ( exists $user_and_attr_ref->{lc($username)} ) {
            my @user_data_in_upload_file = "lc($lastname),$location,$passwd";
            my @user_data_in_master_file = grep { $_ } values @{ $user_and_attr_ref->{lc($username)} };
            if ( @user_data_in_upload_file ~~ @user_data_in_master_file ) {
                $new_master_file_str .= "$username,$lastname,$location,$passwd,$grade" . $/;
            }
            else {
                my $modified_user_data = compare_user_attr( \@user_data_in_master_file, \@user_data_in_upload_file );
                $update_file_str .= "U,$location,$username,$passwd,$firstname,$mi,$lastname,$grade,$email,$active\n"; # . ",@{$modified_user_data},UPDATE" . $/;
                $new_master_file_str .= $username . ",@{$modified_user_data}" . $/;
            }
        }
        else {
            $update_file_str .= "A,$location,$username,$passwd,$firstname,$mi,$lastname,$grade,$email,$active\n";  #qq{$username,$lastname,$location,$passwd,NEW} . $/;
            $new_master_file_str .= "$username,$lastname,$location,$passwd,$grade" . $/;
        }
    }
    return $new_master_file_str, $update_file_str if wantarray;
}
sub compare_user_attr {
    my ( $master_file, $upload_file ) = @_;
    foreach my $user_attr ( 0 .. $#$master_file ) {
        if ( $master_file->[$user_attr] ne $upload_file->[$user_attr] ) {
            $master_file->[$user_attr] = $upload_file->[$user_attr];
        }
    }
    return $upload_file;
}
sub write_out_new_file {
    my ($file) = @_;
    open my $fh, '>', $file->{'filename'} or die "can't open file:$!";
    print {$fh} $file->{'code'}->( $file->{'file'} );
}
sub check_file_to_write {
    my $new_user_and_attr_ref = {};    #initialize a hash reference
    foreach ( split "\n", $_[0] ) {
        my ( $username, $lastname, $location, $passwd, $grade ) = split /,/, $_, 5;
        push @{ $new_user_and_attr_ref->{lc($username)} },qq{$lastname,$location,$passwd};
    }
    my @key = sort keys %$user_and_attr_ref;
    for (@key) {
        if ( !exists $new_user_and_attr_ref->{$_} ) {
            push @{ $new_user_and_attr_ref->{$_} },
              @{ $user_and_attr_ref->{$_} }, $/;
        }
    }
    my $file = q{};
    foreach my $sorted_data ( sort keys %$new_user_and_attr_ref ) {
        $file .= sprintf "%s,%s\n", $sorted_data,
          @{ $new_user_and_attr_ref->{$sorted_data} };
    }
    return $file;
}

Hi,
Nice one. Thanks, though I know the code works.

It didn't however compensate for case insensitive fields. I used the lc() function on the case insensitive fields to get around that.

Hope you know that the lc EXP function return the EXP in lowercase? It doesn't check for case insensitive. To do that you might have to use regex with i modifier.

I am now realizing that it is okay for the username to be valid at multiple locations. How do I go about making the primary key a combination of username and location?

To do that you probabily might have to use HASH of HASH like so:

    push @{ $user_and_attr_ref->{$username}{$location} },
        qq{$lastname,$location,$passwd} if defined $username;

Please, check perldoc perldsc, for detailed infomation. It's acutally, very simple.

I have modified the code to account for the additional fields/formatting for my data files but must have missed something somewhere as I'm getting a "Use of uninitialized value in string ne at line 62"

I will suggest two things:
a.) Remove the LIMIT on the split function. i.e 5 and use split without it, then
b.) print out your result and see how many data you have like so:

print join "\n" => split /,/, $_;

You can see the number of data present within each $_.

So Adam, this is the help you asked for, Please make it worthwhile! Make this script work to your taste and mark this trend SOLVED!!!. [ This trend is basically solved, all you need do is add one or two lines here and there and you are up and running ;-) ]

If you are still having any other issue let us know.
Thanks, hope this helps.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.