Hello!

It's been a great while since I last used DaniWeb and have come back with a quandry in hopes I can get some guru support on...

My perl is very beginner. I used to code in ++ so I know the general rule of programming; however, I do not know the correct syntaxes so I'll be as specific as possible on what I'm trying to achieve.

I have a file named "people.txt"
Example snnippit of the txt file:

Name:Bob Gray
Weight:300lbs
Eyes:Blue
Location:Burmingham,AL
Name Sara Goody
Weight:100lbs
EyesGreen
Name:Bob Dollan
Weight:100lbs
Name:Ron Alberta
Location:Omaha,NE
ApplicationID:10012
ApplicationID:10013
ApplicationID:10013

I want to take the above file and pull out people who have BOTH a NAME & LOCATION and print it to a csv file and ignore the rest. I also want to append the ApplicationID to the csv as well.. Preferably in one column and each ID per row (doesn't have to associate to any persons name).

When I open the CSV, I am looking for "Name" "Location" "IDs" as columns in my csv file.

I'm not sure where to start on this and have been reading a perl book for dummies for two days now.

Any help would be greatly appreciated.

-Jon

Recommended Answers

All 11 Replies

Hi Serpterion,
I believe what you wanted, could be easily done in Perl.
That been said, before any code is written I noticed that the number of IDs do not tally with the numbers of Names with Location, Please, clearify this, since, you want "Name" "Location" "IDs" as columns in your csv file.

If I may suggest, if you could get Learning Perl 6th Edition, or Beginning Perl, I believe these a lot better than " perl book for dummies".

Thanks.

EDIT: Forgot to mention. The form they submit does not perform a verification so I get reports with people with just first names and no last, full names and no cities, or just ID's.. So thats why I looked to Perl to help sort the jargon out of the text file and spit out only the Name and City if Name and City "Location" have something populated on the lines. If they forgot to enter in their name and city, it will not post a blank line, it just goes to the next line.

I'll see if those books are at my local B&N.. The list I'm trying to pull out is twofold.. It's a webform people fill out and submit in for a subscription and tracking tool within my company. The web developer who mashed the tool together didn't do a good job on how it is stored and no one knows PHP in my office to correct it (and the dev moved on). We're working on a long term fix, but in the mean time, I'm dealing with the hand i'm dealt with.

The text file that gets mailed to me daily has people who put in a tracking log for their progress... I was tasked to pull people into an excel together of those who are participating and put it on a wiki page to let others see who is in competition with them in their area. The winners in each region of those who lost the most weight will win an iPad by QIII of this year.

The ID's are plastered near the top of the text file and are completely random of who they are. I've been creating an excel that looks like this manually:

Name City ID
Bob Allan Atlanta,GA B12932KL
Sarah Fesner Atlanta,GA D12393AF
Marcel Adams Havertown, PA B19293KL
B10293AD
B12384MN
D28384AK
C21293AL
C21221AF

Not all people inputed all the webform data, I can only put up those who properly put up their name and region they work out of OR their ID's. I have a ID Database which I can batch query the ID's and retrieve their name and location "City" and then do a quick check within excel of those who are unique or not on the ID list and delete the not unique from the ID Queried list and append it to my master excel of participants. They dont mean much to the users in the corresponding rows; however, it works well for me to have it this way.

Thanks!

Jon

Multi-line records with no record separator make it tricky. As for writing CSV output it's worth installing the Text::CSV module if you don't already have it.

#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;

my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();

#slurp file into one string because it contains multi-line records
my $string = do {
    local $/;
    my $filename = 'people.txt';
    open my $fh, '<', $filename or die "Failed to open $filename: $!";
    <$fh>
};
$string =~ s/\n/!sep!/g;#Replace newlines with !sep! marker
#Break the string into records beginning with "\n"
$string =~ s/(Name|ApplicationID)/\n$1/g;

#Process each 'record'
foreach (split "\n", $string){
    next if m/^$/;#Remove empty 'record'
    my ($id) = m/^ApplicationID:.*?(\d+)(?:!sep!)?$/g;
    my $status;
    my @columns;
    if ($id){
        push @columns,undef,undef,$id;#Make two empty columns followed by id
        $status = $csv->combine(@columns);    # combine columns into a string
        print $csv->string(), "\n";          # print the combined string
        next;
    }
    my ($name,$loc) = m/^Name[:|\s](.+?)(?:!sep!).*?Location:(.+?)(?:!sep!)?$/g;
    if ($loc){
    push @columns,$name,$loc;
    $status = $csv->combine(@columns);    # combine columns into a string
    print $csv->string(), "\n";          # print the combined string
    }

}

For me, running the above script and reading your sample input prints the following:

"Bob Gray","Burmingham,AL"
"Ron Alberta","Omaha,NE"
,,10012
,,10013
,,10013

Hi Serpterion,

d5e5 has hacked together a good solution, am only putting up mine in case you do not have the module Text::CSV I don't have it too, I have Text::CSV_XS instead, however they do the same thing with some different I suppose or it turns out that you can't install it!

Since we all know what comma-seperated-values files are, we can as well generate it manually using Perl like so:

#!/usr/bin/perl
use warnings;
use strict;
use Readonly;

Readonly my $space => q{ };

my @arrayID;
my $profile_data = {};
my $name;

my $input_file = 'contestant.txt';

open my $fh, '<', $input_file or die "can't open file:$!";
while (<$fh>) {
    if (/Name/) {
        $name = substr $_, ( ( index $_, q{:} ) + 1 ), $#_;
    }
    elsif (/Location/) {
        my $location = substr $_, ( ( index $_, q{:} ) + 1 ), $#_;
        $profile_data->{$name} = $location;
    }
    elsif (/Application/) {
        my $id = substr $_, ( ( index $_, q{:} ) + 1 ), $#_;
        push @arrayID, qq[\"$space\",\"$space\",\"$id\"], $/;
    }
    else { next }
}
close $fh or die "can't close file:$!";

my @names_and_location;
push @names_and_location, qq[\"$_\"\,\"$profile_data->{$_}\",\"$space\"], $/
  for keys %{$profile_data};

print $_ foreach @names_and_location, @arrayID;

OUTPUT:
"Bob Gray","Burmingham,AL"," "
"Ron Alberta","Omaha,NE"," "
" "," ","10012"
" "," ","10013"
" "," ","1001"

Since, your initial intention is to have a CSV file with "Name","Location","IDs"

The following modified version of my script outputs the ids so they start at the top of your id column.

#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;

my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();

#slurp file into one string because it contains multi-line records
my $string = do {
    local $/;
    my $filename = 'people.txt';
    open my $fh, '<', $filename or die "Failed to open $filename: $!";
    <$fh>
};
$string =~ s/\n/!sep!/g;#Replace newlines with !sep! marker
#Break the string into records beginning with "\n"
$string =~ s/(Name|ApplicationID)/\n$1/g;

#Process each 'record'
my @lines;
my @ids;
foreach (split "\n", $string){
    next if m/^$/;#Remove empty 'record'
    my ($id) = m/^ApplicationID:.*?(\d+)(?:!sep!)?$/g;
    my $status;
    my @columns;
    if ($id){
        push @ids,$id;
        next;
    }
    my ($name,$loc) = m/^Name[:|\s](.+?)(?:!sep!).*?Location:(.+?)(?:!sep!)?$/g;
    if ($loc){
        push @columns,$name,$loc;
        push @lines, [@columns];
    }
}
while (@lines || @ids){
    my ($name, $loc,$line, $id);
    if (@lines){
        $line = shift @lines;
        ($name,$loc) = @$line;
    }
    if (@ids){
        $id = shift @ids;
    }
    my @columns;
    push @columns, $name,$loc,$id;
    my $status = $csv->combine(@columns);    # combine columns into a string
    print $csv->string(), "\n";          # print the combined string
}

Outputs:
"Bob Gray","Burmingham,AL",10012
"Ron Alberta","Omaha,NE",10013
,,10013

Hi d5e5,

I enjoyed your last modified code from line 38 downwards! That is cool! The only concern I have is that since the IDs are randomized and seperated from the names and locations. I don't know if the name and location will end up matching the IDs. ( My guess though - I might be wrong )
Ofcourse, obviously that issues is not from your code, but from the raw data given. The code might work ofcourse for the above problem but I don't know if that works for the very problem.
Just saying.
Nice one!

Hi 2teez,

doesn't have to associate to any persons name

I don't know the reason for Serpterion's requirement to append application ids onto records for persons that probably have no relation to those ids. I would consider the column of ids as logically separate from the name and city records. Perhaps putting them there saves the trouble of putting them in a separate spreadsheet. I wouldn't recommend arranging the data that way if there's any chance of its becoming a long-term application that you would eventually have to explain to someone else. A data base analyst, for example, wouldn't like it.

Thanks for the replies back. Putting them on the same sheet saves me time from flipping back and forth between the two. I have a MySQL DB in which I can batch query the ID's. Since the ID's are not sequential in the file I recieved, it's hard to determine when the next ID shows up. I could have a string of 4 to 6 ID's then a person's name/city. It's all depending on when someone used the tool to report their participation.

If I took my first post's text example, I would record all the ID's in one excel and both Bob's and Ron's information would be recorded in an excel. Sarah would get dropped from the list as she failed to enter in enough data for us to register her.

Do you have a solution now to the Perl aspect of your question? If so please mark this topic solved.

Hi Serpterion,

it's hard to determine when the next ID shows up. I could have a string of 4 to 6 ID's then a person's name/city

If I may suggest, using Perl, you could 'pull out' all the IDs irrespective of where, in the file you got. like so:

if (/Application/) {
  .........
}

Then batch query them to get names and location from the your Database.
I believe that could make your sorting process a bit easy!

Thanks for all the support. I was able to modify the scripts above to pull precisely what I needed. This was a superior learning experience and am now tackling something I feel is a bit more complex (extracting info out of a router config when the search matches).

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.