Basic parsing problem...Help plz

Question

jacquelinek 0 Newbie Poster

14 Years Ago

I have a txt file as an input.
It is a list which looks like this:

A12345
B153875
C34893
...
...

and I have a database file which looks like this:

A12345 detail information
nvonafwenfovosdncsjdnfoewhuwerhwieufhiudhfisdfnsd
sdofnowerugfeuhgfurhgiuwerhfjdshfiasdhifheruwufhi
irgfiweurgf

A246 detail information
isdofnowerugfeuhgfurhgiuwerhfjdshfiadhifheruwufhi
wgerjgneiguihuhdnvkjdnvkjbdegiauberiubgieubgridfb
ooogrngoawerngiauengugbuivrug

B153875 detail information
wgerjgneiguihuvkwwjddnvkegtiaugberijubgieubgridfb
eragnowergnoweungfiousdhiuhsdnjkfnsk

C34893 detail information
fnweuraiwerbgivjbdbvurgfuwherugtheurhguhweriguhdg
sdgnasoughiueghaiwuh

...
...
...

My goal now is to find all the names listed (A12345_XXX, B153875_XXX, C34893_XXX, ...etc) in the database and create an output file like this (containing the names and the contents):

A12345_XXX
nvonafwenfovosdncsjdnfoewhuwerhwieufhiudhfisdfnsd
sdofnowerugfeuhgfurhgiuwerhfjdshfiasdhifheruwufhi
irgfiweurgf

B153875_XXX
wgerjgneiguihuvkwwjddnvkegtiaugberijubgieubgridfb
eragnowergnoweungfiousdhiuhsdnjkfnsk

C34893_XXX
fnweuraiwerbgivjbdbvurgfuwherugtheurhguhweriguhdg
sdgnasoughiueghaiwuh
...
...

How should I approach this?
(Fortunately, both the namelist and the database are in alphabetical order.)

My code so far only cover the filehandle part, something like this:

($v1, $v2, $v3) = @ARGV;
//$v1 is the namelist file
//$v2 is the database filename
//$v3 is the desired output filename

open (FILEHANDLE, $v1) || die;
open (DATABASE, $v2) || die;
open (RESULTS, ">$v2");

......
......
......

close (FILEHANDLE);
close (DATABASE);
close (RESULTS);
exit;

Request help!

perl

Edited 14 Years Ago by jacquelinek because: n/a

2 Contributors
1 Reply
166 Views
2 Days Discussion Span
Latest Post 14 Years Ago Latest Post by d5e5

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

d5e5 109 Master Poster · Answer 1 · 2009-10-22T02:51:07+00:00

This may work for you. From looking at the files as they appear in your post (without code tags) it's hard to know if there are supposed to be spaces, carriage returns, or line-feed characters separating the records, or whether they are fixed or variable length.

I attached the input files used to test the following script, plus the resulting output file.

#!/usr/bin/perl -w
#RegExSlurp.pl for jacquelinek
use strict;

my ($v1, $v2, $v3) = @ARGV;
#$v1 is the namelist file
#$v2 is the database filename
#$v3 is the desired output filename
open (FILEHANDLE, $v1) || die;
open (DATABASE, $v2) || die;
open (RESULTS, ">$v3");
my @namelist = <FILEHANDLE>; #Read entire namelist file into an array;

my $save = $/; #To restore after undef
undef $/; #Enter "file-slurp mode"
my $db = <DATABASE>; #Read entire DATABASE into $db string
$/ = $save; #Restore default record separator
print "\n";
foreach my $i (@namelist) {
    chomp($i);
    if ($db =~ /^($i) detail information\s*([a-z\r\n\s]+)/m){
        #print "Name is: $1\n details are: $2\n";
        print RESULTS "$1_XXX\n$2";
    }
    else {
        print RESULTS "$i not found in database\n";
    }
}

close (FILEHANDLE);
close (DATABASE);
close (RESULTS);
exit;