I have a Perl parsing problem. I have to find a keyvalue for a KEYWORD KEYNAME.

I have a file containing KEYWORD KEYNAME on a line followed by keyvalue on a subsequent line

An added problem is that the file allows perl & shell type comments using # – e.g. everything after a # char until the end of the line is a comment. The file also allows blank lines between KEYWORD KEYNAME and keyvalue. Also, as long as KEYWORD and KEYNAME are on the same line there can be any amount of white space between KEYWORD and KEYNAME.

The good thing is that KEYWORD KEYNAME and keyvalue have to start the line

e.g. all the following are valid.

Example1 – I have to find the MACHINENAME value which is atab15
########## comment
KEYWORD MACHINENAME #comment
## comment
atab15 # comment

Example2 – I have to find the TARKETSYSTEM value sunSolaris
KEYWORD TARKETSYTEM
sunSolaris

Example3 – I have to find the HIHO value goodbye
KEYWORD HIHO
Goodbye

Any ideas?

#!/usr/bin/perl
#parse_file_kv.pl
use strict;
use warnings;

#For convenience I read from DATA section. You can open a file instead.
my $save_input_record_separator = $/; #Save original value before changing it
undef $/; # enable slurp mode
my $file = <DATA>;
$/ = $save_input_record_separator; #Restore original value to this global variable

$file =~ s/#.*\n//g; #Remove comments
$file =~ s/KEYWORD\s*//g; #Remove KEYWORD followed by optional whitespace
my %h = $file =~ m/\w+/g;#Read keys and values from file into hash %h

for (keys %h){
    print "KeyWord $_ has value $h{$_}.\n";
}
print "\n";

my @kws2find = qw(MACHINENAME TARKETSYSTEM HIHO);

foreach ( @kws2find ){
    find_value($_);
}

sub find_value{
    my $kw = shift @_;
    if (exists $h{$kw}){
        print "Value of $kw is $h{$kw}\n";
    }else{
        print "Keyword $kw is not found in hash\n";
    }
}

__DATA__
#Example1 – I have to find the MACHINENAME value which is atab15
########## comment
KEYWORD MACHINENAME #comment
## comment
atab15 # comment

#Example2 – I have to find the TARKETSYSTEM value sunSolaris
KEYWORD TARKETSYTEM
sunSolaris

#Example3 – I have to find the HIHO value goodbye
KEYWORD HIHO
Goodbye

Gives the following output:

KeyWord MACHINENAME has value atab15.
KeyWord TARKETSYTEM has value sunSolaris.
KeyWord HIHO has value Goodbye.

Value of MACHINENAME is atab15
Keyword TARKETSYSTEM is not found in hash
Value of HIHO is Goodbye

key.txt:

########## comment
KEYWORD MACHINENAME #comment
## comment
atab15 # comment

KEYWORD TARKETSYTEM
sunSolaris

KEYWORD HIHO

Goodbye
open(FILE,"<key.txt");
my $matched=0;
my %hash;
my $keyword;
while(<FILE>){
chomp;
next if($_ eq "");
next if(/^#/);
if(/^KEYWORD/){
  /KEYWORD\s+(\w+)/;
  $matched=1;
  $keyword=$1;
}elsif($matched){
	s/#\s+?(\w+)//g;
	$hash{$keyword}=$_;
	$matched=0;
}
}

for (keys %hash){
	print "$_ $hash{$_}\n";
}

Output:

MACHINENAME atab15
TARKETSYTEM sunSolaris
HIHO Goodbye

I love the fact that David and I came up with 2 separate solutions that both work. Gotta love perl. I thought about the "slurping" solution and then decided against it - not because it is wrong of inefficient - David showed it is very efficient - I just went with the "flagging" idea instead. His regexes are probably better than mine - it was never my strong suit.

I love the fact that David and I came up with 2 separate solutions that both work. Gotta love perl. I thought about the "slurping" solution and then decided against it - not because it is wrong of inefficient - David showed it is very efficient - I just went with the "flagging" idea instead. His regexes are probably better than mine - it was never my strong suit.

Thanks Mike. On the other hand, your solution would not trip up if the file happened to have some extra non-comment words before the first KEYWORD literal. Plus not slurping allows you to handle gigantic data files that might fill up the memory (I have no idea how big a file it would take to cause a problem but it's always good to have another approach available if the need arises.)

This article has been dead for over six months. Start a new discussion instead.