Hello Everyone

I am having some trouble in parsing a XML document with a perl script.
I have a file like the attached file(I have just taken a part of the original file
as it is too big to be posted overhere and is hard to analyze manually).
Now, what I have to do is count the number of different authors publishing a paper
in a single year.
To count this, I can either use the issue printdate or the cpyrtdate.

The trouble is that I am not able to pass a value in this line:
my $nodeset = $xp->findnodes('//issue[@printdate="1917-07-00"]/..//author');
my $nodeset = $xp->findnodes('//issue[@printdate="$x"]/..//author');
where $x comes from a list containing all the years like 1917, 1913 etc.

I am using this code but it is not helping much .

use XML::XPath;
my $file = 'Aj.xml';
my $xp = XML::XPath->new(filename=>$file);
my $nodeset = $xp->findnodes('//issue[@printdate="1917-07-00"]/..//author');
my @date;                 
if (my @nodelist = $nodeset->get_nodelist)
 {
 @date = map($_->string_value, @nodelist);
  @date = sort(@date);
  local $" = "\n";
  print "I found these authors:\n@date\n";
}

I have analyzed the file manually and the things to be considered are as follows:
issue printdate="1913-01-00"
Author names:
DavidW.Cornelius.
FrederickSlate

issue printdate="1913-02-00"
Author names:
DavidW.Cornelius.
LachlanGilchrist

issue printdate="1917-08-00"
Author names:
H.W.Nichols.

issue printdate="1917-07-00"
Author names:
JohnZeleny

So, what I want should like this:

Year No. of different authors publishing in a single year
1913 3
1917 2

I am kind of stuck with it, can somebody please help.

Thanks
Aj

Recommended Answers

All 3 Replies

I found this incredibly difficult as I've done very little XML. I tried using XPath but don't know enough about XPath. Finally, I tried XML::Simple.

#!/usr/bin/perl
#parse_xml_simple.pl
use strict;
use warnings;

# use module
use XML::Simple;

# create object
my $xml = new XML::Simple;

# read XML file
my $file = '/home/david/Programming/Perl/data/Aj.xml';
my $data = $xml->XMLin($file,
                        ForceArray => 1,
                        KeyAttr    => {},
                      );

my %count;
foreach my $article (@{$data->{article}}) {
    my $year = $article->{cpyrt}->[0]->{cpyrtdate}->[0]->{date};
    my $author = $article->{authgrp}->[0]->{author}->[0];
    my $autname;
    
    foreach ( keys %{$author} ){
        $autname .= $$author{$_}->[0];
    }

    $count{$year}->{$autname}++ if defined($autname);
}

foreach (keys %count){
    my $ctr = keys %{$count{$_}};
    print "$_ $ctr\n"
}

This gives the following output:

1913 3
1917 2

ajaj_p5, I slightly change your code in the XPath and data process lines.

use strict;
use warnings;

use XML::XPath;
my $file = 'Aj.xml';
my $xp = XML::XPath->new(filename=>$file);

print "\nYear\tNo. of different authors publishing in a single year";
foreach my $year ( 1913, 1917) {
### Declare the hash as NULL value
my %hash=();	

### Declare your xpath function
my $nodeset = $xp->findnodes("//issue[contains(\@printdate, \'$year\')]/..//author");
my @nodelist = $nodeset->get_nodelist;

### Store the author name in hash format
$hash{$_ ->string_value}++ for @nodelist;
print "\n", $year, "\t", scalar keys %hash;
}
commented: Well done. +2

k_manimuthu,

Yep i like your snippet. That should provide the dates. ;)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.