0

This snippet

while(<>) {
        for my $word ( split(/\s+/) ) {
                $idx{$word}{$page_number} = 1;
        }
        $page_number++ if ( /\014/ );
}

is part of a perl script which indexes words by page (I'm trying to rewrite it in python). I do not understand it, or rather, I can guess a part of it: in particular, if someone can explain the meaning of ( /\014/ ).
Thanks

Edited by Dani: Formatting fixed

2
Contributors
2
Replies
3
Views
9 Years
Discussion Span
Last Post by gawain_
0

\014 looks to be octal for a carraige return or newline. so /\014/ is a regexp looking for that pattern in $_. Probably it's counting lines, not sure why it does it like that though. Perl has a variable that tells you how many lines are in a file: '$.'

0

Thanks KevinADC.
In any way here is the complete snippet

#!/usr/bin/perl

use warnings;
use strict;

my $page_number = 1;
my %idx;

while(<>) {
        for my $word ( split(/\s+/) ) {
                $idx{$word}{$page_number} = 1;
        }
        $page_number++ if ( /\014/ );
}

foreach my $word (sort keys %idx) {
        printf "%-20s ", $word;
        print comma_sep(sort {$a <=> $b} keys %{$idx{$word}});
        print "\n";
}

sub comma_sep {
        my $ret = "";
        foreach (@_) { $ret .= $_ . ", "; }
        chop $ret;
        chop $ret;
        return $ret;
}

As output it gives

word1 - 1,4,6
word2 - 5,9,23
word3 - 7,,44,88

where the numbers are the page numbers

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.