Hi

I am a PERL novice. Currently dealing with a problem and wondering whether it can be solved using PERL?

I got a folder containing a number of text files. Every time a file is created a timestamp is added at the end followed by an "_". for example "apple_20101210091255.txt".

Where:
========
apple = item name
20101210091255 = timestamp, format: yyyymmddhhhhmmss
txt = filetype


Now the folder contains the following files:

apple_20101221134352.txt
apple_20101221131735.txt
apple_20110401091633.txt
orange_20111221131731.txt
orange_20111237131755.txt
banana_20111221131709.txt
coconut_20110401110735.txt

The script to create an array which holds the latest file on each item (apple, orange, banana, coconut) based on timestamp.

Is this possible?
I believe there are PERL gurus out there who could solve the problem with little effort.

Please share your knowledge....

Boshu

Recommended Answers

All 8 Replies

It can be solved, yes. One question for you is - when you say "holds the latest file" do you mean the file name or the file contents?

Here's a way with your data and assuming the extension will always be txt:

use strict;
use warnings;
my %hash;
while(<DATA>){
	chomp;
	my ($fruit,$end)=split(/\_/);
	$hash{$fruit}=0 if(!$hash{$fruit});
	my ($stamp,$ext)=split(/\./,$end);
	if($stamp>$hash{$fruit}){
		$hash{$fruit}=$stamp;
	}
}
for (keys %hash){
	print "Last $_ = $_"."_"."$hash{$_}\.txt\n";
}


__DATA__
apple_20101221134352.txt
apple_20101221131735.txt
apple_20110401091633.txt
orange_20111221131731.txt
orange_20111237131755.txt
banana_20111221131709.txt
coconut_20110401110735.txt

Output:

Last coconut = coconut_20110401110735.txt
Last banana = banana_20111221131709.txt
Last apple = apple_20110401091633.txt
Last orange = orange_20111237131755.txt

BTW, I don't think there's a 12/37 in 2011 - looks like a typo in the filename.

It can be solved, yes. One question for you is - when you say "holds the latest file" do you mean the file name or the file contents?

Its the file that was created last, which can be noticed looking at the time stamp it is prefixed with.

In my case, if we see the files with key name apple:
apple_20101221134352.txt
apple_20101221131735.txt
apple_20110401091633.txt

The latest file is apple_20110401091633.txt (created on 4th Jan 2011 at 16 minutes past 09 and 33 sec).

Regards
Boshu

Sorry for that typo. You notice it right. I created all these manually for illustration purpose.

All these files with txt extension will be located in a folder under C:\fruits\.

With my little knowledge of PERL I shall try to grap all the files of right type (txt) in an array and incorporate your suggestion code to it.

Thanks a lot, this is a great help and sharing!

-- Boshu

Here I added the code for read the .txt format files only. And the remaining things are same.

use strict;
use warnings;
my %hash;

# Declare your directory
my $dir='e:/dani/csv/fruit';

# read the txt file in the $dir
opendir(DIR, $dir) || die "Cannot open the $dir : $!";
my @fruit = grep /\.txt$/, readdir(DIR);
closedir(DIR);

# get the latest file for each Scenario
for (@fruit) {
	chomp;
	my ($fruit,$end)=split(/\_/);
	$hash{$fruit}=0 if(!$hash{$fruit});
	my $stamp = $1 if ($end=~ m{(\d+)});
	if($stamp>$hash{$fruit}){
		$hash{$fruit}=$stamp;
	}
}
for (keys %hash){
	print "Last $_ = $_"."_"."$hash{$_}\.txt\n";
}

Most Appreciated!
Thanks to both of you.

I have tested both code set and it cater my need very well.

May I ask thoug, what the following lines are doing in the background:

#
$hash{$fruit}=0 if(!$hash{$fruit});
#
my $stamp = $1 if ($end=~ m{(\d+)});
#
if($stamp>$hash{$fruit}){
#
$hash{$fruit}=$stamp;
#

Best regards,
Boshu

Feedback and question:

I have now tested the script against following input files in C:\fruit dir,

C:\>dir /b c:\fruits
apple_20101221134353.txt
apple_20110501120333.txt
apple_20111221134352.txt
banana_20110601112045.txt
banana_20111221134352.txt
coconut_20101221134342.txt
coconut_20101221134352.txt
kiwi_19991221134352.txt
kiwi_20101221134352.txt
olive.txt
olive_20101221134310.txt
olive_20101221134311.txt
olive_20101221134352.txt

Note: There was one file somehow ended up in the folder without a timestamp.

Script output:
C:\>perl get_latest_file.pl
Use of uninitialized value in split at get_latest_file.pl line 29.
Use of uninitialized value in numeric gt (>) at get_latest_file.pl line 32.
olive.txt_0.txt
coconut_20101221134352.txt
apple_20111221134352.txt
kiwi_20101221134352.txt
banana_20111221134352.txt
olive_20101221134352.txt
C:\>

Not sure what the warnings are but hope nothing serious. It brings the result I wanted

my @fruit = grep /\.txt$/, readdir(DIR);

The above code grep the all .txt format file. But you want extract the time stamp files only. So you modify the below line in the previous code.

# grep the underscore follwed  by digit with .txt format file
my @fruit = grep /\_\d+\.txt$/, readdir(DIR);
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.