I having troubles with the different seperators in this one and just picking out the lines I want keyed from the .dat filename. I basically have an autogenerated flat file that is an export of a directory structure. I want to parse through the flat file and pull values from specific lines that contain a file name with the extention of .dat. From that I want to collect the Date-Time, Size, and portion of the filename. And skip the rest.

.Example Flat File.

Volume in drive E is NEW Data
Volume Serial Number is 901D-960F

Directory of E:\My_DATA\LooK_Here\Year_2008\02_21_08

02/20/2008 12:52 PM <DIR> .
02/20/2008 12:52 PM <DIR> ..
02/02/2008 11:35 AM 851,744 01ID0801.dat
02/05/2008 11:35 AM 61 01ID0801.sta
02/09/2008 11:36 AM 299,216 01ID0823.dat
02/09/2008 11:36 AM 61 01ID0823.sta
02/10/2008 11:36 AM 373,018 01ID0827.dat
02/10/2008 11:36 AM 61 01ID0827.sta
02/11/2008 11:37 AM 49,258 01ID0855.dat
02/11/2008 11:37 AM 61 01ID0855.sta
02/15/2008 11:37 AM 427,803 01ID0861.dat
02/15/2008 11:37 AM 61 01ID0861.sta
02/18/2008 11:37 AM 282,035 01ID0865.dat
02/18/2008 11:38 AM 61 01ID0865.sta
1604 File(s) 386,639,292 bytes
2 Dir(s) 78,292,127,744 bytes free

.End Flat File Example.

.Output Desired.
Record 1
Date-Time = 02/02/2008 11:35 AM
Size = 851,744
Name = 0801 (This is always 4 characters to left of the .)
Record 2
Date-Time = 02/09/2008 11:36 AM
Size = 299,216
Name = 0823 (This is always 4 characters to left of the .)
ect....

Thanks for any help on this.

what have you tried so far?

I assume you have the flat file because you can actually read the directory directly (If you can use readdir and you can get just the file name and check the extension).

Since the flat file is not really set out as a record (such as csv) with a consistent field separator (e.g. a comma) you will probably have the most success with regular expressions or alternatively just get the last 4 characters of each line. If they equal '.dat' then that is the file your after.

You need to break up the line into parts?

Use a regular expression to catch each bit...

m/(\d{2}\/\d{2}\/\d{4}\s\d{2}:\d{2}\s\w{2})\s([\d,]*?)\s(.*?)/gs

you can then get the parts as:
$date = $1;
$size = $2;
$filename = $3

It's easier to do an actual readdir

opendir(my $DIRHANDLE, "$directory_to_read") || die "Bad things etc: $!";
while(my $file = readdir($DIRHANDLE))
{
//if the $file is a file and ends in .dat
if(-f $file && $file =~ m/\.dat?/)
{
$size = -s $file
//not sure what you use for date
}
}

You can get perls flags for file checking here: http://perldoc.perl.org/functions/-X.html

I do a lot of this type of thing in my day to day administration. Usually, I have control over the format of the flat file that is autogenerated, and I use a ':' or '|' char to delimit the fields.

Now, I know this isn't a Perl answer, but for something like this task, I'd use awk to take care of this type of scanning. It's pretty easy to do.

Using awk each line can be taken as a record, parsed according to whatever dilimiter you specify (default is space), and you specify the fields you want, or test the line, and print out what fields you want. The output can be redirected to another file, either new file, or appended to an existing file.

I find it a lot easier to handle this type of task.

Send me an email, I'd be happy to give you an example.

Sorry, I couldn't include an example earlier, but was using a shared windows workstation. Here's an example of a awk command that does the scanning and returns the output you posted, with the exception of extracting just the portion of the file's name that you want. The awk utility will do that as well, but this was a quickie, and I'll leave it to you to find that piece - it's pretty similar to using substring in a variety of program languages. Google or man awk, and you can easily do it.

$ awk '$5 ~ /.dat$/ { print "Record: " NR; print "Date-Time = " $1 $2 " " $3; print "Size = " $4; print "Name = " $5 }' dirlis.txt

All of this command is on 1 line, but you could also, put it into a text file calling it as an awk script. Google on 'awk tutorials' if you need more info on how to use the utility.

The input file is specified at the end of the command, in this case, dirlis.txt. Hope that helps you broaden your horizons. I use the tool that makes sense to me for the task at hand. I enjoy working in Perl, but don't shy away for other utilities when they do the job. If you are stuck with Perl, then, just disregard the posts.

Cheers,

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.