I am a complete beginner to PERL. I am using Strawberry PERL. I am coming across text files that I need to format. I basically need to take field data that starts with a certain character and output the field data to a new file which will have all the fields delimited by character of my choosing.

Here is a short example.

; Record 1
@FULLTEXT PAGE
@T R000358
@C ENDDOC# R000358
@C BEGATTACH R000358
@C ENDATTACH R000358
@C MAILSTORE No
@C AUTHOR
@C BCC
@C CC
@C COMMENTS
@C ATTACH
@C DATECREATED 11/23/2010
@C DATELASTMOD 07/18/2010
@C DATELASTPRNT
@C DATERCVD
@C DATESENT
@C FILENAME wrangling.wpd
@C LASTAUTHOR
@C ORGANIZATION
@C REVISION
@C SUBJECT
@C TIMEACCESSED 00:00:00
@C TIMECREATED 15:21:34
@C TIMELASTMOD 09:04:12
@C TIMELASTPRNT
@C TIMERCVD
@C TIMESENT
@C TITLE
@C TO
@C FROM

For each 'Record' the '@C' and '@T' is the field delimiter followed by a space, then the field name followed by a space, then the field data. I need all the field data delimited in one row rather then a column as shown above. The record delimiter is a semi-colon(;).

I am looking to parse the text file and reformat

"R000358","R000358","R000358","R000358","No",etc, etc. (in one row)

This example is comma delimited but it may change but I figured I would start there.

Any help would be appreciated. Thanks in advance.

Recommended Answers

All 4 Replies

What part of the script gives you trouble? Reading a file? Splitting the data and saving them in an array? Writing a new file?

Beginner tutorials can show you how to read and write files. As for splitting the data, pushing some of it into an array and outputting the array you could do something like the following:

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;

my @fields;
while (<DATA>){
    process_end_of_rec() if m/^;/ and @fields;
    if (m/^\@C/){
        my @data = split;
        push(@fields, $data[2]);
    }
}

process_end_of_rec();

sub process_end_of_rec{
    foreach my $fld(@fields){
        $fld = '' unless defined($fld);
    }
    my $rec = join(',', @fields);
    say $rec;
}
__DATA__
; Record 1
@FULLTEXT PAGE
@T R000358
@C ENDDOC# R000358
@C BEGATTACH R000358
@C ENDATTACH R000358
@C MAILSTORE No
@C AUTHOR 
@C BCC 
@C CC 
@C COMMENTS 
@C ATTACH 
@C DATECREATED 11/23/2010
@C DATELASTMOD 07/18/2010
@C DATELASTPRNT 
@C DATERCVD 
@C DATESENT 
@C FILENAME wrangling.wpd
@C LASTAUTHOR 
@C ORGANIZATION 
@C REVISION 
@C SUBJECT 
@C TIMEACCESSED 00:00:00
@C TIMECREATED 15:21:34
@C TIMELASTMOD 09:04:12
@C TIMELASTPRNT 
@C TIMERCVD 
@C TIMESENT 
@C TITLE 
@C TO 
@C FROM

Output: R000358,R000358,R000358,No,,,,,,11/23/2010,07/18/2010,,,,wrangling.wpd,,,,,00:00:00,15:21:34,09:04:12,,,,,,

Instead of the line my $rec = join(',', @fields); this adds the double quotes in the data my $rec = join ( ',', map ("\"$_\"", @fields));

commented: Good. I forgot the OP wanted quotes. +9

After the say $rec; statement you should reset the @fields array to an empty list so that your output line doesn't keep doubling in size (in case there will be more than one output record.) @fields = ();#Empty the array after record has been output

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.