954,523 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Parsing Text and Reformatting

I am a complete beginner to PERL. I am using Strawberry PERL. I am coming across text files that I need to format. I basically need to take field data that starts with a certain character and output the field data to a new file which will have all the fields delimited by character of my choosing.

Here is a short example.

; Record 1
@FULLTEXT PAGE
@T R000358
@C ENDDOC# R000358
@C BEGATTACH R000358
@C ENDATTACH R000358
@C MAILSTORE No
@C AUTHOR
@C BCC
@C CC
@C COMMENTS
@C ATTACH
@C DATECREATED 11/23/2010
@C DATELASTMOD 07/18/2010
@C DATELASTPRNT
@C DATERCVD
@C DATESENT
@C FILENAME wrangling.wpd
@C LASTAUTHOR
@C ORGANIZATION
@C REVISION
@C SUBJECT
@C TIMEACCESSED 00:00:00
@C TIMECREATED 15:21:34
@C TIMELASTMOD 09:04:12
@C TIMELASTPRNT
@C TIMERCVD
@C TIMESENT
@C TITLE
@C TO
@C FROM

For each 'Record' the '@C' and '@T' is the field delimiter followed by a space, then the field name followed by a space, then the field data. I need all the field data delimited in one row rather then a column as shown above. The record delimiter is a semi-colon(;).

I am looking to parse the text file and reformat

"R000358","R000358","R000358","R000358","No",etc, etc. (in one row)

This example is comma delimited but it may change but I figured I would start there.

Any help would be appreciated. Thanks in advance.

damorph
Newbie Poster
1 post since Nov 2011
Reputation Points: 10
Solved Threads: 0
 

What part of the script gives you trouble? Reading a file? Splitting the data and saving them in an array? Writing a new file?

d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
 

Beginner tutorials can show you how to read and write files. As for splitting the data, pushing some of it into an array and outputting the array you could do something like the following:

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;

my @fields;
while (<DATA>){
    process_end_of_rec() if m/^;/ and @fields;
    if (m/^\@C/){
        my @data = split;
        push(@fields, $data[2]);
    }
}

process_end_of_rec();

sub process_end_of_rec{
    foreach my $fld(@fields){
        $fld = '' unless defined($fld);
    }
    my $rec = join(',', @fields);
    say $rec;
}
__DATA__
; Record 1
@FULLTEXT PAGE
@T R000358
@C ENDDOC# R000358
@C BEGATTACH R000358
@C ENDATTACH R000358
@C MAILSTORE No
@C AUTHOR 
@C BCC 
@C CC 
@C COMMENTS 
@C ATTACH 
@C DATECREATED 11/23/2010
@C DATELASTMOD 07/18/2010
@C DATELASTPRNT 
@C DATERCVD 
@C DATESENT 
@C FILENAME wrangling.wpd
@C LASTAUTHOR 
@C ORGANIZATION 
@C REVISION 
@C SUBJECT 
@C TIMEACCESSED 00:00:00
@C TIMECREATED 15:21:34
@C TIMELASTMOD 09:04:12
@C TIMELASTPRNT 
@C TIMERCVD 
@C TIMESENT 
@C TITLE 
@C TO 
@C FROM

Output: R000358,R000358,R000358,No,,,,,,11/23/2010,07/18/2010,,,,wrangling.wpd,,,,,00:00:00,15:21:34,09:04:12,,,,,,

d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
 


Instead of the line my $rec = join(',', @fields); this adds the double quotes in the data my $rec = join ( ',', map ("\"$_\"", @fields));

k_manimuthu
Junior Poster in Training
93 posts since Jun 2009
Reputation Points: 55
Solved Threads: 24
 

After the say $rec; statement you should reset the @fields array to an empty list so that your output line doesn't keep doubling in size (in case there will be more than one output record.) @fields = ();#Empty the array after record has been output

d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: