Hello friends ,
I need to parse some data from a file and arrange it in a certain file..however the file is so confusing and has such minute issues that it has really confused me now..can sumbody help.
Thanks
Aj
I am attaching the main part of the input file which are causing me trouble.

I am using this code :

my @file1 =<INFILE>;
     foreach $lines(@file1)
       {	   
	   if($i==3)
            {
             $i=1;
	        print OUTFILE"\\"; 
		print OUTFILE"\n";  
	    }
	
             if ($lines =~m/^\\/)
              {
               $i=$i+1;
              $lines=~s/\\//g;
	      print OUTFILE"\n";             
	      if ($i==2) 
	         {	       
	          print OUTFILE"$lines";	       
	         } 
              }
	    if($lines =~m/^JASSS:|Date:|Title:/)
	    {	       
	      print OUTFILE"$lines";   
	    }
	    elsif ($lines =~m/^Author: && ^Address:/)
	    {
             print OUTFILE"\n"; 
	     print OUTFILE"$lines";   
	   }
	 elsif ($lines =~m/^Address:/)
	    {	
		 print OUTFILE"\n";	 
		print OUTFILE"$lines";   
	    }
	    else 
	      {
		  chomp($lines);
		   print OUTFILE"$lines";
	       }  

	   }
	   
   close (INFILE);   
 close(OUTFILE);
 exit;

OUTPUT comes like this:
ID: 1.1.2
Date: 31 Jan 1998
Title: Qualitative Modeling and Simulation of Socio-Economic Phenomena
Author: Giorgio Brajnik
Address: Dipartimento di Matematica e InformaticaUniversit&agrave; di Udine Udine Italy Italy
Author: Marji Lines{The author and address shouldn't come together, which doesn't come in this case but if i change the code to get the 2nd record correctly this gets disturbed
Address: Dipartimento di Scienze StatisticheUniversit&agrave; di Udine Udine Italy 33100 Italy


This paper describes an application of recently developed qualitative reasoning techniques to complex, socio-economic allocation problems.
\
ID: 2.3.3
Date: 30 Jun 1999
Title: Simulating Household Waste Management Behaviours
Author: Peter Tucker
Address: Environmental Initiatives GroupHigh Street PAISLEY PA1 2BE United Kingdom
Author: Andrew Smith
Address: Language Evolution and Computation Research UnitSchool of Philosophy Psychology and Language Sciences University of Edinburgh,{I don't want a new line here but if i change my code to get this correctly the 1st record gets disturbed
Adam Ferguson Building, 40 George Square EH8 9LL Edinburgh, United Kingdom

The paper reports the outcome of research to demonstrate the proof of concept.

NOTE: {I would like to have an output which fulfills both the criteria.}

Recommended Answers

All 4 Replies

my @file1 =<INFILE>;
foreach $lines(@file1)
{
        chomp($lines);
        if($i==3)
        {
                $i=1;
                print OUTFILE"\\";
                print OUTFILE"\n";
        }
        if ($lines =~m/^\\/)
        {
                $i=$i+1;
                $lines=~s/\\//g;
                print OUTFILE"\n";             
                if ($i==2) 
                {              
                        print OUTFILE"\n"; 
                        print OUTFILE"$lines";         
                } 
        }
        if($lines =~m/^JASSS:|Date:|Title:/)
        {              
                print OUTFILE"\n"; 
                print OUTFILE"$lines";   
        }
        elsif ($lines =~m/^Author:/)
        {
                print OUTFILE"\n"; 
                print OUTFILE"$lines";
        }
        elsif ($lines =~m/^Address:/)
        {
                print OUTFILE"\n";
                $lines=~s/,$/, /g;
                print OUTFILE"$lines";
        }
        else
        {
                print OUTFILE"$lines";
        }
}

close (INFILE);
close(OUTFILE);
exit;

Sometimes it's simpler to read the entire document into one string variable and then apply a series of global substitute commands. The nice thing about this way is you can add a substitute command to your program, run it, visually inspect the output, then add another substitute command, test again until the output looks right.

I find this more intuitive sometimes because it's similar to what I would have to do if I didn't have time to write a program. I would have to load the document into a good text editor that allows regular expressions for search and replace, and keep running search and replace commands until the document has been tidied up. I tested the following with your ID.txt input file:

#!/usr/bin/perl -w
#ParseFile.pl
use strict;
my ($f1, $f2) = @ARGV;
open (INFILE, $f1) || die "Can't open $f1: $!";
open (OUTFILE, ">$f2") || die "Can't open $f2: $!";
undef $/; #When $/ doesn't contain a record-end character Perl reads entire file
my $string = <INFILE>; #Read entire file into a string variable
$/ = "\n";
my $stringout = $string;
$stringout =~ s/^\\\\//gm; #Remove double backslashes at start of any line
$stringout =~ s/^(JASSS:|ID:|Date:|Title:|Address:|Author:)(.*)\n/$1$2/gm; #Remove extra newlines
$stringout =~ s/\n^Author:/Author:/gm; #Remove extra newline before Author
$stringout =~ s/^ID:/\\\n$&/gm; #Put a single backslash on the line before ID:
#Remove the extra blank lines and single backslash at the start of the document (not global)
$stringout =~ s/^\s*\\//m;
print OUTFILE $stringout;
close INFILE;
close OUTFILE;

Thanks but I tested it again and your code doesn't work for me, however, I have already solved the problem.
A code like this gives me the output which actually I wanted..

my @file1 =<INFILE>;
     foreach $lines(@file1)
      {
        chomp($lines);
          if ($lines =~m/^\\/)
          {
             $i=$i+1;
             $lines=~s/\\//g;
            print OUTFILE"\n"; 
              if ($i==2) 
               { 
                 print OUTFILE"\n"; 
                 print OUTFILE"$lines"; 
              } 
          }
          if($i==3)
           {
            $i=1;
            print OUTFILE"\\";
            print OUTFILE"\n";
           }
              if($lines =~m/^JASSS:|Date:|Title:/)
               { 
                 print OUTFILE"\n"; 
                 print OUTFILE"$lines"; 
               }
          elsif ($lines =~m/^Author:/)
              {
                print OUTFILE"\n"; 
                print OUTFILE"$lines";
              }
             elsif ($lines =~m/^Address:/)
            {
              print OUTFILE"\n";
              $lines=~s/,\n/, /g;
              print OUTFILE"$lines";
            }
          else
          {
            print OUTFILE"$lines";
         }
    }
close (INFILE);
close(OUTFILE);
exit;

Anyways thanks for your reply, I learnt a new way of dealing with files.

Cheers
Aj

You're welcome. I took a second look at my output today and see that it still isn't quite right.:ooh: Both ways have their pros and cons but yours actually worked so that's what counts.:)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.