How to parse this confusing file

Please support our Perl advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Apr 2009
Posts: 19
Reputation: ajay_p5 is an unknown quantity at this point 
Solved Threads: 0
ajay_p5 ajay_p5 is offline Offline
Newbie Poster

How to parse this confusing file

 
0
  #1
Oct 21st, 2009
Hello friends ,
I need to parse some data from a file and arrange it in a certain file..however the file is so confusing and has such minute issues that it has really confused me now..can sumbody help.
Thanks
Aj
I am attaching the main part of the input file which are causing me trouble.

I am using this code :
  1. my @file1 =<INFILE>;
  2. foreach $lines(@file1)
  3. {
  4. if($i==3)
  5. {
  6. $i=1;
  7. print OUTFILE"\\";
  8. print OUTFILE"\n";
  9. }
  10.  
  11. if ($lines =~m/^\\/)
  12. {
  13. $i=$i+1;
  14. $lines=~s/\\//g;
  15. print OUTFILE"\n";
  16. if ($i==2)
  17. {
  18. print OUTFILE"$lines";
  19. }
  20. }
  21. if($lines =~m/^JASSS:|Date:|Title:/)
  22. {
  23. print OUTFILE"$lines";
  24. }
  25. elsif ($lines =~m/^Author: && ^Address:/)
  26. {
  27. print OUTFILE"\n";
  28. print OUTFILE"$lines";
  29. }
  30. elsif ($lines =~m/^Address:/)
  31. {
  32. print OUTFILE"\n";
  33. print OUTFILE"$lines";
  34. }
  35. else
  36. {
  37. chomp($lines);
  38. print OUTFILE"$lines";
  39. }
  40.  
  41. }
  42.  
  43. close (INFILE);
  44. close(OUTFILE);
  45. exit;

OUTPUT comes like this:
ID: 1.1.2
Date: 31 Jan 1998
Title: Qualitative Modeling and Simulation of Socio-Economic Phenomena
Author: Giorgio Brajnik
Address: Dipartimento di Matematica e InformaticaUniversit&agrave; di Udine Udine Italy Italy
Author: Marji Lines{The author and address shouldn't come together, which doesn't come in this case but if i change the code to get the 2nd record correctly this gets disturbed
Address: Dipartimento di Scienze StatisticheUniversit&agrave; di Udine Udine Italy 33100 Italy


This paper describes an application of recently developed qualitative reasoning techniques to complex, socio-economic allocation problems.
\
ID: 2.3.3
Date: 30 Jun 1999
Title: Simulating Household Waste Management Behaviours
Author: Peter Tucker
Address: Environmental Initiatives GroupHigh Street PAISLEY PA1 2BE United Kingdom
Author: Andrew Smith
Address: Language Evolution and Computation Research UnitSchool of Philosophy Psychology and Language Sciences University of Edinburgh,{I don't want a new line here but if i change my code to get this correctly the 1st record gets disturbed
Adam Ferguson Building, 40 George Square EH8 9LL Edinburgh, United Kingdom

The paper reports the outcome of research to demonstrate the proof of concept.

NOTE: {I would like to have an output which fulfills both the criteria.}
Attached Files
File Type: txt id.txt (3.1 KB, 6 views)
Reply With Quote Quick reply to this message  
Join Date: Oct 2009
Posts: 8
Reputation: vbharathi is an unknown quantity at this point 
Solved Threads: 1
vbharathi vbharathi is offline Offline
Newbie Poster
 
0
  #2
Oct 22nd, 2009
my @file1 =<INFILE>;
foreach $lines(@file1)
{
chomp($lines);
if($i==3)
{
$i=1;
print OUTFILE"\\";
print OUTFILE"\n";
}
if ($lines =~m/^\\/)
{
$i=$i+1;
$lines=~s/\\//g;
print OUTFILE"\n";
if ($i==2)
{
print OUTFILE"\n";
print OUTFILE"$lines";
}
}
if($lines =~m/^JASSSDateTitle:/)
{
print OUTFILE"\n";
print OUTFILE"$lines";
}
elsif ($lines =~m/^Author:/)
{
print OUTFILE"\n";
print OUTFILE"$lines";
}
elsif ($lines =~m/^Address:/)
{
print OUTFILE"\n";
$lines=~s/,$/, /g;
print OUTFILE"$lines";
}
else
{
print OUTFILE"$lines";
}
}

close (INFILE);
close(OUTFILE);
exit;
Reply With Quote Quick reply to this message  
Join Date: Sep 2009
Posts: 52
Reputation: d5e5 is an unknown quantity at this point 
Solved Threads: 7
d5e5's Avatar
d5e5 d5e5 is offline Offline
Junior Poster in Training
 
0
  #3
Oct 23rd, 2009
Sometimes it's simpler to read the entire document into one string variable and then apply a series of global substitute commands. The nice thing about this way is you can add a substitute command to your program, run it, visually inspect the output, then add another substitute command, test again until the output looks right.

I find this more intuitive sometimes because it's similar to what I would have to do if I didn't have time to write a program. I would have to load the document into a good text editor that allows regular expressions for search and replace, and keep running search and replace commands until the document has been tidied up. I tested the following with your ID.txt input file:
  1. #!/usr/bin/perl -w
  2. #ParseFile.pl
  3. use strict;
  4. my ($f1, $f2) = @ARGV;
  5. open (INFILE, $f1) || die "Can't open $f1: $!";
  6. open (OUTFILE, ">$f2") || die "Can't open $f2: $!";
  7. undef $/; #When $/ doesn't contain a record-end character Perl reads entire file
  8. my $string = <INFILE>; #Read entire file into a string variable
  9. $/ = "\n";
  10. my $stringout = $string;
  11. $stringout =~ s/^\\\\//gm; #Remove double backslashes at start of any line
  12. $stringout =~ s/^(JASSS:|ID:|Date:|Title:|Address:|Author:)(.*)\n/$1$2/gm; #Remove extra newlines
  13. $stringout =~ s/\n^Author:/Author:/gm; #Remove extra newline before Author
  14. $stringout =~ s/^ID:/\\\n$&/gm; #Put a single backslash on the line before ID:
  15. #Remove the extra blank lines and single backslash at the start of the document (not global)
  16. $stringout =~ s/^\s*\\//m;
  17. print OUTFILE $stringout;
  18. close INFILE;
  19. close OUTFILE;
Reply With Quote Quick reply to this message  
Join Date: Apr 2009
Posts: 19
Reputation: ajay_p5 is an unknown quantity at this point 
Solved Threads: 0
ajay_p5 ajay_p5 is offline Offline
Newbie Poster
 
0
  #4
Oct 23rd, 2009
Thanks but I tested it again and your code doesn't work for me, however, I have already solved the problem.
A code like this gives me the output which actually I wanted..
  1. my @file1 =<INFILE>;
  2. foreach $lines(@file1)
  3. {
  4. chomp($lines);
  5. if ($lines =~m/^\\/)
  6. {
  7. $i=$i+1;
  8. $lines=~s/\\//g;
  9. print OUTFILE"\n";
  10. if ($i==2)
  11. {
  12. print OUTFILE"\n";
  13. print OUTFILE"$lines";
  14. }
  15. }
  16. if($i==3)
  17. {
  18. $i=1;
  19. print OUTFILE"\\";
  20. print OUTFILE"\n";
  21. }
  22. if($lines =~m/^JASSS:|Date:|Title:/)
  23. {
  24. print OUTFILE"\n";
  25. print OUTFILE"$lines";
  26. }
  27. elsif ($lines =~m/^Author:/)
  28. {
  29. print OUTFILE"\n";
  30. print OUTFILE"$lines";
  31. }
  32. elsif ($lines =~m/^Address:/)
  33. {
  34. print OUTFILE"\n";
  35. $lines=~s/,\n/, /g;
  36. print OUTFILE"$lines";
  37. }
  38. else
  39. {
  40. print OUTFILE"$lines";
  41. }
  42. }
  43. close (INFILE);
  44. close(OUTFILE);
  45. exit;

Anyways thanks for your reply, I learnt a new way of dealing with files.

Cheers
Aj
Reply With Quote Quick reply to this message  
Join Date: Sep 2009
Posts: 52
Reputation: d5e5 is an unknown quantity at this point 
Solved Threads: 7
d5e5's Avatar
d5e5 d5e5 is offline Offline
Junior Poster in Training
 
0
  #5
Oct 24th, 2009
You're welcome. I took a second look at my output today and see that it still isn't quite right. Both ways have their pros and cons but yours actually worked so that's what counts.
Reply With Quote Quick reply to this message  
Reply

Message:




Views: 664 | Replies: 4
Thread Tools Search this Thread



Tag cloud for Perl
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC