| | |
How to parse this confusing file
Please support our Perl advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: Apr 2009
Posts: 19
Reputation:
Solved Threads: 0
Hello friends ,
I need to parse some data from a file and arrange it in a certain file..however the file is so confusing and has such minute issues that it has really confused me now..can sumbody help.
Thanks
Aj
I am attaching the main part of the input file which are causing me trouble.
I am using this code :
OUTPUT comes like this:
ID: 1.1.2
Date: 31 Jan 1998
Title: Qualitative Modeling and Simulation of Socio-Economic Phenomena
Author: Giorgio Brajnik
Address: Dipartimento di Matematica e InformaticaUniversità di Udine Udine Italy Italy
Author: Marji Lines{The author and address shouldn't come together, which doesn't come in this case but if i change the code to get the 2nd record correctly this gets disturbed
Address: Dipartimento di Scienze StatisticheUniversità di Udine Udine Italy 33100 Italy
This paper describes an application of recently developed qualitative reasoning techniques to complex, socio-economic allocation problems.
\
ID: 2.3.3
Date: 30 Jun 1999
Title: Simulating Household Waste Management Behaviours
Author: Peter Tucker
Address: Environmental Initiatives GroupHigh Street PAISLEY PA1 2BE United Kingdom
Author: Andrew Smith
Address: Language Evolution and Computation Research UnitSchool of Philosophy Psychology and Language Sciences University of Edinburgh,{I don't want a new line here but if i change my code to get this correctly the 1st record gets disturbed
Adam Ferguson Building, 40 George Square EH8 9LL Edinburgh, United Kingdom
The paper reports the outcome of research to demonstrate the proof of concept.
NOTE: {I would like to have an output which fulfills both the criteria.}
I need to parse some data from a file and arrange it in a certain file..however the file is so confusing and has such minute issues that it has really confused me now..can sumbody help.
Thanks
Aj
I am attaching the main part of the input file which are causing me trouble.
I am using this code :
Perl Syntax (Toggle Plain Text)
my @file1 =<INFILE>; foreach $lines(@file1) { if($i==3) { $i=1; print OUTFILE"\\"; print OUTFILE"\n"; } if ($lines =~m/^\\/) { $i=$i+1; $lines=~s/\\//g; print OUTFILE"\n"; if ($i==2) { print OUTFILE"$lines"; } } if($lines =~m/^JASSS:|Date:|Title:/) { print OUTFILE"$lines"; } elsif ($lines =~m/^Author: && ^Address:/) { print OUTFILE"\n"; print OUTFILE"$lines"; } elsif ($lines =~m/^Address:/) { print OUTFILE"\n"; print OUTFILE"$lines"; } else { chomp($lines); print OUTFILE"$lines"; } } close (INFILE); close(OUTFILE); exit;
OUTPUT comes like this:
ID: 1.1.2
Date: 31 Jan 1998
Title: Qualitative Modeling and Simulation of Socio-Economic Phenomena
Author: Giorgio Brajnik
Address: Dipartimento di Matematica e InformaticaUniversità di Udine Udine Italy Italy
Author: Marji Lines{The author and address shouldn't come together, which doesn't come in this case but if i change the code to get the 2nd record correctly this gets disturbed
Address: Dipartimento di Scienze StatisticheUniversità di Udine Udine Italy 33100 Italy
This paper describes an application of recently developed qualitative reasoning techniques to complex, socio-economic allocation problems.
\
ID: 2.3.3
Date: 30 Jun 1999
Title: Simulating Household Waste Management Behaviours
Author: Peter Tucker
Address: Environmental Initiatives GroupHigh Street PAISLEY PA1 2BE United Kingdom
Author: Andrew Smith
Address: Language Evolution and Computation Research UnitSchool of Philosophy Psychology and Language Sciences University of Edinburgh,{I don't want a new line here but if i change my code to get this correctly the 1st record gets disturbed
Adam Ferguson Building, 40 George Square EH8 9LL Edinburgh, United Kingdom
The paper reports the outcome of research to demonstrate the proof of concept.
NOTE: {I would like to have an output which fulfills both the criteria.}
•
•
Join Date: Oct 2009
Posts: 8
Reputation:
Solved Threads: 1
0
#2 Oct 22nd, 2009
my @file1 =<INFILE>;
foreach $lines(@file1)
{
chomp($lines);
if($i==3)
{
$i=1;
print OUTFILE"\\";
print OUTFILE"\n";
}
if ($lines =~m/^\\/)
{
$i=$i+1;
$lines=~s/\\//g;
print OUTFILE"\n";
if ($i==2)
{
print OUTFILE"\n";
print OUTFILE"$lines";
}
}
if($lines =~m/^JASSS
Date
Title:/)
{
print OUTFILE"\n";
print OUTFILE"$lines";
}
elsif ($lines =~m/^Author:/)
{
print OUTFILE"\n";
print OUTFILE"$lines";
}
elsif ($lines =~m/^Address:/)
{
print OUTFILE"\n";
$lines=~s/,$/, /g;
print OUTFILE"$lines";
}
else
{
print OUTFILE"$lines";
}
}
close (INFILE);
close(OUTFILE);
exit;
foreach $lines(@file1)
{
chomp($lines);
if($i==3)
{
$i=1;
print OUTFILE"\\";
print OUTFILE"\n";
}
if ($lines =~m/^\\/)
{
$i=$i+1;
$lines=~s/\\//g;
print OUTFILE"\n";
if ($i==2)
{
print OUTFILE"\n";
print OUTFILE"$lines";
}
}
if($lines =~m/^JASSS
Date
Title:/){
print OUTFILE"\n";
print OUTFILE"$lines";
}
elsif ($lines =~m/^Author:/)
{
print OUTFILE"\n";
print OUTFILE"$lines";
}
elsif ($lines =~m/^Address:/)
{
print OUTFILE"\n";
$lines=~s/,$/, /g;
print OUTFILE"$lines";
}
else
{
print OUTFILE"$lines";
}
}
close (INFILE);
close(OUTFILE);
exit;
0
#3 Oct 23rd, 2009
Sometimes it's simpler to read the entire document into one string variable and then apply a series of global substitute commands. The nice thing about this way is you can add a substitute command to your program, run it, visually inspect the output, then add another substitute command, test again until the output looks right.
I find this more intuitive sometimes because it's similar to what I would have to do if I didn't have time to write a program. I would have to load the document into a good text editor that allows regular expressions for search and replace, and keep running search and replace commands until the document has been tidied up. I tested the following with your ID.txt input file:
I find this more intuitive sometimes because it's similar to what I would have to do if I didn't have time to write a program. I would have to load the document into a good text editor that allows regular expressions for search and replace, and keep running search and replace commands until the document has been tidied up. I tested the following with your ID.txt input file:
Perl Syntax (Toggle Plain Text)
#!/usr/bin/perl -w #ParseFile.pl use strict; my ($f1, $f2) = @ARGV; open (INFILE, $f1) || die "Can't open $f1: $!"; open (OUTFILE, ">$f2") || die "Can't open $f2: $!"; undef $/; #When $/ doesn't contain a record-end character Perl reads entire file my $string = <INFILE>; #Read entire file into a string variable $/ = "\n"; my $stringout = $string; $stringout =~ s/^\\\\//gm; #Remove double backslashes at start of any line $stringout =~ s/^(JASSS:|ID:|Date:|Title:|Address:|Author:)(.*)\n/$1$2/gm; #Remove extra newlines $stringout =~ s/\n^Author:/Author:/gm; #Remove extra newline before Author $stringout =~ s/^ID:/\\\n$&/gm; #Put a single backslash on the line before ID: #Remove the extra blank lines and single backslash at the start of the document (not global) $stringout =~ s/^\s*\\//m; print OUTFILE $stringout; close INFILE; close OUTFILE;
•
•
Join Date: Apr 2009
Posts: 19
Reputation:
Solved Threads: 0
0
#4 Oct 23rd, 2009
Thanks but I tested it again and your code doesn't work for me, however, I have already solved the problem.
A code like this gives me the output which actually I wanted..
Anyways thanks for your reply, I learnt a new way of dealing with files.
Cheers
Aj
A code like this gives me the output which actually I wanted..
Perl Syntax (Toggle Plain Text)
my @file1 =<INFILE>; foreach $lines(@file1) { chomp($lines); if ($lines =~m/^\\/) { $i=$i+1; $lines=~s/\\//g; print OUTFILE"\n"; if ($i==2) { print OUTFILE"\n"; print OUTFILE"$lines"; } } if($i==3) { $i=1; print OUTFILE"\\"; print OUTFILE"\n"; } if($lines =~m/^JASSS:|Date:|Title:/) { print OUTFILE"\n"; print OUTFILE"$lines"; } elsif ($lines =~m/^Author:/) { print OUTFILE"\n"; print OUTFILE"$lines"; } elsif ($lines =~m/^Address:/) { print OUTFILE"\n"; $lines=~s/,\n/, /g; print OUTFILE"$lines"; } else { print OUTFILE"$lines"; } } close (INFILE); close(OUTFILE); exit;
Anyways thanks for your reply, I learnt a new way of dealing with files.
Cheers
Aj
![]() |
Similar Threads
- How to parse binary file (C++)
- Parse txt file in C++ (C++)
- Help Creating Script to Parse a Flat File. (Perl)
- Parse CSV file header (PHP)
- parse a csv file (PLEASE HELP) (Java)
- parse a .txt file (Java)
- How to Parse XML file (RSS, Web Services and SOAP)
Other Threads in the Perl Forum
- Previous Thread: ASP PERL script table
- Next Thread: Perl DBF file doubt
Views: 664 | Replies: 4
| Thread Tools | Search this Thread |
Tag cloud for Perl






Both ways have their pros and cons but yours actually worked so that's what counts.