Hi People

I need to compare a particular line in a file with a particular string in the other file. I am attaching both the files for your reference.
here is my code:

open (INFILE, "$input") or die "Couldn't open $input for reading: $!\n";
  while(<INFILE>)
   {
    my $line= $_;
    $line=~tr/\n//d;
    my($city,$country,$lan,$lat) = (split(/\t/))[0,1,2,3];
    push(@aj1,"$city");
    push(@aj2, "$country");
    push(@aj3,"$lan");
    push(@aj4,"$lat");
   }
   close(INFILE);
 my $arr_size = $#aj1;
open (OUTFILE, ">$output") or die "Couldn't open $output for writing: $!\n";
open (INFILE2, "<$input2") or die "Couldn't open $input2 for reading: $!\n";
   while(<INFILE2>)
   {
    my $lines=$_;
    #$lines=~tr/\n//d;
    print"$lines";
   my($jass,$author,$add) = (split(/\|/,"$lines"))[0,1,2];
   my $j=0;
       for(my $i=0;$i<=$arr_size;$i++)
        {
          my $country1=$aj2[$i];
          my $city1=$aj1[$i];
            if(($add=~m/$country1/) && ($add=~m/$city1/))
             {
             if($j==0)
             {
             #print"$j";
               $j=$j+1;
              print OUTFILE"$lines|";
		      print OUTFILE"$aj1[$i]|";
              print OUTFILE"$aj2[$i]|";
              print OUTFILE"$aj3[$i]|";
              print OUTFILE"$aj4[$i]\n";
            }
           }
      }
           if ($j==0)
           {
              for(my $k=0;$k<=$arr_size;$k++)
        	{
        	 my $countr=$aj2[$k];
            if($add=~m/$countr/ && $j!=1)
             {
               $j=$j+1;
           	  print OUTFILE"$lines|";
		      print OUTFILE"|";
              print OUTFILE"$aj2[$k]|";
              print OUTFILE"0|";
              print OUTFILE"0\n";
           }
         } 
       }
     }   
 close(INFILE2);
 close(OUTFILE);
exit;

Now what I want in the output file is that it should take the first line of the address file , print it like that and wherever it finds the matching city and country from the city_lan.txt file it shud print it afterwards sumwhat like this:
1.1.2|1. Giorgio Brajnik 2. Marji Lines |1. Dipartimento di Matematica e InformaticaUniversit&agrave; di Udine Udine Italy Italy 2. Dipartimento di Scienze StatisticheUniversit&agrave; di Udine Udine Italy 33100 Italy |Udine|Italy|78|87


The problem is when i am printing $lines of input file 2 it is terminating at 2.3.3 and giving an error like this:
Quantifier follows nothing in regex; marked by <-- HERE in m/? <-- HERE stanbul / at "filename" line 39, <INFILE2> line 46, why am i not able to read the whole data .what is the problem..can sumbody help??

while a particular text stored in a scalar variable and the text having any meta characters " \ | ( ) [ { ^ $ * + ? . " , when the scalar variable used to match any string, at the time the above type of error appeared. Better you will be use "quotemeta" function for this case.

http://www.tutorialspoint.com/perl/perl_quotemeta.htm


you will be modify the below lines in your code.

my $country1=quotemeta($aj2[$i]);
      my $city1=quotemeta($aj1[$i]);

Ok thanks that solves the problem to certain extent but still i am not able to compare all the cities and countries and print them ..however, few of them matches ..what can be a problem??

Many of the city and country values in the city_lan.txt have a trailing space character, whereas the city and country values in your address_out.txt input file do not have trailing spaces and so don't match.

Try modifying the first split statement to allow for the possibility of a space preceding the tab separating the city from country in your city_lan.txt input file. Add a space followed by ? to match one or fewer spaces followed by the \t. The statement will look like this:

#Modified the following to split on one or no space followed by tab
    my($city,$country,$lan,$lat) = (split(/ ?\t/))[0,1,2,3];

This should eliminate the trailing space from the city and country values which could be one cause for failed matches.

Thanks but this does not work for me.

it shud print it afterwards sumwhat like this:
1.1.2|1. Giorgio Brajnik 2. Marji Lines |1. Dipartimento di Matematica e InformaticaUniversit&agrave; di Udine Udine Italy Italy 2. Dipartimento di Scienze StatisticheUniversit&agrave; di Udine Udine Italy 33100 Italy |Udine|Italy|78|87

You want the output for ID #1.1.2 to look like the above, right? What I get in the output file looks as follows:

1.1.2|Giorgio Brajnik|Dipartimento di Matematica e InformaticaUniversit&agrave; di Udine	Udine	Italy	Italy 
|Udine|Italy|78|87

1.1.2|Marji Lines|Dipartimento di Scienze StatisticheUniversit&agrave; di Udine	Udine	Italy	33100	Italy 
|Udine|Italy|78|87

Is why you say it doesn't work? Apart from that can you show us another line in the address_out.txt file that should have matches in the city_lan.txt file which are not printed in the output?

Do you still need a program like this? If it's important we can probably improve the results, but how good the results are depends of course on the input data. For example: /to\?ky\?/ will not match "Tokyo". The quotemeta function added the backslashes before the question mark and that avoids the run-time error you were getting but the resulting pattern tries to match a literal '?' in the string so it will not match "Tokyo". Also "United States of America" in the city_lan.txt file will never match "United States" in the address_out.txt file. If you will need to process more data like these in the same format then maybe it would be worth trying to solve all these inconsistencies in the data with a program. Otherwise you may just have to yell at the person who created the city_lan.txt file.:)

Hi
Thanks for the help...actually I am finally done with matching all the data...which did require some manual curation and ya, I agree with the fact that the input data was the real problem.

Congratulations. That turned out to be harder than it looked at first.

The problem of printing one record consolidating all the authors and addresses for each $jas also turned out to be harder than it looked, at least for me. In fact, I never got that part working quite right. Anyway, you're done now, so that's good.