943,583 Members | Top Members by Rank

Ad:
  • Perl Discussion Thread
  • Marked Solved
  • Views: 2074
  • Perl RSS
May 13th, 2009
0

Extract data from a saved file

Expand Post »
Hey,

I have used the LWP:\imple module and saved the source of a website in a file. I am trying to extract all the data between the <head> tags and pass it to a variable to process.

So far I can't seem to extract the data properly. Any suggestions?

Perl Syntax (Toggle Plain Text)
  1. my $data = getstore("http://www.google.com/", "website.txt");
  2. unless(is_success($data)){
  3. die "Could not retrive website: $data";
  4. }
  5. open(PAGE, "website.txt") or die "$!";
  6. my @info = <PAGE>;
  7. close(PAGE);
  8. my @meta;
  9. my $i = 0;
  10. my $stuff;
  11. foreach $stuff(@info){
  12. $meta[$i] = ($stuff =~ m/<head(.*?)</head>/);
  13. $i++;
  14. }
  15. $i = 0;
  16. foreach $_ (@meta){
  17. #print $meta[$i];
  18. print $_;
  19. }

Thanks
Last edited by kenji; May 13th, 2009 at 4:07 pm.
Similar Threads
Reputation Points: 11
Solved Threads: 11
Junior Poster
kenji is offline Offline
145 posts
since May 2008
May 13th, 2009
0

Re: Extract data from a saved file

UPDATE:

I managed to get the data into one string and now I am trying to match with a regular expression.

I am having trouble with the regular expression.

Perl Syntax (Toggle Plain Text)
  1. my $d = ($s =~ m/<head>(.*)<\/head>/);

$s is the scalar with the whole string, I want to extract the head tag s from $s and assign to $d.
Reputation Points: 11
Solved Threads: 11
Junior Poster
kenji is offline Offline
145 posts
since May 2008
May 13th, 2009
0

Re: Extract data from a saved file

try:

my ($d) = $s =~ m/<head>(.*?)<\/head>/is;
Last edited by KevinADC; May 13th, 2009 at 7:35 pm.
Reputation Points: 246
Solved Threads: 67
Practically a Posting Shark
KevinADC is offline Offline
898 posts
since Mar 2006
May 13th, 2009
0

Re: Extract data from a saved file

Thanks that worked great.

What exactly does /is do? Also one more question if I try to extract the meta tags and place each individually in array will it work? Assuming that their maybe 1 or meta tags inside.

Something like this:
my (@m) = $s =~ m/<meta (.*?)>/is;
Last edited by kenji; May 13th, 2009 at 8:30 pm.
Reputation Points: 11
Solved Threads: 11
Junior Poster
kenji is offline Offline
145 posts
since May 2008
May 13th, 2009
0

Re: Extract data from a saved file

This is what I came up with, but it seems to be repeating the first match rather than check for the next meta tag.

Perl Syntax (Toggle Plain Text)
  1. my (@m) = $d =~ m/(<meta (.*?)>){1,5}/is;
Reputation Points: 11
Solved Threads: 11
Junior Poster
kenji is offline Offline
145 posts
since May 2008
May 14th, 2009
0

Re: Extract data from a saved file

my (@m) = $s =~ m/<meta (.*?)>/gis;

You can look up the regexp modifiers in any regexp tutorial.

i - case insentive matching
s - match as a single string so matches across newlines
g - global match, works like grep, finds all matches in a string/line
Last edited by KevinADC; May 14th, 2009 at 6:02 am.
Reputation Points: 246
Solved Threads: 67
Practically a Posting Shark
KevinADC is offline Offline
898 posts
since Mar 2006

This thread is solved

Either the thread starter or a moderator has marked this thread as solved. You can most likely trust the responses and answers given. There is most likely no reason for any further responses to be posted here. If you have a related question, please start a new thread in this forum instead.

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Perl Forum Timeline: "Use of uninitialized value" for database variable
Next Thread in Perl Forum Timeline: concatenation question





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC