Extract data from a saved file

Please support our Perl advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved

Join Date: May 2008
Posts: 128
Reputation: kenji is an unknown quantity at this point 
Solved Threads: 9
kenji's Avatar
kenji kenji is offline Offline
Junior Poster

Extract data from a saved file

 
0
  #1
May 13th, 2009
Hey,

I have used the LWP:\imple module and saved the source of a website in a file. I am trying to extract all the data between the <head> tags and pass it to a variable to process.

So far I can't seem to extract the data properly. Any suggestions?

  1. my $data = getstore("http://www.google.com/", "website.txt");
  2. unless(is_success($data)){
  3. die "Could not retrive website: $data";
  4. }
  5. open(PAGE, "website.txt") or die "$!";
  6. my @info = <PAGE>;
  7. close(PAGE);
  8. my @meta;
  9. my $i = 0;
  10. my $stuff;
  11. foreach $stuff(@info){
  12. $meta[$i] = ($stuff =~ m/<head(.*?)</head>/);
  13. $i++;
  14. }
  15. $i = 0;
  16. foreach $_ (@meta){
  17. #print $meta[$i];
  18. print $_;
  19. }

Thanks
Last edited by kenji; May 13th, 2009 at 4:07 pm.
And she said "Let there be light" and on the seveth day Windows booted.
And the crowds screamed in terror and cowered in fear for Microsoft had approached.
From the testament of 10011101
Reply With Quote Quick reply to this message  
Join Date: May 2008
Posts: 128
Reputation: kenji is an unknown quantity at this point 
Solved Threads: 9
kenji's Avatar
kenji kenji is offline Offline
Junior Poster

Re: Extract data from a saved file

 
0
  #2
May 13th, 2009
UPDATE:

I managed to get the data into one string and now I am trying to match with a regular expression.

I am having trouble with the regular expression.

  1. my $d = ($s =~ m/<head>(.*)<\/head>/);

$s is the scalar with the whole string, I want to extract the head tag s from $s and assign to $d.
And she said "Let there be light" and on the seveth day Windows booted.
And the crowds screamed in terror and cowered in fear for Microsoft had approached.
From the testament of 10011101
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 898
Reputation: KevinADC has a spectacular aura about KevinADC has a spectacular aura about 
Solved Threads: 67
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Posting Shark

Re: Extract data from a saved file

 
0
  #3
May 13th, 2009
try:

my ($d) = $s =~ m/<head>(.*?)<\/head>/is;
Last edited by KevinADC; May 13th, 2009 at 7:35 pm.
Reply With Quote Quick reply to this message  
Join Date: May 2008
Posts: 128
Reputation: kenji is an unknown quantity at this point 
Solved Threads: 9
kenji's Avatar
kenji kenji is offline Offline
Junior Poster

Re: Extract data from a saved file

 
0
  #4
May 13th, 2009
Thanks that worked great.

What exactly does /is do? Also one more question if I try to extract the meta tags and place each individually in array will it work? Assuming that their maybe 1 or meta tags inside.

Something like this:
my (@m) = $s =~ m/<meta (.*?)>/is;
Last edited by kenji; May 13th, 2009 at 8:30 pm.
And she said "Let there be light" and on the seveth day Windows booted.
And the crowds screamed in terror and cowered in fear for Microsoft had approached.
From the testament of 10011101
Reply With Quote Quick reply to this message  
Join Date: May 2008
Posts: 128
Reputation: kenji is an unknown quantity at this point 
Solved Threads: 9
kenji's Avatar
kenji kenji is offline Offline
Junior Poster

Re: Extract data from a saved file

 
0
  #5
May 13th, 2009
This is what I came up with, but it seems to be repeating the first match rather than check for the next meta tag.

  1. my (@m) = $d =~ m/(<meta (.*?)>){1,5}/is;
And she said "Let there be light" and on the seveth day Windows booted.
And the crowds screamed in terror and cowered in fear for Microsoft had approached.
From the testament of 10011101
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 898
Reputation: KevinADC has a spectacular aura about KevinADC has a spectacular aura about 
Solved Threads: 67
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Posting Shark

Re: Extract data from a saved file

 
0
  #6
May 14th, 2009
my (@m) = $s =~ m/<meta (.*?)>/gis;

You can look up the regexp modifiers in any regexp tutorial.

i - case insentive matching
s - match as a single string so matches across newlines
g - global match, works like grep, finds all matches in a string/line
Last edited by KevinADC; May 14th, 2009 at 6:02 am.
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC