Extract data from a saved file

Question

kenji 1 Junior Poster

16 Years Ago

Hey,

I have used the LWP:\:Simple module and saved the source of a website in a file. I am trying to extract all the data between the <head> tags and pass it to a variable to process.

So far I can't seem to extract the data properly. Any suggestions?

my $data = getstore("http://www.google.com/", "website.txt");
unless(is_success($data)){
	die "Could not retrive website: $data";
}
open(PAGE, "website.txt") or die "$!";
my @info = <PAGE>;
close(PAGE);
my @meta;
my $i = 0;
my $stuff;
foreach $stuff(@info){
	$meta[$i] = ($stuff =~ m/<head(.*?)</head>/);
	$i++;
}
$i = 0;
foreach $_ (@meta){
	#print $meta[$i];
	print $_;
}

Thanks

open-source perl

2 Contributors
5 Replies
168 Views
19 Hours Discussion Span
Latest Post 16 Years Ago Latest Post by KevinADC

All 5 Replies

KevinADC 192 Practically a Posting Shark

16 Years Ago

try:

my ($d) = $s =~ m/<head>(.*?)<\/head>/is;

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

kenji 1 Junior Poster · Answer 1 · 2009-05-14T01:53:22+00:00

UPDATE:

I managed to get the data into one string and now I am trying to match with a regular expression.

I am having trouble with the regular expression.

my $d = ($s =~ m/<head>(.*)<\/head>/);

$s is the scalar with the whole string, I want to extract the head tag s from $s and assign to $d.

kenji 1 Junior Poster · Answer 2 · 2009-05-14T05:20:50+00:00

Thanks that worked great.

What exactly does /is do? Also one more question if I try to extract the meta tags and place each individually in array will it work? Assuming that their maybe 1 or meta tags inside.

Something like this:
my (@m) = $s =~ m/<meta (.*?)>/is;

kenji 1 Junior Poster · Answer 3 · 2009-05-14T05:46:35+00:00

This is what I came up with, but it seems to be repeating the first match rather than check for the next meta tag.

my (@m) = $d =~ m/(<meta (.*?)>){1,5}/is;

KevinADC 192 Practically a Posting Shark · Answer 4 · 2009-05-14T15:01:26+00:00

my (@m) = $s =~ m/<meta (.*?)>/gis;

You can look up the regexp modifiers in any regexp tutorial.

i - case insentive matching
s - match as a single string so matches across newlines
g - global match, works like grep, finds all matches in a string/line

Extract data from a saved file

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers