Wget & PHP

Question

siina 0 Newbie Poster

13 Years Ago

Function, which picks up from specified page all images that are larger than 50kt. The function returns arrayn, which contains the image URL and the image size in kilobytes. How do i start to do this? Do i use wget and exec? Is there easier way to do it. So first i need to download all images. Then analyze images and get url and size.

php

4 Contributors
7 Replies
188 Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by cwarn23

All 7 Replies

cereal 1,524 Nearly a Senior Poster

13 Years Ago

You can, also, query the remote server for an HEAD request and check if provides Content-Length, something like:

<?php
$url = 'http://www.website.tld/image01.jpg';
$head = get_headers($url);
$length = str_replace('Content-Length: ','',$head[6]);
if($length < 50000)
{
	echo 'too small: ';
}
else
{
	echo 'ok: ';
}
echo $length;
echo "\n";
?>

And for the array you can do a loop, it's simple:

<?php
$url = array(
	'http://www.website.tld/image01.jpg',
	'http://www.website.tld/image02.jpg',
	'http://www.website.tld/image031.jpg'
	);

foreach($url as $key)
{
	$head = get_headers($key);
	$length = str_replace('Content-Length: ','',$head[6]);
	if($length >= 50000)
	{
		$result[$key] = $length;
	}
	
}

print_r($result);
?>

But you still need to grab all the images links from a specific page. Good work.
Bye :)

cereal 1,524 Nearly a Senior Poster

13 Years Ago

@siina
I was looking at cwarn23 code and I tried to mix it with mine (hope is not a problem ^_^ and that works for you), this will scan the link you set and build an array of those images greater than 50kb:

<?php
$url = "http://www.website.tld"; # no ending slash
$data = file_get_contents($url);
$pattern = "/src=[\"']?([^\"']?.*(png|jpg|gif))[\"']?/i"; # search for img tags
preg_match_all($pattern, $data, $images);

function valid_url($u)
{
	if(preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $u))	{ return true; }
	else { return false; }
}

# print_r($images); # uncomment to check $images array

$result = array();
foreach($images[1] as $key)
{
	$link = $url . $key;
	if(valid_url($link) === true)
	{
		$head = get_headers($link);
		$length = str_replace('Content-Length: ','',$head[6]);
		if($length >= 50000)
		{
			$result[$link] = $length;
		}
	}
}

if(empty($result))
{
	echo 'no data';
}else{
	print_r($result); # array to use for retrieving images
}
?>

This script is not perfect because will search only for img and object tags but not for images included by CSS, and you have to figure for: relative paths, absolute paths, complete links, external images... Right now this example works only with absoute paths, so <img src="/images/pic01.jpg" /> rather than <img src="../images/pic01.jpg" /> or <img src="http://a-website.tld/images/pic01.jpg" />

Edited 13 Years Ago by cereal because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

cwarn23 387 Occupation: Genius Team Colleague Featured Poster · Answer 1 · 2011-09-25T17:22:33+00:00

There is a nice function called file_get_contents which gets the contents of a url and stores it in a variable then you can place it into a file or process it into a mysql database etc. For example

<?php
//first to specify the url
$url='http://images.daniweb.com/logo.gif';
//now to retrieve it
$imagedata=file_get_contents($url);
//now to save it
file_put_contents('image.gif');
//and image.gif will be in the same directory as your php file

And there you go. As simple as that.

siina 0 Newbie Poster · Answer 2 · 2011-09-25T17:32:25+00:00

cwarn23 i know that function but problem is that we dont know what is name of image and we need to download all images not only one.

cwarn23 387 Occupation: Genius Team Colleague Featured Poster · Answer 3 · 2011-09-25T18:45:41+00:00

If you want to download more than one image then perhaps a loop might be best. For example.

<?php
//first to specify the url
$links=array(
'http://images.daniweb.com/1a.jpg',
'http://images.daniweb.com/2c.jpg',
'http://images.daniweb.com/3d.jpg',
'http://images.daniweb.com/4h.jpg',
'http://images.daniweb.com/5f.jpg',
'http://images.daniweb.com/6e.jpg',
'http://images.daniweb.com/7d.jpg');
foreach ($links AS $url) {
//now to retrieve it
$imagedata=file_get_contents($url);
//now to save it
file_put_contents(basename($url),$imagedata);
//and image.jpg will be in the same directory as your php file
}

vibhaJ 126 Master Poster · Answer 4 · 2011-09-26T17:30:57+00:00

siina,
1) firstly you need to take all html content from site url using file_get_contents.
2) Then find all image tags from html source using preg_match_all.
3) have a loop of images array and again use file_get_contents function to grab image source and save it in your folder.

cwarn23 387 Occupation: Genius Team Colleague Featured Poster · Answer 5 · 2011-09-26T17:50:32+00:00

You also may want to refer this this thread.
http://www.daniweb.com/web-development/php/threads/179487
Brings back the memories.

Wget & PHP

Recommended Answers Collapse Answers

All 7 Replies

Recommended Answers