PHP XML feed, need to extract image URL from content

Question

mk1200 0 Newbie Poster

15 Years Ago

Hey guys, I'm sure this has been done before, but I haven't found the snippet yet. I think I have to use preg_match_all.

Here's my very simple stopping point:

$content = $obj->introtext;
$xml .= "<ImageUrl><![CDATA[$content]]></ImageUrl>";

Here are two examples of the introtext table data:

<img src="images/stories/D-West.jpg" alt="D-West" style="margin: 3px; float: right;" width="200" height="300" />Had the Cavaliers played the rest of the schedule...

<img height="237" width="240" src="images/stories/Crystalball.jpg" alt="Crystalball" style="margin: 3px; float: right;" />Quite an Oscar Night for “The Hurt Locker”, as th...

You can see from the above example that not all image urls are going to start with the <img tag and I want to return just the image location like this:
images/stories/D-West.jpg
images/stories/Crystalball.jpg

I'll add the website url later, but for now I just need to learn how to separate the image location from the content.

Any help is appreciated.

Thanks

image php xml

2 Contributors
3 Replies
1K Views
3 Weeks Discussion Span
Latest Post 15 Years Ago Latest Post by mk1200

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mk1200 0 Newbie Poster · Answer 1 · 2010-03-14T10:57:34+00:00

$content = $obj->introtext;
	preg_match('/src=([\'"])([^"\1]+)\1/i', $content, $match); 

		$xml .= "<ImageUrl><![CDATA[$match[0]]]></ImageUrl>";

I still need help with the preg_match though. This is what it returns:
src="images/stories/D-West.jpg"

Anyone know how to clean it up, so it's just: images/stories/D-West.jpg

JenniC 0 Newbie Poster · Answer 2 · 2010-04-05T20:51:54+00:00

not all image urls are going to start with the <img tag

But they are enclosed in <img ... > tags.

# Script ImgSrc.txt
var str file, doc, img, src
# Get XML into a string variable.
cat $file > $doc
# Get <img ... > into a string variable.
stex -r -c "^<img&\>^" $doc > $img
# Get src="..." into a string variable.
stex -r -c "^src&=&\"&\"^" $img > $src
# Remove everything upto the first and everything after the
# last double quote.
stex -r -c "^\"^]" $src > null
stex -r -c "[^\"^" $src > null
# The source location is in $src.
echo $src

Script in biterscripting. Save it in file "C:/Scripts/ImgSrc.txt". Call it as

script "C:/Scripts/ImgSrc.txt" file("http://www...../.../file.html")

Check out the documentation for the stex, etc. commands at http://www.biterscripting.com/helppages_editors.html .

You can also automatically download the .jpg files to your computer using IS = Automatic Internet Session.

# Script DownloadJpg.txt
var str htmlfile, remotejpgfile, localjpgfile
# Start internet session.
isstart a
# Connect session to server
isconnect a "http://www........com"
# Use the previous script to get img src.
script "C:/Scripts/ImgSrc.txt" file("http://www........com/.../file.html") > remotejpgfile
# Get local file name
stex -p "^/^l[" $remotejpgfile > $localjpgfile
# Download remote jpg file in binary format.
isget -b a $remotejpgfile
# Save the donloaded jpg file from session buffer to local file.
issave -b a $localjpgfile
# We are done. Disconnect and end session.
isdiscon a
isend a

mk1200 0 Newbie Poster · Answer 3 · 2010-04-05T20:58:57+00:00

Here's what I came up with:

<?

$dbConnect = mysql_connect($host, $dbUsername, $dbPassword) ;
@mysql_select_db($dbDatabase) or die("Category query failed") ;

$limit = null;
if(!empty($_REQUEST['limit'])) {
	$limit = (int)$_REQUEST['limit'];
}
else {
	$limit = 4;
}

$website_url = 'http://www.xxxxxxx.com';

$fnd_sql = 'SELECT * FROM jos_content WHERE state = 1 AND sectionid = 4 ';
$fnd_sql .= 'ORDER BY publish_up DESC ';

if($limit) {
	$fnd_sql .= "limit $limit ";
}

$result = mysql_query($fnd_sql);
if($result) {

	$xml = "<?xml version='1.0' encoding='ISO-8859-1'?><Items>";
	while($obj = mysql_fetch_object($result)) {
		$xml .= "<Item><Url><![CDATA[$website_url/index.php?option=com_content&view=article&id=$obj->id:$obj->alias&catid=$obj->catid:indians-archive&Itemid=4]]></Url>";
		$xml .= "<Title><![CDATA[$obj->title]]></Title>";
				
	$content = $obj->introtext;
	preg_match('/src=([\'"])([^"\1]+)/i', $content, $match);
	$match[0] = str_replace('src="','', $match[0]);
	
	$http = 'http://';
	$pos = strpos($match[0], $http);
	if ($pos === false) {
		$xml .= "<ImageUrl><![CDATA[$website_url/$match[0]]]></ImageUrl>";
	} else {
		$xml .= "<ImageUrl><![CDATA[$match[0]]]></ImageUrl>";
	}
	
	preg_match('/<img(.*?)>/', $content, $img);
	$content = str_replace($img,'', $content);
	
	/*$desc[0] = str_replace('alt="','', $desc[0]);*/

		$xml .= "<Description><![CDATA[$content]]></Description></Item>";
	}
	
	$xml .= '</Items>';
	
	header("Content-type: text/xml");
	echo $xml;
}

mysql_close() ;

?>