Pull data from a website

Question

jamesl22 0 Light Poster

13 Years Ago

If I have an external website how can I pull data from it. I have the following code snippet that I need to pull data from:

<div class="headlinesBox">

							
																								<div class="headline currentHeadline">
									<div class="headlinesClipping">
									<img src="/common/images/thumbnails/source/1320614405d.jpg" style="float: left; width: 262px; height: 236px;"/>
									</div>
									<div class="headlinesText">
										<h3><a href="/details/news/1325223/Template-Assisted_Fabrication_for_Polymer_Solar_Cells.html" title="Template-Assisted Fabrication for Polymer Solar Cells">Template-Assisted Fabrication for Polymer Solar Cells</a></h3>

										<p>
											Bulk heterojunction films with nanostructured donor/acceptor interfaces have been fabricated for photovoltaic devices by means of anodic aluminum oxide (AAO) templates.
											<!--
																						-->
										</p>
									</div>
								</div>
							
																								<div class="headline additionalHeadline">
									<div class="headlinesClipping">
									<img src="/common/images/thumbnails/source/13215dec9a2.jpg" style="float: left; width: 156px; height: 236px;"/>

									</div>
									<div class="headlinesText">
										<h3><a href="/details/news/1328303/First_Understand_Absorber_Layers_Then_Improve_Solar_Cell_Efficiency.html" title="First Understand Absorber Layers, Then Improve Solar Cell Efficiency">First Understand Absorber Layers, Then Improve Solar Cell Efficiency</a></h3>
										<p>
											A thorough understanding of photovoltaic materials is crucial if thin-film solar cell efficiency is to be improved.
											<!--
																						-->
										</p>
									</div>
								</div>

i need to get the title, image location and page location for both articles in that code snippet and put them into an array. How can I do this?

Thanks,

James

php

4 Contributors
7 Replies
145 Views
11 Hours Discussion Span
Latest Post 13 Years Ago Latest Post by diafol

All 7 Replies

diafol

13 Years Ago

If your host allows you to use file_get_contents() on external sites (some don't - check the phpinfo()), then use that to gain the output and use substr() or some of the preg functions to strip out the bits you need.

Perhaps xpath or curl could also do what you want.

diafol

13 Years Ago

Have a look at this: http://uk.php.net/manual/en/function.preg-match.php

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

phaedrusGhost 0 Junior Poster in Training · Answer 1 · 2011-09-01T21:56:21+00:00

Look up web scraping or site crawling. There are several ways to accomplish this so you have to figure out what type of data you are going to be scraping off the site(s) and then build something to accomplish your goal.

jamesl22 0 Light Poster · Answer 2 · 2011-09-01T22:47:03+00:00

If your host allows you to use file_get_contents() on external sites (some don't - check the phpinfo()), then use that to gain the output and use substr() or some of the preg functions to strip out the bits you need.
Perhaps xpath or curl could also do what you want.

Im a little confused of what I would need to do to use preg_match or is that the wrong function?

jamesl22 0 Light Poster · Answer 3 · 2011-09-01T23:05:55+00:00

Have a look at this: http://uk.php.net/manual/en/function.preg-match.php

In this example:

<?php
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i',
    "http://www.php.net/index.html", $matches);
$host = $matches[1];

// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>

How would I alter that for my needs? All the brackets and symbols are confusing :confused:

chrishea 182 Nearly a Posting Virtuoso · Answer 4 · 2011-09-01T23:45:41+00:00

I have done quite a bit of screen scraping. Some of this has been run on a daily basis to extract data and move it to somewhere else. I decided to capture what I know about this in my help file. You can see it here.

diafol · Answer 5 · 2011-09-02T02:32:10+00:00

If you want a ready made script, you could do worse than look at: http://simplehtmldom.sourceforge.net/

Pull data from a website

Recommended Answers Collapse Answers

All 7 Replies

Recommended Answers