0

If I have an external website how can I pull data from it. I have the following code snippet that I need to pull data from:

<div class="headlinesBox">

							
																								<div class="headline currentHeadline">
									<div class="headlinesClipping">
									<img src="/common/images/thumbnails/source/1320614405d.jpg" style="float: left; width: 262px; height: 236px;"/>
									</div>
									<div class="headlinesText">
										<h3><a href="/details/news/1325223/Template-Assisted_Fabrication_for_Polymer_Solar_Cells.html" title="Template-Assisted Fabrication for Polymer Solar Cells">Template-Assisted Fabrication for Polymer Solar Cells</a></h3>

										<p>
											Bulk heterojunction films with nanostructured donor/acceptor interfaces have been fabricated for photovoltaic devices by means of anodic aluminum oxide (AAO) templates.
											<!--
																						-->
										</p>
									</div>
								</div>
							
																								<div class="headline additionalHeadline">
									<div class="headlinesClipping">
									<img src="/common/images/thumbnails/source/13215dec9a2.jpg" style="float: left; width: 156px; height: 236px;"/>

									</div>
									<div class="headlinesText">
										<h3><a href="/details/news/1328303/First_Understand_Absorber_Layers_Then_Improve_Solar_Cell_Efficiency.html" title="First Understand Absorber Layers, Then Improve Solar Cell Efficiency">First Understand Absorber Layers, Then Improve Solar Cell Efficiency</a></h3>
										<p>
											A thorough understanding of photovoltaic materials is crucial if thin-film solar cell efficiency is to be improved.
											<!--
																						-->
										</p>
									</div>
								</div>

i need to get the title, image location and page location for both articles in that code snippet and put them into an array. How can I do this?

Thanks,

James

4
Contributors
7
Replies
9
Views
5 Years
Discussion Span
Last Post by diafol
0

Look up web scraping or site crawling. There are several ways to accomplish this so you have to figure out what type of data you are going to be scraping off the site(s) and then build something to accomplish your goal.

Edited by phaedrusGhost: n/a

0

If your host allows you to use file_get_contents() on external sites (some don't - check the phpinfo()), then use that to gain the output and use substr() or some of the preg functions to strip out the bits you need.

Perhaps xpath or curl could also do what you want.

0

If your host allows you to use file_get_contents() on external sites (some don't - check the phpinfo()), then use that to gain the output and use substr() or some of the preg functions to strip out the bits you need.

Perhaps xpath or curl could also do what you want.

Im a little confused of what I would need to do to use preg_match or is that the wrong function?

0

I have done quite a bit of screen scraping. Some of this has been run on a daily basis to extract data and move it to somewhere else. I decided to capture what I know about this in my help file. You can see it here.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.