Hi all

i am a newbie and try different things everyday and always come here when i am stuck with something.

I want to write a script using curl and php that goes to this link :http://tools.cisco.com/WWChannels/LOCATR/openBasicSearch.do and then goes through each page for each country capturing a list of every partner in every country and saving it to database.

i have no ideas how script will select countries one by one from select box and redirect page to country page...which is the very first thing to do, once we are on the page pattern matching comes in play for storing name and address in database which i can manage.

Problem is before we select any country url is::http://tools.cisco.com/WWChannels/LOCATR/openBasicSearch.do
and after we select country say 'india' url is:http://tools.cisco.com/WWChannels/LOCATR/performBasicSearch.do , there is no reference to any country selected.

The Idea that i had was to traverse the HTML page, and enter all countries in an array and then make a recursive function to call a page with specific country but for that we need something different in URL for each country in recursive function right?

Please help

I received your message and had a crack at the code but it seems there is some sort of anti-bot protection script on the website. The following is the script I used:

<?
//array('AF','DZ','AS','AD','AO','AI','AQ','AG','AR','AM','AW','AT','AU','AZ','BS','BH','BD','BB','BY','BE','BZ','BJ','BM','BT','BO','BA','BW','BV','BA','IO','BN','BF','BG','BI','KH','CM','CA','CV','KY','CF','TD','CL','CN','CX','CC','CO','KM','CG','CD','CK','CR','CI','HR','CY','CZ','DK','DJ','DM','DO','EC','EG','SV','GQ','EE','ET','FK','FO','FJ','FI','FR','GF','PF','GA','GM','GE','DE','GH','GI','GR','GL','GD','GP','GU','GT','GN','GW','GY','HT','HM','VA','HN','HK','HU','IS','IN','ID','IQ','IE','IL','IT','JM','JP','JO','KZ','KE','KI','KR','KW','KG','LA','LV','LB','LS','LR','LY','LI','LT','LU','MO','MK','MG','MW','MY','MV','ML','MT','MH','MQ','MR','MU','YT','MX','FM','MD','MC','MN','MS','MA','MZ','MM','NA','NR','NP','NL','AN','NC','NZ','NI','NE','NG','NU','NF','MP','NO','OM','PK','PW','PS','PA','PG','PY','PE','PH','PN','PL','PT','PR','QA','RE','RO','RU','RW','SH','KN','LC','PM','VC','WS','SM','ST','SA','SN','CS','SC','SL','SG','SK','SI','SB','SO','ZA','ES','LK','SR','SJ','SZ','SE','CH','TW','TJ','TZ','TH','TL','TG','TK','TO','TT','TN','TR','TM','TC','TV','UG','UA','AE','GB','US','UM','UY','UZ','VU','VE','VN','VG','VI','WF','YE','ZM','ZW');
$country=array('AF');
for ($id=0;isset($country[$id]);$id++) {
	$ch = curl_init();
	// set the target url
	curl_setopt($ch, CURLOPT_URL,'http://tools.cisco.com/WWChannels/LOCATR/performBasicSearch.do');
	curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); // request as if Firefox
	// howmany parameter to post
	curl_setopt($ch, CURLOPT_POST, true);
	curl_setopt($ch, CURLOPT_POSTFIELDS,'state=&latitude=&longitude=&city=&zip=&lonlatRequired=N&smbSort=true&address=&country='.$country[$id]);
	curl_setopt($ch, CURLOPT_NOBODY, false);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
	$result= curl_exec ($ch);
	curl_close ($ch);
	echo $result;
	}
?>

And if you check what it displays the curl function doesn't redirect to the third page which contains the results. It is stuck in the second page and may may have something to do with javascript or ajax. So although the above script partially works at retrieving the data it just needs extending to lead to the beginning results instead of the bot trap.

Also just as a note, the commented array can replace the array on the line below it when it all works so that the script will check all of the countries.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.