0

Hi all,
I came across a new issue in curl request with php.
I need to fetch a document from other host http://xyz.com/content.html. I cannot view that page without login credentials. So I have given http://user:pwd@xyz.com/content.html?get1=x&get2=y.

I can send a successfull curl request to that page in a command line on my server.

curl --connect-timeout 30 http://user:pwd@xyz.com/content.html?get1=x\&get2=y.

So I have given in my code
(IM using 'simple_html_dom.php')
$html = file_get_html("http://user:pwd@xyz.com/content.html?get1=x&get2=y");
This one not yielding the result and doesnot even produce any errors.

Though I can yield the result by giving "php -f myfile.php" on my server host.

What is that am I missing here. ??
Any help will be much appreciated

Thanks
Sugumaran

2
Contributors
6
Replies
16
Views
5 Years
Discussion Span
Last Post by sugumarclick
1

Check the headers you get with that link:

<?php
$url = 'http://user:pwd@xyz.com/content.html?get1=x&get2=y';
print_r(get_headers($url,1));
?>

And then try to use cURL:

$useragent = "Mozilla Firefox ..."; # set valid user agent
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERPWD, "$username:$password");
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);

There can be a filter for nonvalid user agents. Bye :)

0

Hi Cereal, Thanks for the response.

Warning: file_get_contents() [function.file-get-contents]: php_network_getaddresses: getaddrinfo failed: Temporary failure in name resolution in /var/www/html/reg_post/simple_html_dom.php on line 70

Warning: file_get_contents(http://...content.html) [function.file-get-contents]: failed to open stream: Connection refused in /var/www/html/reg_post/simple_html_dom.php on line 70
GET HEADERS

Warning: get_headers() [function.get-headers]: php_network_getaddresses: getaddrinfo failed: Temporary failure in name resolution in /var/www/html/reg_post/scrapper.php on line 16

Warning: get_headers(http://...content.html) [function.get-headers]: failed to open stream: Connection refused in /var/www/html/reg_post/scrapper.php on line 16

This is the error I am getting now.


when i execute via command line <pre> GET HEADERSArray
(
[0] => HTTP/1.1 200 OK
[Date] => Mon, 19 Dec 2011 14:30:03 GMT
[Server] => Apache/2.2.14 (Unix) mod_jk/1.2.28
[Set-Cookie] => JSESSIONID=ECC3B5BFE6F0B97ED6EC459A99BE764C; Path=/server
[Connection] => close
[Content-Type] => text/html
)


Any thoughts on this ,cereal??

0

The warnings you get are related to connection problems. Can you try to get the contents of a non protected page of that website?
In your array response, status is 200 and there is no redirect, otherwise you will see 302 instead of 200 (or 501) and Location with the landing url. What seems strange to me is the cookie path /server because this means the cookie is valid only for pages inside http://xyz.com/server/. So, in your request link there is also a /server/ path?

More important: is the authentication method based on Apache web Authentication or there is a login form? If the case is a login form first you need to get an authentication cookie, save it with cURL and use it to access to a protected page. Something like:

# send request to the url where action form attribute is pointing and save cookie to cookie.txt file
curl -c cookie.txt "user=your_user&password=01234567" http://xyz.com/login/validate

# use cookie to get protected contents
curl -b cookie.txt http://xyz.com/restricted_area/content.html

Edited by cereal: n/a

0

Hi cereal,
I am sending the request to the page http://$username:$pwd@my_client_host/server/earms?screen=somevalue&op=somevalue&selectedRequestId=somevalue.

When I enter the url http://my_client_host/server/arms?screen=somevalue&op=somevalue&selectedRequestId=somevalue(without the user and pwd) it asks me enter username & pwd through javascript

A username and password are being requested by http://myhost. The site says: "(my-host) Authentication"

failing which I can get 401 authorization required page.. Strange is that when I execute the php file using "php -f myfile.php" yielding the expected result without any errors and filled with errors as stated above when I send a curl request localhost/html/myfile.php?reqid=somevalue. I am running out of ideas .

0

If you are sending user and password in this form, I think the popup requesting user and password is not javascript, it's the apache web authentication system. Simple_html_dom is going to use file_get_contents to get the document, line 70 is:

$dom->load(call_user_func_array('file_get_contents', $args), true);

I can only think to use urlencode():

<?php
echo file_get_contents('http://user:pass@client_host/server/arms?...');
echo file_get_contents(urlencode('http://user:pass@client_host/server/arms?...'));
?>

I hope someone else can give you a better answer. Bye :)

0

Below is the function to get the html contents in simple html dom

function file_get_html($url, $use_include_path = false, $context=null, $offset = -1, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT)
{
    // We DO force the tags to be terminated.
    $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $defaultBRText);
    // For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done.
    $contents = file_get_contents($url, $use_include_path, $context, $offset);
    // Paperg - use our own mechanism for getting the contents as we want to control the timeout.
//    $contents = retrieve_url_contents($url);
    if (empty($contents))
    {
        return false;
    }
    // The second parameter can force the selectors to all be lowercase.
    $dom->load($contents, $lowercase, $stripRN);
    return $dom;
}

and 70th line in the file is

$contents = file_get_contents($url, $use_include_path, $context, $offset);

And when i set urlencode to the url location, it does throw this

Warning: file_get_contents(http%3A%2F%2Facereg%3Aacereg%40earms-app.cisco.com%2Fserver%2Fearms%3Fscreen%3Dcom.cisco.earms.ui.TestResult%26op%3DshowDetail%26selectedRequestId%3D200354385) [function.file-get-contents]: failed to open stream: No such file or directory in /var/www/html/reg_post/simple_html_dom.php on line 70

Fatal error: Call to a member function find() on a non-object in ....

I am sure urlencode is not going to solve this issue.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.