0

One of my websites recently had problems with the PHP news parser.
For many years, it worked fine. However, a while back it started acting up. See the attached image of a screenshot.

And since the PHP parser was the first block of code on the page, since it didn't work, the whole page didn't work.

Now I am trying to narrow down the problem. I am not good with PHP.
The hosting company's Tech Support guy told me the code is bad, even though I made no changes. (Why would it work fine for years and then randomly stop?!?!)
Somebody else suggested the problem might be with FeedBurner, because the feed is coming through FeedBurner.

The error message is, "XML error: not well-formed (invalid token) at line 5"

Does this information indicate the problem is with the PHP code, or the incoming feed?

Any direction/advice/guidance is appreciated.

Attachments Image_of_RS_Fail.PNG 5.47 KB
3
Contributors
11
Replies
51
Views
2 Years
Discussion Span
Last Post by DavidB
0

Does this information indicate the problem is with the PHP code, or the incoming feed?

It seems related to the XML document, not to PHP, otherwise the error would be at PHP level. Could you share the XML?

0

The XML looks fine to me, it validates and I can parse it in PHP.
They probably made a change to the structure of their XML at some point and it caused your code to break.

You will probably need to share your PHP that you're using to get much further.

0

It is the Space Exploration feed from Science Daily:

No, I was asking for your RSS feed not for the source but yes, at this point, as suggested by pixelsoul share also your PHP code.

0

Okay, attached is a text file that includes the section of PHP code for the parser. It has worked fine for years, but if you have any suggestions for making it fail gracefully, it would be appreciated. That way, the whole page wouldn't fail just because the RSS feed is failing.

Attachments
<?php

		  		  	  function startElement($parser, $name, $attrs) {
		  		  	          global $rss_channel, $currently_writing, $main;
		  		  	          switch($name) {
		  		  	                  case "RSS":
		  		  	                  case "RDF:RDF":
		  		  	                  case "ITEMS":
		  		  	                          $currently_writing = "";
		  		  	                          break;
		  		  	                  case "CHANNEL":
		  		  	                          $main = "CHANNEL";
		  		  	                          break;
		  		  	                  case "IMAGE":
		  		  	                          $main = "IMAGE";
		  		  	                          $rss_channel["IMAGE"] = array();
		  		  	                          break;
		  		  	                  case "ITEM":
		  		  	                          $main = "ITEMS";
		  		  	                          break;
		  		  	                  default:
		  		  	                          $currently_writing = $name;
		  		  	                          break;
		  		  	          }
		  		  	  }

		  		  	  function endElement($parser, $name) {
		  		  	          global $rss_channel, $currently_writing, $item_counter;
		  		  	          $currently_writing = "";
		  		  	          if ($name == "ITEM") {
		  		  	                  $item_counter++;
		  		  	          }
		  		  	  }

		  		  	  function characterData($parser, $data) {
		  		  	          global $rss_channel, $currently_writing, $main, $item_counter;
		  		  	          if ($currently_writing != "") {
		  		  	                  switch($main) {
		  		  	                          case "CHANNEL":
		  		  	                                  if (isset($rss_channel[$currently_writing])) {
		  		  	                                          $rss_channel[$currently_writing] .= $data;
		  		  	                                  } else {
		  		  	                                          $rss_channel[$currently_writing] = $data;
		  		  	                                  }
		  		  	                                  break;
		  		  	                          case "IMAGE":
		  		  	                                  if (isset($rss_channel[$main][$currently_writing])) {
		  		  	                                          $rss_channel[$main][$currently_writing] .= $data;
		  		  	                                  } else {
		  		  	                                          $rss_channel[$main][$currently_writing] = $data;
		  		  	                                  }
		  		  	                                  break;
		  		  	                          case "ITEMS":
		  		  	                                  if (isset($rss_channel[$main][$item_counter][$currently_writing])) {
		  		  	                                          $rss_channel[$main][$item_counter][$currently_writing] .= $data;
		  		  	                                  } else {
		  		  	                                          $rss_channel[$main][$item_counter][$currently_writing] = $data;
		  		  	                                  }
		  		  	                                  break;
		  		  	                  }
		  		  	          }
		  		  	  }

		  		  	  set_time_limit(0);  // Specify no time limit for execution of script

		  		  	  // Specify the input source, the stream to be parsed
		  		  	  $file = "http://www.sciencedaily.com/rss/space_time/space_exploration.xml";

		  		  	  $rss_channel = array(); // The array for storing the input strings
		  		  	  $currently_writing = "";
		  		  	  $main = "";
		  		  	  $item_counter = 0;
		  		  	  $numItems;  // The number of headlines streamed in
		  		  	  $numOut = 20;  // The maximum number of headlines to output.
		  		  	  $xml_parser = xml_parser_create(); // Create an xml parser

		  		  	  xml_set_element_handler($xml_parser, "startElement", "endElement");  // Set up start and end elements
		  		  	  xml_set_character_data_handler($xml_parser, "characterData");  // Set up character data handler

		  		  	  if (!($fp = fopen($file, "r"))) {
		  		  	          die("could not open XML input");
		  		  	  }

		  		  	  while ($data = fread($fp, 4096)) {
		  		  	          if (!xml_parse($xml_parser, $data, feof($fp))) {
		  		  	                  die(sprintf("XML error: %s at line %d",
		  		  	                                          xml_error_string(xml_get_error_code($xml_parser)),
		  		  	                                          xml_get_current_line_number($xml_parser)));
		  		  	          }
		  		  	  }
		  		  	  xml_parser_free($xml_parser);  // Free the xml parser

		  		  	  $numItems = count($rss_channel["ITEMS"]);
		  		  	  // print "numItems = " .$numItems;

		  		  	  if ($numItems < $numOut)
		  		  	   $numOut = $numItems;

		  		  	  // output HTML
		  		  	  // print ("<div class=\"channelname\">" . $rss_channel["TITLE"] . "</div>");  // Output the name of the channel source. i.e. - moreover science space

		  		  	  if (isset($rss_channel["ITEMS"])) {
		  		  	          if ($numItems > 0) {
		  		  	                  for($i = 0; $i < $numOut; $i++) {

		  		  	                        // Trim the length of the title text for each headline to 44 characters and replace excess with an ellipsis

											if (strlen($rss_channel["ITEMS"][$i]["TITLE"]) > 47) {
												 $rss_channel["ITEMS"][$i]["TITLE"] = substr($rss_channel["ITEMS"][$i]["TITLE"], 0, 44). ". . .";
											} // End if

		  		  	                          if (isset($rss_channel["ITEMS"][$i]["LINK"])) {
		  		  	                          print ("\n<div class=\"newslink\">&nbsp; &#149;<a href=\"" . $rss_channel["ITEMS"][$i]["LINK"] . "\">" . $rss_channel["ITEMS"][$i]["TITLE"] . "</a></div>");
		  		  	                          } else {
		  		  	                          print ("\n <div class=\"newslink\">".$rss_channel["ITEMS"][$i]["TITLE"] . "</div>");
		  		  	                          }
		  		  	  //                         print ("<div class=\"itemdescription\">" . $rss_channel["ITEMS"][$i]["DESCRIPTION"] . "</div><br />");
		  		  	  }
		  		  	          } else {
		  		  	                  print ("<b>There are no articles in this feed.</b>");
		  		  	          }
		  		  	  }

	  ?>
0

Heh, I just came to the same solution, correct link is:

The problem happens because the old link asks for a session cookie, which is not submitted by the request, in absence it will return an HTML page, instead of the XML. You can test it by using file_get_contents():

$file = "http://www.sciencedaily.com/rss/space_time/space_exploration.xml";
print_r(file_get_contents($file));

Besides: at line 73 you're declaring $numItems; without any default value, which means the variable is still undefined, and it will fail if for example you do something like:

$a;
echo $a++;

Bye!

Edited by cereal

0

Thanks for the advice.

I will clean that up ASAP.

I am also curious about Line 64: set_time_limit(0);
Is this okay?
If it were changed to another number, would it cause the failures to be handled better (i.e., render rest of page instead of wait forever)?

0

Changing set_time_limit(0); won't change the way PHP handles errors, it will however stop the processing of this, should it continue going for an extended amount of time.

This is the line that causes your page to stop loading when it runs into an error

die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));

The die() function basically "kills" any further processing of code on the page.
It would be better to use PHP Exceptions with a try/catch, or you could create your own error handler.

Exceptions
http://php.net/manual/en/language.exceptions.php
http://code.tutsplus.com/tutorials/php-exceptions--net-22274

Error handler
http://php.net/manual/en/function.set-error-handler.php

0

In addition: you could add a timeout to file() so that if the resource is not reachable, it will stop after a defined number of seconds and emit an E_WARNING message:

$options["http"]  = array(
    "method"    => "GET",
    "header"    => "Content-Type: text/xml; charset=UTF-8\r\n".
                   "Connection: close\r\n",
    "timeout"   => 15,
);
$context = stream_context_create($options);

if( ! ($fp = fopen($file, "r", FALSE, $context))) {
0

Thank you both very much for the advice. That seems to have solved the problem.
Will now mark the thread "Solved."

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.