I have this feed of classifieds that I need to convert to an .xml doc. I explode each string into an array (separated by a "|"). Everything work just fine except the string cuts off if there are any line breaks in the 5th section and everything after the break is skipped over and I lose that information.

I've tried things like prep_replace & str_replace (looking for things like \n\r char(10) char(13) \t \0 \x0B), striptags, nl2br. I can't find these breaks and destroy them.

I'm attaching part of the text file that I'm using. I've had it open in Notepad and seen small squares where the breaks are, but that's the only place I've seen that. The first line in the text file is a complete string with no breaks and how the ideal feed would look, after that it hasn't been adjusted.

If anyone has come across anything like this before please lend me some aid.

Thank you.

Recommended Answers

All 14 Replies

This is a bit of a tricky one but you could do something along these lines:

$somefile = file('blah');

$rows = array();
$row_counter = 0;
foreach($somefile as $line => $content) {
  if(!preg_match('/^[0-9]+?\|/', $content)) {
    $rows[$row_counter] .= $content;
  } else {
    $rows[++$row_counter] = $content;
  }
}

$rows should contain your corrected array

Forgive me if I am reading this code wrong, but won't that just split up the whole string into an array? I think I'm doing that already with explode.

Maybe it will make more sense if I just say I need a way to kill all carriage returns/line breaks in a string because the ways I've done it before are not working with this feed and I don't know why.

What the code does is reads in the file into an array with each index being a line. What the loop does is goes through each line and checks to see if it starts with your first key which is a series of numbers followed by |. So 3335| or 67341|. If it doesn't match this pattern then it is assumed that it is part of the last row so it is appended to the last row and it continues from there.

Well that makes sense. Thanks, I'm trying to incorporate into the rest of my script.

It was a new approach, but I'm getting the same results.

If the string was:
1080|1284898|50629|2778 WHITETAIL AVE.||LOST: Yellow Lab,in rural Fairbank/Oran area. Camouflage collar with tag, name Otis.
319-638-7931 Reward|

It is exporting the .xml like this:

<ad adID="1284898"
renewedID=""
newspaperID="433"
email=""
address=""
city=""
state=""
catID="1080"
loczip="54601"
zip="50629"
startDate="20090607"
stopDate="20090608"
kill=""
doNotUpsell=""
>
<title>
<![CDATA[LOST: Yellow Lab,in rural Fairbank/Oran area. Camo]]>
</title>
<description>
<![CDATA[LOST: Yellow Lab,in rural Fairbank/Oran area. Camouflage collar with tag, name Otis.]]>
</description>
<adImages>

You can see where the ad text is getting cut off (I highlighted it in red). There is still some kind of a break in there that I'm not catching.

I just tested it again and it definitely works. What exactly is your code?

There is a lot of it, but I think that this should have everything relevant to what I'm trying to do in it. This is with your code put in there.

if (isset($_POST["submit"])) {

	$linerdate = $_POST["linerdate"];

	$feeddate = date("Y-m-d",$linerdate) . " 19:00:00";
	$startdate = date("Ymd",$linerdate);
	$killdate = date("Ymd",($linerdate + 86400)); //kill the ads after 1 day, or 86400 seconds

	$verifyfeed = "incoming_data/TBT" . date("mdy",$linerdate) . ".txt";
	
	if (!file_exists($verifyfeed)) {
		echo "The file, \"<FONT COLOR=\"#990000\">" . $verifyfeed . "</FONT>,\" does not exist in the folder.<BR>\nPlease export the data file from APT first before running this process.";
		exit;
	} 

echo "<FONT COLOR=\"#000099\">Processing ... </FONT><BR>";

$file_handle = file($verifyfeed, "rb"); 

$adcopy = array();
$row_counter = 0;
foreach($file_handle as $line => $content) {
  if(!preg_match('/^[0-9]+?\|/', $content)) {
    $adcopy[$row_counter] .= $content;
  } else {
    $adcopy[++$row_counter] = $content;
  }
  $row_counter++;
}

$convertedfile = "<" . "?" . "xml version=\"1.0\" encoding=\"IS" . "O-8859-1\"?" . ">\n<export>\n<feedPublishDate>" . $feeddate . "</feedPublishDate>\n";
 
  
$killReturn = array('/\r\n|\n|\r|chr(13)|\t|\0|\x0B/');
$killReturn2 = array("\r\n", "\n", "\r", "chr(13)", "chr(10)", "\t", "\0", "\x0B", "<br />", "\xc2", "\xa0", "\x00..\x1F");

for ($i=0; $i<($row_counter-1); $i++) {

	$tempinfo = explode("|",$adcopy[$i]);
	$catnum = $tempinfo[0];
	$adnum = $tempinfo[1];
	$zipnum = $tempinfo[2];
	$adtext = rtrim($tempinfo[5]);
	
	
	
	if (strlen($adtext) > 0) { // Make sure there's ad copy
		$pos = strpos($adtext,"<IMG SRC"); // Is there an image with the ad?
		if ($pos === false) { // If not, insert the Tribune logo image
			$imgnum = "/art/courierMarketplaceLogo.jpg";
			/*
			$pos = 99999;
			*/
		} else if ($pos !== false) { // Otherwise, pull the image from the ad text and save it as $imgnum
			$posstart = ($pos + 10);
			$pos = strpos($adtext,".jpg\">");
			$posend = ($pos + 3);
			$imgnum = substr($adtext,$posstart,(($posend - $posstart) + 1));
			$replacethis = substr($adtext,$posstart-10,(($posend - $posstart) + 3));
			$withthis = "";
			$adtext = str_replace($replacethis, $withthis, $adtext);
		
			/* if the 9th position of $adtext is >, delete 0 - 9 characters */
			$poscheck = strpos($adtext,">\"<BR>");
			if ($poscheck = 9) {
				$adtext = substr($adtext, 14);
			}
		}
	
	// Delete a number of HTML tags for bold, italics, underline
	
	$adtext = str_replace("<B>","",$adtext);
	$adtext = str_replace("</B>","",$adtext);
	$adtext = str_replace("<I>","",$adtext);
	$adtext = str_replace("</I>","",$adtext);
	$adtext = str_replace("<U>","",$adtext);
	$adtext = str_replace("</U>","",$adtext);
	$adtext = str_replace("\r\n", "",$adtext);
	$adtext = str_replace("\n", "", $adtext);
	$adtext = str_replace(chr(10),"",$adtext);
	$adtext = str_replace(chr(13),"",$adtext);
	$adtext = str_replace('""', "", $adtext);
	$adtext = str_replace($killReturn, " ",$adtext);
	
	/*
	Now build the xml coding using the collected variables, store it in a temp file and then save the file as a text file for uploading
	*/
	
	$convertedfile .= "<ad adID=\"" . $adnum . "\"\n";
	$convertedfile .= "renewedID=\"\"\n";
	$convertedfile .= "paperID=\"403\"\n";
	$convertedfile .= "email=\"\"\n";
	$convertedfile .= "address=\"\"\n";
	$convertedfile .= "city=\"\"\n";
	$convertedfile .= "state=\"\"\n";
	$convertedfile .= "catID=\"" . $catnum . "\"\n";
	$convertedfile .= "loczip=\"54601\"\n";
	$convertedfile .= "zip=\"" . $zipnum . "\"\n";
	$convertedfile .= "startDate=\"" . $startdate . "\"\n";
	$convertedfile .= "stopDate=\"" . $killdate . "\"\n";
	$convertedfile .= "kill=\"\"\n";
	$convertedfile .= "doNotUpsell=\"\"\n";
	$convertedfile .= ">\n";
	$convertedfile .= "<title>\n";
	$convertedfile .= "<![CDATA[" . substr($adtext, 0, 50) . "]]>\n";
	$convertedfile .= "</title>\n";
	$convertedfile .= "<description>\n";
	$convertedfile .= "<![CDATA[";
	$convertedfile .= $adtext . "]]>\n";
	$convertedfile .= "</description>\n";
	$convertedfile .= "<adImages>\n";
	$convertedfile .= "<images>\n";
	$convertedfile .= "<image width=\"\">\n";
	$convertedfile .= "<![CDATA[" . $imgnum . "]]>\n";
	$convertedfile .= "</image>\n";
	$convertedfile .= "<image width=\"\">\n";
	$convertedfile .= "<![CDATA[]]>\n";
	$convertedfile .= "</image>\n";
	$convertedfile .= "</images>\n";
	$convertedfile .= "</adImages>\n";
	$convertedfile .= "</ad>\n";
	
	/*
	echo $i . ": Category: " . $catnum . "<BR>\nAd Number: " . $adnum . "<BR>\nZip: " . $zipnum . "<BR>\nAd Text: " . $adtext . "<BR>\nImage: " . $imgnum . "\n<HR>\n";
	*/

	$imgnum = "";

	/*
	$catnum = "";
	$adnum = "";
	$zipnum = "";
	$adtext = "";
	$imgnum = "";
	$adcopy = "";
	$pos = "";
	$posstart = "";
	$posend = "";

... and then it goes on with some file shuffling from here

So... the content you gave me is obviously nothing like the content you're using because there were no HTML tags in the example file you gave. Don't put a newline after every attribute, its wasted space, only put a newline after the tag if even then. And paste your output.

Oh, that's just for a makeshift title, we display the first 50 characters or whatever. A couple lines down you'll see $adtext and that's where the whole thing gets displayed (or would if I could get it to work properly).

Oh, that's just for a makeshift title, we display the first 50 characters or whatever. A couple lines down you'll see $adtext and that's where the whole thing gets displayed (or would if I could get it to work properly).

yeah, I noticed that 3 seconds after I posted it, hence the edit :)

So... the content you gave me is obviously nothing like the content you're using because there were no HTML tags in the example file you gave. Don't put a newline after every attribute, its wasted space, only put a newline after the tag if even then. And paste your output.

I'm not sure what you mean. The text file I had attached is the file that I'm trying to convert to an xml. What I pasted in here last was the code that I'm running the file through so that I can create that xml. I thought that's what you had asked for.

That's good to know about the newline, but for where it's going, they asked to have it laid out that way.

The code that I posted worked against the file you gave so there's something wonky you're doing with the adtext variable in between there. As a start, get rid of all those damn str_replaces for newlines, you don't need them you're inserting into a CDATA field. I'm not about to decode what the heck those other str_replaces/strpos's are doing, I'll leave that to you

Yeah, sorry about that. I was doing a lot of trial and error (mostly error) and was kind of grasping at air towards the end. I was hoping that one of them would find the breaks and kill them. I'll clean it up.

Hey. I thought I'd post this solution I eventually used so I could get this thread shut down. You're code was still losing me some of the info I needed. Thanks though for your help. I really appreciated it.

$fp = fopen($verifyfeed, "r");
   $killReturn2 = array("\r\n", "\n", "\r", "\n\n", "chr(13)", "chr(10)");
   $numbered = 0;
   $line = '';
   while (!feof($fp)) {
        $line .= fgets($fp);    // read in a line
        if ($line[strlen($line) -2] == '|') { // then we are done with this ad
                // we know we're done with ad text if there is
                // a pipe followed by a return
          $ar = split('\|', $line);  // break it out by the delimiter
        $line = '';     // done with ad - clear it out
        }
		$adcopy[$numbered] = implode('|', str_replace($killReturn2, ' ', $ar));
		$numbered++;
   }
   fclose($fp);
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.