I have this feed of classifieds that I need to convert to an .xml doc. I explode each string into an array (separated by a "|"). Everything work just fine except the string cuts off if there are any line breaks in the 5th section and everything after the break is skipped over and I lose that information.

I've tried things like prep_replace & str_replace (looking for things like \n\r char(10) char(13) \t \0 \x0B), striptags, nl2br. I can't find these breaks and destroy them.

I'm attaching part of the text file that I'm using. I've had it open in Notepad and seen small squares where the breaks are, but that's the only place I've seen that. The first line in the text file is a complete string with no breaks and how the ideal feed would look, after that it hasn't been adjusted.

If anyone has come across anything like this before please lend me some aid.

Thank you.

Attachments
3100|1261539|50613|0AK-3 LI/$100,000||OAKRIDGE REALTORS 277-5005   OakridgeRealtors.net|
3100|1175294|50701|C13 EOD 3 MOS||Jack Page Real Estate 
233-4515|
3100|1256892|50701|4DV3 2 LI 12,000 DVC||Prudential One Realty   233-5293
www.prudentialone.com |
3100|1260801|50701|LOC-3LI/$60,000||LockardOnline.com
Lockard Realty  277-8000|
3100|1280226|50701|CMAX 3 LI/$120,000 YEARLY||RE/MAX Home Group232-7100
266-7100 openhousebymouse.net|
3100|1280718|50701|C13 EOD 3 MOS||REDUCED OVER $5,000
Nice 2 BR, new floor coverings, GAR, $39,500.
Northstar Realtors 233-5260|
3100|1282218|50702|4DV3   3 LI/$12,000||Harbaugh Winninger Realtors      waterlooiowarealty.com    234-4402|
3100|1284559|50701|4DV3 $4,500/2 LI DAILY||Bill Ramsey Realtors 
235-6205|
3170|5497|50704|CLASSIFICATION HEADINGS||FOR RENT
|
3170|1279937|50703|4DV3- 3 LI DAILY 12 MOS||OFFICE RENT: 6 mos. free rent in River Plaza and Black's Buildings. 235-1521  www.blacksbuilding.com|
3170|5498|50704|CLASSIFICATION HEADINGS||FOR SALE/LEASE
|
3170|1276716|50613|4DV3 3 LI DAILY YEARLY||RETAIL and office spaces, 
$200 - $3,500 319-234-0535 days, 984-5726 eves|
3170|1284668|50701|4DV3 $6,000 DV/3 LI DAILY||3000 +/- Sq. Ft. Office on Falls $140,000 Wolfe Realtors 433-0300|
3200|1284859|50701|4DVC $4,500 DOLLAR VOL||Super Sharp and Clean Timberline condos with 3BR, 2 BA, full basement and awesome patio area and private corner unit. A must see. Call Gary Rankin at Young Development    415-6343 or 235-5346
|
3200|1221629|50701|4DV3-3 LI DAILY 12 MOS+HT||LOFT: Winterbottom-14' ceiling, hrdwd floor, parking/storage inside,
tax abatement for 6 yrs.    235-1521
|
4400|1283676|50704|C/O CATHY KNUDSVIG||Brentwood Estates
1 BR, Security building Thunder Ridge area, C.F.  234-8712 |
4400|1283768|50677|202 CEDAR AVENUE||1-4 BR 10 min. N. C.F. pool, $400-$700. 352-5555 hildebrandrentals.com|
4400|1283902|55903|227 7TH ST. NW HORIZON TO||1 BR apt. Cedar Falls
Seniors/Disabled Rental Assistance avail. Free electric, heat, water, trash. Great atmosphere and location   277-1441
Equal Housing Opportunity|
4400|1284104|50648|1350 12TH ST||GREAT C.F. location! 2 BR, laundry, off street parking, free cable. Avail now.No Sec. 8.   319-961-1219|
4400|1284106|50703|4046 LOGAN AVE.||2 BR apts. 1 block from College Hill
Free cable. $550/mo.+dep. 2515 Olive St. Call for appt. 319-269-0128|
4400|1284203|50613|5715 HUDSON RD||C.F. 1 BR, $425 Incl. heat water garage. No dogs. 266-7154|
4400|1284478|50613|808 E. 18TH ST.||CEDAR FALLS 1 BR, No Sec. 8
319-240-7967|
4400|1280309|50651|4DV3 3 LI DLY YR NO INT.H||COBBLESTONE CIR. 2 BR, GAR, $525+dep. lease, no pets  240-5233|
4400|1261457|50707|C09 6 MOS 3 LI DAILY||NEW construction, condo style apt, 2 BR, appls, Gar, $785.   232-3715|
4400|1260291|52406|4DV3  3 LI DAILY YRLY||PARK Tower Apts. for handicapped, 
eldery or disabled. 235-7754  EHO|
4400|1282807|50701|4DV3-3 LI DAILY 12 MOS+HT||DOWNTOWN loft for rent, 2 BR, 2 BA, wood floors, skylight, $800 mo. + utilities. 235-1521|
4400|1284671|50707|1019 GILBERT ST.||first mo. free. East, 117 Courtland, 2 BR, $400+ dep. 232-4777|
4400|1284717|50621|4DV3 3 LI/$4,500||2 BR - 1425 Franklin #3, 2 BR in Tama.   319-226-4663|
4400|1283753|50701|1141 LANTERN SQ.||South W'loo, lg 2 story, 4 BR on nearly 2 A, free use of adjacent pool, $1,500/yr lease. 319-493-1382|
4400|1283690|50702|7055 SOPPE FARM RD||2 BR, 734 W. 1st St., W'loo, GAR, no pets, June 1, $550. 319-215-5889|
4400|1247649|50648|C15 3 MOS 3 LI DAILY||Quail Valley Apts.
West 4th and Pinehurst     
Spacious Studios and 2 BRs. 
GARs avail.   Great place to live   
Call 215-7711 for app't|
4450|1284399|50707|1747 ENID ST.||315 E. 10th, Waterloo-Nice clean 2 BR, GAR, laundry $530 + dep. Avail. now. 233-5205 or 415-3750|
4450|1284296|50622|2596 MIDWAY AVE.||1 BR up duplex, W'loo, washer/ dryer, $350 + dep. 319-984-6404|
4450|1284959|50702|3623 PEARL LANE||245 BROOKERIDGE, spacious 2 BR, basement, $585 233-5513|
4500|1284798|50613|1107 LILAC LANE||1222 ACKERMANT Rd., W'loo: 1 BR, lg. BR/BA, full BSMT, new flooring/carpet, remodeled BA, $500 mo. 319-239-2914|
4500|1284675|50613|2822 WEST 3RD ST||C.F. 4 BR ranch (3 up, 1 down), AC, finished BSMT, attached GAR, 1731 Brookside Dr., near UNI avail. now. $1,200/mo. 319-231-2173|
4500|1284905|50613|1809 GIBSON ST.||Very Nice 3 BR, C.F.
3716 Knoll Ridge, 2 Bath, Jacuzzi, Dishwasher, AC, Deck, GAR. $1,100 + dep.             319-231-0219|
4500|1284917|50613|228 MARYHILL DR.||1 BR, 1222 bertchwood, C.F.  GAR. no pets, $450+dep. 266-2688|
4500|1284716|50621|4DV3 3 LI/$4,500||2 BR - 808 Barclay, 4 BR - 1011 Logan/ 1205 Randolph. 319-226-4663|
4500|1284398|50707|100 BRENNEN BLVD||3 BR, 1 BA, 749 Kern, $700 mo., avail July 1st, Sec. 8 ok. 215-9616|
4500|1284670|50647|2676 DAKOTA AVE.||Rent to own. 2 story, 3 BR, 2.5 detached GAR, appls. $925/mo. 
319-404-1111|
4500|1284991|50665|4DV3-3LI DAILY YRLY||427 Ricker 3 BR 225 Mobile 3 BR
Sec. 8 ok.         319-240-6715|
4500|1284596|50676|2010 MIDWAY AVE.||403 Sunnyside Ave., W'loo, 3 BR, AC, $650 mo., Sec. 8 and pets (w/dep) ok.  Rachel 515-297-2126. |
4500|1284966|50669|101 E. SPRUCE ST.||3-4 BR, 2.5 BA, 3133 Tulip, lease, no smoking/pets, $1,200. 415-3470|
4500|1284365|50703|4DV3-3 LI DAILY 12 MOS|| 409 Wellington, 3 BR, $595. 
400 Elm St., 3 BR, $595
319-234-5958 |
4500|1284731|50626|405 CARROLL BLVD||2 BR, all appls., washer/dryer, hrdwd. floors, GAR. $650/mo.1541 Forest Ave, W'loo.  319-215-5578|
4500|1284835|50622|1991 260TH ST||2 BR Townhouses on West Mullan. Clean! Freshly painted. Sec. 8 ok. Call Mark 319-269-2897|
4500|1284939|50703|2245 BURTON AVENUE||821 W. 6TH, 2 BR, $575. No section 8, no pets. 319-231-9925|
4500|1284941|50703|2245 BURTON AVENUE||3173 BURTON Ave., $675, 3 BR, no section, no pets. 319-231-9925|
4500|1284943|50703|2245 BURTON AVENUE||620 W. 6th, $700. 3 BR, no pets, no section 8. 319-231-9925|
4500|1284944|50703|2245 BURTON AVENUE||1038 MINNESOTA, $695. 3 BR, no pets, no section 8. 319-231-9925|
4500|1284108|50703|4046 LOGAN AVE.||2 BR MOBILE home w/ appls. $350/mo.+dep. No pets. Gaslight Mobile Home Park.  319-269-0128|
4500|1284487|50613|30834 120TH ST.||3 BR, W. of C.F.   No  smoking/pets. No Sec. 8  $625  240-7057|
4720|1283656|50613|1708 GIBSON ST.||Someone to share house. Nice area, close to bike trails. 231-5371|
5715|1284390|50634|516 17TH AVE.||NF 620 JD tractor with #45 loader with trip bucket and 6/8 snow bucket, all in working order.    296-3430  |
5715|1284850|50635|1392 G AVE||'93 Volvo 60 series Detroit, 10 speed, excellent tires and paint, $9,500/best offer. 319-239-8590
|
5715|1284852|50635|1392 G AVE||JD 4520 OS, low hrs, excellent cond., $12,500/offer. 319-239-8590 |
5745|1284262|50651|11508 S. BLACK HAWK-||MFT 6 yr old mare, black and white Tobinao, very pretty, trail broke, $2,500.                         319-474-2467|
6020|1283869|50613|P.O. BOX 188||Used Amana refrig and Whirlpool gas stove, good cond., $250 for set.                                 319-240-8005|
6120|1276811|50706|C07  EOD 6 MOS.||Real Estate / Personal Property
Cornbelt Auction 233-9258|
6220|1285085|50701|3619 PHEASANT LANE||Elliptical from HORIZON
E700 $800 or best offer
319-464-5824|
6230|1284719|50659|1751 NEWELL AVE||USED '06 Starcraft Run About Boat 18', I/O, 4.3, like new.
Call 641-330-3664|
6230|1284780|50677|1509 HORTON RD.||GLASSTRON V-hull I/O, brand new motor, $2,500 or best offer. Call for more info. 319-404-2677|
6230|1283728|50651|216 VALLEY DR.||2004 Tahoe Fish  and Ski
Boat And Trailer 
Merc cruiser V6 IO
bimini, travel, and new
snapdown covers.
Yellow and black. $7,200
319-269-9425|
6230|1284896|50701|857  LYNKAYLEE DR.||1998 MOOMBA boomerang ski boat, 150 hrs., 310 h.p. inboard, 2007 trailer, $12,500.  319-240-4900|
6230|1284936|53821|C09 6 MOS DAILY||OUTBOARDS Mercury, Yamaha All Sizes. Immediate Delivery. Yamaha 25 HP $2,399. FISH BOATS, 200 Must Go. We Trade, Easy Finance. INSTANT CASH. Check Phone 608-326-2478.
Starks Prairie du Chien, WI. 
Open Sunday |
6230|1284045|50702|1658 EASTON||1989 Bayliner Capri Force 50 h.p. motor, depth finder, trolling motor, $2,000/best offer. 234-6599|
6230|1284206|50630|2731 270TH ST.||16 FT. aluminum 30 horse Johnson and extras. Excellent shape. $900. 563-237-6371|
6230|1285044|50619|17021 ROYAL AVE||'93 2250 Bayliner boat, new 5.7 L chev. engine, new int., complete w/ 4 wheel trailer $7,500 319-278-4923|
6360|1252291|52310|C01  EOD YEARLY +HT+INT.||USED Office Furniture, Shelving, Racks, Warehouse Equip. Welter Storage Equip.          319-393-4043|
6360|1284751|50662|416 FIRST AVE NE||Glass 6' counter, $250. Shelving, $20/section. Clothes racks, $20. 319-238-1616|
6450|1284128|50702|4859 TEXAS ST.||2 trailers:  2 Axel steel bed, 13 ft., new paint. 4 tire single Axel, 16 ft., flat bed, $600 ea. firm.
319-296-2106|
6590|1283886|50701|107 HOME PARK BLVD.||Moving Sale: lg. armoire, $250, wing back chair, $35, sm. desk, $5, computer desk and hutch, $25. Sectional couch w/recliners, $150, 2 end tables, $35 ea., glass top coffee table, $25, glass top end table, $20, dinning table w/4 chairs and leaf, $250.                   319-233-0296|
6590|1284728|50613|3013 HILLCREST DR.||COUCH, $75. Recliner, $50. Both  good cond. 319-277-0132 after 4.|
6590|1285108|50630|502 CLOVER CIRCLE||Large lighted cherry hutch and server, $450. 641-226-7790|
6590|1281452|50613|4DVC $4,500||NEW Mattress and Box Sets- from $109. Best prices on new mattress sets. Bring in this ad for an additional discount of $25 on King set, $20 on Queen or Full, $10 on Twin (excludes clearance) 
Factory Direct Mattress Warehouse 702 Ansborough Ave. W'loo  319-504-7676|
6590|1284913|50613|4DVC $4,500||NEW Queen Euro Mattress and box set $199 with ad. Other sizes also avaiable. 
Factory Direct Mattress Warehouse 
702 Ansborough Ave. W'loo  319-504-7676|
6590|1283689|50669|411 BLACKHAWK ST||BEAUTIFUL red sofa, 3 cushioned versatile style- 82"x33"x28" high. Virtually new, non-smokers.
$350                            319-788-2885|

This is a bit of a tricky one but you could do something along these lines:

$somefile = file('blah');

$rows = array();
$row_counter = 0;
foreach($somefile as $line => $content) {
  if(!preg_match('/^[0-9]+?\|/', $content)) {
    $rows[$row_counter] .= $content;
  } else {
    $rows[++$row_counter] = $content;
  }
}

$rows should contain your corrected array

Forgive me if I am reading this code wrong, but won't that just split up the whole string into an array? I think I'm doing that already with explode.

Maybe it will make more sense if I just say I need a way to kill all carriage returns/line breaks in a string because the ways I've done it before are not working with this feed and I don't know why.

What the code does is reads in the file into an array with each index being a line. What the loop does is goes through each line and checks to see if it starts with your first key which is a series of numbers followed by |. So 3335| or 67341|. If it doesn't match this pattern then it is assumed that it is part of the last row so it is appended to the last row and it continues from there.

Well that makes sense. Thanks, I'm trying to incorporate into the rest of my script.

It was a new approach, but I'm getting the same results.

If the string was:
1080|1284898|50629|2778 WHITETAIL AVE.||LOST: Yellow Lab,in rural Fairbank/Oran area. Camouflage collar with tag, name Otis.
319-638-7931 Reward|

It is exporting the .xml like this:

<ad adID="1284898"
renewedID=""
newspaperID="433"
email=""
address=""
city=""
state=""
catID="1080"
loczip="54601"
zip="50629"
startDate="20090607"
stopDate="20090608"
kill=""
doNotUpsell=""
>
<title>
<![CDATA[LOST: Yellow Lab,in rural Fairbank/Oran area. Camo]]>
</title>
<description>
<![CDATA[LOST: Yellow Lab,in rural Fairbank/Oran area. Camouflage collar with tag, name Otis.]]>
</description>
<adImages>

You can see where the ad text is getting cut off (I highlighted it in red). There is still some kind of a break in there that I'm not catching.

There is a lot of it, but I think that this should have everything relevant to what I'm trying to do in it. This is with your code put in there.

if (isset($_POST["submit"])) {

	$linerdate = $_POST["linerdate"];

	$feeddate = date("Y-m-d",$linerdate) . " 19:00:00";
	$startdate = date("Ymd",$linerdate);
	$killdate = date("Ymd",($linerdate + 86400)); //kill the ads after 1 day, or 86400 seconds

	$verifyfeed = "incoming_data/TBT" . date("mdy",$linerdate) . ".txt";
	
	if (!file_exists($verifyfeed)) {
		echo "The file, \"<FONT COLOR=\"#990000\">" . $verifyfeed . "</FONT>,\" does not exist in the folder.<BR>\nPlease export the data file from APT first before running this process.";
		exit;
	} 

echo "<FONT COLOR=\"#000099\">Processing ... </FONT><BR>";

$file_handle = file($verifyfeed, "rb"); 

$adcopy = array();
$row_counter = 0;
foreach($file_handle as $line => $content) {
  if(!preg_match('/^[0-9]+?\|/', $content)) {
    $adcopy[$row_counter] .= $content;
  } else {
    $adcopy[++$row_counter] = $content;
  }
  $row_counter++;
}

$convertedfile = "<" . "?" . "xml version=\"1.0\" encoding=\"IS" . "O-8859-1\"?" . ">\n<export>\n<feedPublishDate>" . $feeddate . "</feedPublishDate>\n";
 
  
$killReturn = array('/\r\n|\n|\r|chr(13)|\t|\0|\x0B/');
$killReturn2 = array("\r\n", "\n", "\r", "chr(13)", "chr(10)", "\t", "\0", "\x0B", "<br />", "\xc2", "\xa0", "\x00..\x1F");

for ($i=0; $i<($row_counter-1); $i++) {

	$tempinfo = explode("|",$adcopy[$i]);
	$catnum = $tempinfo[0];
	$adnum = $tempinfo[1];
	$zipnum = $tempinfo[2];
	$adtext = rtrim($tempinfo[5]);
	
	
	
	if (strlen($adtext) > 0) { // Make sure there's ad copy
		$pos = strpos($adtext,"<IMG SRC"); // Is there an image with the ad?
		if ($pos === false) { // If not, insert the Tribune logo image
			$imgnum = "/art/courierMarketplaceLogo.jpg";
			/*
			$pos = 99999;
			*/
		} else if ($pos !== false) { // Otherwise, pull the image from the ad text and save it as $imgnum
			$posstart = ($pos + 10);
			$pos = strpos($adtext,".jpg\">");
			$posend = ($pos + 3);
			$imgnum = substr($adtext,$posstart,(($posend - $posstart) + 1));
			$replacethis = substr($adtext,$posstart-10,(($posend - $posstart) + 3));
			$withthis = "";
			$adtext = str_replace($replacethis, $withthis, $adtext);
		
			/* if the 9th position of $adtext is >, delete 0 - 9 characters */
			$poscheck = strpos($adtext,">\"<BR>");
			if ($poscheck = 9) {
				$adtext = substr($adtext, 14);
			}
		}
	
	// Delete a number of HTML tags for bold, italics, underline
	
	$adtext = str_replace("<B>","",$adtext);
	$adtext = str_replace("</B>","",$adtext);
	$adtext = str_replace("<I>","",$adtext);
	$adtext = str_replace("</I>","",$adtext);
	$adtext = str_replace("<U>","",$adtext);
	$adtext = str_replace("</U>","",$adtext);
	$adtext = str_replace("\r\n", "",$adtext);
	$adtext = str_replace("\n", "", $adtext);
	$adtext = str_replace(chr(10),"",$adtext);
	$adtext = str_replace(chr(13),"",$adtext);
	$adtext = str_replace('""', "", $adtext);
	$adtext = str_replace($killReturn, " ",$adtext);
	
	/*
	Now build the xml coding using the collected variables, store it in a temp file and then save the file as a text file for uploading
	*/
	
	$convertedfile .= "<ad adID=\"" . $adnum . "\"\n";
	$convertedfile .= "renewedID=\"\"\n";
	$convertedfile .= "paperID=\"403\"\n";
	$convertedfile .= "email=\"\"\n";
	$convertedfile .= "address=\"\"\n";
	$convertedfile .= "city=\"\"\n";
	$convertedfile .= "state=\"\"\n";
	$convertedfile .= "catID=\"" . $catnum . "\"\n";
	$convertedfile .= "loczip=\"54601\"\n";
	$convertedfile .= "zip=\"" . $zipnum . "\"\n";
	$convertedfile .= "startDate=\"" . $startdate . "\"\n";
	$convertedfile .= "stopDate=\"" . $killdate . "\"\n";
	$convertedfile .= "kill=\"\"\n";
	$convertedfile .= "doNotUpsell=\"\"\n";
	$convertedfile .= ">\n";
	$convertedfile .= "<title>\n";
	$convertedfile .= "<![CDATA[" . substr($adtext, 0, 50) . "]]>\n";
	$convertedfile .= "</title>\n";
	$convertedfile .= "<description>\n";
	$convertedfile .= "<![CDATA[";
	$convertedfile .= $adtext . "]]>\n";
	$convertedfile .= "</description>\n";
	$convertedfile .= "<adImages>\n";
	$convertedfile .= "<images>\n";
	$convertedfile .= "<image width=\"\">\n";
	$convertedfile .= "<![CDATA[" . $imgnum . "]]>\n";
	$convertedfile .= "</image>\n";
	$convertedfile .= "<image width=\"\">\n";
	$convertedfile .= "<![CDATA[]]>\n";
	$convertedfile .= "</image>\n";
	$convertedfile .= "</images>\n";
	$convertedfile .= "</adImages>\n";
	$convertedfile .= "</ad>\n";
	
	/*
	echo $i . ": Category: " . $catnum . "<BR>\nAd Number: " . $adnum . "<BR>\nZip: " . $zipnum . "<BR>\nAd Text: " . $adtext . "<BR>\nImage: " . $imgnum . "\n<HR>\n";
	*/

	$imgnum = "";

	/*
	$catnum = "";
	$adnum = "";
	$zipnum = "";
	$adtext = "";
	$imgnum = "";
	$adcopy = "";
	$pos = "";
	$posstart = "";
	$posend = "";

... and then it goes on with some file shuffling from here

So... the content you gave me is obviously nothing like the content you're using because there were no HTML tags in the example file you gave. Don't put a newline after every attribute, its wasted space, only put a newline after the tag if even then. And paste your output.

Oh, that's just for a makeshift title, we display the first 50 characters or whatever. A couple lines down you'll see $adtext and that's where the whole thing gets displayed (or would if I could get it to work properly).

Oh, that's just for a makeshift title, we display the first 50 characters or whatever. A couple lines down you'll see $adtext and that's where the whole thing gets displayed (or would if I could get it to work properly).

yeah, I noticed that 3 seconds after I posted it, hence the edit :)

So... the content you gave me is obviously nothing like the content you're using because there were no HTML tags in the example file you gave. Don't put a newline after every attribute, its wasted space, only put a newline after the tag if even then. And paste your output.

I'm not sure what you mean. The text file I had attached is the file that I'm trying to convert to an xml. What I pasted in here last was the code that I'm running the file through so that I can create that xml. I thought that's what you had asked for.

That's good to know about the newline, but for where it's going, they asked to have it laid out that way.

The code that I posted worked against the file you gave so there's something wonky you're doing with the adtext variable in between there. As a start, get rid of all those damn str_replaces for newlines, you don't need them you're inserting into a CDATA field. I'm not about to decode what the heck those other str_replaces/strpos's are doing, I'll leave that to you

Yeah, sorry about that. I was doing a lot of trial and error (mostly error) and was kind of grasping at air towards the end. I was hoping that one of them would find the breaks and kill them. I'll clean it up.

Hey. I thought I'd post this solution I eventually used so I could get this thread shut down. You're code was still losing me some of the info I needed. Thanks though for your help. I really appreciated it.

$fp = fopen($verifyfeed, "r");
   $killReturn2 = array("\r\n", "\n", "\r", "\n\n", "chr(13)", "chr(10)");
   $numbered = 0;
   $line = '';
   while (!feof($fp)) {
        $line .= fgets($fp);    // read in a line
        if ($line[strlen($line) -2] == '|') { // then we are done with this ad
                // we know we're done with ad text if there is
                // a pipe followed by a return
          $ar = split('\|', $line);  // break it out by the delimiter
        $line = '';     // done with ad - clear it out
        }
		$adcopy[$numbered] = implode('|', str_replace($killReturn2, ' ', $ar));
		$numbered++;
   }
   fclose($fp);
This question has already been answered. Start a new discussion instead.