Is there a way to handle standard XML entities with the event-driven XML parser in PHP? I'm currently using an XML file to build a collection of objects and display a page based on those objects. I'd like my CData handler to handle text that includes standard entities, but as it stands, it handles the text on either side of an entity separately. I could compensate for this if I could define an event handler for when a standard entity is encountered, but so far, my research hasn't turned up a way. I've checked the PHP manual, the book Programming PHP (second edition), and various online tutorials with no luck.

Recommended Answers

All 2 Replies

Can you post an example of the XML and also what you're currently trying to do with the xml? I have done extensive work with XML perhaps I can answer your question.

<?xml version="1.0" encoding="UTF-8" ?>
<press_releases>
	<press_release>
<headline>This Is a Headline</headline>
		<date>7/4/76</date>
<image src="image.png" align="right" />
		<body>
			<paragraph>
				Lorem ipsum &quot; Latin Latin Latin &quot;
			</paragraph>
			<paragraph>
				pi sigma epsilon omega Greek Math Physics
			</paragraph>
		</body>
	</press_release>
</press_releases>

I'm using PHP's event-driven parser to build a PressRelease object from the XML. It has fields for a headline, date, image source, image alignment, and body text. The body is actually an array of Strings, each representing a paragraph in the xml.

The problem is that the cdata handler fires for text up to an entity, it fires for the entity, and it fires again for text after the entity. I would rather it fire only once for such a sequence, but I would be happy if I could handle the entity separately from the cdata.

My parser worked by keeping track of the current element and performing differently when text is found. For instance, if headline is the current element, it would set $currPressRelease->headline to $text when the cdata handler is called. If there's an entity in the text, a problem arises: take the string "Lorem Ipsum &quot; Latin Ipsum". First, headline is set to "Lorem Ipsum"; then it's set to "\""; then it's set to "Latin Ipsum," and the end result is "Latin Ipsum" when it should be the entire string.

I worked around this by simply appending $text to $headline (or whichever field is encountered) when cdata is fired, but it gets messier for more complex fields like $body. It would be much simpler and easier if I could handle the string including the entity as one block of text, or even if I could handle the entity separately from the cdata. Is this possible with PHP's event-driven parser?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.