As mentioned is this thread I intend to create a collection of functions that use only core functions (as the String functions) to maximal compatibility. (See the website - anyone is invited to participate.)

The first task will be to parse a string (containing the XML data), break it down and store it in an array. (Why an array?)

My first idea is to use nested strtok (or maybe explode) functions to break down the XML data into the entities.

Anyone hear with a better idea?

Recommended Answers

All 12 Replies

As mentioned is this thread I intend to create a collection of functions that use only core functions (as the String functions) to maximal compatibility. (See the website - anyone is invited to participate.)

The first task will be to parse a string (containing the XML data), break it down and store it in an array. (Why an array?)

My first idea is to use nested strtok (or maybe explode) functions to break down the XML data into the entities.

Anyone hear with a better idea?

I think you should research about parser that can recoded the code.

That's what I'm trying to do - and this includes this forum.

Most parser intended for use of XML data rely on special modules or functions. PHP4 had a XSLT set of functions that relied on Sablotron that required the Expat XML Parser and other applications. PHP5 used SimpleXML and the XSL set of functions that require libxslt.

Any of the modules may be omitted when installing PHP. In my case, on a project I'm working on, the webhoster has PHP4 with XSLT but I can't install it on my test server or PHP5 with SimpleXML but without libxslt (which I have on my test server). I intend my work I'm trying here for maximal compatibility - with only core features involved.

If you know of a parser that does not rely on PHP version-specific modules or external applications let me know!

My first idea is to use nested strtok (or maybe explode) functions to break down the XML data into the entities.

I don't think explode will be a valid option, because it removes the token you are exploding on. When parsing your xml tags I think they need to be valid. I think strtok poses the same problem.

Maybe one of the regex functions may be more suitable. It may be more difficult to create a correct pattern, but the result can contain the full valid tags.

If you're thinking of implementing your own XML parser in PHP you might want to forget that idea. PHP already has a built-in XML parser in SimpleXML and the XMLParser library which are language functions written in C. Writing you own parser is possible but will be orders of magnitude slower that the built-in libraries.

My dear Shawn,

I am aware that both PHP4 and PHP5 (and likely the others) have XML functions but as I have explained already they are always dependent on your install thus neither portable nor controllable. My solution will be much slower but as it will consist merely of some PHP files they will work everywhere.

As for regex vs. strtok/explode

For now I'm not including a validating functionality. Only some well-formedness constraints will be checked (where needed for the function). For the time being I expect anyone to check the XML data for well-formedness and validity (if needed by the user) before feeding it to the function.

As for strtok I was thinking to use < and > as delimiters to break it down. But regular expression might be useful. I'm still working on the best structure for standard XML array.

My dear Shawn,

I am aware that both PHP4 and PHP5 (and likely the others) have XML functions but as I have explained already they are always dependent on your install thus neither portable nor controllable. My solution will be much slower but as it will consist merely of some PHP files they will work everywhere.

As for regex vs. strtok/explode

For now I'm not including a validating functionality. Only some well-formedness constraints will be checked (where needed for the function). For the time being I expect anyone to check the XML data for well-formedness and validity (if needed by the user) before feeding it to the function.

As for strtok I was thinking to use < and > as delimiters to break it down. But regular expression might be useful. I'm still working on the best structure for standard XML array.

No they aren't dependent on your install "These functions are enabled by default, using the bundled expat library." I'm just trying to save you time from reinventing the wheel when it doesn't need to be.

If you're determined to write your own then you have a few ways to go about it.
* Write a lexer to parse the XML then an FSM to iterate over the tokens to determine well-formedness
* Use regular expressions (NO)
* Use strtok (slightly less enthusiastic NO)

I am sorry but they may be bundled by default but not every company providing server space includes them (as described above). What I can do at home won't work on the web server of my client.

As for your suggestion, well, I look into your first option. I looked at a Python implementation of an XML lexer but it uses a class construct which was added to PHP only as recently as PHP5. So it is out of the question.

But thanks for the idea of a lexical analyzer.

Really? You're going to support PHP4? Please stop, it's a decade old, full of security holes and damn-near useless. As for not knowing what version of PHP will be on the client's server it's your responsibility as a developer to make sure that they have an environment that fits a stable, secure environment which almost always means the most recent version of PHP. Not a single reputable shared host ONLY uses PHP4. And the way you should be releasing to clients, dedicated hosting, is customized to your needs.

It's an association that uses a free provider to host their website. They won't control the server and I can't spend my time trying to change their mind on hosting. (They're happy with what they have.) If I would exclude PHP4 I'd still be in trouble. The provider supports PHP5 but still won't have all XML-related functions activated (see note* below). And it appears that this is a recurring problem that most professional developpers might change (as they work for companies that can spend the money) but not any user of PHP can do.

The idea of excluding PHP4 does not meet the backward compatibility I'm aiming for. It will not be a mass application but for those cases where the developper (professional or not) can not control what's on the server installed.

I assume that your reaction indicates that you believe developping a XML lexer is impossible?

---

*They have PHP5 to be used with the extension .php5 but for XSL transformation they have activated only PHP4's XSLT module.

As for that analogy I commend and understand you. I would have preferred that I could work on PHP5 (or PHP7 for that matter) but as long as PHP4 is out there I have to consider it. (It's the same way for web developpement using HTML, JavaScript and Flash, or even AJAX - one has to consider that some won't have Flash installed, JS activated or use some ancient browser, say IE 8 or Netscape 4. If you're not sure try to find a way to make it work for them.)

By the way, do you mind that I use your code as a template?
---
Edit: I see now that you use a class. That won't work for PHP4 - I think. But may it not work without using a class? I'll have to check on that!

A) PHP7 doesn't exist, PHP6 doesn't exist yet for that matter. B) IE8 is the _newest_ IE browser, and even Google is halting support for browsers older than IE7 in March. C) Feel free but you should know that the particular code I linked to won't work in PHP4 even remotely

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.