Parsing XML with Regex and preg_match_all

Question

Gloak 0 Light Poster

7 Years Ago

I am not a computer professional, only like to develop my own toys.

I am using Globi Flow to get the info I need from the XML file. All is done, except for one instance where I do not have the closing tag

Regular node with closing tag:

<proper price>$4</price>

Below is the code used for all data extraction.

preg_match_all_gf("/<city>(.*?)<\/city>/ism", [(variable)token],4)

This would get the 4th occurrence.

And this is what I currently need and it does not have the closing tag:

<comp score:"9.0">

I managed to get the first digit (in the example above, 9) using Regex, as it can be seen here.
The code used is:

  preg_match_all_gf('/(?<=score=")[^"]*(?=.0")', [(Variable) xml],2)

If you remove "*" it gets only the first occurence:

preg_match_all_gf('/(?<=score=")[^"](?=.0")', [(Variable) xml],2)

So now the problem: the iteration can go up to 25.

It takes ALL the correct numbers (score) I need. Either altogether, or just the first one.

Final goal:

city1, address1, state1, price1, etc.1, score1 (current problem).

city2, address2, state2, price2, etc.1, score2
And so forth.

I am not able to pull one score at a time for the right order, because it seems not to accept the the pre-match syntax - more specifically, the offset.

Any idea? Thank you so much for any help!

php regex xml

Edited 7 Years Ago by Gloak because: cleaning text

4 Contributors
7 Replies
3K Views
2 Days Discussion Span
Latest Post 7 Years Ago Latest Post by Gloak

All 7 Replies

AndrisP 193 Posting Pro in Training

7 Years Ago

Why don't you use theloadXML function and then handle it through the DOM object? http://php.net/manual/en/domdocument.loadxml.php

Edited 7 Years Ago by Reverend Jim because: OK

Joris Claassen 0 Newbie Poster

7 Years Ago

Are you sure about <proper price>$4</price>? To me it seems to violate the basic XML rules .... A 'decent XML' tag could be <proper_price>$4</proper_price>, or <price type="proper">$4</price>. Also when a closing tag is missing it should be made 'known' by using <comp score="9.0"/> (known as "implicit closing tag") instead of <comp score:"9.0">. Also note I use = instead of :. (Perhaps I have missed that in a new version the XML rules have been extended and tags can now be assigned using either = or :?)
How on earth could a parser know that the next tag isn't embedded without closing the tag? Unless it would have (to keep) a whole list of tags which are violating XML rules by missing a closing tag.... (In other words if you want to continue to use this 'incorrect XML', then you have to 'catch' all those violations. If you decide to switch to correct XML then likely any xml-parser could be used ... )

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Gloak 0 Light Poster · Answer 1 · 2017-09-04T13:45:48+00:00

Globiflow has limited PHP function. Here is the list.
I believe it has to be with preg_match, preg_match_all.
I have heard that this is not the best way to parse a file, but that is what we have today.

Gloak 0 Light Poster · Answer 2 · 2017-09-05T15:52:55+00:00

I am so sorry, the full code for the nodes with closing tag is:

preg_match_all_gf("/<amount currency=\"USD\">(.*?)<\/amount>/ism",[(Variable) token], 4)

This would get the 4th occurrence.

Here is a partial copy of the xml file.

pty 882 Posting Pro · Answer 3 · 2017-09-05T16:46:01+00:00

Here's a comprehensive answer why parsing XML with regular expressions is a bad idea. @AndrisP put you on the right track.

Gloak 0 Light Poster · Answer 4 · 2017-09-05T17:14:10+00:00

Thank you, @pty, I had read that. Globiflow is what we have today. This is to push date onto Podio, using Globiflow, which has its own set of PHP function.
I have no idea how to do this using regular code (not the code per se, but how to operate Podio CRM using regular PHP).

Gloak 0 Light Poster · Answer 5 · 2017-09-06T16:08:40+00:00

Here is the solution that worked:

preg_match_all ('/score="([0-9]{1,2})./ism', [(Variable) token],5)

Thank you all.

Parsing XML with Regex and preg_match_all

Recommended Answers Collapse Answers

All 7 Replies

Recommended Answers