I am not a computer professional, only like to develop my own toys.

I am using Globi Flow to get the info I need from the XML file. All is done, except for one instance where I do not have the closing tag

Regular node with closing tag:

<proper price>$4</price>

Below is the code used for all data extraction.

preg_match_all_gf("/<city>(.*?)<\/city>/ism", [(variable)token],4)

This would get the 4th occurrence.

And this is what I currently need and it does not have the closing tag:

<comp score:"9.0">

I managed to get the first digit (in the example above, 9) using Regex, as it can be seen here.
The code used is:

  preg_match_all_gf('/(?<=score=")[^"]*(?=.0")', [(Variable) xml],2)

If you remove "*" it gets only the first occurence:

preg_match_all_gf('/(?<=score=")[^"](?=.0")', [(Variable) xml],2)

So now the problem: the iteration can go up to 25.

It takes ALL the correct numbers (score) I need. Either altogether, or just the first one.

Final goal:

city1, address1, state1, price1, etc.1, score1 (current problem).

city2, address2, state2, price2, etc.1, score2
And so forth.

I am not able to pull one score at a time for the right order, because it seems not to accept the the pre-match syntax - more specifically, the offset.

Any idea? Thank you so much for any help!

Re: Parsing XML with Regex and preg_match_all 80 80

Why don't you use theloadXML function and then handle it through the DOM object? http://php.net/manual/en/domdocument.loadxml.php

Re: Parsing XML with Regex and preg_match_all 80 80

Globiflow has limited PHP function. Here is the list.
I believe it has to be with preg_match, preg_match_all.
I have heard that this is not the best way to parse a file, but that is what we have today.

Re: Parsing XML with Regex and preg_match_all 80 80

Are you sure about <proper price>$4</price>? To me it seems to violate the basic XML rules .... A 'decent XML' tag could be <proper_price>$4</proper_price>, or <price type="proper">$4</price>. Also when a closing tag is missing it should be made 'known' by using <comp score="9.0"/> (known as "implicit closing tag") instead of <comp score:"9.0">. Also note I use = instead of :. (Perhaps I have missed that in a new version the XML rules have been extended and tags can now be assigned using either = or :?)
How on earth could a parser know that the next tag isn't embedded without closing the tag? Unless it would have (to keep) a whole list of tags which are violating XML rules by missing a closing tag.... (In other words if you want to continue to use this 'incorrect XML', then you have to 'catch' all those violations. If you decide to switch to correct XML then likely any xml-parser could be used ... )

Re: Parsing XML with Regex and preg_match_all 80 80

I am so sorry, the full code for the nodes with closing tag is:

preg_match_all_gf("/<amount currency=\"USD\">(.*?)<\/amount>/ism",[(Variable) token], 4)

This would get the 4th occurrence.

Here is a partial copy of the xml file.

Re: Parsing XML with Regex and preg_match_all 80 80

Here's a comprehensive answer why parsing XML with regular expressions is a bad idea. @AndrisP put you on the right track.

Re: Parsing XML with Regex and preg_match_all 80 80

Thank you, @pty, I had read that. Globiflow is what we have today. This is to push date onto Podio, using Globiflow, which has its own set of PHP function.
I have no idea how to do this using regular code (not the code per se, but how to operate Podio CRM using regular PHP).

Re: Parsing XML with Regex and preg_match_all 80 80

Here is the solution that worked:

preg_match_all ('/score="([0-9]{1,2})./ism', [(Variable) token],5)

Thank you all.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of 1.19 million developers, IT pros, digital marketers, and technology enthusiasts learning and sharing knowledge.