Hello everyone. I have a string that looks like this

(Mango, fruits, and), (Maize, cereals, and), (Mango juice, beverages, and)

I would like to convert the above string using php to something similar to this:

(Mango[fruits]) AND (Maize[cereals]) AND (Mango juice[beverages])

How can i achieve this in php. I have looked at PHP string replace, substr functions but cannot figure out how to use them to solve my problem. Thanking you in advance

Are you missing a ] in the converted version, or is that deliberate?

It was a typo error not deliberate, I have corrected it

You could use a strategy like
Remove all the blanks. Remove the first and last paren.
Split on ),( to give you each block of 3 words
For each block of 3 words: split on comma
Now you have just the words, grouped in threes, so it’s easy to concatenate them in the right order with the desired brackets etc

Alternatively there’s bound to be a single regex that does it, if you like long incomprehensible undebuggable strings of bizarre character sequences.

This may be difficult to predict. Your 'and' in the original string - could this be anything else e.g. 'or'? Will there always be a space after the comma outside the brackets? If strings always have the exact same pattern, then a simple function or series of simple functions could do it. However if there are variations, then regex (preg functions) will need to be used.

//3dit

On second thoughts, I don't think you nedd regex

@alan.davies Yes. My 'and' could be anything else e.g., 'or'. Yes, there will always be space after the bracket and the strings will always have the same exact pattern.

I have tried to solve the problem as suggested by @JamesCherrill above and here is what i have come with:

$var1 = str_replace(' ', '', $mystring);
$var2 = trim($var1, '()');
$var3 = explode('),(', $var2);

I get stuck here. I cannot figure how to manipulate the data when in groups of 3. If i run

print_r

on

var3

i get an array that looks like so:

Array
(
    [0] => Mango, fruits, and
    [1] => Maize, cereals, and
    [2] => Mango juice, beverages, and
)

How do i loop this array to get this output below.

 (Mango[fruits]) AND (Maize[cereals]) AND (Mango juice[beverages])

Sorry if the solution to this is obvious but somehow i cannot get my head around it

You need to loop through that array processing one line at a time. Each line can be exploded into the 3 words so you can put them all back together in the desired order with the desired punctuation.

I quite enjoy puzzles like this. Of course, I suspect that your problem will get much harder once you start adding stuff like (Banana, fruits, or) to the equation, as then you'll need to worry about brackets and presedence.

However, for this simple version here's how I tackled it in Ruby. You can pretty much translate this to PHP but it won't be as succinct or elegant.

puts "(Mango, fruits, and), (Maize, cereals, and), (Mango juice, beverages, and)"
  .scan(/\((.*?)\)/)                                                    # grab all the text that appears inside brackets
  .flatten                                                              # scan yields an array of arrays, flatten it
  .map{|chunk| chunk.split(",")}                                        # split each trio into a word array
  .map{|fruit, category, operation| "(#{fruit}[#{category.strip}])" }   # build the keywords into strings in the desired format
  .join(" AND ")                                                        # join the built strigns with 'AND'

(Mango[fruits]) AND (Maize[cereals]) AND (Mango juice[beverages])

Thank you all for taking your time to try and solve this problem. Based on suggestions by @JamesCherrill this is how i solved the problem..

   $var1 = str_replace(' ', '', $mystring);
    $var2 = trim($var1, '()');
    $var3 = explode('),(', $var2);
        foreach($var3 as $var4){
            $var5 = explode(',', $var4);
           echo '(' . $var5[0] . '[' . $var5[1] . ']'. ')' ."&nbsp". $var5[2] ."&nbsp";

        }

This returns a string that looks like so:

(Mango[fruits]) AND (Maize[cereals]) AND (Mango juice[beverages])

Well done however it doesn't solve your hangover, that is the additional AND at the end of the string.

the additional AND at the end of the string.

Yes, I was curious about that as well. The input string has n boolean ops but the output string has n-1. How does that work? (I mean "what's the logic?" not "how do you code it?".)

True observation @alan.davies. I welcome any suggestions on how to around getting rid of the last boolean....

Just to be sure... is it the last boolean you want to drop or is it the first boolean?

To drop the last boolean you could use strrpos to find the position of the last space (ie the character before the last boolean) and substr to extract all the string up to that last space.

commented: Just saw this after I posted :) +2

Like I said in my post, this is deceptively easy. We're assuming they're all and, and under that assumption you may as well just join.

Once you start adding or into the query, the whole thing becomes much more difficult.

We're assuming they're all and

Not at all. The algorithm he has used copies whatever is in the boolean position in the input: and, or, xor(?) etc (with maybe a conversion to upper case)

I'd do this:

function convertString( $string )
{
    $dirty = preg_replace_callback('/\(([a-z A-Z0-9]+), (\w+), (\w+)\),*/', function ($m) {
        return $m[1] . '[' . $m[2] . '] ' . strtoupper($m[3]);
    }, $string);
    return substr($dirty, 0, strrpos($dirty, ' '));
}

$str = '(Mango, fruits, and), (Maize, cereals, and), (Mango juice, beverages, and)';
echo convertString($str);

//Mango[fruits] AND Maize[cereals] AND Mango juice[beverages]

After a bit of thinking, I thought it would be easier to go regex (which I hate btw - coz I don't really understand it well enough). The logic for getting rid of the last boolean is simply search for the last space from the end of the string and truncate the whole string to that position. Not very elegant but works.

However - preg_* functions are notoriously slow - so even a one-liner can be slower than 4 or 5 "regular" string functions. Do some tests if you think this may be an issue.

The callback function is prefereable to /e in preg (as /e is dangerous and deprecated). Notice I used an anonymous function in the callback - this is just a preference - you could create a separate function.

BTW - not sure if you notice "maize juice" is 2 words - should this be allowed if you are using using what seems to be an array? Anyhow I allowed for this with the expanded regex of [a-z A-Z0-9]+ instead of just \w+.

//EDIT: Heh heh just read James' post on strrpos! Yep, I agree - I think the most convenient way

Not at all. The algorithm he has used copies whatever is in the boolean position in the input: and, or, xor(?) etc (with maybe a conversion to upper case

Yes I see that but once you start adding or statements to a query you have to take precedence and additional brackets into account. It can get very difficult very fast.

Source, I wrote a query builder a few years ago, thought it'd take a couple of days but ended up being more than a week. Should have just taught the users SQL!

Agree with pty on the complexity. An SQL parser, if that's what it is, can be ridiculously complicated. You only have to look at 'eloquent' packages in laravel. Even that gives up the ghost after a while and says, 'sod it, this is too complicated, just type in your raw sql'. Anyhow difficult to see how this 3 term items work. As mentioned by somebody earlier, shouldn't the first boolean be dropped instead of the last one? Nesting or setting precedence is another rabbit hole.

Aren't we having fun trying to guess what the real scope and spec of the OP's problem really is!
Anyway, my 2p's worth:
I don't worry about OR. I happy to guess that both the input and output formats follow the usual precedence riule (AND higher than OR), so simply copying is OK.
I would be very worried about the possibility of extra bracketing. There isn't any visible or implied in the OPs posts, but if it were possible then the current solution is a non-starter and he will need a proper parser.
I too have no idea why the input format seems to have a redundant boolean operator, and it worries me. Why would someone design a syntax like that? It's one of those loose ends that when pulled can unravel the whole thing.

I guess the fundamental mistake is in the design of the input form. I have tried to get rid of the boolean AND at the end of the form with no success.Just in case you are wondering how the input form looks like here it is..

<select name = "choice2[]" class="btn btn-outline-secondary dropdown-toggle" type="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false" >
                <div class="dropdown-menu">
                    <option value="and" name="and">AND</option>
                    <option value="or" name="or">OR</option>
                    <option value="not" name="not">NOT</option>
                </div>
            </select>
        </div>

        <div class="input-group-prepend">

            <select name="choice1[]" class="btn btn-outline-secondary dropdown-toggle" type="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
                <div class="dropdown-menu">
                    <option value = "fruits" name = "fruits">Fruits</option>
                    <option value = "cereals" name = "cereals">Cereals</option>
                    <option value = "beverages" name = "beverages">Beverages</option>   
                </div>
            </select>
            </div>
    <input type="text" name="item_name[]" class="form-control" aria-label="Text input with dropdown button">

I guess i have to think long and hard on how best to design this form to avoid the problem of the hanging boolean

If you use the regular expression

\((.+?),\s*(.+?),\s*(and|or)\),\s*?\((.+?),\s*(.+?),\s*(and|or)\),\s*?\((.+?),\s*(.+?),.*

and a replacement string of

\($1[$2]\) $3 \($4[$5]\) $6 \($7[$8]\)

then you get what you want except that and & or will be in the original case rather than upper case.

Are you quite sure its
\((.+?),\s*(.+?),\s*(and|or)\),\s*?\((.+?),\s*(.+?),\s*(and|or)\),\s*?\((.+?),\s*(.+?),.*
and not
\((.+?),\s*(.+?),\s*(and|or)\),\s*?\((.+?),\s*(.+?),\s*(and|or\)),\s*?\((.+?),\s*(.+?),.*
?

(just kidding, but it's a great example of why regex syntax is someone's attempt at a joke that backfired with terrible long-term consequences)

commented: Hah! +15
commented: Evil. +1. +15

Actually it can be shortened (yeah, right) to

\((.+?),\s*(.+?),\s*(and|or)\),\s*\((.+?),\s*(.+?),\s*(and|or)\),\s*\((.+?),\s*(.+?),.*

which breaks down to

 \(         opening `(`
 (.+?),     shortest string up to `,` (group $1)
 \s*        0 or more spaces 
 (.+?),     shortest string up to `,` (group $2)
 \s*        0 or more spaces
 (and|or)   logical operator (group $3)
 \)         closing ')'
 ,\s*          `,` followed by 0 or more spaces
 \(         opening `(`
 (.+?),     shortest string up to `,` (group $4)
 \s*        0 or more spaces 
 (.+?),     shortest string up to `,` (group $5)
 \s*        0 or more spaces
 (and|or)   logical operator (group $6)
 \)         closing ')'
 ,\s*       `,` followed by 0 or more spaces
 \(         opening `(`
 (.+?),     shortest string up to `,` (group $7)
 \s*        0 or more spaces 
 (.+?),     shortest string up to `,` (group $8)
 .*         remainder of string

It looks hideous but it's mostly three simple patterns repeated. If you punch it into rexexpr you will see it as a graphic. It's too wide to insert here.

commented: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski +9

Heh heh. Well done RJ. From the form snippet included, it appears that the op expects 3 sets of data. If only two sets are included, is there're a regex for this?that is a variable number of sets of data. That's where I was going with my 'replace each occurrence'. Genuinely interested. Am a complete duffer when it comes to regex.

Am a complete duffer when it comes to regex.

So was I until about a week ago. By coincidence I had just finished working through Beginning Regular Expressions by Andrew Watt and this seemed like a good opportunity to show off before I forget it all ^_^