Anyone have an idea what the regex would be for preg_split to split a string at a semicolon ( ; ), but ignore any quoted (single or double) parts as well as ignore escaped ( \; ) semicolons?

I have tried to decipher this one and could not (regex not my strongpoint - YET)

Herewith is a function I wrote that does this what I want.

//split string into characters and process...
function split_at($input, $splitAt=";"){
	$on=1;
	$j=0;
	$output[0]="";
	for($i=0; $i<strlen($input); $i++) {
                //when a quote is reached then set $on switch off to ignore quoted part
                //untill another quote is reached then it switches $on on again
		if($input[$i]=='"' or $input[$i]=="'"){
			if ($on) {$on=0;} else {$on=1;}
		}
                //create new array for new part when a semicolon is reached
                //or ignore if in quoted part or if escaped
		if($input[$i]==";" && $on==1){
		  if(isset($input[$i-1]) && $input[$i-1]!="\\"){
			$output[++$j]="";
			continue;
		  }
		} 
		$output[$j].=$input[$i];
	}
        // array of split parts of input
	return $output;
}

Any help would be appreciated!

Recommended Answers

All 4 Replies

you may want to use preg_replace_callback() for this. For the sake of clarity, let's say you have the following TWO (separate/independent) input strings

a;b\";\"c;d';'ef\;g

a;b\";'\"c;d';'ef\;g

what results do you expect? Do you have a sample of an actual/realistic input string?

This would be to parse email headers (nearly impossible task ;-))
from raw emial data.

This preg should use be able to split the following strings:
multipart/alternative; boundary="001636284f500b21f90494114b4d"
multipart/alternative; boundary='001636284f500b21f90494114b4d'
multipart/alternative; boundary="001636284f;500b21f90494114b4d"
foo; fa="001636284f5"\; fy="00b21f90494114b4d"

It should only split on the one (or multiple) ';' character except where it is in quotes or has been escaped.

Thanks for the feedback.

try:

$str=<<<STR
multipart/alternative; boundary="001636284f500b21f90494114b4d"
multipart/alternative; boundary='f01636284f500b;21f90494114b4f'
multipart/alternative; boundary="001636284f;500b21f90494114b4d"
foo; fa="001636284f5"\; fy="00b21f90494114b4d"
STR;
$str=preg_replace('#(\x22|\x27)([^;]*)(?<!\x5C)(;)(.*?)\1#','$1$2'.chr(7).chr(92).';$4$1',$str);
$m=preg_split('#(?<![\x5C]);#',$str);
foreach($m as $i=>$v)
{
	$m[$i]=str_replace(chr(7).chr(92).';',';',$v);
}
//this shows the result
print_r($m);

Thanks hielo,
Works 100% for what I need!
This works even on nested quotes, NICE!

Just a note for future reference, the code should be used on one line at a time - Like:

$str=<<<STR
multipart/alternative; boundary="001636284f500b21f90494114b4d"
STR;
$str=preg_replace('#(\x22|\x27)([^;]*)(?<!\x5C)(;)(.*?)\1#','$1$2'.chr(7).chr(92).';$4$1',$str);
$m=preg_split('#(?<![\x5C]);#',$str);
foreach($m as $i=>$v)
{
	$m[$i]=str_replace(chr(7).chr(92).';',';',$v);
}
//this shows the result
print_r($m);

Now I have to figure out how it works (only way to learn!)

Thanks again!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.