Soooo ... quick question :)

I need to do some HTML parsing with regex :)

I currently use $output = preg_replace('/>\s+</', "> <", $output) to strip whitespace between any two HTML tags. What can I do to strip whitespace only between paragraph tags. For example, only between </p> and <p> tags?


OK ... Strike all that. I'm going to use PHP's DOMDocument class instead to manipulate the HTML. Soooo here's what I need ... I need to loop through all HTML, and run nl2br() on everything within a <p> tag. Any takers?

Member Avatar

Sorry Dani, I'm a bit of a noob at regex...
But my DomDOc is even worse...

$content = "<p>this is your html

with a ]
few things

function breaker($m){
    return nl2br($m[1]);    
echo preg_replace_callback('/(<p>[^<]+<\/p>)/s','breaker',$content);

Change the regex to include paragraph attributes if they are used.

Haha, no worries. You made up for it by figuring out the culprit with the %02 bug! :)

In any case, I just got it. Here's my code:

$output = '';

$dom = new DOMDocument();
$tags = $dom->getElementsByTagName('body')->item(0);
foreach ($tags->childNodes as $tag)
    if ($tag->localName == 'p')
        $output .= nl2br($dom->saveHTML($tag));
        $output .= $dom->saveHTML($tag);

echo $output;

I did the whole getElementsByTagName('body') thing because it kept turning my little HTML snippets (for individual forum posts) into XHTML-compliant HTML documents complete with doctypes, etc.

Now, I am pleased to say, that the parsing bug for converting BBCode to Markdown is finally fixed! :)

commented: nice +14