Hi,
how can I use regex or string replace to add missing "p" tags to sentences without tags.
I tried matching* and splitting first the whole string matching "h" and "pre" tags but dont know how to merge it.

*let regexRule = /<pre>(.|\n|\r\n)[\s\S]*?<\/pre>/g;

Example - input

let someVariable = "Basket
<h1>Fruits</h1>
<pre>Apple
Juice</pre>
<pre>Kiwi</pre>
PVC thing
Trash
<h1>...</h1>";

How can I add "p" tag to Basket, PVC thing and Trash so output would be:

someVariable = "<p>Basket</p>
<h1>Fruits</h1>
<pre>Apple
Juice</pre>
<pre>Kiwi</pre>
<p>PVC thing</p>
<p>Trash</p>
<h1>...</h1>";

Thank you

Recommended Answers

All 5 Replies

I'm not too good with regex, but if you're using jQuery, there's a built-in utility where you pass it an HTML string, and it converts that string to valid dom nodes for you to do what you want with. Hope maybe this will help with whatever it is you're trying to do.

commented: Thanks for reply. Instead of regex, how can I achieve my goal using cheerio or node-html-parser or with jquery? +2

I used cheerio in node.js

let someVariable = "Basket<h1>Fruits</h1><pre>AppleJuice</pre><pre>Kiwi</pre>PVC thing Trash<h1>...</h1>";
let addBodyTag = cheerio.load(someVariable).html(); // add html/body tag
let $ = cheerio.load(addBodyTag, {
    normalizeWhitespace: true,
    xmlMode: true
}); // load one more time without html/body tag

$('body').contents().each(function(i, el) {
    if (el.type == 'text') { // in jQuery 'text' condition is nodeType == 3
    // if there is untagged elements, then add <p> tag
        $(this).wrap("<p>");
    }
})
let returnResult = $.html();
// could not put in one line, this doesnt work:
// returnResult = returnResult.substr(returnResult.indexOf("<body>") + 6, returnResult.length - 14);

returnResult = returnResult.substr(returnResult.indexOf("<body>") + 6, returnResult.length);
returnResult = returnResult.substr(0, returnResult.length - 14);
console.log(returnResult); // got: <p>Basket</p><h1>Fruits</h1><pre>AppleJuice</pre><pre>Kiwi</pre><p>PVC thing Trash</p><h1>...</h1>

So what I was suggesting was something like this:

// Random valid or invalid HTML string (in this case, missing a closing </p>
var html = '<strong><p>Foo</strong>';

// Create a jQuery DOM element whose contents are the HTML string
// The DOM element will manipulate the invalid HTML and make it valid
var element = $(html);

// Inject the contents of the DOM element into a jQuery selector
// The element with ID 'selector' will now contain Foo in a bold paragraph with valid HTML
$('#selector').html(element);

I don't think a regex will do what you want. I don't see how you can define a pattern, based on your example, that could distinguish things you want to <p></p>ify and things you do not. How would you determine that you want to modify basket, PVC thing, and Trash, but not let or someVariable. Also, removing the word let is something else entirely.

commented: I had a problem I wanted to solve with a regex... now I have two problems. +16

Is the HTML in a string or part of the DOM? If a string, I think the best method would be to change it to nodes via .innerHTML on a temporary parent tag (like a div). Then you extract the Nodes through .firstChild.

var tmpDiv = document.createElement('div');
tmpDiv.innerHTML= '<div><a href="#"></a><span></span></div>';
var HTML = tmpDiv.firstChild;

nodeType = 3 identifies Text: Reference (Mozilla)

Sorry, just remembered about jQuery .contents() - I really think this is all you need.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.