Hi,

My objective is to replace all pattern words that match a particular regex with that word surrounded in some html tags like so.

Example html:
<p>This is some text where the word text will get surrounded by something else.</p>

After regex:
<p>This is some <span class='mydiv'>text</span> where the word <span class='mydiv'> will get surrounded by something else.</p>

Note that as this is a simple example used to illustrate my problem, I cannot simply do this...

$(document).html($(document).html().replace(/text/g,"<span class='mydiv'>text</span>")); 

The original problem comprises a large regular expression that will accept a large range of inputs and the text is allowed to contain spaces. So text, t E x T and other derivatives will be allowed and I wouldn't want to change the original styling/format to a static replacement word "text" I'd like the original wording/casing etc. there within the replacement.

Anyone know how to do this?

This may work for you. This will accept a text file (which I called keywords.txt) and throw it into a string to match against. The words/phrases are delimited by "|" because I wanted to be able to match on commas and figured that was sort of an obscure character people wouldn't use too much.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta http-equiv=X-UA-Compatible content="IE=Edge;chrome=1">

    <title>highlight words</title>

    <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.5.2/jquery.min.js"></script>
    <script type="text/javascript">

    $(document).ready(function() {

        function highlight_words(keywords, element) {
            if(keywords) {
                var textNodes;
                var str = keywords.split("|");
                $(str).each(function() {
                    var term = this;
                    var textNodes = $(element).contents().filter(function() { return this.nodeType === 3 });
                    textNodes.each(function() {
                      var content = $(this).text();
                      var regex = new RegExp(term, "gi");
                      var theword = content.match(regex, content);
                      content = content.replace(regex, '<span class="highlight">' + theword + '</span>');
                      $(this).replaceWith(content);
                    });
                });
            }
        }

        jQuery.get('keywords.txt', function(data) {
                var element = ".myText";
                var keywords = data;
                highlight_words(keywords, element);
        });

    });

    </script>

    <style>
    .highlight {
        font-weight: bold;
        background:red;
    }
    </style>

</head>
<body>

<p class="myText">I like apples, oranges and bananas!</p>

</body>
</html>

Then you would just format your keywords.txt like

apples, oranges|bananas

note: this won't match "apples" if you just have "apple" as a keyword but it will still match "apple" and leave the "s" untouched. So, some more improvements could be made on it possibly. You can also change the delimiter in the function if you want. Or if you figure out a way to have it delimit on commas but still be able to match commas, you could do that. It was not support important to me so I just never took it that far.

Something just crossed my mind, you could always just make it a list in the text file with returns and then split off of a return like "\n"

var str = keywords.split("\n");

I have to take off and can't test it right now but that may be more of an ideal split for you. I am not sure though if that would decrease the performance since it would have itterate down each line rather than a straight string.

That sounds like the right thing, or can also use the .matches() function to get an array of all matches/

Well yeah, I guess it could be modified to work off of an array. I was thinking more along the lines of maintaining a list of common words and storing them. If you had another source you wanted to get the strings from and inject them from an array that could work. And just an FYI, I did test splitting off of "\n" for the return in a text file and it worked great.

var string = '<p>This is some text where the word T eXt will get surrounded by something else.</p>';
string.replace(/(t(\ ?)e(\ ?)x(\ ?)t)/ig, '<span class="mydiv">$1</span>');

So for your example:

var html = $(document).html(),
    newHtml = html.replace(/(t(\ ?)e(\ ?)x(\ ?)t)/ig, '<span class="mydiv">$1</span>');

$(document).html(newHtml);

Or a re-usable method:

String.prototype.replaceWordVariations = function(word, replacement) {
  var getRegex = function(match) {
    var r = '',
        m = match.split(''),
        len = m.length,
        i;

    for(i=0; i < len; i++) {
      r += m[i];
      if (i < (len-1)) {
        r +=  '(\\ ?)';
      }
    }
    return '(' + r + ')';
  }, 
  regex = new RegExp(getRegex(word), 'ig');

  return this.replace(regex, replacement);
};

var html = $(document).html(),
    newHtml = html.replaceWordVariations('text', '<span class="mydiv">$1</span>');

$(document).html(newHtml);
commented: strong JS kungfu +5

In fact, I suddenly just thought... The following would actually be more efficient so that the getRegex() method isn't created every time you call .replaceWordVariations():

String.prototype.replaceWordVariations = (function() {
  var getRegex = function(match) {
    var r = '',
        m = match.split(''),
        len = m.length,
        i;

    for(i=0; i < len; i++) {
      r += m[i];
      if (i < (len-1)) {
        r +=  '(\\ ?)';
      }
    }
    return '(' + r + ')';
  };

  return function(word, replacement) {
    var regex = new RegExp(getRegex(word), 'ig');
    return this.replace(regex, replacement);
  };
}());

Now you're just showing off... ;-)

Haha! That was fun ;)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.