I'm writing a jquery script that is to be run in a large number of websites all written in different ways.

My intention is to place certain html tags around bits of text within the code - sounds simple? But it isn't.

One approach which I've tried is to use is this:

$('body').ready(function() {
var bodyHTML = $('body').html();
var matches = bodyHTML.match(/stringtomatch/g); // array of all matches (note actual regex is much more complicated and matches a large variety of strings not just this)
var match;
for (var i = 0; i<matches.length; i++) {
match = matches.pop();
bodyHTML = bodyHTML.replace(match, '<span id="blah">'+match+'</span>'); // replace each in body
}

$('body').html(bodyHTML); // set new body back to DOM
});

this replaces the entire content of the body which in itself is inefficient because there's a lot of unreplaced stuff having to be added back to the DOM, but also it fails flat for a large amount of sites and for some reason when the html is set back on there is a lot less html than there is supposed to be. I'm not sure as to why.

It would be great if there was some way I could just find the text within the entire DOM and add tags wherever the string matching my regular expression is found but after several hours of research I haven't found anything.

Any ideas?

I have created a plugin a long time ago to do just that. Perhaps it's a good starting point. A demo is here, the plugin itself (hp-highlight) is also on GitHub. Feel free to use/change it at will. If you have interesting changes please let me know, or do a push request.

I was looking for some code perhaps accompanied with explanation - I'm not quite sure how jquery plugins are developed or how objects work in jquery so that code does not make much sense to me.

Basically you just need the highligh part of the plugin:

var options = {
    highlightWords: ['stringtomatch', 'anotherstring'],
    wholeWordsOnly: true,
    cssClass: "highlighted"
};

var target = "<span class=" + options["cssClass"] + ">$&<\/span>";
var boundary_left = "";
var boundary_right = "";
if (options["wholeWordsOnly"]) {
    boundary_left = "\\b(";
    boundary_right = ")\\b";
}
var source = new RegExp("(?!<[^>]*?)" + boundary_left + options["highlightWords"].join("|") + boundary_right + "(?!([^<]*?>))", "ig");

$('body').each(function() {
    var obj = $(this);
    obj.html(obj.html().replace(source, target));
});

That is the solution I tried with my own code yet it doesn't work on certain websites thought i don't know why - perhaps something to do with the fact they're ajax heavy... I really need a solution to this though, anything else?

Anything else?