I have written the code below to highlight words in quite large documents. Problem is:

1. It needs to be case insensitive, ie: "dog" or "Dog" should match "dog"
2. It currently matches words inside other words

Can anyone help?

Many thanks

void highlightTerms(vector<string> finalWords, string *source){
	for (int i = 0; i != finalWords.size(); ++i) {
		string::size_type spos = 0;
		string what = finalWords[i];
		string startTag = "<font color=\"red\">";
		string endTag = "</font>";
		string del = " .?;|!\n\"";/* delimeter points to end the tagging*/
		
		while((spos = source->find(what,spos)) != string::npos){
			bool is_word = true;
			if(is_word){
				source->insert(spos,startTag);
				spos += startTag.length();
				spos = source->find_first_of(del,spos);
				if(spos != string::npos){source->insert(spos,endTag);}
				if((spos += endTag.length()) >= source->length()) break;
			}
		}
		
	}
}

1) You have several options. Here are a few:

  • Create your own case insensitive traits class and apply that to your strings.
  • Convert all of your strings to one case before doing comparisons.
  • Forget about the string specific search member functions and use STL-based searching where you can provide a predicate.

2) You need to check the boundaries of the match and make sure it's a character that you would classify as a word boundary (space, punctuation, etc...). But don't forget about hyphenated words.

Comments
Really helpful - thank you

Thanks for that

I have seen traits and looked at this code

class ichar_traits : public std::char_traits<char> {
public:
static bool eq(char, char);
static bool lt(char, char);
static int compare(const char *, const char *, size_t n);
static const char * find(const char *, size_t n, char);
};

Is there any way to have it only apply to my one function. Reason is that it is part of a large code base that uses case-sensitive search in other functions.

what is "STL-based searching where you can provide a predicate"

PS: I'm a C++ noob so sorry about that.

Hi again

I now know what you meant by predicate and have found some code. This looks like the answer for me. Trouble is it requires a const string and I need to use a string pointer (string *source)

Any ideas?

bool ci_equal(char ch1, char ch2){
    return toupper((unsigned char)ch1) == toupper((unsigned char)ch2);
}

size_t ci_find(const string& str1, const string& str2){
     string::iterator pos = search(str1. begin ( ), str1. end ( ), str2.
     begin ( ), str2. end ( ), ci_equal);
     if (pos == str1. end ( ))
     return string::npos;
     else
     return pos - str1. begin ( );
}

>Is there any way to have it only apply to my one function.
No, anywhere you use an object that uses the insensitive traits, you're stuck with it throughout the life of the object.

>what is "STL-based searching where you can provide a predicate"
In the <algorithm> header, look for std::search. That's probably the one you'll want.

>Trouble is it requires a const string and I need to use a string pointer (string *source)
Dereference the pointer when you pass it...perhaps?

#include <string>
#include <iostream>
using namespace std;

bool lol(string& x, string& y) {
	if (x.size() != y.size())
		return false;
	for (size_t i = 0; i < x.size(); ++i) {
		if (x[i] == y[i] || x[i]+32 == y[i] || x[i] == y[i]+32)
			continue;
		else
			return false;
	}
	return true;
}

int main(int argc, char **argv) {
	string loli = "dsadasdas";
	string loli1 = "xxl";
	string loli2 = "xxl";
	cout << lol(loli, loli1) << endl;
	cout << lol(loli1, loli2);
	return 0;
}
This article has been dead for over six months. Start a new discussion instead.