I have created a profanity list and am riding text on a site that contains it. There is also a preview portion (which is below), that tells the user there is a word that is not accepted, and gives them the option to change the word or to continue with the submission of the post with the word blotted out (****).

What I was wondering, the list is huge... and I mean probably well over 225 words in this array. I was wondering if there would be a better way of doing this, maybe introducing a stringbuilder instead of concatenating the string continuously?

Public Function CheckCensoredList(ByVal txt As String) As String
	If Len(txt) > 0 Then
		Dim regx As Regex
		Dim counts As Integer = 0
		Dim arr() As String = {........}

		For Each word In arr
			regx = New Regex(word)
			counts += regx.Matches(txt).Count
			txt = txt.Replace(word, "<span style=""color:red;""><em>" & word & "</em></span>")
		Next

		Return (counts & ";" & txt)
	End If
	Return (txt)
End Function

Essentially, the txt variable is recreated each time the regex and replace is performed. I don't know if stringbuilder would be a better option. I know it is faster, especially at large texts. KEEP IN MIND that this "txt" can be up to 7,000 characters long (but in most cases, 500 or less).

I guess I can do a search for an indexOfAny and replace it that way. The problem with this is that it will continuously find this word over and over again, and not move on.

I can use the indexOf with an incrementing "i" until the ubound of the array, however now you are talking about the same thing, however more code, more memory usage, more processing, etc.

Would there be a better way? Comments please.

Recommended Answers

All 9 Replies

You know, I was thinking.. Since my array is probably almost 250 elements long, it would be very time consuming, and hard for the server, to loop through every single element, then looping through up to 7,000 characters for each element. This could be a very hard process.

Would it work to use the IndexOfAny to see if there are any elements within the string submitted? Of course, but now how do I achieve the knowledge of WHICH element triggered this? I can go the long way around by grabbing the first character it saw that existed within the array. This would give me the indexOf whatever element triggered it. Then I can find the next space within the string, and then use the substring. After that, I can then loop through the elements to see which one matches that word. Then do a replace throughout the whole string and remove the array element. Then try indexOfAny and repeat the process.

This could be faster, or slower. Does anyone have any suggestions??

could you store the words in a sql db/?

yes, but that would be pointless hits against the database.

So far I have a revised function:

Public Function CheckCensoredList(ByVal txt As String) As String
        If Len(txt) > 0 Then
            Dim str As StringBuilder
            Dim regx As Regex
            Dim counts As Integer = 0
            Dim arr() As String = {...............}

'not sure if this works (indexofany), haven't tested it yet.
            If txt.IndexOfAny(arr) >= 0 Then
                For Each word In arr
                    regx = New Regex(word)
                    counts &= regx.Matches(txt).Count
                    str = New StringBuilder()
                    str.Append("<span style=""color:red;""><em>")
                    str.Append(word)
                    str.Append("</em></span>")
                    txt = txt.Replace(word, (str).ToString())
                Next
            End If

            Return (counts & ";" & txt)
        End If
        Return (txt)
    End Function

could you not have the curses in the db then if hits occur disallow the word? would that not work ?

Yes it would, but that requires 250 hits to the database, which is excessive :)

yeah I see your point! would performance suffer greatly though ?

through those hits? Yes cause this function will be called roughly 300-1200 times a day (depending on the day of the week).

It's fine how it is, I was just hoping there was a better way. At least this way

fair enough, I see what you mean about the performance now with that many hits

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.