I have a SQL table that contains spam or naughty word phrases. I'm trying to decide how to take some text and determine if any of the values in the table exist in my text.

In the past I have selected all of the phrases out of the database and looped over them, comparing looking to see if any phrase is in my passed text. Is this the best way? Is there a better way?

Recommended Answers

All 8 Replies

I have a SQL table that contains spam or naughty word phrases. I'm trying to decide how to take some text and determine if any of the values in the table exist in my text.

In the past I have selected all of the phrases out of the database and looped over them, comparing looking to see if any phrase is in my passed text. Is this the best way? Is there a better way?

can you post you code where you did it as a loop

This is how I did it in ColdFusion. Now I need to do the same, perhaps more efficiently, in .NET 4.

<!--- check black list --->
	  <cfquery name="getEmailAbusers" datasource="#variables.dsn2#">
	    SELECT	phrase
	    FROM		emailAbusers
	  </cfquery>

	  <!--- email staff if on black list --->
	  <cfloop query="getEmailAbusers">
	    <cfif arguments.body CONTAINS #phrase#>
	      <cfmail	from="webserver@xyz.org"
	              to="#Application.webDeveloper1#"
	              subject="Form Abuse"
	              type="html">

	        <p>The form is being abused with "#phrase#" in the body.</p>

	        <p>#arguments.body#</p>

	        <cfdump var="#cgi#" label="CGI">
	      </cfmail>

	      <cfreturn true />
	    </cfif>
	  </cfloop>

I'm basically trying to create a bad words filter, or a perform a reverse contains.

Here is what I have so far. It's failing if my phrase is more than one word.

public static bool isOnBlackList(string body)
		{
			SqlConnection con = new SqlConnection(ConnectionString);
			string sqlSelectCommand = "SELECT phrase FROM emailAbusers";
			SqlCommand cmd = new SqlCommand(sqlSelectCommand, con);

			bool result = false;
			body = " " + body + " ";

			using (con)
			{
				con.Open();
				SqlDataReader Phrases = cmd.ExecuteReader();

				while (Phrases.Read())
				{
					foreach (string phrase in Phrases)
					{
						string aBadPhrase = " " + phrase + " ";

						if (body.ToUpper().Contains(aBadPhrase.ToUpper()))
						{
							result = true;
							break;
						}
					}
				}
				Phrases.Close();
				return result;
			}
		}

I'm basically trying to create a bad words filter, or a perform a reverse contains.

Why are you not passing Where Criteria in your Select query ? I mean in below way :
SELECT phrase FROM emailAbusers Where phrase like '%body%'

I don;t have more idea about your requirement but i believe above query modification will work for you.

Try by it and let us know..

Here is what I have so far. It's failing if my phrase is more than one word.

public static bool isOnBlackList(string body)
		{
			SqlConnection con = new SqlConnection(ConnectionString);
			string sqlSelectCommand = "SELECT phrase FROM emailAbusers";
			SqlCommand cmd = new SqlCommand(sqlSelectCommand, con);

			bool result = false;
			body = " " + body + " ";

			using (con)
			{
				con.Open();
				SqlDataReader Phrases = cmd.ExecuteReader();

				while (Phrases.Read())
				{
					foreach (string phrase in Phrases)
					{
						string aBadPhrase = " " + phrase + " ";

						if (body.ToUpper().Contains(aBadPhrase.ToUpper()))
						{
							result = true;
							break;
						}
					}
				}
				Phrases.Close();
				return result;
			}
		}

@rohand, I agree; rather than retrieving the list of words then having to iterate through them. Pass the body of the message to the database and use a LIKE condition. The database will be able to perform the match this way much quicker and more efficiently than any loop you write :)

However, your syntax is checking for entries in the phrase field which contain the whole of the email body. You would need something like:

SELECT phrases
  FROM emailAbusers
  WHERE (LOWER(@body) LIKE '% ' + LOWER(LTRIM(RTRIM(phrases))) + ' %')
  OR (LOWER(@body) LIKE '% ' + LOWER(LTRIM(RTRIM(phrases)))) 
  OR (LOWER(@body) LIKE LOWER(LTRIM(RTRIM(phrases))) + ' %')

The OP's code suggests they want to ignore case so I have converted both the body and the phrase to lower case before comparing. I also left and right trimmed the phrase to ensure there are no white spaces to disrupt the match and added a space after the first % and before the last as the OP's code suggests they only wish to match whole words.
Finally, the '%' at the start of the match requires there to be at least one character BEFORE the match and the '%' at the end requires at least one character after it so it wont match the first or last word in the string. I have added an extra two LIKE clauses to check for those matches as well.

The above query will return a collection of all the bad phrases that are in the body. Hope this helps :)

Your query actually works perfectly so it appears. I never knew you could swap the values, i.e. Where 'value' LIKE column. Learned something new.

Thanks.

@rohand, I agree; rather than retrieving the list of words then having to iterate through them. Pass the body of the message to the database and use a LIKE condition. The database will be able to perform the match this way much quicker and more efficiently than any loop you write :)

However, your syntax is checking for entries in the phrase field which contain the whole of the email body. You would need something like:

SELECT phrases
  FROM emailAbusers
  WHERE (LOWER(@body) LIKE '% ' + LOWER(LTRIM(RTRIM(phrases))) + ' %')
  OR (LOWER(@body) LIKE '% ' + LOWER(LTRIM(RTRIM(phrases)))) 
  OR (LOWER(@body) LIKE LOWER(LTRIM(RTRIM(phrases))) + ' %')

The OP's code suggests they want to ignore case so I have converted both the body and the phrase to lower case before comparing. I also left and right trimmed the phrase to ensure there are no white spaces to disrupt the match and added a space after the first % and before the last as the OP's code suggests they only wish to match whole words.
Finally, the '%' at the start of the match requires there to be at least one character BEFORE the match and the '%' at the end requires at least one character after it so it wont match the first or last word in the string. I have added an extra two LIKE clauses to check for those matches as well.

The above query will return a collection of all the bad phrases that are in the body. Hope this helps :)

No problem :) Thats what daniweb is all about.
Remember to mark the thread as solved if your question has been answered

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.