| | |
PHP Badword Filter (Intermediate)
Please support our PHP advertiser: PostgreSQL or MySQL? Compare and contrast the two most popular open source databases
![]() |
•
•
Join Date: Jul 2007
Posts: 1
Reputation:
Solved Threads: 0
Yes, there are many posts out there about badword filters, and most seem to fall short of something you'd want to turn loose on a corporate website. I've created a fairly elegant badword solution, and I wish to share it with the development community. I'm looking to optimize it a bit more, because as the list of badwords I use grows, the process time could get pretty ugly.
A little more explanation about this particular script - One of my clients has a fairly complex comment form on their website that allows the option to comment, get the newsletter, etc - and then emails the necessary details to their person who handles such things. Recently, the emails were coming fast and furious with spam for lewd websites, prescription medications, and such. Of course, this needed to stop.
As there is no message board to check, and no obvious verification to see if the message was indeed sent or received (just a static thank-you splash page) - there was no need to try and replace bad words with characters, or warn the offending user/bot that anything was amiss - the messages just needed to be sent to the circular bin. As we weren't interested in keeping the messages around, a little delicacy was needed to keep partial matches from flagging the messages. This was the first feature of my script that I noticed most do not contain - many will display words like class as cl***. As the message would then be flagged as containing bad words, this approach wouldn't work.
So, a little wrangling with regular expressions later, I have a script that only matches whole words from my badwords list. Now, the occasional swear isn't going to matter, so if a couple masked words or swears next to tags make it through, it's ok. If you need to further filter for those instances, then by all means replace tags with whitespace before doing the string comparison.
Other features that are in this script that may or may not be included in other examples bouncing around the web:
* bad words are loaded from a text file
* additional block for [url] tags implemented
* result is transparent - offender doesn't know he's blocked.
And here's the script. If there are ideas on how to tidy some things up, I'll happily give them a shot. I'm currently considering switching from preg_match() to eregi(), and also creating the badwords expression from the entire badwords file, using the | operator and a loop. This way there's only one preg_match() call needed, instead of looping it. I'm also sure some of my control expressions could be a little more elegant, but this is a good first crack, I believe. If I make any major changes, I'll post them as replies.
A little more explanation about this particular script - One of my clients has a fairly complex comment form on their website that allows the option to comment, get the newsletter, etc - and then emails the necessary details to their person who handles such things. Recently, the emails were coming fast and furious with spam for lewd websites, prescription medications, and such. Of course, this needed to stop.
As there is no message board to check, and no obvious verification to see if the message was indeed sent or received (just a static thank-you splash page) - there was no need to try and replace bad words with characters, or warn the offending user/bot that anything was amiss - the messages just needed to be sent to the circular bin. As we weren't interested in keeping the messages around, a little delicacy was needed to keep partial matches from flagging the messages. This was the first feature of my script that I noticed most do not contain - many will display words like class as cl***. As the message would then be flagged as containing bad words, this approach wouldn't work.
So, a little wrangling with regular expressions later, I have a script that only matches whole words from my badwords list. Now, the occasional swear isn't going to matter, so if a couple masked words or swears next to tags make it through, it's ok. If you need to further filter for those instances, then by all means replace tags with whitespace before doing the string comparison.
Other features that are in this script that may or may not be included in other examples bouncing around the web:
* bad words are loaded from a text file
* additional block for [url] tags implemented
* result is transparent - offender doesn't know he's blocked.
And here's the script. If there are ideas on how to tidy some things up, I'll happily give them a shot. I'm currently considering switching from preg_match() to eregi(), and also creating the badwords expression from the entire badwords file, using the | operator and a loop. This way there's only one preg_match() call needed, instead of looping it. I'm also sure some of my control expressions could be a little more elegant, but this is a good first crack, I believe. If I make any major changes, I'll post them as replies.
php Syntax (Toggle Plain Text)
// Filtering Function function filterBadWords($str,$badWordsFile) { $badFlag = 0; if(!is_file($badWordsFile)) { echo "ERROR: file missing: ".$badWordsFile; exit; } else { $badWordsFH = fopen($badWordsFile,"r"); $badWordsArray = explode("\n", fread($badWordsFH, filesize($badWordsFile))); fclose($badWordsFH); } foreach ($badWordsArray as $badWord) { if(!$badWord) continue; else { $regexp = "/\b".$badWord."\b/i"; if(preg_match($regexp,$str)) $badFlag = 1; } } if(preg_match("/\[url/",$str)) $badFlag = 1; return $badFlag; }
php Syntax (Toggle Plain Text)
// Function Call/Usage if (filterBadWords($message,"badwords.txt") == 0) { mail("mail@destination.com", $subject, $message, $from); } header("Location: http://www.siteurl.com/index.php?p=Thank_You");
PHP Syntax (Toggle Plain Text)
// badwords.txt word1 word2 word3
A better solution for yourself would be just to mask the bad words (unless you really don't want to send the email).
Just do a preg_replace of *@!>&^ (or however many characters are in the badword).
Nothing much wrong with your code though, just a few logic changes might benefit yourself.
You are not validating your mail() function either. I suggest wrapping it in an IF statement or use the @ character eg: @mail(..);
Cheers
Just do a preg_replace of *@!>&^ (or however many characters are in the badword).
Nothing much wrong with your code though, just a few logic changes might benefit yourself.
You are not validating your mail() function either. I suggest wrapping it in an IF statement or use the @ character eg: @mail(..);
Cheers
GardCMS :: Open Source CMS :: Gardcms.org
•
•
Join Date: Jun 2009
Posts: 1
Reputation:
Solved Threads: 0
•
•
•
•
Yes, there are many posts out there about badword filters, and most seem to fall short of something you'd want to turn loose on a corporate website. I've created a fairly elegant badword solution, and I wish to share it with the development community. I'm looking to optimize it a bit more, because as the list of badwords I use grows, the process time could get pretty ugly.
A little more explanation about this particular script - One of my clients has a fairly complex comment form on their website that allows the option to comment, get the newsletter, etc - and then emails the necessary details to their person who handles such things. Recently, the emails were coming fast and furious with spam for lewd websites, prescription medications, and such. Of course, this needed to stop.
As there is no message board to check, and no obvious verification to see if the message was indeed sent or received (just a static thank-you splash page) - there was no need to try and replace bad words with characters, or warn the offending user/bot that anything was amiss - the messages just needed to be sent to the circular bin. As we weren't interested in keeping the messages around, a little delicacy was needed to keep partial matches from flagging the messages. This was the first feature of my script that I noticed most do not contain - many will display words like class as cl***. As the message would then be flagged as containing bad words, this approach wouldn't work.
So, a little wrangling with regular expressions later, I have a script that only matches whole words from my badwords list. Now, the occasional swear isn't going to matter, so if a couple masked words or swears next to tags make it through, it's ok. If you need to further filter for those instances, then by all means replace tags with whitespace before doing the string comparison.
Other features that are in this script that may or may not be included in other examples bouncing around the web:
* bad words are loaded from a text file
* additional block for [url] tags implemented
* result is transparent - offender doesn't know he's blocked.
And here's the script. If there are ideas on how to tidy some things up, I'll happily give them a shot. I'm currently considering switching from preg_match() to eregi(), and also creating the badwords expression from the entire badwords file, using the | operator and a loop. This way there's only one preg_match() call needed, instead of looping it. I'm also sure some of my control expressions could be a little more elegant, but this is a good first crack, I believe. If I make any major changes, I'll post them as replies.
php Syntax (Toggle Plain Text)
// Filtering Function function filterBadWords($str,$badWordsFile) { $badFlag = 0; if(!is_file($badWordsFile)) { echo "ERROR: file missing: ".$badWordsFile; exit; } else { $badWordsFH = fopen($badWordsFile,"r"); $badWordsArray = explode("\n", fread($badWordsFH, filesize($badWordsFile))); fclose($badWordsFH); } foreach ($badWordsArray as $badWord) { if(!$badWord) continue; else { $regexp = "/\b".$badWord."\b/i"; if(preg_match($regexp,$str)) $badFlag = 1; } } if(preg_match("/\[url/",$str)) $badFlag = 1; return $badFlag; }
php Syntax (Toggle Plain Text)
// Function Call/Usage if (filterBadWords($message,"badwords.txt") == 0) { mail("mail@destination.com", $subject, $message, $from); } header("Location: http://www.siteurl.com/index.php?p=Thank_You");
PHP Syntax (Toggle Plain Text)
// badwords.txt word1 word2 word3
my additional code in front is this
PHP Syntax (Toggle Plain Text)
$badWordsFile="badwords.txt";
Last edited by jiawei456123; Jun 21st, 2009 at 2:09 am. Reason: Spelling error
![]() |
Similar Threads
- learning php (PHP)
- Just introducing myself (Community Introductions)
- Problems using a php generator (PHP)
- Simple Banned Words Filter (PHP)
- WINDOWS vs LINUX (Windows NT / 2000 / XP)
- My HiJack This log (Viruses, Spyware and other Nasties)
- Redirection to http://th.msie.cc/index.php?aid=20038 (Viruses, Spyware and other Nasties)
Other Threads in the PHP Forum
- Previous Thread: Passing & displaying text array from form
- Next Thread: Populating form with existing MySQL data
| Thread Tools | Search this Thread |
Tag cloud for PHP
.htaccess access ajax apache api array beginner binary broken cakephp checkbox class cms code cron curl database date directory display download dynamic echo email encode error fcc file files folder form forms function functions google howtowriteathesis href htaccess html image include insert integration ip java javascript joomla limit link login loop mail menu methods mlm mod_rewrite multiple multipletables mysql oop open parse paypal pdf php problem provider query radio random recursion regex remote script search select server sessions sms soap source space speed sql structure syntax system table template tutorial update upload url validation validator variable video web xml youtube





