Hi there,

I am making a spell checker program with php with words stored in mysql. I use it to check unicode languages. But I do not get good suggestions with similar_text with percentage above 82. But if I decrease this percentage, I get many useless words. How can I make it better?

Here is a part of my code :

while ($row = mysql_fetch_assoc($words)) {
                $word = $row['word'];
                similar_text($word, $str, $percent);
                if ($percent > 82) {
                    echo "<span class=\"sugg\">$word </span>";
                }
            }

Recommended Answers

All 18 Replies

@cereal
Is there any tutorial to do so?

You can do it like this:

<?php

$word = 'swrd';
$pspell_config = pspell_config_create("en");

# Available modes:
#
# PSPELL_FAST
# PSPELL_NORMAL
# PSPELL_BAD_SPELLERS
#
pspell_config_mode($pspell_config, PSPELL_BAD_SPELLERS);
$pspell_link = pspell_new_config($pspell_config);

$result = array();
if (!pspell_check($pspell_link, $word)) {
    $suggestions = pspell_suggest($pspell_link, $word);

    foreach ($suggestions as $suggestion) {

        similar_text($word, $suggestion, $percent);
        if (round($percent) > 70) {
        $result[] = $suggestion;
        }
    }
}
$result = array_unique($result); # no duplicates
print_r($result);
?>

In this example at 70 you get:

Array
(
    [0] => surd
    [1] => sward
    [2] => sword
    [3] => seaward
    [4] => sward's
    [5] => swards
    [6] => sword's
    [7] => swords
    [8] => ward
    [9] => word
)

At 80 just:

Array
(
    [0] => sward
    [1] => sword
)

Hope is useful, bye!

Pspell doesn't work in newer versions of PHP... Is there any other way??

Yes, you need a PWL file: a PWL file is personal word file one word per line, if you have words in the database just loop them into a file. And you have to use the function enchant_broker_request_pwl_dict() like this:

<?php
$r = enchant_broker_init();

# PWL dictionary path/file
$mydict = 'dict.pwl';

# word to check
$word = 'swrd';

# load custom dictionary
$d = enchant_broker_request_pwl_dict($r,$mydict);

# load standard dictionary instead of a custom
//$d = enchant_broker_request_dict($r,'en_GB');

$dict_details = enchant_dict_describe($d);
$dprovides = enchant_dict_describe($d);
echo "dictionary provides:\n";

$wordcorrect = enchant_dict_check($d, $word);

# details about the loaded dictionary
print_r($dprovides);

if (!$wordcorrect) {
    $suggestions = enchant_dict_suggest($d, $word);
    $result = array();
    foreach ($suggestions as $suggestion) {
        similar_text($word, $suggestion, $percent);
        if (round($percent) > 80) {
            $result[] = $suggestion;
        }
    }
}

# no duplicates, with Enchant I think this can be removed
$result = array_unique($result);
print_r($result);

enchant_broker_free_dict($d);
?>

Read here for more info:
* http://www.php.net/manual/en/enchant.installation.php
* http://www.php.net/manual/en/function.enchant-broker-request-pwl-dict.php

Can you provide me the procedure to install it??? Please.. I am new in doing this all php installations..

Which platform and which PHP version are you using? On debian/ubuntu, for example, you need to write sudo apt-get install php5-enchant (this will probably reload Apache). For Windows follow these instrunctions:

http://php.net/manual/en/install.windows.extensions.php
http://www.php.net/manual/en/install.pecl.windows.php

At the end: you just need to download the extension, place the dll in the extensions directory of PHP, edit the configuration file to set the new extension and reload/restart Apache or the extension will not be loaded.

NOTE: at the moment, in Windows, you should compile the extension on your own because the pecl4win.php.net is still not available, but you can use those provided by the team: http://downloads.php.net/pierre/ just follow the instructions in the downloaded package.

Reference: http://stackoverflow.com/questions/2048309/pecl-extesions-for-windows

i've done everything but it isn't working...

I need information in order to help you. What you get if you run:

<?php
echo extension_loaded('enchant') ? 'working':'not loaded';
?>

And which platform/PHP version are you using?

PHP 5.3.5
I get not loaded with the above code... but i tried everything... I am using wampserver... extension is ticked in the wampserver tray

Ok, that means the extension is not loaded, try to restart Windows and if it does not work, please explain what you have done, maybe the dll is in the wrong directory..

In wampserver, the extensions directory is ext which I read from the php.ini. I placed all the enchant dlls in the ext directory and added the extention line in the php.ini. But it isn't working......

And one more thing... How would I spell check a big string.. I tried by splitting the string by explode and then used foreach variable. It took a very long to for a very long string.... Is it the only way??

This is my code I used with mysql...

    function checkspell($string) {
       $counter = 0;
        $arr = explode(" ",$string);
        foreach($arr as $str) {
            $text = substr($str, 0, 3);
            $exists = mysql_query("SELECT COUNT(word) FROM unicode WHERE word = '$str'") or die (mysql_error());
            $words = mysql_query("SELECT * FROM unicode WHERE word LIKE '$text%' ORDER BY word") or die (mysql_error());
            $sugges = 0;
            if (mysql_result($exists, 0) == 0) {
            $counter++;
                echo "<span class=\"error\" sug=\"$counter\">$str </span><div class=\"suggestions $counter\">";
                while ($row = mysql_fetch_assoc($words)) {
                    $word = $row['word'];
                    similar_text($word, $str, $percent);
                    if ($percent > 84) {
                        echo "<span class=\"sugg\" sugges=\"$word \" idt=\"$counter\">$word</span>";
                        $sugges++;
                    }
                }
                if ($sugges == 0) {
                echo "<span class=\"nosug\">No Suggestions Found...</span>";
                }
                echo "<hr size=\"1\" color=\"#ccc\"><span class=\"as\">Synonyms</span><span class=\"as\">Antonyms</span></div>";
            } else {
            echo "<span class=\"whps\">$str</span>";
            }
        }
    }

Leave enchant.dll in the c:\php\extensions directory and move the others to the parent: c:\php or to c:\php\dlls and restart the server as explained here:

Some of the extensions need extra DLLs to work. Couple of them can be found in the distribution package, in the C:\php\dlls\ folder in PHP 4 or in the main folder in PHP 5, but some, for example Oracle (php_oci8.dll) require DLLs which are not bundled with the distribution package. If you are installing PHP 4, copy the bundled DLLs from C:\php\dlls folder to the main C:\php folder. Don't forget to include C:\php in the system PATH (this process is explained in a separate FAQ entry).

source: http://php.net/manual/en/install.windows.extensions.php

And if you read the FAQ entry http://www.php.net/manual/en/faq.installation.php#faq.installation.addtopath
you will read how to add the path c:\php otherwise the system will not find the additional dlls.

About your script

Checking a string can be slow. In your code you are doing two queries for each word, you can speed the queries by creating an index and searching an hash in the first query. You will need to add a varchar field to your table:

ALTER TABLE unicode ADD word_hash varchar(32) NOT NULL;

and run a query to update the table, this will add the hash value to word_hash field:

UPDATE unicode SET word_hash = (SELECT MD5(word));

and create an index containing the word_hash, the word and the id (if present):

CREATE INDEX word_index on unicode (word_hash,word,id);

then you can change your first query to:

SELECT count(id) FROM unicode WHERE word_hash = md5('$str');

the index will be used also in the second query, you can verify this by running both in PHPMyAdmin:

EXPLAIN SELECT count(id) FROM unicode WHERE word_hash = md5('hello')\G
EXPLAIN SELECT * FROM unicode WHERE word LIKE 'hello%' ORDER BY word\G

with these test queries you will get some information about the method used to retrieve the results, if in the EXTRA field you read Using index that means it is working. Example:

mysql> explain select count(id) from users where fname = 'jane'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: users
         type: index
possible_keys: NULL
          key: user_index
      key_len: 254
          ref: NULL
         rows: 21
        Extra: Using where; Using index
1 row in set (0.00 sec)

Also I would move the second query after the IF statement at line 9, so you run the second search query only if you got zero results from the first query.

I think using Enchant can give a bit of speed, at least you don't use the database in multiple queries.

Thanks cereal...Your script is awesome I see.. I'll try it.... Thanks for all your help...

Hey! have you got new idea?

what??

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.