0

I am making a basic craigslist type website with a search function. There is a field where you can enter what you think the name of the product might be and my search function is supposed to find names of products that match.

so for example a user can enter:
"squat rack"
"squatrack"
"sqaut rack"
"squat rac"
"scuat rack"
"squart rac" (you know, in case of a typo)

and hopefull what will show up in the serach results is:
"squat rack"
"some other kind of rack"
"squat machine"

and things that are similar.

I used "/<name of product>/" as my search criteria and it pulls up everything. Hopefully the user would put other search criteria to narrow it down but right now I'm looking for a regex that can do "close enough" type matches. Myabe even use a for loop or split it up into sections and do multiple searches I'd be OK with. I'm very new to regex and I find it to be difficult, so bear with me.

7
Contributors
25
Replies
32
Views
4 Years
Discussion Span
Last Post by diafol
Featured Replies
  • 1

    you can try using the LIKE operator in your mysql statement. SELECT fields FROM database WHERE field LIKE 'variable' ex SELECT lastName, firstName FROM employees WHERE firstName LIKE 'a%' Read More

  • 1

    Dont know much PHP but logic for regex is similar; use \w+ - for one or more whitespaces (q|c) - letter q or c so squat/scuat/scaut/sqaut with ending whitespace will be. you get the idea. s(q|c)(a|u)t(\w+) Read More

  • 1
    diafol 3,720   4 Years Ago

    How about knocking up something with soundex()? That *should* match many similar sounding spelling mistakes. I tried out the following and it worked really well on single words: SELECT job_id, message, SOUNDEX('squat') AS search, SOUNDEX(message) AS msg, 'squat' SOUNDS LIKE message AS result FROM mytable Not so good with multiple … Read More

  • 1
    diafol 3,720   4 Years Ago

    OK, just out of interest, here's some code to work with a soundex column in a mysqlDB. I've used vanilla mysql here, but mysqli or PDO would be a far better solution. Also I went overboard in the soundIt function, but that can be tidied up easily enough: //JUst soundexes … Read More

  • 1
    diafol 3,720   4 Years Ago

    Perhaps metaphone() would be better. It's similar to soundex() but seems more accurate. The problem with soundex is that it only goes so far (4 chars). Here's the script with metaphone instead: //JUst metaphones function soundIt($str,$return_as_array=false){ preg_match_all('/\b([\w]*)\b/',$str,$m); $arr = array_filter(array_map("metaphone",$m[1])); return ($return_as_array) ? $arr : implode(" ", $arr); } //create … Read More

1

you can try using the LIKE operator in your mysql statement.

SELECT fields FROM database WHERE field LIKE 'variable'

ex
SELECT lastName, firstName FROM employees WHERE firstName LIKE 'a%'
1

Dont know much PHP but logic for regex is similar; use
\w+ - for one or more whitespaces
(q|c) - letter q or c
so squat/scuat/scaut/sqaut with ending whitespace will be. you get the idea.
s(q|c)(a|u)t(\w+)

0

@dancks

I used "/<name of product>/" as my search criteria and it pulls up everything. Hopefully the user would put other search criteria to narrow it down but right now I'm looking for a regex that can do "close enough" type matches. Myabe even use a for loop or split it up into sections and do multiple searches I'd be OK with. I'm very new to regex and I find it to be difficult, so bear with me.

Here is couple of links that has regex examples that you can test it out and learn from it:

http://www.iamcal.com/publish/articles/php/search/

http://www.askaboutphp.com/25/regex-extract-content.html

0

WHat you propose is fine with regard to the odd example, but how sustainable is this for every word and its possible variants that could be placed into a search bar?

0

&fobos: Will try the like operator and post back results. Thanks.

Dont know much PHP but logic for regex is similar; use
\w+ - for one or more whitespaces
(q|c) - letter q or c
so squat/scuat/scaut/sqaut with ending whitespace will be. you get the idea.
s(q|c)(a|u)t(\w+)

Looks good but what if the person doesn't know how to spell the word/phrase or even know how many characters?

so for squat/scuat/scaut/sqaut/sqa/squ maybe:
s([<lowestchar>-<highestchar>])(\w+) -> s([a-u])(\w+) ?
and if a phrase: (e.g "squat rack")
s([<lowestchar>-<highestchar>])(\w+) + r([<lowestchar>-<highestchar>])(\w+) -> s([a-u])(\w)+r([a-k])(\w+) ?

wait, wait a minute, how about for each word: (for "squat")
s([q|u|a|t]?)*(\w+) ???

if that makes any sense? I'm going by what I see here:
http://weblogtoolscollection.com/regex/regex.php?page=3

Edited by dancks: idea

1

How about knocking up something with soundex()? That should match many similar sounding spelling mistakes. I tried out the following and it worked really well on single words:

SELECT job_id, message, SOUNDEX('squat') AS search, SOUNDEX(message) AS msg, 'squat' SOUNDS LIKE message AS result FROM mytable

Not so good with multiple words though!

Squat successfully matched with: sgwhat, skwott, sqwatt

It should be noted that SOUNDEX() works with English words and A-Z/a-z. Non-English and extended chars provide uncertain results.
You could expect it to throw up nonsense from time to time too.

Edited by diafol

0

Update: Messed with like, doesn't come anywhere close to the accuracy I desire. It literally can only make a match only if the case of a character was changed.

@Diafol I'm sorry but I'm having a hard time understanding your suggestion. I never heard of SOUNDEX before and your sample SQL statement doesn't make sense to me. What does variable msg do/contain? As far as multiple words I can always break them apart and do separate searches and simply include only those results which popped up in every search (a bit time consuming perhaps)

I tried the following:

mysql> SELECT ItemID,item_name, SOUNDEX('squat') AS search, SOUNDEX(item_name) as msg, 'squat' SOUNDS LIKE item_name AS result FROM Item;
+--------+-------------------------------------+--------+-------------------+--------+
| ItemID | item_name                           | search | msg               | result |
+--------+-------------------------------------+--------+-------------------+--------+
|      1 | Boom Mike                           | S300   | B520              |      0 | 
|      2 | C Answer Book                       | S300   | C52612            |      0 | 
|      3 | Hammer                              | S300   | H560              |      0 | 
|      4 | Linux 6th Ed The Complete Reference | S300   | L5232514361652    |      0 | 
|      5 | Rascal                              | S300   | R240              |      0 | 
|      6 | Remington 20ga 1100                 | S300   | R52352            |      0 | 
|      7 | Sopranos Entire Series DVD Box Set  | S300   | S1652536262313123 |      0 | 
|      8 | Squat Rack                          | S300   | S362              |      0 | 
|      9 | 1980s Ford Stanza                   | S300   | S1632352          |      0 | 
|     10 | The Wire Entire Series DVD Box Set  | S300   | T6536262313123    |      0 | 
|     11 | Tooth Brush                         | S300   | T162              |      0 | 
|     12 | Tooth Paste                         | S300   | T123              |      0 | 
+--------+-------------------------------------+--------+-------------------+--------+
12 rows in set (0.00 sec)

for reference:

table:

mysql> describe Item;
+-------------+--------------+------+-----+---------------------+----------------+
| Field       | Type         | Null | Key | Default             | Extra          |
+-------------+--------------+------+-----+---------------------+----------------+
| item_name   | varchar(100) | NO   |     |                     |                | 
| ItemID      | mediumint(9) | NO   | PRI | NULL                | auto_increment | 
| cat_name    | varchar(45)  | NO   |     |                     |                | 
| userID      | mediumint(9) | NO   |     |                     |                | 
| descr       | text         | NO   |     |                     |                | 
| image       | tinytext     | YES  |     | NULL                |                | 
| date        | timestamp    | NO   |     | CURRENT_TIMESTAMP   |                | 
| highest_bid | decimal(6,2) | NO   |     | 0.00                |                | 
| time_expire | timestamp    | NO   |     | 0000-00-00 00:00:00 |                | 
+-------------+--------------+------+-----+---------------------+----------------+
9 rows in set (0.00 sec)

And some sample data contained in table:

mysql> select item_name,ItemID from Item;
+-------------------------------------+--------+
| item_name                           | ItemID |
+-------------------------------------+--------+
| Boom Mike                           |      1 | 
| C Answer Book                       |      2 | 
| Hammer                              |      3 | 
| Linux 6th Ed The Complete Reference |      4 | 
| Rascal                              |      5 | 
| Remington 20ga 1100                 |      6 | 
| Sopranos Entire Series DVD Box Set  |      7 | 
| Squat Rack                          |      8 | 
| 1980s Ford Stanza                   |      9 | 
| The Wire Entire Series DVD Box Set  |     10 | 
| Tooth Brush                         |     11 | 
| Tooth Paste                         |     12 | 
+-------------------------------------+--------+
12 rows in set (0.01 sec)
0

It was just a thought. I've never used this to search before, but if you use multiple words, they have to be broken, otherwise you get

S362 for Squat Rack, where it would be better to get S300 (squat) R200 (rack)

On second thoughts, perhaps this wasn't my best idea :)

The msg was just there to show the soundex() of the column to search. The 'sounds like' can work without it. If there's a match, 'result' will show 1.

Edited by diafol

0

wait. I think I get it. I run the SOUNDEX as a mysql_query (4.3.5!) in PHP msg for squat = S563 and search is S300 ah ok. just whatever hash gets closest. Which then I'll need a string comparison like strcmp or trim... which I probably could've done in the first place... or just use preg_match or something... oh boy.

0

It was a good suggestion. I think I'll use it But now I seriously just need a method of comparing the hash values. Just match first letter and size perhaps.

0

@dancks

I think I'll use it But now I seriously just need a method of comparing the hash values. Just match first letter and size perhaps.

Can your provide a code? It will be much easier to know where to put the hashes on the code. Right now, I don't see any code at all.

I only see a query.

0

Erm, I'm not sure if I should be encouraging this as I can't see how to scale it up for multiple words. There's no such thing as str_split in MYSQL, so creating a separate soundex() column containing soundexes for all words in the rows for the table would be tricky. Of course it could be done really easily with php. MYSQL although it has regexp support, it can't really replace stuff, so it's difficult.

0

if you just want to learn regular expression have try for "expresso regex" it just a small program that will teach you on regex

1

OK, just out of interest, here's some code to work with a soundex column in a mysqlDB. I've used vanilla mysql here, but mysqli or PDO would be a far better solution. Also I went overboard in the soundIt function, but that can be tidied up easily enough:

//JUst soundexes
function soundIt($str,$return_as_array=false){
    preg_match_all('/\b([\w]*)\b/',$str,$m);
    $arr = array_filter(array_map("soundex",$m[1]));
    return ($return_as_array) ? $arr : implode(" ", $arr);
}

//create sql for search via soundex
function search($term,$limit=20){
    if(trim($term) == '')return false;
    $search_terms = soundIt($term, true);
    foreach($search_terms as $s) $search_string[] = "IF (soundexcol LIKE '%$s%',1,0)";  
    $search_clause = implode(" + ", $search_string) . " AS score FROM table HAVING score > 0 ORDER BY score DESC LIMIT $limit";
    return "SELECT description, $search_clause";
}

//USAGE
$input = mysql_real_escape_string($_POST['input']);
$res = mysql_query(search($input,10));

//SORT OUT EXISTING ROWS WITHOUT SOUNDEX COL DATA
$result = mysql_query("SELECT description FROM table WHERE soundexcol = ''");
while($row = mysql_fetch_assoc($result)){
    $string = $row['description'];
    $input = soundIt($string);
    mysql_query("UPDATE table SET soundexcol = '$input'");
}

For this a 'soundexcol' field needs to be in the table. I'm assuming varchar/255.

Edited by diafol

1

Perhaps metaphone() would be better. It's similar to soundex() but seems more accurate. The problem with soundex is that it only goes so far (4 chars). Here's the script with metaphone instead:

//JUst metaphones
function soundIt($str,$return_as_array=false){
    preg_match_all('/\b([\w]*)\b/',$str,$m);
    $arr = array_filter(array_map("metaphone",$m[1]));
    return ($return_as_array) ? $arr : implode(" ", $arr);
}

//create sql for search via metaphone
function search($term,$limit=20){
    if(trim($term) == '')return false;
    $search_terms = soundIt($term, true);
    foreach($search_terms as $s) $search_string[] = "IF (metaphonecol LIKE '%$s%',1,0)";    
    $search_clause = implode(" + ", $search_string) . " AS score FROM table HAVING score > 0 ORDER BY score DESC LIMIT 20";
    return "SELECT description, $search_clause";
}

//USAGE
$input = mysql_real_escape_string($_POST['input']);
$res = mysql_query(search($input,10));

//SORT OUT EXISTING ROWS WITHOUT SOUNDEX COL DATA
$result = mysql_query("SELECT description FROM table WHERE metaphonecol = ''");
while($row = mysql_fetch_assoc($result)){
    $string = $row['description'];
    $input = soundIt($string);
    mysql_query("UPDATE table SET metaphonecol = '$input'");
}

//to get the just the SQL
echo search("help me find a programmer",10);
0

Diafol Thats perfect. Although there is a lot of php you have that I haven't heard of or messed with before. Its perfect. Thanks.

0

OK screw it I was gonna whip up a regex tester page:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Untitled Document</title>
</head>
<body>
<form method="post" action="testregex.php">
<p>Enter php regex: <input type="text" name="1" class="regex" id="regex" /></p>
<p>Enter text to screw with: <textarea rows=5 cols=80 name="2"></textarea></p>
<p>Function:
<select name="0" class="func" id="func">
<option value="preg_match" selected="selected">preg_match</option>
<option value="preg_match_all">preg_match_all</option>
<option value="preg_replace">preg_replace</option>
<option value="preg_split">preg_split</option>
<option value="preg_grep">preg_grep</option>
<option value="preg_quote">preg_quote</option>
<option value="ereg">ereg</option>
</select>
</p>
<input type="submit" />
</form>
</body>
</html>

and

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Test PHP regex</title>
</head>
<body>
<?php
    $_POST['3'] = array();
    $str = 0;
    if(isset($_POST['0']) && isset($_POST['1']) && isset($_POST['2']))
    {
        try
        {
            $str = call_user_func($_POST['0'],$_POST['1'],$_POST['2']);
            echo "<p>".$str."</p>\n";
        }
        catch(exception $e)
        {
            try
            {
                call_user_func($_POST['0'],$_POST['1'],$_POST['2'],$_POST['3']);
                $a = implode($_POST['3']);
                echo "<p>".$a."</p>\n";
            }
            catch(exception $e)
            {
                echo "<p>I suck at this\n
                ".$_POST['2']."</p>\n";
            }
        }
    }
    else
    {
        echo "<p>The required fields weren\'t filled out</p>\n";
    }
?>
</body>
</html>

Parse error on line 14. Will call_user_func() work in this way?

0

AFAIK, it should work if the function exists. In truth, I've not used it with php native functions. I assumed that it worked just worked with user-defined ones.

//EDIT with testing it seems to work with natives too:

function s2u($str){
    echo strtoupper($str);  
}
call_user_func("s2u",'help');
echo "<br />";
echo call_user_func("strtoupper",'help');

Output:

HELP
HELP

Perhaps you need to enclose your variable within quotes, e.g.

$post = $_POST[0];
$x = call_user_func("$post",...);

But I doubt it. Can't see that would make much of a difference.
Also, remember that the various preg_* functions take different parameters at different positions. If your regex is dicey, then I could imagine getting parse issues.

Edited by diafol

0

revised testregex.php:

<?php
    session_start();
    $_SESSION = array();
    try
    {
        if(isset($_POST['regex']) && isset($_POST['string']))
        {
            $_SESSION['result'] = array();
            preg_match($_POST['regex'],$_POST['string'],$_SESSION['result']);
            $_SESSION['regex'] = $_POST['regex'];
            $_SESSION['string'] = $_POST['string'];
            $_SESSION['result'] = "Matched: ".implode(", ",$_SESSION['result']);
        }
    }
    catch(Exception $e)
    {
        $_SESSION['error'] = $e->getMessage();
    }
    header('Location:http://cs4.sunyocc.edu/~j.d.dancks/testphp/enterregex2.php');
?>

Parse error: parse error, unexpected '{' in /u/students/j/j.d.dancks/public_html/testphp/testregex2.php on line 5

So I guess its the try catch or something. IDK. Am I supposed to import a library or soemthing to use try/catch? What version was try/catch released in?

1

I once ran into Felipe Ribeiro's spelling corrector class. It can do spelling suggestions like "did you mean... ?". I think it can be useful in this case. You can find it here. It's a pretty nifty piece of work :).

Edited by minitauros

0

I'm using 4.3.5 I took the try/catch out and it worked. It would be nice if there was another way I could catch the error so it wouldn't crash. Oh well. It works. Anyways thanks again.

Edited by dancks

0

It would be nice if there was another way I could catch the error so it wouldn't crash.

Why are you using such an old version of php? It was released on 26 March 2004. Over eight and a half years ago. :( Seriously serious.

0

I know. And I have a server at home I just need to get up and running but I'm forced to use the school server for assignments.

0

Why not use a portable option? I'm pretty sure you can run XAMPP and EasyPHP off a memory stick. Unless you school doesn't allow sticks! Either that or ask the school IT admin to update php. Using software that old should be a crime. :(

Edited by diafol

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.