0

I am in the process of developing a web application with a "question - answer" session in PHP and MySQL. This is more like a forum, where the user asks questions and a dedicated expert answers the questions. I am trying to implement the following feature:

  • When a user asks a new question, the system checks the database to see if a similar question already exists.
  • If it finds a similar question, it shows a link to that (or those) question(s).
  • If not, the new question is inserted into the database.

I have used several methods to compare the string, out of which, the last thing is as follows:

The 'questions' table

CREATE TABLE IF NOT EXISTS `questions` (
  `q_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `question` text NOT NULL,
  `cat_id` int(10) unsigned NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`q_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;

The php codes:

$sql = "SELECT * FROM questions";
if($rs = mysqli_query($link, $sql)){
    $str = $_POST['question'];// The user submitted question
    $q_ids = "";
    while($row=mysqli_fetch_array($rs)){
        if(levenshtein($row['question'],$str)<=10){
            //echo $row['q_id']."-".$row['question']."---".levenshtein($row['question'],$str)."<br />";
            $q_ids.=$row['q_id'].",";
        }
    }
}

But i failed to tell the system

  1. to differentiate between "2+2=?" and "2X2=?"
  2. NOT to Differentiate between "SUM OF 2 and 3" AND "WHAT IS 2+3=?"

I thank anyone who could guide me reach my destination.

3
Contributors
2
Replies
23
Views
1 Year
Discussion Span
Last Post by overwraith
1

Are you serious about the examples you give? Why not ask question posters to use keyword tags? You can then match up the question with others having the same (or very similar) tags.

Votes + Comments
Yes Diafol, that is an option. I just want to make the duplication check for the questions more water tight.
1

Never done it before, but could bayesian filtering do something like this for you? Basically you count the occurrences of tokens in a sentence and based on that you have a strong probability that two questions are or are not the same, or similar. You would probably need to make a hash map of some kind to get it working right.

Votes + Comments
Thanks overwraith, but that did not solve my problem either. But like Diafol, you lead me a step further. Thank youme a
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.