I am in the process of developing a web application with a "question - answer" session in PHP and MySQL. This is more like a forum, where the user asks questions and a dedicated expert answers the questions. I am trying to implement the following feature:

  • When a user asks a new question, the system checks the database to see if a similar question already exists.
  • If it finds a similar question, it shows a link to that (or those) question(s).
  • If not, the new question is inserted into the database.

I have used several methods to compare the string, out of which, the last thing is as follows:

The 'questions' table

CREATE TABLE IF NOT EXISTS `questions` (
  `q_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `question` text NOT NULL,
  `cat_id` int(10) unsigned NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`q_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;

The php codes:

$sql = "SELECT * FROM questions";
if($rs = mysqli_query($link, $sql)){
    $str = $_POST['question'];// The user submitted question
    $q_ids = "";
    while($row=mysqli_fetch_array($rs)){
        if(levenshtein($row['question'],$str)<=10){
            //echo $row['q_id']."-".$row['question']."---".levenshtein($row['question'],$str)."<br />";
            $q_ids.=$row['q_id'].",";
        }
    }
}

But i failed to tell the system

  1. to differentiate between "2+2=?" and "2X2=?"
  2. NOT to Differentiate between "SUM OF 2 and 3" AND "WHAT IS 2+3=?"

I thank anyone who could guide me reach my destination.

Member Avatar for diafol

Are you serious about the examples you give? Why not ask question posters to use keyword tags? You can then match up the question with others having the same (or very similar) tags.

commented: Yes Diafol, that is an option. I just want to make the duplication check for the questions more water tight. +0

Never done it before, but could bayesian filtering do something like this for you? Basically you count the occurrences of tokens in a sentence and based on that you have a strong probability that two questions are or are not the same, or similar. You would probably need to make a hash map of some kind to get it working right.

commented: Thanks overwraith, but that did not solve my problem either. But like Diafol, you lead me a step further. Thank youme a +0
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.