Hi all,
I am developing a search engine as project in PHP and MySQL as backend . I have implemented crawler , now i am facing problems in searcher . When a user enters any query by default i want to accept it as OR Query i.e. I want to display results containing all keywords in the beginning ( [assuming 4 keywords search] desc order of weight) then after that next results to be displayed would be links containing any of three keywords searched , same in desc order of weights and so on and so forth . till i don't display single keywords records ..
I am tired by thinking logic of this problem , for first displaying all keywords i have solved it but next problem i m unable to think .. can you give some ideas? please help!

Recommended Answers

All 25 Replies

Can you perhaps give an example of what you have, and of what you want ?

If I understand correctly, you just want a way to order your results on relevancy.

pritaeas You got it correctly I will give an example.
Suppose user enters query addition of two numbers in servlet , now as a search engine , I am not considering of in so I am left with 4 keywords.
Addition Two Numbers Servlet
Next step is I will scan my database and find out the results. I have 15 tables, i do MD5 and save keywords , table names are from kwd0 to kwdf along with Keyword id and weight in a link. Keyword is another table which consists of actual Keywords.
Now What i want is !
First run a query to get all sites containing all the 4 searched keywords. So when i find all those sites, I will add all the weights of respective keywords , one with highest weight will be on TOP .
But this is Like Simple AND search . Now As i want to implement OR , next thing i have to consider 3 keywords , any3 keywords in site, find weights, arrange them . Same with 2 keywords , 1 keyword and so on and arrange them.
And then finally my complete result will be there which I want to display to user!! So now How should i solve this problem? can you give me some ideas !! Well number of records will be atleast in 1000s so I want to implement this thing in Optimum way so that it will be fast and giving good results!
Thanks in advance.

Is it not possible to do an OR search with your query, let that calculate the weight in the result for each keyword, and then group it by article and summing the weight. Then you can order by the summed weight descending, without having to run X queries.

What I know from weighted keywords, is that it is not always true that having all keywords will outrank any other (depending on your weighting system).

Could you show me the query for a search on a single table ?

while (($words < count($wordarray)) && $possible_to_find == 1) { // Checking whether //we are still having words left and getting results
$wordmd5 = substr(md5($searchword), 0, 1);

			$query1 = "SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword$wordmd5, ".$mysql_table_prefix."keywords where ".$mysql_table_prefix."link_keyword$wordmd5.keyword_id= ".$mysql_table_prefix."keywords.keyword_id and keyword='$searchword' $domain_qry order by weight desc";
	$result = mysql_query($query1);

			$num_rows = mysql_num_rows($result);

			if ($num_rows == 0) {

				if ($type != "or") {

					$possible_to_find = 0;

					break;

				}

			}
$indx = $words; // FOR AND SEARCH
	while ($row = mysql_fetch_row($result)) {	

				$linklist[$indx]['id'][] = $row[0];

				$domains[$row[0]] = $row[2];

				$linklist[$indx]['weight'][$row[0]] = $row[1];

				
			}

			$words++;
}

$j= 1;

				$min = 0;

				while ($j < $words) {

				if (count($linklist[$min]['id']) > count($linklist[$j]['id'])) {

						$min = $j;

					}

					$j++;

				}

			
				$j = 0;
//ABOVE Code for finding Least number results from a query to build AND search
$temp_array = $linklist[$min]['id'];
$count = 0;

				while ($j < count($temp_array)) {

					$k = 0; //and word counter

					$n = 0; //not word counter

					$o = 0; //phrase word counter

					$weight = 1;

					$break = 0;

					while ($k < $words && $break== 0) {

						if ($linklist[$k]['weight'][$temp_array[$j]] > 0) {

							

							$weight = $weight + $linklist[$k]['weight'][$temp_array[$j]];

							//echo $linklist[$k]['weight'][$temp_array[$j]] ." ". $weight . "<br>";

						} else {

							$break = 1;

						}

						$k++;

					}

					while ($n < $not_words && $break== 0) {

						if ($notlist[$n]['id'][$temp_array[$j]] > 0) {

							$break = 1;

						}

						$n++;

					}				



					while ($o < $phrase_words && $break== 0) {

						if ($phraselist[$n]['id'][$temp_array[$j]] != 1) {

							$break = 1;

						}

						$o++;

					}

					if ($break== 0 && $category > 0 && $category_list[$temp_array[$j]] != 1) {

						$break = 1;

					}



					if ($break == 0) {

						$result_array_full[$temp_array[$j]] = $weight;

						$count ++;

					}

					$j++;

				}

			}

		}


$query1 = "SELECT distinct link_id, url, title, description,  $fulltxt, size FROM ".$mysql_table_prefix."links WHERE link_id in ($inlist)";



		$result = mysql_query($query1);

		echo mysql_error();



		$i = 0;

		while ($row = mysql_fetch_row($result)) {

			$res[$i]['title'] = $row[2];

			$res[$i]['url'] = $row[1];

			if ($row[3] != null && $show_meta_description == 1)

				$res[$i]['fulltxt'] = $row[3];

			else 

				$res[$i]['fulltxt'] = $row[4];

			$res[$i]['size'] = $row[5];

			$res[$i]['weight'] = $result_array[$row[0]];

			$dom_result = mysql_query("select domain from ".$mysql_table_prefix."domains where domain_id='".$domains[$row[0]]."'");

			$dom_row = mysql_fetch_row($dom_result);

			$res[$i]['domain'] = $dom_row[0];

			$i++;

		}







		if ($merge_site_results  && $domain_qry == "") {

			sort_with_domains($res);

		} else {

			usort($res, "cmp"); 	

		}

		echo mysql_error();

		$res['maxweight'] = $maxweight;

		$res['results'] = $results;

		return $res;

Something like this .. i am using for Returning AND results! I know I couldn't paste everything from code.. i tried to post as much was possible and essential ..

You should be able to do an OR search, and then GROUP BY link_id and SUM(weight).

SELECT link_id, SUM(weight) AS weight, domain 
FROM yourtables
WHERE your keywords filter with OR instead of AND
GROUP BY link_id

Perhaps you can try it in PhpMyAdmin (or another tool you use), to see if it gets results. Anything you can move from code to query, will speed things up.

commented: Thank you bro +1

Thank you bro.. I will try it now.. and let you know how it works!! :) thanks a lot!! I will update this post as soon as i feel i got success ( cause you !!! )

This kind of queries I am using currently

SELECT distinct link_id, weight, domain from link_keywordb, keywords where link_keywordb.keyword_id= keywords.keyword_id and keyword='two' order by weight desc

If I try to use

SELECT link_id, SUM(weight) AS weight
FROM  link_keyword3,link_keywordb,keywords
WHERE ((link_keyword3.keyword_id= keywords.keyword_id and keyword='add')and(link_keywordb.keyword_id= keywords.keyword_id and keyword='two')) 
GROUP BY link_id

something like above , i get error of unambiguity . I am totally fed up and unable to think :( ! since past 10 days trying to solve this problem .. If you could help me further I will be really thankful

I am posting schema of my db , so that it can be useful

link_keyword0 to link_keywordf        link_id | keyword_id | weight | domain
links                                 link_id|site_id|url_title
keywords                              keyword_id|keyword

Thanks in advance!

You get that because link_id appears in more tables. So in the SELECT and GROUP BY you need to specify which one you mean, just like you did with keyword_id in the WHERE.

I have one doubt.. now suppose I find a all keywords in same link, then how can I add it here if I use Above queries? will it be possible

Not sure what you mean. The GROUP BY will merge the results to one link_id, just as your DISTINCT did earlier, but it will SUM all the weights for that link_id.

I tried to run many queries but none is working as i want .. I am posting one of example how I tried

SELECT link_id, SUM(weight) AS weight, domain 
FROM link_keyword3,keywords,link_keywordb
WHERE ((SELECT distinct link_id, weight, domain from link_keyword3, keywords where link_keyword3.keyword_id= keywords.keyword_id and keyword='add' order by weight desc) and (SELECT distinct link_id, weight, domain from link_keywordb, keywords where link_keywordb.keyword_id= keywords.keyword_id and keyword='two' order by weight desc ))
GROUP BY link_id

Error
#1052 - Column 'link_id' in field list is ambiguous
I am getting this kind of error ! Please help me to fix it!! I am really in huge trouble and out of mind!! tried a lot :'(

Put the tablename and a dot in front of it. link_keyword3.link_id for example.

Am i doing it right? .. but putting that tablename also not working :'(

Perhaps you forgot one.

What should i write for 1st one.. because what i want is result of

WHERE ((SELECT distinct link_id, weight, domain from link_keyword3, keywords where link_keyword3.keyword_id= keywords.keyword_id and keyword='add' order by weight desc) and (SELECT distinct link_id, weight, domain from link_keywordb, keywords where link_keywordb.keyword_id= keywords.keyword_id and keyword='two' order by weight desc ))

these should be used by select query! now how do that :| i am totally confused!

I want to use result of the queries in WHERE clause in Select query!

Pooh, this is getting difficult to do from the top of my head now. I don't think the WHERE should contain a SELECT like that.

did u understand my schema? so can you show me an example query on that schema .. for this kind of situation? so that i might get a much better idea then :S
I mean try to use AND for that kind of situation .. I shown you queries which i m using currently .. I want to convert them totally into query way .. into single query so that i don't need to do coding as such !

SELECT distinct link_id, weight, domain from link_keyword3, keywords where link_keyword3.keyword_id= keywords.keyword_id and keyword='add' order by weight desc

SELECT distinct link_id, weight, domain from link_keywordb, keywords where link_keywordb.keyword_id= keywords.keyword_id and keyword='two' order by weight desc

these 2 are example queries.. How do i merge them to get results from them as AND directly through a query!
I hope it becomes simpler for you now..

I could use some example data with that, just a small example of what you have (5 records or so). Then I will try tonight.

can you give me your contact info email or something? I will forward you my DATABASE on that.. how about that?

Start experimenting with this.

SELECT *, SUM(weight) AS totalweight
FROM keywords k, link_keyword7 lk7, links l
WHERE k.keyword = 'broadband'
AND lk7.keyword_id = k.keyword_id
AND l.link_id = lk7.link_id
GROUP BY l.link_id
ORDER BY totalweight DESC

Also your link_keyword tables do not have a primary key, or unique index. It will slow things down.

Hi pritaeas I have reached upto something like follows

select link_id,domain,sum(weight) from (
(SELECT distinct link_id, weight, domain from link_keywordb, keywords where link_keywordb.keyword_id= keywords.keyword_id and keyword='two' order by weight desc) 
Union
(select distinct link_id, weight, domain from link_keywordb , keywords where link_keywordb.keyword_id= keywords.keyword_id and keyword='number' order by weight desc)
)
as t1 group by link_id

but it's displaying me same results again and again.. will you a have a look at it and tell what can be issue ?

Those two queries are the same, just use one with both keywords:

... AND (keyword = 'first' OR keyword = 'second) ORDER ...

Nah in this case actually i had both keywords in same table, otherwise if both keywords are in different tables !

for other query

select distinct link_id,sum(t1.weight) as weight , t1.domain from (
(SELECT distinct link_id, weight, domain from link_keyword3, keywords where link_keyword3.keyword_id= keywords.keyword_id and keyword='add' order by weight desc)
Union
(SELECT distinct link_id, weight, domain from link_keyworda, keywords where link_keyworda.keyword_id= keywords.keyword_id and keyword='servlet' order by weight desc))
as t1
group by link_id

I can have this way also
but problem is when i try to execute these type of query in PHPMYADMIN i m like getting 130~ results etc. which are same and repeated on all pages! can you check out if you still have that database which i had mailed u?

I get 140 results from this query, all with a different link_id, so no, they are not the same, and not repeated.

Let's approach it a different way...

SELECT id, keyword, weight ... WHERE keyword = 'aaa' 
UNION ALL 
SELECT id, keyword, weight ... WHERE keyword = 'bbb'

That gives you the information, now to further digest it.

SELECT id, COUNT(*) as ct, SUM(weight) as tot_weight 
FROM ( <the above query> ) as x 
GROUP BY id

The rows with ct=2 will have both keywords, and tot_weight will be the value you need. So...

SELECT id, tot_weight 
FROM ( <the second query> ) as y 
WHERE ct = 2

Then JOIN that to whatever (perhaps even a self-join) to get the rest of the fields.

SELECT id, tot_weight 
FROM ( <the second query> ) as y 
JOIN ... ON ... 
WHERE ct = 2 
ORDER BY tot_weight DESC 
LIMIT 5;

You could do as I describe it (subqueries) or each SELECT could be put into a temp table for use by the next step.

This solution i got it from MYSQL forums by Rick James . This helped and works as i wanted :) It is able to perform both AND and OR search :) Anyways pritaeas you also helped me a lot :) thanks
My problem is solved!! Just implementation is left ;)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.