Simply put, I need a better query. I have a table that contains 85000 rows, each row has a TITLE and an ABSTRACT (they are academic papers).

What I need is to do a realistic SMART search in my query.
Right now I'm doing

SELECT * FROM PsPapers WHERE Title LIKE '%MySearchString%'

which is, as you can see, a fairly retarded search system. This has several problems:

  1. If the search is "othello essays" then my query only gives you ones where the title contains "othello essays" even though something like "essays on othello" I would also want to be returned because the title is OBVIOUSLY on point.
  2. My query doesn't use the abstracts right now, the reason is that the abstracts contain alot of crap and should be WEIGHTED MUCH LESS than titles. I would want title matches to list HIGHER than abstract matches.
  3. There is no real cleverness or weighting here. The older cgi system I'm trying to replace uses code taken from an actual search engine so I need to at least ATTEMPT to do a smart search.

Any thoughts on a better query?

Recommended Answers

All 2 Replies

Given the search term "othello essays" on the application front end you need to split up the search terms. I think full text indexing offers more options for weighting search results. You may want to take a look at it. Here is one way to help make it better, but only slightly:

IF OBJECT_ID('tempdb..#PsPapers', 'U') IS NOT NULL DROP TABLE #PsPapers
IF OBJECT_ID('tempdb..#Results', 'U') IS NOT NULL DROP TABLE #Results

Create Table #PsPapers
(
  PaperId int identity(1000, 1) PRIMARY KEY,
  Title varchar(100),
  Body varchar(8000)
)



Declare @Search varchar(100), @s1 varchar(100), @s2 varchar(100)
Set @Search = 'othello essays'
Set @s1 = '%othello%'
Set @s2 = '%essays%'

Insert Into #PsPapers (Title, Body) Values ('othello essays', '')
Insert Into #PsPapers (Title, Body) Values ('essays on othello', '')
Insert Into #PsPapers (Title, Body) Values ('essays on XXothello', '')
Insert Into #PsPapers (Title, Body) Values ('notes on othello', '')
Insert Into #PsPapers (Title, Body) Values ('essays on chocolate', '')

Select *, Cast(1 as int) as Relevance
Into #Results
From #PsPapers
Where Title = @Search

Union All

Select *, Cast(2 as int)
From #PsPapers
Where Title Like @s1 and Title Like @s2

Union All

Select *, Cast(3 as int)
From #PsPapers
Where Title Like @s1 

Union All

Select *, Cast(3 as int)
From #PsPapers
Where Title Like @s2

Select PaperId, Title, Body, Min(Relevance) As Relevance
From #Results
Group By PaperId, Title, Body
Order By Relevance Asc

Results in:

PaperId     Title                     Body       Relevance
----------- ------------------------- ---------- -----------
1000        othello essays                       1
1001        essays on othello                    2
1002        essays on XXothello                  2
1003        notes on othello                     3
1004        essays on chocolate                  3

You will need to break the search input string on word boundaries and search for each word and weight the results.

Hi mmxbass, do not forget to make a backup of database before your experiments :) if it is not possible, try the ...table repair application

commented: spammer -1
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.