The project we're working on is a document retrieval system
I have 3 tables as follows. Table 1 contains words from a example doc. Table 2 contains words from documents in a database. Table3 contain only one column ‘DocID’.
I would like to search the database to find those docs similar to the example doc by calculating similarity score between the example doc and each doc in the database. Specifically, the simi score is calculated by adding up the sum of word frequencies of all matched word. Additionally, I am only interested in the those documents whose DocID appear in table3.
Table 1 Word Freq Book 2 Desk 3 Pen 3 Board 3 Table2 DocID Word Freq 1 Book 3 1 English 2 1 Math 1 2 Desk 2 2 Machine 5 2 Power 2 3 Desk 3 3 Teacher 3 3 Class 2 4 Building 1 4 Tower 2 Table3 DocID 1 2
I am not sure the following statement is correct or not
SELECT DocID, table1.Freq+ table2.Freq FROM table1, table2, table3 WHERE table1.Word = table2.Word AND table2.DocID = table3.DocID;