Hi All!

I'm setting up a very small DB to be used for customer based product reviews. There are two tables:

product_tbl: 
id int unsigned auto_increment 
reviews int unsigned // number of times product reviewed
product varchar(45)
review_tbl: 
id unsigned int 
product_id int unsigned // id from product_tbl 
price int 
quality int 
features int 
email varchar(45) 
pros longtext 
cons longtext

price, quality, features are ratings based on a scale of 1-10 - for each int (price, quality, features) I provide a clickable array of 10 dhtml/css stars that update a hidden field and that field gets stored in its relative db field).

What I want to do is query based on the highest ranked product and ranking is determined by number of average of all values in price, quality, features.

So if toasters A & B each have 'x' reviews (in this example its three):
Toaster A: Price=1,5,7 Quality=3,4,6 Features=1,1,1
Toaster B: Price=3,3,3 Quality=6,7,8 Features=5,8,2

Then Toaster B would be the higer rated because:
Toaster A = (4, 4, 1)/3 = 3
Toaster B = (3, 7, 5)/3 = 5

How's the best way to construct this query in SQL knowing that I may have 100's of toasters and 10's of thousands of reviews. Some toasters (top 5 brands) may get 80% of the reviews while the remaining 20% of the reviews may be spread across the remaining toasters.

I would appreciate any suggestions!

Pete

Recommended Answers

All 10 Replies

The AVG function will be valuable to you. Basically, you are after an average of the average of your three rating values, so your query will look something like this:

select product_id, avg((price + quality + features)/3) as average_rating from review_tbl group by product_id

You can join with the other table as necessary, or limit your result set to a list of products etc. Have a fiddle with that query and see what you can come up with.

The AVG function will be valuable to you. Basically, you are after an average of the average of your three rating values, so your query will look something like this:

select product_id, avg((price + quality + features)/3) as average_rating from review_tbl group by product_id

You can join with the other table as necessary, or limit your result set to a list of products etc. Have a fiddle with that query and see what you can come up with.

OK, that makes some sense...

in raw terms this seems to mostly work:

select product, avg((price + quality + features)/3) as average_rating from review_tbl GROUP product_id ORDER BY average_rating

The issue is the over all rating would be affected by the 'count' in the product_tbl or is MySQL going to average together all of the fields for all of the records with the same product_id?

Also, I was playing with it a little further and was getting a little hung up trying to actually get back only the information I need...

I think, among other things my JOINS are all messed up.

For the display page I want to rank the toasters by: overall ranking based on average of price/quality/features but it wouldnt be fair to have a product with one review of all 10's outranking a product with 1000's of reviews.

Also, on the initial ranking page I really only need: product_tbl.product, average_rating, review_tbl.review_date pros[45] cons[45]. If the user wants more details they can click on the product and get the average of each of the individual rating fields and a history of pros/cons.

So, in typing may way through this I think I confused myself a little more... my query is:

select product, review_date, avg((price + quality + features)/3) as average_rating from review_tbl GROUP product_id ORDER BY average_rating
product_tbl: 
id INT UNSIGNED AUTO_INCREMENT reviews INT UNSIGNED // number of times product reviewedproduct VARCHAR(45)product_tbl: 
id int unsigned auto_increment 
reviews int unsigned // number of times product reviewed
product varchar(45)
review_tbl: 
id UNSIGNED INT product_id INT UNSIGNED // id FROM product_tbl price INT quality INT features INT email VARCHAR(45) pros LONGTEXT cons LONGTEXTreview_tbl: 
id unsigned int 
product_id int unsigned // id from product_tbl 
price int 
quality int 
features int
[b]review_date timestamp[/b] 
email varchar(45) 
pros longtext 
cons longtext
SELECT *, AVG((price + quality + features) / 3) 
AS average_rating
FROM product_tbl, review_tbl
WHERE product_tbl.id = review_tbl.product_id
GROUP BY review_tbl.product_id
ORDER BY average_rating
LIMIT 0, 30

ALMOST!!!
This returns virtually everything I need EXCEPT:
it returns the first pro/con entered and not the last AND it does NOT order by average_rating but is by date_reviewed

Thanks to Peter Brawley I did get the following to semi-work...

SELECT 
  p.product, 
  r.pros, 
  r.cons,
  a.AVG, 
  a.Latest
FROM product_tbl AS p
JOIN (
  SELECT 
    product, AVG((price+quality+features)/3) AS AVG,
    MAX(review_date) AS Latest
  FROM review_tbl
  GROUP BY product
) AS a ON p.id=a.product
JOIN review_tbl AS r ON p.id=r.product
ORDER BY a.AVG DESC;

OK, I did get this to run and its nearly PERFECT...

What I got back was:

Toaster A, 4.95336667, 2010-01-11 11:40:45
Toaster A, 4.95336667, 2010-01-11 11:40:45
Toaster A, 4.95336667, 2010-01-11 11:40:45
Toaster A, 4.95336667, 2010-01-11 11:40:45
Toaster B, 4.92860000, 2010-01-11 11:40:16
Toaster B, 4.92860000, 2010-01-11 11:40:16
Toaster B, 4.92860000, 2010-01-11 11:40:16
Toaster B, 4.92860000, 2010-01-11 11:40:16

When what I really should have gotten was:
Toaster A, 4.95336667, 2010-01-11 11:40:45
Toaster B, 4.92860000, 2010-01-11 11:40:16

How do I fix that?

The problem here is that your pros and cons, there can be one of each for each and every record, and because they are text they will rarely be the same as another review.

I would do this in two queries, the first returning the average rating and latest review date, the second returning the list of pros and cons, and then deal with the entire set of data as you need.

What I dont get is that I get 4 responses for each when there are only 5 records in the table...

And its that duplicate query I was trying to avoid... LOL

You could try the DISTINCT key word, but I'm not sure if it will help because you are selecting the columns pros and cons which will undoubtedly not be unique.

SELECT DISTINCT
      p.product,
      r.pros,
      r.cons,
      a.AVG,
      a.Latest
      FROM product_tbl AS p
      JOIN (
      SELECT
      product, AVG((price+quality+features)/3) AS AVG,
      MAX(review_date) AS Latest
      FROM review_tbl
      GROUP BY product
      ) AS a ON p.id=a.product
      JOIN review_tbl AS r ON p.id=r.product
      ORDER BY a.AVG DESC;

You could try the DISTINCT key word, but I'm not sure if it will help because you are selecting the columns pros and cons which will undoubtedly not be unique.

yeah, I had already tried that... got no where.

I also tried:

SELECT DISTINCT
p.product,
a.PROS,
a.CONS,
a.AVG,
a.Latest
FROM product_tbl AS p
JOIN (
      SELECT
      product, AVG((price+quality+features)/3) AS AVG,
      pros AS PROS,
      cons AS CONS,
      MAX(review_date) AS Latest
      FROM review_tbl
      GROUP BY product
) AS a ON p.id=a.product
JOIN review_tbl AS r ON p.id=r.product
ORDER BY a.AVG DESC;

That got me closer, it returned:

Toaster B, test 1, test 1, 5, 2010-01-11 11:40:16
Toaster A, test 1, test 1, 3, 2010-01-11 11:40:45

My tables are defined as:

CREATE TABLE `product` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`count` int(10) unsigned NOT NULL,
`since` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`product` varchar(45) COLLATE latin1_general_ci NOT NULL
PRIMARY KEY(`id`)
)ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci

CREATE TABLE `review` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`product` int(10) unsigned NOT NULL AUTO_INCREMENT,
`price` int(10) unsigned NOT NULL,
`quality` int(10) unsigned NOT NULL,
`features` int(10) unsigned NOT NULL,
`submitted` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`pros` longtext COLLATE latin1_general_ci NOT NULL
`cons` longtext COLLATE latin1_general_ci NOT NULL
PRIMARY KEY(`id`)
)ENGINE=MyISAM AUTO_INCREMENT=6 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci

The product table contains:

id  count  since                 product
1   0      2011-01-07 10:05:03   Toaster A
2   0      2011-01-07 10:08:03   Toaster B

The review table contains:

id  product  price  quality  features  review_date   pros     cons  
1   1        3      1        3         11:38:55      test 1   test 1
2   2        2      2        2         11:39:25      test 2   test 2
3   1        3      8        7         11:39:51      test 3   test 3
4   2        3      3        3         11:40:16      test 4   test 4
5   1        5      9        6         11:40:45      test 5   test 5

What I should be getting back (or what I want to get back) is this:

Toaster B, test 4, test 4, 5, 2010-01-11 11:40:16
Toaster A, test 5, test 5, 3, 2010-01-11 11:40:45

Try ordering your sub-query by review_date desc. This will then return the last row first and should be picked up by your outer query.

SELECT 
  p.product,
  a.AVG, 
  a.Latest,
  r.pros,
  r.cons
FROM products AS p
JOIN (
  SELECT 
    product, AVG((price+quality+features)/3) AS AVG,
    MAX(submitted) AS Latest
  FROM reviews
  GROUP BY product
) AS a ON p.id=a.product
JOIN reviews AS r ON p.id=r.product AND r.submitted=a.latest
ORDER BY a.AVG DESC;

Works perfectly... and by this time next year I hope to actually understand more than 50% of this! LOL

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.