0

How to speed up the Group By Clause for a large 3GB database.

I am using Group by clause for a large database having 148 columns and 5 million rows with approx 3GB of size.
We need to apply the Group by clause on approx 1,00,000 rows at a time without using LIMIT.
We can’t use LIMIT as we need all of the entries from a category to be show in the filters section.

We have a Dedicated Linux server with 4GB RAM and latest Configuration with 2 processors.

I tried all different my.cnf configuration settings to optimize the mysql speed but nothing works.

Here is Query that I am using to fetch the data:

SELECT e.product_id,
 e.name,
 e.description,
 e.manufacturer,
 e.imageurl,
 e.warranty,
 e.colour,
 e.collection,
 e.saleprice,
 e.price,
 e.ages,
 e.size,
 e.size_h,
 e.size_w,
 e.size_d,
 e.size_unit,
 e.wifi_ready,			 
 e.bundled_deals_packages,
 e.service_provider,
 e.how_many_seats,
 e.characters, 
 e.publishercategory,
 e.clean_modelno
 
 MAX(price) as max_price,
 MIN(price) as min_price,
 count(distinct(advertiserid)) as total
 FROM elec_products as e
 
 WHERE status = 1
 AND (subcategory2 = 3115) 
 GROUP BY clean_modelno, publishercategory
 ORDER BY total DESC

I have index on following columns


  • product_id PRIMARY KEY
  • Group_by(clean_modelno, publishercategory) BTREE
  • subcategory1 BTREE
  • subcategory2 BTREE
  • subcategory3 BTREE
  • subcategory4 BTREE
  • subcategory5 BTREE
  • status BTREE

Table Type is "MyISAM".


All major My.cnf configurations:

  • skip-locking
  • key_buffer_size = 512M
  • max_allowed_packet = 128M
  • table_open_cache = 512
  • sort_buffer_size = 128M
  • read_buffer_size = 128M
  • read_rnd_buffer_size = 128M
  • myisam_sort_buffer_size = 128M
  • thread_cache_size = 8
  • query_cache_size = 128M
  • max_heap_table_size=256M
  • tmp_table_size=256M
  • join_buffer_size = 2M

I can see lots of other similar Price Comparison website which has excellent pageload speed.
Please help me out from this and let me know if I am missing anything.

2
Contributors
6
Replies
7
Views
6 Years
Discussion Span
Last Post by smantscheff
0

How can you have an index on "Group_by(clean_modelno, publishercategory) BTREE[/*]" - an index on a group by clause? This is possible?
What are the results of "EXPLAIN your_query" ?

0

Thank you for taking part in this.

That is not actually index on a group by clause but we need to make group together on both columns as per their priority in GroupBy clause.

Here is Explain of query:
id: 1
select_type: SIMPLE
table: e
type: ref
possible_keys: subcategory2,status
key: subcategory2
key_len: 2
ref: const
rows: 92713
Extra: Using where; Using temporary; Using filesort

I am getting the expected results that i want but it takes huge time to make group of 100000 products by their clean_modelno, publishercategory


You know code green, You can compare this websites with other similar websites like Pricegrabber[dot]com OR kelkoo[dot]co[dot] uk
We need update the database almost everyday so it is not possible to create the another table for each category specially when we have 1500 categories.
It take approx. 2-3 minutes to fetch data from database if that particular category contain 100000 row which would be consider as huge pageload time and visitor will then close the browser and wont come again.

Here is Index structure:
[IMG]http://phpganesha.powweb.com/scud/r_forums/index.gif[/IMG]


Any recommendations will be appreciated.

0

I never dived into table partitioning. But as I understand it, with partitions you split tables so that they reside on several machines, and you set the condition on which the data are split. Thus by partitioning on product categories it might be possible to spread the workload to several machines while keeping the higher level database structure intact.

Edited by smantscheff: n/a

0

On second thoughts your query seems to be flawed.
You are mixing grouped with non-grouped fields. The result of the non-grouped fields is indeterminate. See http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html There is says:

When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.

What do you expect e.g. as the result for product_id in the same row with min(price) and max(price)? Your query assumes that all fields except the grouped fields have the same content, which is obviously not true. So I wonder how you get the expected results with this query.
Consider this example which will show the abitrarily chosen product_ids:

drop table if exists products;
create table products (
	id integer primary key auto_increment,
	price integer,
	category integer,
	model_no integer
);

insert into products values 
(NULL,1,1,1),
(NULL,1,2,2),
(NULL,1,3,3),
(NULL,2,1,4),
(NULL,2,2,1),
(NULL,2,3,2),
(NULL,3,1,3),
(NULL,3,2,4),
(NULL,3,3,1),
(NULL,1,1,2),
(NULL,1,2,3),
(NULL,1,3,4),
(NULL,2,1,1),
(NULL,2,2,2),
(NULL,2,3,3),
(NULL,3,1,4),
(NULL,3,2,1),
(NULL,3,3,2),
(NULL,1,1,3),
(NULL,1,2,4),
(NULL,1,3,1),
(NULL,2,1,2),
(NULL,2,2,3),
(NULL,2,3,4),
(NULL,3,1,1),
(NULL,3,2,2),
(NULL,3,3,3);

select id,min(price) as mip, max(price) as map, model_no, category from products where category = 1 group by model_no;

Try first to retrieve the minimum/maximum price for each category (use only aggregate functions), then retrieve in a second query the products which match those prices.

0

It still really takes 16seconds to Group 90000 rows if i remove MAX, MIN, COUNT and ORDER BY.


May be i need to change something with the INDEX and Database.

Here is Result for "SHOW CREATE TABLE elec_products;"

CREATE TABLE `elec_products` (
 `product_id` int(11) NOT NULL AUTO_INCREMENT,
 `programname` varchar(100) CHARACTER SET latin1 NOT NULL,
 `advertiserid` smallint(6) NOT NULL,
 `programurl` text CHARACTER SET latin1 NOT NULL,
 `catalogname` text CHARACTER SET latin1 NOT NULL,
 `advertiserstatus` varchar(300) CHARACTER SET latin1 NOT NULL,
 `productstatus` int(2) NOT NULL,
 `lastupdated` varchar(20) CHARACTER SET latin1 NOT NULL,
 `name` varchar(300) CHARACTER SET latin1 NOT NULL,
 `productid` varchar(200) CHARACTER SET latin1 NOT NULL,
 `advertiserproductid` int(50) NOT NULL DEFAULT '0',
 `keywords` varchar(300) CHARACTER SET latin1 NOT NULL,
 `description` varchar(2000) CHARACTER SET latin1 NOT NULL,
 `specifications` varchar(5000) CHARACTER SET latin1 NOT NULL,
 `ourproductdescription` varchar(5000) CHARACTER SET latin1 NOT NULL,
 `bestproductdescription` text CHARACTER SET latin1 NOT NULL,
 `language` varchar(5) CHARACTER SET latin1 NOT NULL DEFAULT 'EN',
 `manufacturer` varchar(160) CHARACTER SET latin1 NOT NULL,
 `manufacturerid` varchar(64) CHARACTER SET latin1 NOT NULL,
 `modelno` varchar(20) CHARACTER SET latin1 NOT NULL,
 `buyurl` varchar(500) CHARACTER SET latin1 NOT NULL,
 `impressionurl` varchar(500) CHARACTER SET latin1 NOT NULL,
 `imageurl` varchar(200) NOT NULL,
 `thumburl` varchar(500) CHARACTER SET latin1 NOT NULL,
 `mpn` varchar(20) CHARACTER SET latin1 NOT NULL,
 `upc` varchar(100) CHARACTER SET latin1 NOT NULL,
 `isbn` varchar(64) CHARACTER SET latin1 NOT NULL,
 `ean` varchar(100) CHARACTER SET latin1 NOT NULL,
 `author` varchar(200) CHARACTER SET latin1 NOT NULL,
 `artist` varchar(200) CHARACTER SET latin1 NOT NULL,
 `title` varchar(200) CHARACTER SET latin1 NOT NULL,
 `publisher` varchar(200) CHARACTER SET latin1 NOT NULL,
 `label` varchar(200) CHARACTER SET latin1 NOT NULL,
 `format` varchar(200) CHARACTER SET latin1 NOT NULL,
 `warranty` varchar(200) CHARACTER SET latin1 NOT NULL,
 `moneybackexclusion` varchar(200) CHARACTER SET latin1 NOT NULL,
 `condition1` varchar(200) CHARACTER SET latin1 NOT NULL,
 `producttype` varchar(200) CHARACTER SET latin1 NOT NULL DEFAULT '1',
 `adultcontent` varchar(200) CHARACTER SET latin1 NOT NULL,
 `buyingguidename` varchar(200) CHARACTER SET latin1 NOT NULL,
 `buyingguideurl` varchar(200) CHARACTER SET latin1 NOT NULL,
 `colour` varchar(200) CHARACTER SET latin1 NOT NULL,
 `dupcolour` varchar(100) CHARACTER SET latin1 NOT NULL,
 `style` varchar(200) CHARACTER SET latin1 NOT NULL,
 `collection` varchar(200) CHARACTER SET latin1 NOT NULL,
 `parentproductid` varchar(10) CHARACTER SET latin1 NOT NULL,
 `categoryname` varchar(100) CHARACTER SET latin1 NOT NULL,
 `categoryid` varchar(11) CHARACTER SET latin1 NOT NULL,
 `advertisercategory` varchar(300) CHARACTER SET latin1 NOT NULL,
 `categorylist` varchar(200) CHARACTER SET latin1 NOT NULL,
 `thirdpartyid` varchar(64) CHARACTER SET latin1 NOT NULL,
 `thirdpartycategory` varchar(300) CHARACTER SET latin1 NOT NULL,
 `publisherid` int(11) NOT NULL,
 `publishercategory` smallint(6) NOT NULL,
 `publishercategory_name` varchar(200) CHARACTER SET latin1 NOT NULL,
 `subcategory1` smallint(6) NOT NULL,
 `subcategory1_name` varchar(200) CHARACTER SET latin1 NOT NULL,
 `subcategory2` smallint(6) NOT NULL,
 `subcategory2_name` varchar(200) CHARACTER SET latin1 NOT NULL,
 `subcategory3` smallint(6) NOT NULL,
 `subcategory3_name` varchar(200) CHARACTER SET latin1 NOT NULL,
 `subcategory4` smallint(6) NOT NULL,
 `subcategory4_name` varchar(200) CHARACTER SET latin1 NOT NULL,
 `subcategory5` smallint(6) NOT NULL,
 `subcategory5_name` varchar(200) CHARACTER SET latin1 NOT NULL,
 `commissiongroup` varchar(200) CHARACTER SET latin1 NOT NULL,
 `commission_amount` varchar(200) CHARACTER SET latin1 NOT NULL,
 `promotionaltext` varchar(300) CHARACTER SET latin1 NOT NULL,
 `preorder` varchar(200) CHARACTER SET latin1 NOT NULL,
 `hotpick` varchar(200) CHARACTER SET latin1 NOT NULL,
 `new` varchar(200) CHARACTER SET latin1 NOT NULL,
 `special` varchar(200) CHARACTER SET latin1 NOT NULL,
 `gift` varchar(200) CHARACTER SET latin1 NOT NULL,
 `startdate` varchar(30) CHARACTER SET latin1 NOT NULL,
 `enddate` varchar(30) CHARACTER SET latin1 NOT NULL,
 `offline` varchar(3) CHARACTER SET latin1 NOT NULL,
 `online` varchar(3) CHARACTER SET latin1 NOT NULL,
 `instock` varchar(3) CHARACTER SET latin1 NOT NULL,
 `stockquantity` int(11) NOT NULL,
 `deliveryavailble` varchar(10) CHARACTER SET latin1 NOT NULL,
 `deliveryDetails` varchar(500) CHARACTER SET latin1 NOT NULL,
 `deliverytime` varchar(50) CHARACTER SET latin1 NOT NULL,
 `storepickupavailable` varchar(10) CHARACTER SET latin1 NOT NULL,
 `storepickupdetails` varchar(50) CHARACTER SET latin1 NOT NULL,
 `internetonly` varchar(10) CHARACTER SET latin1 NOT NULL,
 `creditoffer` varchar(10) CHARACTER SET latin1 NOT NULL,
 `isforsale` varchar(5) CHARACTER SET latin1 NOT NULL,
 `currency` varchar(50) CHARACTER SET latin1 NOT NULL,
 `saleprice` double(10,2) NOT NULL,
 `price` float(8,2) NOT NULL,
 `retailprice` double(10,2) NOT NULL,
 `storeprice` double(10,2) NOT NULL,
 `fromprice` varchar(3) CHARACTER SET latin1 NOT NULL,
 `vat` varchar(10) CHARACTER SET latin1 NOT NULL,
 `priceexvat` varchar(10) CHARACTER SET latin1 NOT NULL,
 `priceincvat` varchar(10) CHARACTER SET latin1 NOT NULL,
 `pricevaliduntil` varchar(100) CHARACTER SET latin1 NOT NULL,
 `standardshippingcost` double(10,2) NOT NULL,
 `ship2mecost` varchar(10) CHARACTER SET latin1 NOT NULL,
 `totaldeliverycost` varchar(10) CHARACTER SET latin1 NOT NULL,
 `totalproductcost` varchar(10) CHARACTER SET latin1 NOT NULL,
 `retailerdescription` varchar(500) CHARACTER SET latin1 NOT NULL,
 `localoronline` varchar(10) CHARACTER SET latin1 NOT NULL,
 `vatinfo` int(3) NOT NULL,
 `vatrate` varchar(10) CHARACTER SET latin1 NOT NULL,
 `shippingwhere` varchar(100) CHARACTER SET latin1 NOT NULL,
 `shippinginfo` varchar(100) CHARACTER SET latin1 NOT NULL,
 `freeshipping` varchar(200) CHARACTER SET latin1 NOT NULL,
 `retailerreviews` varchar(250) CHARACTER SET latin1 NOT NULL,
 `retailerrating` int(11) NOT NULL,
 `expertreviews` varchar(200) CHARACTER SET latin1 NOT NULL,
 `expertratings` int(6) NOT NULL,
 `videoreview` varchar(200) CHARACTER SET latin1 NOT NULL,
 `videoreviewtext` varchar(500) CHARACTER SET latin1 NOT NULL,
 `productreviews` varchar(250) CHARACTER SET latin1 NOT NULL,
 `productrating` int(11) NOT NULL,
 `thirdpartyuserreviews` varchar(20) CHARACTER SET latin1 NOT NULL,
 `thirdpartyuserratings` varchar(20) CHARACTER SET latin1 NOT NULL,
 `numberofreviews` varchar(20) CHARACTER SET latin1 NOT NULL,
 `overallcustomerrating` varchar(20) CHARACTER SET latin1 NOT NULL,
 `filters` smallint(1) NOT NULL,
 `ages` varchar(20) CHARACTER SET latin1 NOT NULL,
 `size` varchar(500) CHARACTER SET latin1 NOT NULL,
 `size_h` varchar(50) CHARACTER SET latin1 NOT NULL,
 `size_w` varchar(50) CHARACTER SET latin1 NOT NULL,
 `size_d` varchar(50) CHARACTER SET latin1 NOT NULL,
 `size_unit` varchar(50) CHARACTER SET latin1 NOT NULL,
 `wifi_ready` varchar(200) CHARACTER SET latin1 NOT NULL,
 `bundled_deals_packages` varchar(10) CHARACTER SET latin1 NOT NULL,
 `service_provider` varchar(500) CHARACTER SET latin1 NOT NULL,
 `how_many_seats` varchar(100) CHARACTER SET latin1 NOT NULL,
 `characters` varchar(500) CHARACTER SET latin1 NOT NULL,
 `popular` int(2) NOT NULL DEFAULT '0',
 `match_modelno` int(11) NOT NULL,
 `aff_name` varchar(100) CHARACTER SET latin1 NOT NULL,
 `clean_modelno` varchar(200) CHARACTER SET latin1 NOT NULL,
 `pa_time_stamp` date NOT NULL,
 `status` smallint(1) NOT NULL,
 `14_july_modelno` varchar(100) CHARACTER SET latin1 NOT NULL,
 `datentry_modelno` varchar(200) CHARACTER SET latin1 NOT NULL,
 `new_old` varchar(100) CHARACTER SET latin1 NOT NULL,
 `last_cat_update` varchar(100) CHARACTER SET latin1 NOT NULL,
 `clean_modelno_dataentry` varchar(100) CHARACTER SET latin1 NOT NULL,
 `pa_categorization` smallint(1) NOT NULL,
 `seo_title` varchar(500) NOT NULL,
 `seo_desc` varchar(500) NOT NULL,
 PRIMARY KEY (`product_id`),
 KEY `subcategory1` (`subcategory1`),
 KEY `subcategory2` (`subcategory2`),
 KEY `subcategory3` (`subcategory3`),
 KEY `subcategory4` (`subcategory4`),
 KEY `subcategory5` (`subcategory5`),
 KEY `price` (`price`),
 KEY `status` (`status`),
 KEY `advertiserid` (`advertiserid`),
 KEY `pa_categorization` (`pa_categorization`),
 KEY `pa_time_stamp` (`pa_time_stamp`),
 KEY `Group_by` (`clean_modelno`,`publishercategory`),
 FULLTEXT KEY `name` (`name`,`description`,`keywords`)
) ENGINE=MyISAM AUTO_INCREMENT=3689200 DEFAULT CHARSET=utf8

Result for EXPLAIN SELECT...

id:	1
select_type:	SIMPLE
table:	e
type:	ref
possible_keys:	subcategory2,status
key:	subcategory2
key_len:	2
ref:	const
rows:	92713
Extra:Using where; Using filesort

Result for "SHOW TABLE STATUS LIKE 'elec_products'":

Name: 	elec_products
Engine: 	MyISAM
Version: 	10
Row_format: 	Dynamic
Rows: 	3689199
Avg_row_length: 	838
Data_length: 	3126241800
Max_data_length: 	2.81475E+14
Index_length: 	1209323520
Data_free: 	0
Auto_increment: 	3689200
Create_time :	12/30/2010 17:54
Update_time :	12/31/2010 17:32
Check_time: 	12/30/2010 18:01
Collation: 	utf8_general_ci
Checksum :	NULL
Create_options: 	
Comment	:

Please help me to findout the bug.

0

Well, I'm not sure if there is a bug. Big queries take some time. I fear that eventually you will have to implement some cache mechanism.
How does the response time change if you select only the ID and the grouped fields? I don't know how MySQL handles this, but I assume there is a lot of data copying going on in the memory when you select all those fields.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.