Hi,
I am working in a finance project where each transaction table is having more than 4 Million records just for 3 months. I have user & branch related information on other tables.

I wrote a joined query by joining around 5 tables to summarize the report based on Branch, Time & type of transaction. But when I tried to run the query it is taking quite long time & sometime results in no response. timeout happens in servlet some time. I am using index for all necessary columns.

I am using JSP/servlet concept. And I am using Amazon EC2 small instance for running the jboss server.


I am really struggling with this for more than 2 weeks. Is there any way to improve the query performance?

I was advised to read the database into memory. But I do not know how to achieve this. Does anybody have an idea how to achieve his in java? Please help me.

Regards,
pani

Recommended Answers

All 7 Replies

Did you EXPLAIN your query? What does it say?

Here is my explain select query response. I am new to join queries. I will be thankful if you can suggest solutions on this.

mysql> explain select L1.prov, ((100 * count(S.curprincipal))/(select count(*) from tblsendout where S.dtfiled between '2008-01-01 00:00' and '2008-01-30 23:59')), ROUND(avg(S.curprincipal)), ROUND(max(S.curprincipal)), ROUND(min(S.curprincipal)), ROUND(max(S.curprincipal)-min(S.curprincipal)), ROUND(sum(S.curprincipal)) as dispValue , count(*) from tblsendout S, tblpayout P, tblbranch B1, tblbranch B2, branch_locations L1, branch_locations L2, tbldistance D where S.objid=P.strsendoutid and S.strbranchid=B1.objid and P.strbranchid=B2.objid and B1.parentid=L1.id and B2.parentid=L2.id and L1.id=D.sourcebranch and L2.id=D.destinationbranch and S.dtfiled between '2008-01-01 00:00' and '2008-01-30 23:59' group by L1.prov order by dispValue desc limit 0, 10;
+----+--------------------+------------+--------+----------- -----------------------------------------------------+------ ------------------+---------+------------------------------- --------------------------+---------+----------------------- ----------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+--------+----------- -----------------------------------------------------+------ ------------------+---------+------------------------------- --------------------------+---------+----------------------- ----------+
| 1 | PRIMARY | B2 | ALL | PRIMARY,parentid | NULL | NULL | NULL | 1116 | Using temporary; Using filesort |
| 1 | PRIMARY | L2 | eq_ref | PRIMARY | PRIMARY | 4 | mlkp_statistics.B2.parentid | 1 | Using index |
| 1 | PRIMARY | P | ref | strsendoutid,idx_tblpayout_branchid,strbranchid | idx_tblpayout_branchid | 5 | mlkp_statistics.B2.objid | 3849 | Using where |
| 1 | PRIMARY | S | eq_ref | PRIMARY,idx_tblsendout_dtfiled,idx_tblsendout_branchid,ix_db us | PRIMARY | 8 | mlkp_statistics.P.strsendoutid | 1 | Using where |
| 1 | PRIMARY | B1 | eq_ref | PRIMARY,parentid | PRIMARY | 4 | mlkp_statistics.S.strbranchid | 1 | |
| 1 | PRIMARY | D | eq_ref | PRIMARY,sourcebranch,destinationbranch | PRIMARY | 8 | mlkp_statistics.B1.parentid,mlkp_statistics.B2.parentid | 1 | Using index |
| 1 | PRIMARY | L1 | eq_ref | PRIMARY | PRIMARY | 4 | mlkp_statistics.B1.parentid | 1 | |
| 2 | DEPENDENT SUBQUERY | tblsendout | index | NULL | idx_tblsendout_state | 3 | NULL | 4140800 | Using where; Using index |
+----+--------------------+------------+--------+----------- -----------------------------------------------------+------ ------------------+---------+------------------------------- --------------------------+---------+----------------------- ----------+
8 rows in set (0.36 sec)
select count(*) from tblsendout 
         where S.dtfiled between '2008-01-01 00:00' and '2008-01-30 23:59')

sub query is the one scanning larger number of rows

is this column indexed "S.dtfiled"?

if possible go for Stored Procedure so you can run the inner query separately and store the o/p in a variable and reuse it
so the actual query will be faster

Thanks varma. Yes. S.dtfiled is indexed. But even without that sub query it took long time.

I am using default mysql system variable configurations. Is there any other way that I can increase heap/cache memory which will help me to improve the performance.

Regards,
pani

ya there are some variables that can speed up mysql that depends on

what type of storage engine u r using
MyIsam is upto 12 times faster than Innodb but heavy inserts and deletes may corrupt the MyIsam Type table

MyIsam loads only the index in to ram where innodb loads the data and the index so u can allocate more space to innodb than MyIsam

you can check Thread Cache,Sort Buffer,Table Cache and Read Rnd Buffer and set it as per your environment

There are many thing that can be done but over all the queries are the first place to start with

if possible post create table syntax and explain o/p of the query(after sub query is removed)

Try this Link it has good articles about Mysql Tunning
http://www.mysqlperformanceblog.com/

Hi,
Thanks for you response. I will try changing the cache/heap related variables & update you the response.

I am using MyIsam storage engine. I am using this database only for scheduled report generation & not for insert, update & delete.

I will try the above & see if am seeing any performance effect.

Regards,
pani

Thanks varma. I have one more question for you. Please suggest me the best one.


We have the choice of the following instance configurations:

7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2
Compute Units each), 1690 GB of instance storage, 64-bit platform

OR

15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2
Compute Units each), 1690 GB of instance storage, 64-bit platform

Please let me know which one is best. More CPU or More Memory?

Regards,
pani

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.