Improve HAVING BY performance - Page 2

Question

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Dani 4,675 The Queen of DaniWeb Administrator Featured Poster Premium Member · Answer 1 · 2024-04-04T22:37:03+00:00

But you're not comparing apples to apples. Your WHERE query and your HAVING query perform different calculations and do different things. In one, you're plucking out all the rows with a price greater than $4, and then calculating an average price for each product. In the other, you're plucking out all rows, calculating an average price for each product, and then discarding the rows with a price less than or equal to $4. I'm not sure what your database looks like, but if there are lots of rows with products of all different price ranges, you're not going to end up with the same result set.

toneewa 115 Junior Poster · Answer 2 · 2024-04-05T00:49:40+00:00

Correct. It wasn't about displaying the results, but to measure the different clauses. I'm not impressed by only 95 INSERTs/sec, and a maximum write speed of 175 KB/s for importing data. Increasing to 50K shows WHERE to be faster.

50K:

    WHERE Query execution time: 0.0599129 seconds
    HAVING Query execution time: 0.0621748 seconds

    HAVING Query execution time: 0.0629348 seconds
    WHERE Query execution time: 0.0627212 seconds

5:

    WHERE Query execution time: 0.0002878 seconds
    HAVING Query execution time: 0.0002313 seconds

    HAVING Query execution time: 0.0002674 seconds
    WHERE Query execution time: 0.0004905 seconds


INSERT INTO electronics.products(ProductID, ProductName, Price)
VALUES
('1','capacitors', 2.50),
('2','resistors', 4.50),
('3','rectifiers', 7.50),
('4','diodes', 10.00),
('5','ICs', 25.00),
...
('50000','...', ...);

If you have a test case you want to measure, please do share.

Dani 4,675 The Queen of DaniWeb Administrator Featured Poster Premium Member · Answer 3 · 2024-04-05T01:07:50+00:00

I think you might be missing my point. If you’re fetching different data, there’s no way of saying whether where or having is faster. You would need to write queries that use each, but retrieve the same data, to see which is faster for a use case. Even then, it totally depends on the data in the database itself. The same query can be fast for some datasets and slow for others.

toneewa 115 Junior Poster · Answer 4 · 2024-04-05T04:02:30+00:00

I know when you don't use aggregate calculations use WHERE. Producing the same results can be done. I added another test with MAX(Price). With a database over 305K rows, 3 columns, MySQL Workbench is unstable importing. Took over 2 hours. WHERE is faster, and should be used, when no functions are needed.

WHERE Query execution time: 0.160881 seconds
HAVING Query execution time: 0.245288 seconds

Same results:
    Query execution time: 0.302986 seconds
    49.76
    49.28
    49.2
    49.86
    49.34
    49.44
    49.88
    49.78
    49.74
    49.99
    49.52
    49.1
    49.51
    49.27
    49.13
    49.92
    49.45
    49.56
    49.89
    49.5
    49.06
    49.48
    49.18
    49.35
    49.19
    49.21
    49.17
    49.75
    49.72
    49.93
    49.09
    49.25
    49.11
    49.83
    49.01
    49.4
    49.36
    49.85
    49.81
    49.77
    49.32
    49.8
    49.69
    49.24
    49.15
    49.96
    49.38
    49.63
    49.84
    49.14
    49.08
    49.02
    50
    49.67
    49.97
    49.05
    49.62
    49.91
    49.82
    49.7
    49.9
    49.73
    49.58
    49.12
    49.95
    49.42
    49.79
    49.3
    49.23
    49.16
    49.64
    49.66
    49.04
    49.71
    49.94
    49.53
    49.03
    49.65
    49.41
    49.31
    49.29
    49.87
    49.55
    49.49
    49.68
    49.33
    49.46
    49.6
    49.47
    49.54
    49.26
    49.61
    49.59
    49.37
    49.07
    49.43
    49.39
    49.98
    49.22
    49.57
    WHERE Query execution time: 0.160881 seconds

    49.76
    49.28
    49.2
    49.86
    49.34
    49.44
    49.88
    49.78
    49.74
    49.99
    49.52
    49.1
    49.51
    49.27
    49.13
    49.92
    49.45
    49.56
    49.89
    49.5
    49.06
    49.48
    49.18
    49.35
    49.19
    49.21
    49.17
    49.75
    49.72
    49.93
    49.09
    49.25
    49.11
    49.83
    49.01
    49.4
    49.36
    49.85
    49.81
    49.77
    49.32
    49.8
    49.69
    49.24
    49.15
    49.96
    49.38
    49.63
    49.84
    49.14
    49.08
    49.02
    50
    49.67
    49.97
    49.05
    49.62
    49.91
    49.82
    49.7
    49.9
    49.73
    49.58
    49.12
    49.95
    49.42
    49.79
    49.3
    49.23
    49.16
    49.64
    49.66
    49.04
    49.71
    49.94
    49.53
    49.03
    49.65
    49.41
    49.31
    49.29
    49.87
    49.55
    49.49
    49.68
    49.33
    49.46
    49.6
    49.47
    49.54
    49.26
    49.61
    49.59
    49.37
    49.07
    49.43
    49.39
    49.98
    49.22
    49.57
    HAVING Query execution time: 0.245288 seconds

    Diode 24.1361
    Capacitor 24.1737
    Transistor 24.0247
    Resistor 24.1304
    Inductor 24.1018
    HAVING Query execution time: 0.436416 seconds

    Diode 24.1361
    Capacitor 24.1737
    Transistor 24.0247
    Resistor 24.1304
    Inductor 24.1018
    WHERE Query execution time: 0.432615 seconds

Biiim 182 Junior Poster · Answer 5 · 2024-04-06T13:15:24+00:00

INSERT INTO electronics.products(ProductID, ProductName, Price)
VALUES
('1','capacitors', 2.50),
('2','resistors', 4.50),
('3','rectifiers', 7.50),
('4','diodes', 10.00),
('5','ICs', 25.00),
...
('50000','...', ...);

I don't see any mention of an index on it & your benchmark indicates you don't have one:

ALTER TABLE `products`
    ADD PRIMARY KEY(`ProductID`),
    ADD INDEX `Price` (`Price`);

You should also specify your hardware for a benchmark as it makes quite an impact on the speed

You can also fine tune the MySQL system variables to better handle the queries you want to run, Query cache for instance will cache the most common queries so they don't have to be re-run again when nothing has changed, you generally don't need to bother with this though as it should run fast enough for most purposes as it is.

There is also different table storage engines, InnoDB is trasactional so it is very robust, if you want something faster but are not too worried about corruption you can use MyISAM, it has less rollback functionality but this makes it faster. I was using that for an email marketing DB ages ago (logging each sent email, failed, read, clicked etc. it was over 400,000 a day on an old HDD hard drive, it was a pretty average linux server).

There is definitely something very wrong with your setup cause I import 10 million rows of 6 columns every week in less than 10 minutes and that is slow as it is LOAD FILE from a csv (thats about 17,000 inserts a second). I even have PHP running about 30 queries with various joins and criteria from 100's of tables in less than a second, one is 10 million big and at least 4 more of them that I can think of are over 300,000. One of them even has 132 columns and 310,000 rows. The index cuts out the excess rows in 0.000s if you get the index right, and if you don't specify needing a column they get dropped to save time and memory usage, the PHP query uses a specific ID so the rows are cut down to very few. A smart trick I started using is to include a TIME on queries when you don't need old data, as after a couple of years this starts slowing down the query immensely AND tbl.create_time > NOW() - INTERVAL 2 WEEK (with an index on create_time)

You've inspired me to try and setup a benchmark on my nodejs/react setup to see what it does, as I was amazed at how spammed my terminal got when I made mistakenly made an infinite loop on queries a couple of weeks ago

toneewa 115 Junior Poster · Answer 6 · 2024-04-06T17:34:50+00:00

I added your mentioned alter table index. Took 4.375 secs in MySQL Workbench. It did speed up the results. I'll have to try another method of importing to see if I can improve this. I also ommitted the overhead of converting variables for display output measurements, because they were included in the C++ times. If we were to compare the two, it as folllows:

Query     Duration / Fetch
C++:                 0.30402 sec
MWB:     0.000 sec / 0.297 sec

WHERE:
C++:     0.018953 sec
MWB:     0.000 sec / 0.000 sec

HAVING:
C++:     0.0626546 sec    
MWB:     0.062 sec / 0.000 sec

I don't consider this bad at all for connecting to it through localhost. Let's not talk about display times. :)