0

I have a query in which I need to display some group function figures SUM COUNT etc from a joined table. I also need to select the latest version (by datetime) of a matching record from a separate table, which I can do using an ordered subquery in the join.

My issue is that the group functions appear to be joining on every row in the subquery (not just the latest one) and thus displaying incorrect inflated results of the SUM/COUNT functions.

So I guess what I'm looking for is one of the following:
1) a better way to do the group functions so that the joins don't affect them
2) a way to select only the single latest row in my subquery so the extraneous rows aren't included in the join
3) a different approach to the joins themselves that aleviates the problem.

Here is a simplified version of the query, any ideas greatly appreciated.

SELECT p.field1, p.field2, p.field3,
       SUM(pp.amount1), SUM(pp.amount2),
       COUNT(c.field1), SUM(c.field1),
       s.Status,
       l.lookupvalues
FROM tableP p
LEFT JOIN tableP pp ON pp.field1 = p.field1 AND pp.field2 = p.field2 AND pp.field3=0
LEFT JOIN someotherlookuptables l on l.id=p.lookupid
LEFT JOIN tableC c ON c.id = p.cid
LEFT JOIN (SELECT * FROM status ORDER BY dateChanged DESC) s ON s.pid = p.id
GROUP BY p.field1

I should add that I have narrowed it down to eliminate other LEFT JOINed tables in the query, which I have tested and have no affect on the group functions.

Also, if I alter the subquery to return only the first row, ie: (SELECT * FROM stats ORDER BY dateChanged DESC LIMIT 1) the group functions return correct values, but obviously only one record is displayed with a Status.

Edited by |-|x: additional information

2
Contributors
3
Replies
4
Views
5 Years
Discussion Span
Last Post by |-|x
0

1) You cannot mix aggregate functions with non-aggregate fields with valid results. If you group by p.field1, what would you expect in result for the tupels
(field1, field2) = (1,1), (1,2), (2,1),(2,2) ?
Should the query return 1 or 2 as the value for field2?
The database cannot decide that. MySQL has the bad practice of just returning the first encountered value for the non-aggregate field, which is often more than mildly confusing.
2) Your left joins look like regular (inner) joins to me. Or do you really expect NULL values on the right side?
3) To get a minimum or maximum date from a subquery, use the max() or min() function rather than some order. An order clause in a subquery does not make sense except for implementation quirks like then one mentioned above.

Better show some test data (a complete test case with CREATE TABLE, INSERT and SELECT statements) for further help.

0

Thanks, and I realise you are correct about aggregate functions, although in this case MySQL's handling of the Grouping is sufficient as the primary table will not return duplicate records for the key fields. I have however included the additional non-aggregate fields in the Group clause now, for precision.

I am actually converting this from a legacy system which may not have been written 100% accurately. In addition, the existing data is also legacy and we cannot guarantee that it is complete or integral (ie: no foreign key enforcement) therefore the Joins must allow for incidental missing or invalid data on the Right side.

To my main issue now, I have a work around solution which gives me the correct result. However, as it is a nested subquery I have concerns about efficiency once the data starts to increase. The new join looks as follows:

LEFT JOIN (SELECT * FROM (SELECT * FROM stats ORDER BY dateChanged) ss GROUP BY ss.pid) s ON s.pid=p.id

Max and Min functions wont achieve what I need as they only return the max date for the entire table, and I need the greatest value for each ID in the table. It would work as another nested subquery, but that still doesn't eleviate the efficiency issue.

I am open to any suggestions as to how to eliminate the subquery nesting. I read an article that suggested to do it joining the table to itself WHERE left.dateChanged>right.dateChanged but I couldn't make it work.

0

I managed to get the join version working, it seems to be more efficient but I will need to verify this further with the full dataset.

Here for interest:

...
LEFT JOIN (status s LEFT JOIN status sx ON sx.pid = s.pid AND sx.DateStamp > s.DateStamp) ON s.pid = p.id
...
WHERE sx.DateStamp IS NULL
This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.