Обсуждение: Understanding aggregation and window functions.
Hi all, I have three tables: theatre, show and ticket - all DDL and DML available at the bottom of this post and on a fiddle here: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=94d04a7a08fd09022774f21c327dd529 I've kept things very simple: theatre (theatre_id) is an FK in show (show_theatre_id) show (show_id) is an FK in ticket (ticket_show_id) So, I start off: SELECT th.theatre_id, th.theatre_name AS th_name, s.show_id AS s_id, s.show_theatre_id AS s_th, s.show_name AS s_name, t.ticket_id AS t_id, t.ticket_show_id AS t_show, t.price AS t_p FROM theatre th JOIN show s ON th.theatre_id = s.show_theatre_id JOIN ticket t ON s.show_id = t.ticket_show_id; And I get 29 rows returned - (not all shown) - 1 for every ticket. theatre_id th_name s_id s_th s_name t_id t_show t_p 1 Theatre_1 10 1 show_10 1 10 10 1 Theatre_1 10 1 show_10 2 10 10 That's all fine. So, now I run: SELECT th.theatre_name AS "Theatre name", s.show_name AS "Show name", SUM(t.price) AS "Total sales" FROM theatre th JOIN show s ON th.theatre_id = s.show_theatre_id JOIN ticket t ON s.show_id = t.ticket_show_id GROUP BY th.theatre_name, s.show_name ORDER BY th.theatre_name, s.show_name; And obtain: Theatre name Show name Sales/show Theatre_1 show_10 2050 Theatre_1 show_11 77 Theatre_2 show_20 200 Theatre_2 show_21 63 Again, fine! But what I want is (theatre_1 2127) and (theatre_2, 263). I would also appreciate *_explanations_*, references, URLs... a bit more than just the correct SQL would be great! But, now I want to get the total sales/per theatre in the same query, so I try adding the line (see fiddle): SUM(t.price) OVER (PARTITION BY th.theatre_id ORDER BY th.theatre_id), and I receive the error: ERROR: column "t.price" must appear in the GROUP BY clause or be used in an aggregate function LINE 8: SUM(t.price) OVER (PARTITION BY th.theatre_id ORDER BY th.... Now, I don't understand why I'm getting the error - I *_already_* have t.price in an aggregate function - i.e. the simple SUM without OVER ()? So, I ran another query to experiment: SELECT th.theatre_name AS "T. name", s.show_id AS "Show name", t.price AS "Tkt price", SUM(t.price) OVER (PARTITION BY th.theatre_id) AS "Sales/T.", SUM(t.price) OVER (PARTITION BY s.show_id ORDER BY th.theatre_id, s.show_id) AS "Sales/show", COUNT(t.price) OVER (PARTITION BY th.theatre_id, show_id) "No. of difft tkts/show", SUM(t.price) AS "Tot sales/ticket" FROM theatre th JOIN show s ON th.theatre_id = s.show_theatre_id JOIN ticket t ON s.show_id = t.ticket_show_id GROUP BY th.theatre_id, th.theatre_name, s.show_id, t.price ORDER BY th.theatre_name, s.show_id, t.price DESC; So, the result of this is: T. name Show name Tkt price Sales/T. Sales/show tkts/show Tot sales/ticket Theatre_1 10 500 521 510 2 2000 Theatre_1 10 10 521 510 2 50 Theatre_1 11 11 521 11 1 77 Theatre_2 20 20 41 20 1 200 Theatre_2 21 21 41 21 1 63 I hope this "chart" comes out in the formatting - if not, check out the fiddle - it's far more readable there! So, - t.price AS "Tkt price" gives me 5 prices - i.e. for show 10, there are two types available, 10 and 500 € (or whatever...) and 5 overall (1 each for the others) - OK! - SUM(t.price) OVER (PARTITION BY th.theatre_id) AS "Sales/T." gives me the total of the prices for the individual different tickets by theatre - i.e. the sum of 500 + 10 + 11 for theatre 1 - that's OK So, I'm aggregating over the theatres... but for individual tickets and not the total? - SUM(t.price) OVER (PARTITION BY s.show_id ORDER BY th.theatre_id, s.show_id) AS "Sales/show" give me the sum of the prices of the individual tickets per show - 500 + 10 for theatre 1, show 1 - OK - COUNT(t.price) OVER (PARTITION BY th.theatre_id, show_id) "Tkts/show", give 2 for shows 10 and 1 for the others - which is OK - We've already seen the final result above What I want is Theatre_1 2127 Theatre_2 263 i.e. the total sales overall for all shows by theatre - and I just can't seem to understand why my various partitioning and ordering attempts aren't working out! ** BONUS QUESTION ** I would also like to include the number of tickets sold by price point and the number of tickets sold per theatre and per show - this might involve a subselect in the SELECT clause? I tried (see end of fiddle): (https://dbfiddle.uk/?rdbms=postgres_12&fiddle=94d04a7a08fd09022774f21c327dd529) SELECT COUNT(t.ticket_show_id) FROM show s JOIN ticket t ON s.show_id = t.ticket_show_id GROUP BY s.show_id Result count 9 7 3 10 But when I tried to integrate it into another query as a SELECT in the SELECT statement, I received the error: ERROR: more than one row returned by a subquery used as an expression Which I can understand, but can't figure out how to solve. For each row of my main query, the query above should only return 1 row - but... Any advice appreciated - Pól... ================= DDL and DML ================= CREATE TABLE theatre ( theatre_id INTEGER NOT NULL PRIMARY KEY, theatre_name VARCHAR (25) NOT NULL ); INSERT INTO theatre VALUES (1, 'Theatre_1'), (2, 'Theatre_2'); CREATE TABLE show ( show_id INTEGER NOT NULL PRIMARY KEY, show_theatre_id INTEGER NOT NULL, show_name VARCHAR (25) NOT NULL, CONSTRAINT show_theatre_fk FOREIGN KEY (show_theatre_id) REFERENCES theatre (theatre_id) ); INSERT INTO show (show_id, show_theatre_id, show_name) VALUES (10, 1, 'show_10'), (11, 1, 'show_11'), (20, 2, 'show_20'), (21, 2, 'show_21'); CREATE TABLE ticket ( ticket_id SERIAL PRIMARY KEY, ticket_show_id INTEGER NOT NULL, CONSTRAINT ticket_show_fk FOREIGN KEY (ticket_show_id) REFERENCES show (show_id), price INT NOT NULL -- -- Various other fields - seat_no, is_sold, discount... whatever ); INSERT INTO ticket (ticket_show_id, price) VALUES (10, 10), (10, 10), (10, 10), (10, 10), (10, 10), -- 5 tickets for show 10 (11, 11), (11, 11), (11, 11), (11, 11), (11, 11), (11, 11), (11, 11), -- 7/show 11 &c (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), -- 10 (21, 21), (21, 21), (21, 21), -- 3 (10, 500), (10, 500), (10, 500), (10, 500); -- 4 EXPENSIVE tickets for show_10;
On Wed, Jan 6, 2021 at 9:08 AM Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
SELECT
th.theatre_name AS "Theatre name", s.show_name AS "Show name",
SUM(t.price) AS "Total sales"
Theatre name Show name Sales/show
Theatre_1 show_10 2050
Theatre_1 show_11 77
Theatre_2 show_20 200
Theatre_2 show_21 63
Again, fine! But what I want is (theatre_1 2127) and (theatre_2, 263).
Then why are you grouping on "Show name" if the total you want is "Theatre name"?
SUM(t.price) OVER (PARTITION BY th.theatre_id ORDER BY th.theatre_id),
and I receive the error:
ERROR: column "t.price" must appear in the GROUP BY clause or be used
in an aggregate function LINE 8: SUM(t.price) OVER (PARTITION BY
th.theatre_id ORDER BY th....
sum(sum(t.price)) over (...)
When using window function in a aggregate query you need to write one aggregate function calls to appease the group by (the inner sum) and one aggregate function call to appease the "over" (the outer sum).
Ignoring the rest of the section, re-ask if you still have questions after getting clarity for the above.
** BONUS QUESTION **
I would also like to include the number of tickets sold by price point
and the number of tickets sold per theatre and per show - this might
involve a subselect in the SELECT clause?
SELECT COUNT(t.ticket_show_id)
FROM show s
JOIN ticket t ON s.show_id = t.ticket_show_id
GROUP BY s.show_id
Result
count
9
7
3
10
I try to never write a group by query that doesn't include a column for each grouped field, then any aggregates.
But when I tried to integrate it into another query as a SELECT in the
SELECT statement, I received the error:
ERROR: more than one row returned by a subquery used as an expression
Which I can understand, but can't figure out how to solve. For each
row of my main query, the query above should only return 1 row -
but...
Then you probably need to correlate the outer and inner queries by referencing a value in the outer query inside of the subquery.
with subquery as (select id, val)
from mainquery
Though, as can be seen from the above, usually a join works better than a subquery in cases like this.
David J.