Understanding aggregation and window functions.

Поиск
Список
Период
Сортировка
От Pól Ua Laoínecháin
Тема Understanding aggregation and window functions.
Дата
Msg-id CAF4RT5Q15EJ0mc7cMv8FZK+Vdq23GvQo03R_xo3U=M-QsXmKVw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Understanding aggregation and window functions.  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-novice
Hi all,

I have three tables: theatre, show and ticket - all DDL and DML
available at the bottom of this post and on a fiddle here:

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=94d04a7a08fd09022774f21c327dd529

I've kept things very simple:

theatre (theatre_id) is an FK in show (show_theatre_id)

show (show_id) is an FK in ticket (ticket_show_id)


So, I start off:

SELECT
  th.theatre_id, th.theatre_name AS th_name,
  s.show_id AS s_id, s.show_theatre_id AS s_th, s.show_name AS s_name,
  t.ticket_id AS t_id, t.ticket_show_id AS t_show, t.price AS t_p
FROM
  theatre th
JOIN show s ON th.theatre_id = s.show_theatre_id
JOIN ticket t ON s.show_id = t.ticket_show_id;

And I get 29 rows returned - (not all shown) - 1 for every ticket.

theatre_id th_name s_id s_th s_name t_id t_show t_p
1               Theatre_1 10      1 show_10   1         10 10
1               Theatre_1 10      1 show_10   2         10 10

That's all fine.

So, now I run:

SELECT
  th.theatre_name AS "Theatre name", s.show_name AS "Show name",
  SUM(t.price) AS "Total sales"

FROM
  theatre th
JOIN show s ON th.theatre_id = s.show_theatre_id
JOIN ticket t ON s.show_id = t.ticket_show_id
GROUP BY th.theatre_name, s.show_name
ORDER BY th.theatre_name, s.show_name;


And obtain:

Theatre name    Show name   Sales/show
      Theatre_1         show_10             2050
      Theatre_1          show_11                77
      Theatre_2         show_20               200
      Theatre_2          show_21                63

Again, fine! But what I want is (theatre_1 2127) and (theatre_2, 263).

I would also appreciate *_explanations_*, references, URLs... a bit
more than just the correct SQL would be great!

But, now I want to get the total sales/per theatre in the same query,
so I try adding the line (see fiddle):

  SUM(t.price) OVER (PARTITION BY th.theatre_id ORDER BY th.theatre_id),

and I receive the error:

ERROR: column "t.price" must appear in the GROUP BY clause or be used
in an aggregate function LINE 8: SUM(t.price) OVER (PARTITION BY
th.theatre_id ORDER BY th....

Now, I don't understand why I'm getting the error - I *_already_* have
t.price in an aggregate function - i.e. the simple SUM without OVER
()?

So, I ran another query to experiment:

SELECT
  th.theatre_name AS "T. name", s.show_id AS "Show name", t.price AS
"Tkt price",

  SUM(t.price) OVER (PARTITION BY th.theatre_id) AS "Sales/T.",

  SUM(t.price) OVER (PARTITION BY s.show_id ORDER BY th.theatre_id,
s.show_id) AS "Sales/show",

  COUNT(t.price) OVER (PARTITION BY th.theatre_id, show_id) "No. of
difft tkts/show",

  SUM(t.price) AS "Tot sales/ticket"

FROM
  theatre th
JOIN show s ON th.theatre_id = s.show_theatre_id
JOIN ticket t ON s.show_id = t.ticket_show_id
GROUP BY th.theatre_id, th.theatre_name, s.show_id, t.price
ORDER BY th.theatre_name, s.show_id, t.price DESC;


So, the result of this is:

      T. name   Show name   Tkt price    Sales/T.    Sales/show
tkts/show     Tot sales/ticket
   Theatre_1                 10           500          521
    510                2                     2000
   Theatre_1                 10             10          521
     510                2                         50
   Theatre_1                 11             11           521
        11                1                         77
   Theatre_2                 20             20            41
        20                1                       200
   Theatre_2                 21             21            41
        21                1                         63


I hope this "chart" comes out in the formatting - if not, check out
the fiddle - it's far more readable there!

So,

- t.price AS "Tkt price" gives me 5 prices - i.e. for show  10, there
are two types available, 10 and 500 € (or whatever...) and 5 overall
(1 each for the others) - OK!

- SUM(t.price) OVER (PARTITION BY th.theatre_id) AS "Sales/T." gives
me the total of the prices for the individual different tickets by
theatre - i.e. the sum of 500 + 10 + 11 for theatre 1 - that's OK

So, I'm aggregating over the theatres... but for individual tickets
and not the total?


 - SUM(t.price) OVER (PARTITION BY s.show_id ORDER BY th.theatre_id,
s.show_id) AS "Sales/show" give me the sum of the prices of the
individual tickets per show - 500 + 10 for theatre 1, show 1 - OK

 - COUNT(t.price) OVER (PARTITION BY th.theatre_id, show_id)
"Tkts/show", give 2 for shows 10 and 1 for the others - which is OK

 - We've already seen the final result above

What I want is

Theatre_1 2127
Theatre_2 263

i.e. the total sales overall for all shows by theatre - and I just
can't seem to understand why my various partitioning and ordering
attempts aren't working out!

** BONUS QUESTION **

I would also like to include the number of tickets sold by price point
and the number of tickets sold per theatre and per show - this might
involve a subselect in the SELECT clause?

I tried (see end of fiddle):

(https://dbfiddle.uk/?rdbms=postgres_12&fiddle=94d04a7a08fd09022774f21c327dd529)

  SELECT COUNT(t.ticket_show_id)
  FROM show s
  JOIN ticket t ON s.show_id = t.ticket_show_id
  GROUP BY s.show_id

Result

count
9
7
3
10

But when I tried to integrate it into another query as a SELECT in the
SELECT statement, I received the error:

  ERROR: more than one row returned by a subquery used as an expression

Which I can understand, but can't figure out how to solve. For each
row of my main query, the query above should only return 1 row -
but...

Any advice appreciated -


Pól...


================= DDL and DML =================

CREATE TABLE theatre
(
  theatre_id   INTEGER      NOT NULL PRIMARY KEY,
  theatre_name VARCHAR (25) NOT NULL
);

INSERT INTO theatre VALUES (1, 'Theatre_1'), (2, 'Theatre_2');

CREATE TABLE show
(
  show_id         INTEGER      NOT NULL PRIMARY KEY,
  show_theatre_id INTEGER      NOT NULL,
  show_name       VARCHAR (25) NOT NULL,
  CONSTRAINT show_theatre_fk FOREIGN KEY (show_theatre_id) REFERENCES
theatre (theatre_id)
);

INSERT INTO show (show_id, show_theatre_id, show_name) VALUES
(10, 1, 'show_10'), (11, 1, 'show_11'), (20, 2, 'show_20'), (21, 2, 'show_21');

CREATE TABLE ticket
(
  ticket_id SERIAL PRIMARY KEY,
  ticket_show_id INTEGER NOT NULL,
  CONSTRAINT ticket_show_fk FOREIGN KEY (ticket_show_id) REFERENCES
show (show_id),
  price INT NOT NULL
  --
  -- Various other fields - seat_no, is_sold, discount... whatever
);

INSERT INTO ticket (ticket_show_id, price) VALUES
(10, 10), (10, 10), (10, 10), (10, 10), (10, 10),
-- 5 tickets for show 10
(11, 11), (11, 11), (11, 11), (11, 11), (11, 11), (11, 11), (11, 11),
-- 7/show 11 &c
(20, 20), (20, 20), (20, 20), (20, 20), (20, 20),
(20, 20), (20, 20), (20, 20), (20, 20), (20, 20),                     -- 10
(21, 21), (21, 21), (21, 21),                                         -- 3
(10, 500), (10, 500), (10, 500), (10, 500);        -- 4 EXPENSIVE
tickets for show_10;



В списке pgsql-novice по дате отправления:

Предыдущее
От: Stefan Houtzager
Дата:
Сообщение: Re: coalesce(json_agg [..] filter where [..], '[]' in left join returning null
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: coalesce(json_agg [..] filter where [..], '[]' in left join returning null