Обсуждение: STRING_AGG and GROUP BY
Here are my 4 test tables:
CREATE TABLE players (
uid SERIAL PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE games (
gid SERIAL PRIMARY KEY,
player1 integer NOT NULL REFERENCES players ON DELETE CASCADE,
player2 integer NOT NULL REFERENCES players ON DELETE CASCADE
);
CREATE TABLE moves (
mid BIGSERIAL PRIMARY KEY,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
played timestamptz NOT NULL,
tiles jsonb NOT NULL
);
CREATE TABLE scores (
mid bigint NOT NULL REFERENCES moves ON DELETE CASCADE,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
word text NOT NULL CHECK(word ~ '^[A-Z]{2,}$'),
score integer NOT NULL CHECK(score >= 0)
);
Here they are filled with test data (two players Alice and Bob interchangeably performing moves in game #1):CREATE TABLE players (
uid SERIAL PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE games (
gid SERIAL PRIMARY KEY,
player1 integer NOT NULL REFERENCES players ON DELETE CASCADE,
player2 integer NOT NULL REFERENCES players ON DELETE CASCADE
);
CREATE TABLE moves (
mid BIGSERIAL PRIMARY KEY,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
played timestamptz NOT NULL,
tiles jsonb NOT NULL
);
CREATE TABLE scores (
mid bigint NOT NULL REFERENCES moves ON DELETE CASCADE,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
word text NOT NULL CHECK(word ~ '^[A-Z]{2,}$'),
score integer NOT NULL CHECK(score >= 0)
);
INSERT INTO players (name) VALUES ('Alice'), ('Bob');
INSERT INTO games (player1, player2) VALUES (1, 2);
INSERT INTO moves (uid, gid, played, tiles) VALUES
(1, 1, now() + interval '1 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "A"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '2 min', '[{"col": 7, "row": 12, "value": 3, "letter": "X"}, {"col": 8, "row": 12, "value": 10, "letter": "X"}, {"col": 9, "row": 12, "value": 1, "letter": "Z"}]
'::jsonb),
(1, 1, now() + interval '3 min', '[{"col": 7, "row": 12, "value": 3, "letter": "K"}, {"col": 8, "row": 12, "value": 10, "letter": "K"}, {"col": 9, "row": 12, "value": 1, "letter": "M"}, {"col": 10, "row": 12, "value": 2, "letter": "N"}]
'::jsonb),
(2, 1, now() + interval '4 min', '[]'::jsonb),
(1, 1, now() + interval '5 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "B"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '6 min', '[{"col": 7, "row": 12, "value": 3, "letter": "P"}, {"col": 8, "row": 12, "value": 10, "letter": "P"}]
'::jsonb);
INSERT INTO scores (mid, uid, gid, word, score) VALUES
(1, 1, 1, 'AACD', 40),
(2, 2, 1, 'XXZ', 30),
(2, 2, 1, 'XAB', 30),
(3, 1, 1, 'KKMN', 40),
(3, 1, 1, 'KYZ', 30),
(5, 1, 1, 'ABCD', 40),
(6, 2, 1, 'PP', 20),
(6, 2, 1, 'PABCD', 50);
SELECT
mid,
STRING_AGG(x->>'letter', '') AS tiles,
STRING_AGG(DISTINCT y, ', ') AS words
FROM (
SELECT
mid,
JSONB_ARRAY_ELEMENTS(m.tiles) AS x,
FORMAT('%s (%s)', s.word, s.score) AS y
FROM moves m
LEFT JOIN scores s
USING (mid)
WHERE m.gid = 1
) AS z
GROUP BY mid;
mid tiles words
1 AACD AACD (40)
2 XXZXXZ XAB (30), XXZ (30)
3 KKMNKKMN KKMN (40), KYZ (30)
5 ABCD ABCD (40)
6 PPPP PABCD (50), PP (20)
mid tiles words
1 AACD AACD (40)
2 XXZ XAB (30), XXZ (30)
3 KKMN KKMN (40), KYZ (30)
5 ABCD ABCD (40)
6 PP PABCD (50), PP (20)
Because for example in the last move with mid=6 the player Bob had played 2 tiles, both with letter-value "P" and has formed 2 words (PP and PABCD), but adding distinct would suggest he played a single tile "P".And I can not change the query to: STRING_AGG(DISTINCT x->>'letter', '') AS tiles,
First reaction is to ARRAY_AGG(DISTINCT x) and then write a function that converts that array into a string by extracting 'letter' from each cell in the array.
Thinking it over a bit you have two columns that both are aggregates but that are otherwise independent of each other. Since they are independent they cannot be aggregated at the same time. You need to write a two subqueries, either in the target list or as separate from/join items, and then join the already aggregated queries together on their common group by column.
The presence of DISTINCT here (and, IMO, generally), even if it worked, would be an indicator that something is not quite right.
David J.
Hi David -
On Fri, Mar 16, 2018 at 4:40 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
First reaction is to ARRAY_AGG(DISTINCT x) and then write a function that converts that array into a string by extracting 'letter' from each cell in the array.Thinking it over a bit you have two columns that both are aggregates but that are otherwise independent of each other. Since they are independent they cannot be aggregated at the same time. You need to write a two subqueries, either in the target list or as separate from/join items, and then join the already aggregated queries together on their common group by column.The presence of DISTINCT here (and, IMO, generally), even if it worked, would be an indicator that something is not quite right.
thank you for confirming my feeling that DISTINCT is a bad indicator here...
But you say that "tiles" and "word (score)" are unrelated and this does not seem true to me:
For each move id aka "mid" there is a JSON value, describing how the player played the letter tiles.
And for the same "mid" there is a list of one or more "word (score)"s achieved...
Maybe it is possible to GROUP BY tiles somehow or some kind of special SQL JOIN?
Regards
Alex
But you say that "tiles" and "word (score)" are unrelated and this does not seem true to me:For each move id aka "mid" there is a JSON value, describing how the player played the letter tiles.And for the same "mid" there is a list of one or more "word (score)"s achieved...
For each mid you want to know all tiles played and all word scores achieved - but you want to forget/ignore that a given tile achieved a given word score. IOW, you are intentionally forgetting/ignoring the fact that the tiles and the corresponding word scores are related to each other, beyond the simple/incomplete relationship that both share the same mid. You thus need to write a query that only relates tiles and word scores to mid and not to each other.
Maybe it is possible to GROUP BY tiles somehow or some kind of special SQL JOIN?
I do not know if the tables or columns below match your model but the concept should still come across intact.
SELECT mid,
(SELECT string_agg(tiles, '; ') FROM tiles WHERE ta.mid = moves.mid) AS mid_tiles,
(SELECT string_agg(words, '; ') FROM words WHERE words.mid = moves.mid) AS mid_words
FROM moves
There are other ways to write that that could perform better but the idea holds.
David J.
Thank you, David -
On Fri, Mar 16, 2018 at 5:40 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
SELECT mid,(SELECT string_agg(tiles, '; ') FROM tiles WHERE ta.mid = moves.mid) AS mid_tiles,(SELECT string_agg(words, '; ') FROM words WHERE words.mid = moves.mid) AS mid_wordsFROM movesThere are other ways to write that that could perform better but the idea holds.
I've come up with the following query, wonder if you meant something similar -
WITH cte1 AS (
SELECT
mid,
STRING_AGG(x->>'letter', '') AS tiles
FROM (
SELECT
mid,
JSONB_ARRAY_ELEMENTS(m.tiles) AS x
FROM moves m
WHERE m.gid = 1
) AS z
GROUP BY mid),
cte2 AS (
SELECT
mid,
STRING_AGG(y, ', ') AS words
FROM (
SELECT
mid,
FORMAT('%s (%s)', s.word, s.score) AS y
FROM scores s
WHERE s.gid = 1
) AS z
GROUP BY mid)
SELECT mid, tiles, words
FROM cte1 JOIN cte2 using (mid) ORDER BY mid ASC;
Regards
Alex
P.S. Below is the complete test data in case SQL Fiddle link stops working:
CREATE TABLE players (
uid SERIAL PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE games (
gid SERIAL PRIMARY KEY,
player1 integer NOT NULL REFERENCES players ON DELETE CASCADE,
player2 integer NOT NULL REFERENCES players ON DELETE CASCADE
);
CREATE TABLE moves (
mid BIGSERIAL PRIMARY KEY,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
played timestamptz NOT NULL,
tiles jsonb NOT NULL
);
CREATE TABLE scores (
mid bigint NOT NULL REFERENCES moves ON DELETE CASCADE,
uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
word text NOT NULL CHECK(word ~ '^[A-Z]{2,}$'),
score integer NOT NULL CHECK(score >= 0)
);
INSERT INTO players (name) VALUES ('Alice'), ('Bob');
INSERT INTO games (player1, player2) VALUES (1, 2);
INSERT INTO moves (uid, gid, played, tiles) VALUES
(1, 1, now() + interval '1 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "A"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '2 min', '[{"col": 7, "row": 12, "value": 3, "letter": "X"}, {"col": 8, "row": 12, "value": 10, "letter": "X"}, {"col": 9, "row": 12, "value": 1, "letter": "Z"}]
'::jsonb),
(1, 1, now() + interval '3 min', '[{"col": 7, "row": 12, "value": 3, "letter": "K"}, {"col": 8, "row": 12, "value": 10, "letter": "K"}, {"col": 9, "row": 12, "value": 1, "letter": "M"}, {"col": 10, "row": 12, "value": 2, "letter": "N"}]
'::jsonb),
(2, 1, now() + interval '4 min', '[]'::jsonb),
(1, 1, now() + interval '5 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "B"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '6 min', '[{"col": 7, "row": 12, "value": 3, "letter": "P"}, {"col": 8, "row": 12, "value": 10, "letter": "Q"}]
'::jsonb);
INSERT INTO scores (mid, uid, gid, word, score) VALUES
(1, 1, 1, 'AACD', 40),
(2, 2, 1, 'XXZ', 30),
(2, 2, 1, 'XAB', 30),
(3, 1, 1, 'KKMN', 40),
(3, 1, 1, 'KYZ', 30),
(5, 1, 1, 'ABCD', 40),
(6, 2, 1, 'PQ', 20),
(6, 2, 1, 'PABCD', 50);
I've come up with the following query, wonder if you meant something similar -
WITH cte1 AS (SELECTmid,STRING_AGG(x->>'letter', '') AS tilesFROM (SELECTmid,JSONB_ARRAY_ELEMENTS(m.tiles) AS xFROM moves mWHERE m.gid = 1) AS zGROUP BY mid),cte2 AS (SELECTmid,STRING_AGG(y, ', ') AS wordsFROM (SELECTmid,FORMAT('%s (%s)', s.word, s.score) AS yFROM scores sWHERE s.gid = 1) AS zGROUP BY mid)SELECT mid, tiles, wordsFROM cte1 JOIN cte2 using (mid) ORDER BY mid ASC;
Yes. It does end up presuming that the sets moves.mid and scores.mid are identical but that is probably a safe assumption. Repetition of m.gid = 1 is worth avoiding in theory though depending on how its done the solution can be worse than the problem (if the planner ends up unable to push the predicate down).
David J.