Обсуждение: STRING_AGG and GROUP BY

Поиск

Список

Период

Сортировка

STRING_AGG and GROUP BY

От

Alexander Farber

Дата:

16 марта 2018 г., 20:17:37

Good afternoon,

I have prepared an SQL Fiddle for my question: http://sqlfiddle.com/#!17/4ef8b/2

Here are my 4 test tables:

CREATE TABLE players (
    uid SERIAL PRIMARY KEY,
    name text NOT NULL
);

CREATE TABLE games (
    gid SERIAL PRIMARY KEY,
    player1 integer NOT NULL REFERENCES players ON DELETE CASCADE,
    player2 integer NOT NULL REFERENCES players ON DELETE CASCADE
);

CREATE TABLE moves (
    mid BIGSERIAL PRIMARY KEY,
    uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
    gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
    played timestamptz NOT NULL,
    tiles jsonb NOT NULL
);

CREATE TABLE scores (
    mid     bigint NOT NULL REFERENCES moves ON DELETE CASCADE,
    uid     integer NOT NULL REFERENCES players ON DELETE CASCADE,
    gid     integer NOT NULL REFERENCES games ON DELETE CASCADE,
    word    text    NOT NULL CHECK(word ~ '^[A-Z]{2,}$'),
    score   integer NOT NULL CHECK(score >= 0)
);

Here they are filled with test data (two players Alice and Bob interchangeably performing moves in game #1):

INSERT INTO players (name) VALUES ('Alice'), ('Bob');
INSERT INTO games (player1, player2) VALUES (1, 2);

INSERT INTO moves (uid, gid, played, tiles) VALUES
(1, 1, now() + interval '1 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "A"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '2 min', '[{"col": 7, "row": 12, "value": 3, "letter": "X"}, {"col": 8, "row": 12, "value": 10, "letter": "X"}, {"col": 9, "row": 12, "value": 1, "letter": "Z"}]
'::jsonb),
(1, 1, now() + interval '3 min', '[{"col": 7, "row": 12, "value": 3, "letter": "K"}, {"col": 8, "row": 12, "value": 10, "letter": "K"}, {"col": 9, "row": 12, "value": 1, "letter": "M"}, {"col": 10, "row": 12, "value": 2, "letter": "N"}]
'::jsonb),
(2, 1, now() + interval '4 min', '[]'::jsonb),
(1, 1, now() + interval '5 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "B"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]
'::jsonb),
(2, 1, now() + interval '6 min', '[{"col": 7, "row": 12, "value": 3, "letter": "P"}, {"col": 8, "row": 12, "value": 10, "letter": "P"}]
'::jsonb);

INSERT INTO scores (mid, uid, gid, word, score) VALUES
(1, 1, 1, 'AACD', 40),
(2, 2, 1, 'XXZ', 30),
(2, 2, 1, 'XAB', 30),
(3, 1, 1, 'KKMN', 40),
(3, 1, 1, 'KYZ', 30),
(5, 1, 1, 'ABCD', 40),
(6, 2, 1, 'PP', 20),
(6, 2, 1, 'PABCD', 50);

For a PHP-script which would display all moves+words+scores played in a certain game I am trying:

SELECT
    mid,
    STRING_AGG(x->>'letter', '') AS tiles,
    STRING_AGG(DISTINCT y, ', ') AS words
FROM (
    SELECT
        mid,
        JSONB_ARRAY_ELEMENTS(m.tiles) AS x,
        FORMAT('%s (%s)', s.word, s.score) AS y
    FROM moves m
    LEFT JOIN scores s
    USING (mid)
    WHERE m.gid = 1
) AS z
GROUP BY mid;

Which produces a slightly wrong result (the played letters are duplicated):

mid   tiles            words
1     AACD                 AACD (40)
2     XXZXXZ             XAB (30), XXZ (30)
3     KKMNKKMN      KKMN (40), KYZ (30)
5     ABCD                 ABCD (40)
6     PPPP                 PABCD (50), PP (20)

My expected result would actually be:

mid   tiles            words
1     AACD                 AACD (40)
2     XXZ              XAB (30), XXZ (30)
3     KKMN      KKMN (40), KYZ (30)
5     ABCD                 ABCD (40)
6     PP                    PABCD (50), PP (20)

Is that actually possible by the means of SQL or should I do it in the PHP script?

And I can not change the query to: STRING_AGG(DISTINCT x->>'letter', '') AS tiles,

Because for example in the last move with mid=6 the player Bob had played 2 tiles, both with letter-value "P" and has formed 2 words (PP and PABCD), but adding distinct would suggest he played a single tile "P".

Thank you

Alex

Re: STRING_AGG and GROUP BY

От

"David G. Johnston"

Дата:

16 марта 2018 г., 21:40:00

On Fri, Mar 16, 2018 at 7:17 AM, Alexander Farber <alexander.farber@gmail.com> wrote:

And I can not change the query to: STRING_AGG(DISTINCT x->>'letter', '') AS tiles,

Because for example in the last move with mid=6 the player Bob had played 2 tiles, both with letter-value "P" and has formed 2 words (PP and PABCD), but adding distinct would suggest he played a single tile "P".

First reaction is to ARRAY_AGG(DISTINCT x) and then write a function that converts that array into a string by extracting 'letter' from each cell in the array.

Thinking it over a bit you have two columns that both are aggregates but that are otherwise independent of each other. Since they are independent they cannot be aggregated at the same time. You need to write a two subqueries, either in the target list or as separate from/join items, and then join the already aggregated queries together on their common group by column.

The presence of DISTINCT here (and, IMO, generally), even if it worked, would be an indicator that something is not quite right.

David J.

Re: STRING_AGG and GROUP BY

От

Alexander Farber

Дата:

16 марта 2018 г., 22:10:02

Hi David -

On Fri, Mar 16, 2018 at 4:40 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

First reaction is to ARRAY_AGG(DISTINCT x) and then write a function that converts that array into a string by extracting 'letter' from each cell in the array.

Thinking it over a bit you have two columns that both are aggregates but that are otherwise independent of each other. Since they are independent they cannot be aggregated at the same time. You need to write a two subqueries, either in the target list or as separate from/join items, and then join the already aggregated queries together on their common group by column.

The presence of DISTINCT here (and, IMO, generally), even if it worked, would be an indicator that something is not quite right.

thank you for confirming my feeling that DISTINCT is a bad indicator here...

But you say that "tiles" and "word (score)" are unrelated and this does not seem true to me:

For each move id aka "mid" there is a JSON value, describing how the player played the letter tiles.

And for the same "mid" there is a list of one or more "word (score)"s achieved...

Maybe it is possible to GROUP BY tiles somehow or some kind of special SQL JOIN?

Regards

Alex

Re: STRING_AGG and GROUP BY

От

"David G. Johnston"

Дата:

16 марта 2018 г., 22:40:07

On Fri, Mar 16, 2018 at 9:10 AM, Alexander Farber <alexander.farber@gmail.com> wrote:

But you say that "tiles" and "word (score)" are unrelated and this does not seem true to me:

For each move id aka "mid" there is a JSON value, describing how the player played the letter tiles.
And for the same "mid" there is a list of one or more "word (score)"s achieved...

For each mid you want to know all tiles played and all word scores achieved - but you want to forget/ignore that a given tile achieved a given word score. IOW, you are intentionally forgetting/ignoring the fact that the tiles and the corresponding word scores are related to each other, beyond the simple/incomplete relationship that both share the same mid. You thus need to write a query that only relates tiles and word scores to mid and not to each other.

Maybe it is possible to GROUP BY tiles somehow or some kind of special SQL JOIN?

I do not know if the tables or columns below match your model but the concept should still come across intact.

SELECT mid,

(SELECT string_agg(tiles, '; ') FROM tiles WHERE ta.mid = moves.mid) AS mid_tiles,

(SELECT string_agg(words, '; ') FROM words WHERE words.mid = moves.mid) AS mid_words

FROM moves

There are other ways to write that that could perform better but the idea holds.

David J.

Re: STRING_AGG and GROUP BY

От

Alexander Farber

Дата:

20 марта 2018 г., 02:54:05

Thank you, David -

On Fri, Mar 16, 2018 at 5:40 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Fri, Mar 16, 2018 at 9:10 AM, Alexander Farber <alexander.farber@gmail.com> wrote:
SELECT mid,
(SELECT string_agg(tiles, '; ') FROM tiles WHERE ta.mid = moves.mid) AS mid_tiles,
(SELECT string_agg(words, '; ') FROM words WHERE words.mid = moves.mid) AS mid_words
FROM moves

There are other ways to write that that could perform better but the idea holds.

I've come up with the following query, wonder if you meant something similar -

http://sqlfiddle.com/#!17/4ef8b/48

WITH cte1 AS (

SELECT

mid,

STRING_AGG(x->>'letter', '') AS tiles

FROM (

SELECT

mid,

JSONB_ARRAY_ELEMENTS(m.tiles) AS x

FROM moves m

WHERE m.gid = 1

) AS z

GROUP BY mid),

cte2 AS (

SELECT

mid,

STRING_AGG(y, ', ') AS words

FROM (

SELECT

mid,

FORMAT('%s (%s)', s.word, s.score) AS y

FROM scores s

WHERE s.gid = 1

) AS z

GROUP BY mid)

SELECT mid, tiles, words

FROM cte1 JOIN cte2 using (mid) ORDER BY mid ASC;

Regards

Alex

P.S. Below is the complete test data in case SQL Fiddle link stops working:

CREATE TABLE players (

uid SERIAL PRIMARY KEY,

name text NOT NULL

);

CREATE TABLE games (

gid SERIAL PRIMARY KEY,

player1 integer NOT NULL REFERENCES players ON DELETE CASCADE,

player2 integer NOT NULL REFERENCES players ON DELETE CASCADE

);

CREATE TABLE moves (

mid BIGSERIAL PRIMARY KEY,

uid integer NOT NULL REFERENCES players ON DELETE CASCADE,

gid integer NOT NULL REFERENCES games ON DELETE CASCADE,

played timestamptz NOT NULL,

tiles jsonb NOT NULL

);

CREATE TABLE scores (

mid bigint NOT NULL REFERENCES moves ON DELETE CASCADE,

uid integer NOT NULL REFERENCES players ON DELETE CASCADE,

gid integer NOT NULL REFERENCES games ON DELETE CASCADE,

word text NOT NULL CHECK(word ~ '^[A-Z]{2,}$'),

score integer NOT NULL CHECK(score >= 0)

);

INSERT INTO players (name) VALUES ('Alice'), ('Bob');

INSERT INTO games (player1, player2) VALUES (1, 2);

INSERT INTO moves (uid, gid, played, tiles) VALUES

(1, 1, now() + interval '1 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "A"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]

'::jsonb),

(2, 1, now() + interval '2 min', '[{"col": 7, "row": 12, "value": 3, "letter": "X"}, {"col": 8, "row": 12, "value": 10, "letter": "X"}, {"col": 9, "row": 12, "value": 1, "letter": "Z"}]

'::jsonb),

(1, 1, now() + interval '3 min', '[{"col": 7, "row": 12, "value": 3, "letter": "K"}, {"col": 8, "row": 12, "value": 10, "letter": "K"}, {"col": 9, "row": 12, "value": 1, "letter": "M"}, {"col": 10, "row": 12, "value": 2, "letter": "N"}]

'::jsonb),

(2, 1, now() + interval '4 min', '[]'::jsonb),

(1, 1, now() + interval '5 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "B"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]

'::jsonb),

(2, 1, now() + interval '6 min', '[{"col": 7, "row": 12, "value": 3, "letter": "P"}, {"col": 8, "row": 12, "value": 10, "letter": "Q"}]

'::jsonb);

INSERT INTO scores (mid, uid, gid, word, score) VALUES

(1, 1, 1, 'AACD', 40),

(2, 2, 1, 'XXZ', 30),

(2, 2, 1, 'XAB', 30),

(3, 1, 1, 'KKMN', 40),

(3, 1, 1, 'KYZ', 30),

(5, 1, 1, 'ABCD', 40),

(6, 2, 1, 'PQ', 20),

(6, 2, 1, 'PABCD', 50);

Re: STRING_AGG and GROUP BY

От

"David G. Johnston"

Дата:

20 марта 2018 г., 03:02:09

On Mon, Mar 19, 2018 at 1:54 PM, Alexander Farber <alexander.farber@gmail.com> wrote:

I've come up with the following query, wonder if you meant something similar -

http://sqlfiddle.com/#!17/4ef8b/48

WITH cte1 AS (
SELECT
mid,
STRING_AGG(x->>'letter', '') AS tiles
FROM (
SELECT
mid,
JSONB_ARRAY_ELEMENTS(m.tiles) AS x
FROM moves m
WHERE m.gid = 1
) AS z
GROUP BY mid),
cte2 AS (
SELECT
mid,
STRING_AGG(y, ', ') AS words
FROM (
SELECT
mid,
FORMAT('%s (%s)', s.word, s.score) AS y
FROM scores s
WHERE s.gid = 1
) AS z
GROUP BY mid)
SELECT mid, tiles, words
FROM cte1 JOIN cte2 using (mid) ORDER BY mid ASC;

Yes. It does end up presuming that the sets moves.mid and scores.mid are identical but that is probably a safe assumption. Repetition of m.gid = 1 is worth avoiding in theory though depending on how its done the solution can be worse than the problem (if the planner ends up unable to push the predicate down).

David J.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: STRING_AGG and GROUP BY

STRING_AGG and GROUP BY

Re: STRING_AGG and GROUP BY

Re: STRING_AGG and GROUP BY

Re: STRING_AGG and GROUP BY

Re: STRING_AGG and GROUP BY

Re: STRING_AGG and GROUP BY