Re: An issue with missing rows
От | Viktor Semykin |
---|---|
Тема | Re: An issue with missing rows |
Дата | |
Msg-id | CACaAQPUnj-aqm9aAFc+JXTRjxKw_bmMMbaE-0NnF28sq3iE1sg@mail.gmail.com обсуждение исходный текст |
Ответ на | An issue with missing rows (Viktor Semykin <thesame.ml@gmail.com>) |
Список | pgsql-bugs |
I'm sorry for the noise, my query just missing ORDER BY clause, my fault. On Sat, Feb 13, 2021 at 6:01 PM Viktor Semykin <thesame.ml@gmail.com> wrote: > > I have faced a weird issue with PostgreSQL database in conjunction with > nodejs+Sequelize. I saw this issue happening in 13.1 and 10.5. nodejs is 12.16. > Please let me know if I have to re-format this email somehow for > comfortable reading. > > I'll try to explain step by step what's going on. Sequelize > library test suite includes a test that have been passing successfully until I > broke it. After my changes, the following queries fail about once in 20 times. > > In a nutshell, my queries look like this (here and below all createdAt and > updatedAt fields may be inaccurate): > > ```sql > CREATE SCHEMA IF NOT EXISTS archive; > DROP TABLE IF EXISTS "archive"."Tasks" CASCADE; > DROP TABLE IF EXISTS "archive"."Users" CASCADE; > CREATE TABLE IF NOT EXISTS "archive"."Users" ("id" SERIAL , > "createdAt" TIMESTAMP WITH TIME ZONE NOT NULL, "updatedAt" TIMESTAMP > WITH TIME ZONE NOT NULL, PRIMARY KEY ("id")); > CREATE TABLE IF NOT EXISTS "archive"."Tasks" ("id" INTEGER , "title" > VARCHAR(255), "createdAt" TIMESTAMP WITH TIME ZONE NOT NULL, > "updatedAt" TIMESTAMP WITH TIME ZONE NOT NULL, "UserId" INTEGER > REFERENCES "archive"."Users" ("id") ON DELETE SET NULL ON UPDATE > CASCADE, PRIMARY KEY ("id")); > INSERT INTO "archive"."Users" ("id","createdAt","updatedAt") VALUES > (1,'...','...') RETURNING "id","createdAt","updatedAt"; > INSERT INTO "archive"."Users" ("id","createdAt","updatedAt") VALUES > (2,'...','...') RETURNING "id","createdAt","updatedAt"; > INSERT INTO "archive"."Tasks" > ("id","title","createdAt","updatedAt","UserId") VALUES > (1,'b','...','...',1) RETURNING > "id","title","createdAt","updatedAt","UserId"; > INSERT INTO "archive"."Tasks" > ("id","title","createdAt","updatedAt","UserId") VALUES > (2,'d','...','...',1) RETURNING > "id","title","createdAt","updatedAt","UserId"; > INSERT INTO "archive"."Tasks" > ("id","title","createdAt","updatedAt","UserId") VALUES > (3,'c','...','...',1) RETURNING > "id","title","createdAt","updatedAt","UserId"; > INSERT INTO "archive"."Tasks" > ("id","title","createdAt","updatedAt","UserId") VALUES > (4,'a','...','...',1) RETURNING > "id","title","createdAt","updatedAt","UserId"; > INSERT INTO "archive"."Tasks" > ("id","title","createdAt","updatedAt","UserId") VALUES > (5,'a','...','...',2) RETURNING > "id","title","createdAt","updatedAt","UserId"; > INSERT INTO "archive"."Tasks" > ("id","title","createdAt","updatedAt","UserId") VALUES > (6,'c','...','...',2) RETURNING > "id","title","createdAt","updatedAt","UserId"; > INSERT INTO "archive"."Tasks" > ("id","title","createdAt","updatedAt","UserId") VALUES > (7,'b','...','...',2) RETURNING > "id","title","createdAt","updatedAt","UserId"; > SELECT "User"."id", "User"."createdAt", "User"."updatedAt" FROM > "archive"."Users" AS "User" ORDER BY "User"."id" ASC; > SELECT "Task".* FROM (SELECT * FROM (SELECT "id", "title", > "createdAt", "updatedAt", "UserId" FROM "archive"."Tasks" AS "Task" > WHERE "Task"."UserId" = 1 ORDER BY "Task"."id" ASC LIMIT 2) AS sub > UNION ALL SELECT * FROM (SELECT "id", "title", "createdAt", > "updatedAt", "UserId" FROM "archive"."Tasks" AS "Task" WHERE > "Task"."UserId" = 2 ORDER BY "Task"."id" ASC LIMIT 2) AS sub) AS > "Task"; > DROP SCHEMA IF EXISTS archive CASCADE; > ``` > > in other words, in instantiates a test data like this: > > ``` > sequelize_test=# select * from "archive"."Users"; > id | createdAt | updatedAt > ----+-------------------------------+------------------------------- > 1 | 2021-02-13 16:59:04.190643-03 | 2021-02-13 16:59:04.190643-03 > 2 | 2021-02-13 16:59:04.217696-03 | 2021-02-13 16:59:04.217696-03 > (2 rows) > > sequelize_test=# select * from "archive"."Tasks"; > id | title | createdAt | updatedAt > | UserId > ----+-------+-------------------------------+-------------------------------+-------- > 1 | b | 2021-02-13 16:59:04.222593-03 | 2021-02-13 > 16:59:04.222593-03 | 1 > 2 | d | 2021-02-13 16:59:04.238252-03 | 2021-02-13 > 16:59:04.238252-03 | 1 > 3 | c | 2021-02-13 16:59:04.24318-03 | 2021-02-13 > 16:59:04.24318-03 | 1 > 4 | a | 2021-02-13 16:59:04.256812-03 | 2021-02-13 > 16:59:04.256812-03 | 1 > 5 | a | 2021-02-13 16:59:04.262364-03 | 2021-02-13 > 16:59:04.262364-03 | 2 > 6 | c | 2021-02-13 16:59:04.277525-03 | 2021-02-13 > 16:59:04.277525-03 | 2 > 7 | b | 2021-02-13 16:59:04.281861-03 | 2021-02-13 > 16:59:04.281861-03 | 2 > ``` > > Then I'm trying to fetch all Users, and for each one of them only two tasks > ordered by "id". This is what my query (the Faulty Query) looks like: > > ```sql > SELECT "User"."id", > "User"."createdAt", > "User"."updatedAt", > "tasks"."id" AS "tasks.id", > "tasks"."title" AS "tasks.title", > "tasks"."createdAt" AS "tasks.createdAt", > "tasks"."updatedAt" AS "tasks.updatedAt", > "tasks"."UserId" AS "tasks.UserId" > FROM "archive"."Users" AS "User" > LEFT OUTER JOIN LATERAL > (SELECT * > FROM "archive"."Tasks" AS "tasks" > WHERE "User"."id" = "tasks"."UserId" > LIMIT 2) AS "tasks" ON TRUE > ORDER BY "User"."id" ASC; > ``` > > and most of the time it returns result as expected: > > ``` > id | createdAt | updatedAt | > tasks.id | tasks.title | tasks.createdAt | > tasks.updatedAt | tasks.UserId > ----+-------------------------------+-------------------------------+----------+-------------+-------------------------------+-------------------------------+-------------- > 1 | 2021-02-13 16:59:04.190643-03 | 2021-02-13 16:59:04.190643-03 | > 1 | b | 2021-02-13 16:59:04.222593-03 | 2021-02-13 > 16:59:04.222593-03 | 1 > 1 | 2021-02-13 16:59:04.190643-03 | 2021-02-13 16:59:04.190643-03 | > 2 | d | 2021-02-13 16:59:04.238252-03 | 2021-02-13 > 16:59:04.238252-03 | 1 > 2 | 2021-02-13 16:59:04.217696-03 | 2021-02-13 16:59:04.217696-03 | > 5 | a | 2021-02-13 16:59:04.262364-03 | 2021-02-13 > 16:59:04.262364-03 | 2 > 2 | 2021-02-13 16:59:04.217696-03 | 2021-02-13 16:59:04.217696-03 | > 6 | c | 2021-02-13 16:59:04.277525-03 | 2021-02-13 > 16:59:04.277525-03 | 2 > ``` > > note that Tasks ids are 1,2,5,6. But there are chances that it returns > wrong tasks, for example 1,2,5,7. The problem is that whenever I add logging > it works correctly 100% of the time. > > The good news I was lucky enough to catch the error with Wireshark. Sequelize > uses connection pooling so the queries are spread across 5 separate > tcp connections and > Wireshark output is quite messy. I had to parse the output with an ad-hoc > script that generates a visual graph of the underlying events. This graph is > attached as graph.svg. It's handy to open it in the browser and zoom > out. Graphviz > is not my strongest point, so let me briefly explain what this graph means. > Each node is a pgsql-relevant network frame. Red edges denote TCP connections. > Black arrows denote sequence order. The red nodes are what matters. Each frame > begins with its sequence number and a direction (in/out). Body of each node > shows the list of commands sent/received. > > Frames 1062-1084 are boring preludes. Then at 1086 comes the CREATE > SCHEMA query. > Then come some Sequelize noise like DROP TABLE IF EXISTS of tables that do not > exist. Then in 1098 and 1105 it actually queries table creation. > At 1109 it starts INSERTing data into "Users". Then the pool comes into play. > At 1122 it spawns one more connection and at 1130 it starts INSERTing the second > row into "Users". At 1142 and 1146 the Postgres reports both connections are > "Ready for query". Then at 1148 it starts INSERTing a row into "Tasks" in the > first connection, and at 1165 a row in the second connection. > At 1180 the fun begins, it spawns three more connections and keeps INSERTing > rows into "Tasks" using all the connections. > Around 1263-1270 all the INSERT queries are executed and the connections > reported "Ready for query". > > At 1273 it starts SELECTing the Faulty Query and 1275 returns the seem-to-be > incorrect result. The Tasks rows in the response are > `(1,'b'),(2,'d'),(5,'a'),(7,'b')`. It looks like (6,'c') haven't landed yet. > > So I come to the main question: did I find a bug or my assumptions about > row visibility are just wrong?
В списке pgsql-bugs по дате отправления:
Следующее
От: PG Bug reporting formДата:
Сообщение: BUG #16865: Regression: GIN Negated prefix search returns results that contain the search term