Re: [BUG?] check_exclusion_or_unique_constraint false negative
От | Mihail Nikalayeu |
---|---|
Тема | Re: [BUG?] check_exclusion_or_unique_constraint false negative |
Дата | |
Msg-id | CADzfLwXGhH_qD6RGqPyEeKdmHgr-HpA-tASYdi5onP+RyP5TCw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [BUG?] check_exclusion_or_unique_constraint false negative (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: [BUG?] check_exclusion_or_unique_constraint false negative
|
Список | pgsql-hackers |
Hello, Amit, > IIUC, the problem you are worried about can happen with DELETE+INSERT It seems there was some misunderstanding due to my bad explanation and wording. I wrote "A concurrent transaction deletes a tuple and inserts a new one with a different TID" - but I mean logical UPDATE causing new TID in index page appear because HOT was applied... Lets try again, I hope that explanation is better: At the start, we have a table with a primary key and one extra index (to disable HOT), and a tuple with i=13: CREATE TABLE table (i int PRIMARY KEY, data text); CREATE INDEX no_more_hot_data_index ON table (data); INSERT INTO table (i, data) VALUES (13, 'data'); A btree scan using SnapshotDirty can miss tuples because of internal locking logic. Here’s how the bug shows up: 1) we have a tuple in the index (i=13), committed long ago 2) transaction A starts an index search for that tuple using SnapshotDirty (WHERE i = 13) 3) in parallel, transaction B updates that tuple (SET data='updated' WHERE i=13) and commits (creating a new index entry because HOT is not applied) 4) the scan from step 2 returns nothing at all - as if the tuple never existed In other words, if you start a SnapshotDirty btree scan for i=13 and update that row i=13 at the same physical moment, the scan may: * return the TID of the pre‑update version - correct behavior * return the TID of the post‑update version - also correct * return nothing - this is the broken case More broadly: any SnapshotDirty scan may completely miss existing data when there are concurrent updates. SnapshotDirty usage in Postgres is limited, so the impact isn’t huge, but every case I found is reproducible with the tests from the first commit from v10 in my previous email. * check_exclusion_or_unique_constraint: only a minor performance impact, handled by retry logic * logical replication TAP tests: multiple scenarios fail because RelationFindReplTupleByIndex cannot find existing committed tuples These scenarios look like: 1) logical replication tries to apply a change for tuple X received from the publisher 2) meanwhile, the subscriber updates the same tuple X and commits in parallel transaction 3) due to the bug, RelationFindReplTupleByIndex concludes the tuple X does not exist at all, leading to bad outcomes, including: * incorrect conflict‑type messages (and, in the future, potentially wrong conflict‑resolution choices) * lost updates (see scenario 2 from [0]) If you look at the tests and play with the $simulate_race_condition flag, you can see the behavior directly. The second commit (a possible fix) in v10 also includes documentation updates that try to explain the issue in a more appropriate context. I’m happy to provide additional reproducers or explanations if that would help. [0]: https://www.postgresql.org/message-id/flat/CADzfLwWC49oanFSGPTf%3D6FJoTw-kAnpPZV8nVqAyR5KL68LrHQ%40mail.gmail.com#5f6b3be849f8d95c166decfae541df09 Best regards, Mikhail.
В списке pgsql-hackers по дате отправления: