Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update
| От | Bernice Southey |
|---|---|
| Тема | Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update |
| Дата | |
| Msg-id | CAEDh4nw=h5NihhuBgq4GFCfPP9vcg54QJ+K0XsG9S0owWDUhfA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update (Tender Wang <tndrwang@gmail.com>) |
| Ответы |
Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update
|
| Список | pgsql-bugs |
Hi, I've spent some time looking at the code and history of this bug because I also ran into it [1] and I was deeply curious why concurrent updates worked with a hash join + seq scan but not a nested loop + index scan. I was also curious if this bug is why Yuri didn't see deadlocks in this related bug [2]. FWIW, I found a simpler way to reproduce this: create table t(id int primary key, count int); insert into t values (1, 0), (2, 0); --session 1 begin; update t set count = count + 1; --session 2 set enable_seqscan = off; update t set count = count + 1 from (values (1), (2)) v(id) where t.id = v.id; Only one update succeeds in session 2. Tender Wang wrote: > Dean Rasheed wrote: >> I'm somewhat conflicted as to which approach is better. I think maybe >> there is less chance of other unintended side-effects if the set of >> RTEs included in PlannerGlobal.allRelids, unprunableRelids, and >> es_unpruned_relids is not changed. However, as it stands, >> PlannerGlobal.allRelids is misnamed (it should probably have been >> called "relationRelids", in line with the "relationOids" field). >> Making it actually include all RTEs would solve that. > Per the comment, I guess Amit didn't plan to include all RTEs in his design. While digging, I noticed that allRelids seems only to be used to create unprunableRelids. But unprunableRelids is used among other things, to create es_unpruned_relids and as an unpruned_relids parameter to ExecInitRangeTable. It's difficult to follow all the paths where the missing relids will matter. So I think Tender's approach is safer if amended to include all relids. I'm very new to postgres source code, so this is a very tentative opinion. Amit can best answer after his holiday. I was curious enough to test in the meantime. I made an experimental patch that just adds all rels to allRelids. This fixed all the issues I'm aware of. It caused overexplain tests to fail by adding in new non-relation relIds as expected. Apologies if I'm getting the terminology wrong. I ran Yuri's reproducer on my patched branch and indeed got several deadlocks. psql:crash.sql:3: ERROR: deadlock detected DETAIL: Process 71673 waits for ShareLock on transaction 2766; blocked by process 71643. Process 71643 waits for ShareLock on transaction 2777; blocked by process 71673. Curiously, in the test I added (based on Dean's), enable_indexscan or enable_seqscan didn't make any difference to the plan. But when I run the reproducer manually, it does change the plan. Since the bug disappears with a hash join + seq scan, the plan is critical. So I included 'set enable_seqscan to 0' for reference and future proofing. I didn't attempt to fix the comment because I don't understand it. I also didn't check all my tabs vs spaces, or run TAP tests, as this patch is just meant for info. Most of the code is still way over my head, so my apologies if none of this is helpful. As to why this works for a hash join but not a nested loop, I have some ideas, but I'm still digging. Happy New Year! Bernice [1] https://www.postgresql.org/message-id/flat/CAMbWs4-uC8DWRZvByHej%2B9E5em7V_%2BzNyRfFx-UUpR4HwifK4w%40mail.gmail.com [2] https://www.postgresql.org/message-id/ly6rss443iajiv5cfypzu6fgrmxdbwzdtuujkddr4eis7s2gcq%40gcdytigety6e
Вложения
В списке pgsql-bugs по дате отправления: