Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)
От | Peter Geoghegan |
---|---|
Тема | Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch) |
Дата | |
Msg-id | CAM3SWZTfTt_fehet3tU3YKCpCYPYnNaUqUZ3Q+NAASnH-60teA@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Problems with approach #2 to value locking (INSERT ... ON
CONFLICT UPDATE/IGNORE patch)
(Peter Geoghegan <pg@heroku.com>)
Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch) (Peter Geoghegan <pg@heroku.com>) |
Список | pgsql-hackers |
I've been working on fixing the bugs that Jeff Janes found [1] with approach #2 to value locking [2]. Approach #1 was unaffected. I'm starting this new thread, to discuss these issues. Let's try and confine discussion of semantics and syntax to the main thread, since that has been what has mostly been discussed there. Jeff produced a stress testing tool which tests concurrent deletions (and not just concurrent updates); his test case deletes tuples when a counter value goes under zero, which happens occasionally. It also independently does book keeping for all values in a range of values that are randomly incremented or decremented, and verifies that everything is as it should be. This revealed problems with #2 around concurrent "super deletion" of "unkept" promise heap tuples. As Jeff pointed out, all-NULL tuples were appearing as visible, or spurious duplicates appeared. For example: postgres=# select xmin, xmax, ctid, * from upsert_race_test where index = 1287; xmin | xmax | ctid | index | count --------+--------+-----------+--------+-------- 0 | 205438 | (260,27) | [null] | [null] 176066 | 0 | (258,158) | 1287 | 1 (2 rows) I should point out that one of the problems I found here was a garden variety bug, which has been fixed (the conflict flag in ExecInsert() was not being reset correctly, I believe). However, the other remaining problems could only be fixed with further changes to the routines in tqual.c, which I'm not happy about, and require further discussion here. I attach a differential patch which applies on top of the most recent revision of #2, which is v1.8.vallock2.tar.gz [3]. Note that I've been extending Jeff's stress testing tool to make it highlight problems earlier, and to make it provide more and better instrumentation (e.g. printing of XIDs to correlate with pg_xlogdump output). A good trick I added was to have an INSERT...ON CONFLICT UPDATE predicate of "WHERE TARGET.index = EXCLUDED.index", which serves as a kind of assertion when used within the Perl script (the Perl scripts checks RETURNING-projected tuples, and expects to always insert or update - that WHERE clause should always pass, obviously). Jeff's tool is available here (with my revisions): https://github.com/petergeoghegan/jjanes_upsert I've been running with fsync off, which Jeff felt increased the likelihood of revealing the bug. It isn't necessary, though, and FWIW it was easy for me to recreate this problem once I understood it, using only my 4 core laptop. I found it important to build without assertions and at optimization level -O2, though. The attached patch fixes Jeff's test case, as far as it goes. But, as I mentioned, it's not intended as a real fix. For one thing, I have made multiple changes to HeapTupleSatisfies*() routines, but haven't fully considered the consequences for, say, ri_triggers.c. HeapTupleSatisfiesSelf() and HeapTupleSatisfiesHistoricMVCC() were not modified. I've modified HeapTupleSatisfiesVacuum() to see InvalidTransactionID (not invalid xid infomask bits) as visible for garbage collection (alongside HeapTupleSatisfiesDirty() and HeapTupleSatisfiesMVCC(), which I changed first), and I wouldn't be surprised if that created new problems in other areas. In short, this patch is just for illustrative purposes. To see the difference this patch makes, I suggest commenting out the new code, and running the following test after a fresh initdb: $ perl count_upsert.pl 4 100000 I think that the actual nature of the problem I fixed was that concurrent upserters felt entitled to update a promise tuple that they could at least initially see, but was then "super deleted", but it turned out that once it was super deleted it was okay to update (the tuple wasn't actually duplicated insofar as HeapTupleSatisfiesDirty() was then able to ascertain due to other concurrent activity). It's pretty complicated, in any case. This is just a band-aid. Thoughts? Does anyone have any thoughts on my diagnosis of the problem? [1] http://www.postgresql.org/message-id/CAMkU=1w8e9Q6ZX3U85RtsyCMpdYWFrvVAO3=uNEvtqiRUzntaQ@mail.gmail.com [2] https://wiki.postgresql.org/wiki/Value_locking#.232._.22Promise.22_heap_tuples_.28Heikki_Linnakangas.29 [3] http://www.postgresql.org/message-id/CAM3SWZRg_hTrOL-6_wfe6_d_UcUYc28JfaPsFh_tra76GkkdNw@mail.gmail.com -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Dilip kumarДата:
Сообщение: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]