Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)
Дата
Msg-id CAM3SWZTfTt_fehet3tU3YKCpCYPYnNaUqUZ3Q+NAASnH-60teA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)  (Peter Geoghegan <pg@heroku.com>)
Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
I've been working on fixing the bugs that Jeff Janes found [1] with
approach #2 to value locking [2]. Approach #1 was unaffected.

I'm starting this new thread, to discuss these issues. Let's try and
confine discussion of semantics and syntax to the main thread, since
that has been what has mostly been discussed there.

Jeff produced a stress testing tool which tests concurrent deletions
(and not just concurrent updates); his test case deletes tuples when a
counter value goes under zero, which happens occasionally. It also
independently does book keeping for all values in a range of values
that are randomly incremented or decremented, and verifies that
everything is as it should be. This revealed problems with #2 around
concurrent "super deletion" of "unkept" promise heap tuples. As Jeff
pointed out, all-NULL tuples were appearing as visible, or spurious
duplicates appeared. For example:

postgres=# select xmin, xmax, ctid, * from upsert_race_test where index = 1287;
  xmin  |  xmax  |   ctid    | index  | count
--------+--------+-----------+--------+--------
      0 | 205438 | (260,27)  | [null] | [null]
 176066 |      0 | (258,158) |   1287 |      1
(2 rows)

I should point out that one of the problems I found here was a garden
variety bug, which has been fixed (the conflict flag in ExecInsert()
was not being reset correctly, I believe). However, the other
remaining problems could only be fixed with further changes to the
routines in tqual.c, which I'm not happy about, and require further
discussion here. I attach a differential patch which applies on top of
the most recent revision of #2, which is v1.8.vallock2.tar.gz [3].

Note that I've been extending Jeff's stress testing tool to make it
highlight problems earlier, and to make it provide more and better
instrumentation (e.g. printing of XIDs to correlate with pg_xlogdump
output). A good trick I added was to have an INSERT...ON CONFLICT
UPDATE predicate of "WHERE TARGET.index = EXCLUDED.index", which
serves as a kind of assertion when used within the Perl script (the
Perl scripts checks RETURNING-projected tuples, and expects to always
insert or update - that WHERE clause should always pass, obviously).
Jeff's tool is available here (with my revisions):
https://github.com/petergeoghegan/jjanes_upsert

I've been running with fsync off, which Jeff felt increased the
likelihood of revealing the bug. It isn't necessary, though, and FWIW
it was easy for me to recreate this problem once I understood it,
using only my 4 core laptop. I found it important to build without
assertions and at optimization level -O2, though.

The attached patch fixes Jeff's test case, as far as it goes. But, as
I mentioned, it's not intended as a real fix. For one thing, I have
made multiple changes to HeapTupleSatisfies*() routines, but haven't
fully considered the consequences for, say, ri_triggers.c.
HeapTupleSatisfiesSelf() and HeapTupleSatisfiesHistoricMVCC() were not
modified. I've modified HeapTupleSatisfiesVacuum() to see
InvalidTransactionID (not invalid xid infomask bits) as visible for
garbage collection (alongside HeapTupleSatisfiesDirty() and
HeapTupleSatisfiesMVCC(), which I changed first), and I wouldn't be
surprised if that created new problems in other areas. In short, this
patch is just for illustrative purposes.

To see the difference this patch makes, I suggest commenting out the
new code, and running the following test after a fresh initdb:

$ perl count_upsert.pl 4 100000

I think that the actual nature of the problem I fixed was that
concurrent upserters felt entitled to update a promise tuple that they
could at least initially see, but was then "super deleted", but it
turned out that once it was super deleted it was okay to update (the
tuple wasn't actually duplicated insofar as HeapTupleSatisfiesDirty()
was then able to ascertain due to other concurrent activity). It's
pretty complicated, in any case. This is just a band-aid.

Thoughts? Does anyone have any thoughts on my diagnosis of the problem?

[1] http://www.postgresql.org/message-id/CAMkU=1w8e9Q6ZX3U85RtsyCMpdYWFrvVAO3=uNEvtqiRUzntaQ@mail.gmail.com
[2] https://wiki.postgresql.org/wiki/Value_locking#.232._.22Promise.22_heap_tuples_.28Heikki_Linnakangas.29
[3] http://www.postgresql.org/message-id/CAM3SWZRg_hTrOL-6_wfe6_d_UcUYc28JfaPsFh_tra76GkkdNw@mail.gmail.com
--
Peter Geoghegan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip kumar
Дата:
Сообщение: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Redesigning checkpoint_segments