Re: UNDO and in-place update

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: UNDO and in-place update
Дата
Msg-id CAPpHfdtW1PrUSvP82rkQ31BDpA-bCyCPSnc0od7O6ZjFFJAotg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: UNDO and in-place update  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: UNDO and in-place update  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Thu, Dec 1, 2016 at 6:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
There's a couple of possible designs here, but there is the
possibility for extra hops in some cases.  But there are things we can
do to mitigate that.

1. If each tuple has a definitely-all-visible bit, you can check that
first; if it's set, the tuple is definitely all-visible you don't need
to do anything further.

2. If we still maintain a visibility map, you can check that next.  If
the bit is set for the target page, then the tuple is definitely
all-visible and you don't need to do anything further.

3. Otherwise, you need to look at the heap page.  But if the heap
page's UNDO pointer is invalid, then the whole page is all-visible, so
you're done.  (You can also set the visibility map bit and/or the
index tuple's definitely-all-visible bit to avoid having to recheck
the heap page in the future.)

4. Each tuple will have a bit indicating whether it's been recently
modified; that is, whether the transaction whose UNDO log is
referenced by the page's UNDO pointer did anything to that tuple; or
if the page points to a TPD entry, whether any of the transactions in
the TPD modified that tuple.  If that bit is not set for that
particular tuple, it's all-visible and you're done.

5. Otherwise you have to look at the TPD entry (if any) and UNDO logs.

If we put in the visibility information in the index as you are
proposing, then we can always resolve the visibility of the index
tuple at step 1; we just test xmin and xmax against the snapshot.  For
index-only scans, that's great, because we never need to consult the
heap at all.  For regular index scans, it's not nearly as great,
because we're still going to need to consult the TPD and UNDO logs to
determine which version of the tuple is visible to that snapshot.  You
can avoid looking at the heap in the case where no version of the
tuple is visible to the scan, but that's probably mostly going to
happen only in cases where the transaction is still in flight so in
most cases the heap will be in shared_buffers anyway and you won't
save very much.

Idea of storing just one visibility bit in index tuple is a subject of serious doubts for me.

1. When definitely-all-visible isn't set then we have to recheck during scanning heap, right?
But our current recheck (i.e. rechecking scan quals) is not enough.  Imagine you have a range
index scan (val >= 100 AND val < 200), and val field was updated from val = 50 and val = 60
in some row.  Then you might have this row duplicated in the output.  Removing index scan and
sticking to only bitmap index scan doesn't seem to be fair.  Thus, we need to introduce another
kind of recheck that heapTuple.indexKey = indexTuple.indexKey.

2. Another question is how it could work while index key being intensively updated.  There is a risk
that we would have to traverse same UNDO-chains multiple times.  In worst case, we would have
to spend quadratic time of number of row versions since our snapshot taken.  We might try to mitigate this
by caching TID => heap tuple map for our snapshot.  But I don't like this approach.
If we have large enough index scan, then we could either run out of cache or consume too much memory
for that cache.

So, I think storing visibility information in index tuple is great for regular index scans too.  At least,
it allow us to evade #2 problem.  That is great by itself.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tobias Bussmann
Дата:
Сообщение: Re: Parallel execution and prepared statements
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench more operators & functions