Re: UNDO and in-place update

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: UNDO and in-place update
Дата
Msg-id CA+Tgmoa3uMvYfiEThPiam7hhfAuB7sU9bFqJZkNasfLk7oqsHA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: UNDO and in-place update  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Nov 22, 2016 at 11:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Oracle spends a lot of time on this, and it's really cache-inefficient
>>> because the data is spread all over.  This was what this guy felt in
>>> circa 2001; I'd have to think that the cache unfriendliness problem is
>>> much worse for modern hardware.
>
>> I imagine that temporal locality helps a lot. Most snapshots will be
>> interested in the same row version.
>
> Yeah, but the problem is that all those transactions have to scavenge
> through a big chunk of UNDO log to find out what they need to know.

In my design, there's one UNDO log per transaction, so if a page has
been recently modified only by a single transaction whose effects are
visible to your snapshot, you can very quickly decide that you don't
need to look at UNDO at all.  If it's been recently modified by
multiple transactions all of which are visible to your snapshot, you
can consult the TPD and then decide that none of those transactions
are interesting and therefore their UNDO doesn't need to be examined.

When there are transactions that whose effects are not visible to your
snapshot, the cost is proportional to the number of times those
transactions have modified the page.  Therefore, the "bad" case for
this design is when the same page gets updated a large number of times
by one or more concurrent transactions.  If the transaction you can't
see has made 437 separate modifications to the page, you're going to
need to look at all 437 UNDO records to figure out what you can see.
That is likely to suck.

I think that one of the crucial questions for this design is how often
that will happen.  For people with mostly static data, it should be
rare.  In something like a pgbench workload, it boils down to how
often multiple transactions hit the same page before the previous
transactions that hit that page become all-visible.  Of course, if you
have too many backends relative to the scale factor, pgbench is going
to become lock-bound anyway and it probably won't matter much, but
it's about two orders of magnitude easier to hit the same page than to
hit the same tuple, so it could be that the point where you start
incurring significant costs for visibility checking and page
reconstruction happens well before the tuple lock contention becomes
noticeable.  It might be worth doing some mathematical modeling to try
to estimate how serious that effect is likely to be.

There are also some things that you can do to mitigate this, although
they add even more complexity.  For example:

- If the same transaction touches the same page repeatedly, you could
make a rule that every 5th or every 10th or every 20th modification to
the same page within the same transaction emits a larger summary
record into the UNDO that is a consolidated record of all changes made
to the page by that transaction thus far.  That penalizes writers, but
helps readers; when a reader reaches a summary record, it need look no
further.

- Peter alluded to the possibility - or so I thought - of caching
reconstructed page versions.  That would make a lot of sense for
sequential scans or cases where the same page is being accessed
repeatedly.

- If you're doing an index scan and you only care about one tuple, the
page could contain one bit per tuple which is set if the tuple has
been modified by any transaction the page views as recent.  If the bit
is clear for the one tuple you care about, you can ignore the UNDO
even if is for a transaction not visible to your snapshot, because you
know it didn't touch that tuple.

- If you create a TPD entry for the page because there are multiple
recent transactions that have modified it, you can put additional
information into the TPD entry to speed up access.  For example, you
could store an array indexed by itemid which indicates which of the
recent transactions have touched any given itemid.  This is mostly
useful for index scans, which can then ignore all the transactions
that don't pertain to the one TID they care about.

Which of those ideas are best?  Are they enough, even if we do them
all?  Will there be other down sides?  I don't know.  But I think it's
clear that bloat is an ongoing and serious concern for our users (cf.
snapshot too old), and that vacuuming is too (cf. freeze map) and I
believe we are reaching a point where we can't make much further
improvement in those areas without some kind of major architectural
change.  WARM might qualify; this idea certainly does.  I also believe
that we NEED to improve further.  At least for EnterpriseDB, the
problems I'm trying to address through this design are big problems.
If this design isn't good, let's come up with something else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Hash Indexes
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: [RFC] Should we fix postmaster to avoid slow shutdown?