Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Дата
Msg-id CABOikdNnFon4cJiL=h1mZH3bgUeU+sWHuU4Yr8AB=j3A2p1GiA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers


On Sun, Feb 26, 2017 at 2:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:


Fair point, but I've already said why I think the stakes for this
particular feature are pretty high.


I understand your concerns and not trying to downplay them. I'm doing my best to test the patch in different ways to ensure we can catch most of the bugs before the patch is committed. Hopefully with additional reviews and tests we can plug remaining holes, if any, and be in a comfortable state.
 
>
> (I have mentioned the idea of overloading ip_posid bits a few times now and
> haven't heard any objection so far. Well, that could either mean that nobody
> has read those emails seriously or there is general acceptance to that
> idea.. I am assuming latter :-))

I'm not sure about that.  I'm not really sure I have an opinion on
that yet, without seeing the patch.  The discussion upthread was a bit
vague:

Attached is a complete set of rebased and finished patches. Patches 0002 and 0003 does what I've in mind as far as OffsetNumber bits.

AFAICS this version is a fully functional implementation of WARM, ready for serious review/test. The chain conversion is now fully functional and tested with btrees. I've also added support for chain conversion in hash indexes by overloading ip_posid high order bits. Even though there is a free bit available in btree index tuple, the patch now uses the same ip_posid bit even for btree indexes.

A short summary of all attached patches.

0000_interesting_attrs_v15.patch:

This is Alvaro's patch to refactor HeapSatisfiesHOTandKeyUpdate. We now return a set of modified attributes and let the caller consume that information in a way it wants. The main WARM patch uses this refactored API.

0001_track_root_lp_v15.patch:

This implements the logic to store the root offset of the HOT chain in the t_ctid.ip_posid field. We use a free bit in heap tuple header to mark that a particular tuple is at the end of the chain and store the root offset in the ip_posid. For pg_upgraded clusters, this information could be missing and we do the hard-work of going through the page tuples to find the root offset. 

0002_clear_ip_posid_blkid_refs_v15.patch:

This is mostly a cleanup patch which removes direct references to ip_posid and ip_blkid from various places and replace them with appropriate ItemPointer[Get|Set][Offset|Block]Number macros.

0003_freeup_3bits_ip_posid_v15.patch:

This patch frees up the high order 3 bits from ip_posid and makes them available for other uses. As noted, we only need 13 bits to represent OffsetNumber and hence the high order bits are unused. This patch should only be applied along with 0002_clear_ip_posid_blkid_refs_v15.patch

0004_warm_updates_v15.patch:

This implements the main WARM logic, except for chain conversion (which is implemented in the last patch of the series). It uses another free bit in the heap tuple header to identify the WARM tuples. When the first WARM update happens, the old and new versions of the tuple are marked with this flag. All subsequent HOT tuples in the chain are also marked with this flag so we never lose information about WARM updates, irrespective of whether it commits or aborts. We then implement recheck logic to decide which index pointer should return a tuple from the HOT chain.

WARM is currently supported for hash and btree indexes. If a table has an index of any other type, WARM is disabled.

0005_warm_chain_conversion_v15.patch:

This patch implements the WARM chain conversion as discussed upthread and also noted in the README.WARM. This patch requires yet another bit in the heap tuple header. But since the bit is only set along with the HEAP_WARM_TUPLE bit, we can safely reuse HEAP_MOVED_OFF bit for this purpose. We also need a bit to distinguish two copies of index pointers to know which pointer points to the pre-WARM-update HOT chain (Blue chain) and which pointer points to post-WARM-update HOT chain (Red chain). We steal this bit from t_tid.ip_posid field in the index tuple headers. As part of this patch, I moved XLOG_HEAP2_MULTI_INSERT to RM_HEAP_ID (and renamed it to XLOG_HEAP_MULTI_INSERT). While it's not necessary, I thought it will allow us to restrict XLOG_HEAP_INIT_PAGE to RM_HEAP_ID and make that bit available to define additional opcodes in RM_HEAD2_ID.

I've done some elaborate tests with these patches applied. I've primarily used make-world, pgbench with additional indexes and the WARM stress test (which was useful in catching CIC bug) to test the feature. While it does not mean there are no additional bugs, all bugs that were known to me are fixed in this version. I'll continue to run more tests, especially around crash recovery, when indexes are dropped and recreated and also do more performance tests.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Erik Rijkers
Дата:
Сообщение: Re: [HACKERS] Logical replication existing data copy
Следующее
От: "Okano, Naoki"
Дата:
Сообщение: Re: [HACKERS] Keep ECPG comment for log_min_duration_statement