Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Дата
Msg-id CABOikdMSg-g_CiWga9c=zGrVC_86qjfYKvDrUeyPs-P=U62ffg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Ответы Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Список pgsql-hackers
Hi Alvaro,

On Tue, Jan 17, 2017 at 8:41 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Reading through the track_root_lp patch now.


Thanks for the review.
 
> +             /*
> +              * For HOT (or WARM) updated tuples, we store the offset of the root
> +              * line pointer of this chain in the ip_posid field of the new tuple.
> +              * Usually this information will be available in the corresponding
> +              * field of the old tuple. But for aborted updates or pg_upgraded
> +              * databases, we might be seeing the old-style CTID chains and hence
> +              * the information must be obtained by hard way
> +              */
> +             if (HeapTupleHeaderHasRootOffset(oldtup.t_data))
> +                     root_offnum = HeapTupleHeaderGetRootOffset(oldtup.t_data);
> +             else
> +                     heap_get_root_tuple_one(page,
> +                                     ItemPointerGetOffsetNumber(&(oldtup.t_self)),
> +                                     &root_offnum);

Hmm.  So the HasRootOffset tests the HEAP_LATEST_TUPLE bit, which is
reset temporarily during an update.  So that case shouldn't occur often.

Right. The root offset is stored only in those tuples where HEAP_LATEST_TUPLE is set. This flag should generally be set on the tuples that are being updated, except for the case when the last update failed and the flag was cleared. In other common case is going to be pg-upgraded cluster where none of the existing tuples will have this flag set. So in those cases, we must find the root line pointer hard way.
 

Oh, I just noticed that HeapTupleHeaderSetNextCtid also clears the flag.

Yes, but this should happen only during updates and unless the update fails, the next-to-be-updated tuple should have the flag set.
 

> @@ -4166,10 +4189,29 @@ l2:
>               HeapTupleClearHotUpdated(&oldtup);
>               HeapTupleClearHeapOnly(heaptup);
>               HeapTupleClearHeapOnly(newtup);
> +             root_offnum = InvalidOffsetNumber;
>       }
>
> -     RelationPutHeapTuple(relation, newbuf, heaptup, false);         /* insert new tuple */
> +     /* insert new tuple */
> +     RelationPutHeapTuple(relation, newbuf, heaptup, false, root_offnum);
> +     HeapTupleHeaderSetHeapLatest(heaptup->t_data);
> +     HeapTupleHeaderSetHeapLatest(newtup->t_data);
>
> +     /*
> +      * Also update the in-memory copy with the root line pointer information
> +      */
> +     if (OffsetNumberIsValid(root_offnum))
> +     {
> +             HeapTupleHeaderSetRootOffset(heaptup->t_data, root_offnum);
> +             HeapTupleHeaderSetRootOffset(newtup->t_data, root_offnum);
> +     }
> +     else
> +     {
> +             HeapTupleHeaderSetRootOffset(heaptup->t_data,
> +                             ItemPointerGetOffsetNumber(&heaptup->t_self));
> +             HeapTupleHeaderSetRootOffset(newtup->t_data,
> +                             ItemPointerGetOffsetNumber(&heaptup->t_self));
> +     }

This is repetitive.  I think after RelationPutHeapTuple it'd be better
to assign root_offnum = &heaptup->t_self, so that we can just call
SetRootOffset() on each tuple without the if().

Fixed. I actually ripped off HeapTupleHeaderSetRootOffset() completely and pushed setting of root line pointer into the HeapTupleHeaderSetHeapLatest(). That seems much cleaner because the system expects to find root line pointer whenever HEAP_LATEST_TUPLE flag is set. Hence it makes sense to set them together.
 


> +             HeapTupleHeaderSetHeapLatest((HeapTupleHeader) item);
> +             if (OffsetNumberIsValid(root_offnum))
> +                     HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
> +                                     root_offnum);
> +             else
> +                     HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
> +                                     offnum);

Just a matter of style, but this reads nicer IMO:

        HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
                OffsetNumberIsValid(root_offnum) ? root_offnum : offnum);

Understood. This code no longer exists in the new patch since HeapTupleHeaderSetRootOffset is merged with HeapTupleHeaderSetHeapLatest.
 


> @@ -740,8 +742,9 @@ heap_page_prune_execute(Buffer buffer,
>   * holds a pin on the buffer. Once pin is released, a tuple might be pruned
>   * and reused by a completely unrelated tuple.
>   */
> -void
> -heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
> +static void
> +heap_get_root_tuples_internal(Page page, OffsetNumber target_offnum,
> +             OffsetNumber *root_offsets)
>  {
>       OffsetNumber offnum,

I think this function deserves more/better/updated commentary.

Sure. I added more commentary. I also reworked the function so that the caller can pass just one item array when it's interested in finding root line pointer for just one item. Hopefully that will save a few bytes on the stack.
 

> @@ -439,7 +439,9 @@ rewrite_heap_tuple(RewriteState state,
>                        * set the ctid of this tuple to point to the new location, and
>                        * insert it right away.
>                        */
> -                     new_tuple->t_data->t_ctid = mapping->new_tid;
> +                     HeapTupleHeaderSetNextCtid(new_tuple->t_data,
> +                                     ItemPointerGetBlockNumber(&mapping->new_tid),
> +                                     ItemPointerGetOffsetNumber(&mapping->new_tid));

I think this would be nicer:
        HeapTupleHeaderSetNextTid(new_tuple->t_data, &mapping->new_tid);
AFAICS all the callers are doing ItemPointerGetFoo for a TID, so this is
overly verbose for no reason.  Also, the "c" in Ctid stands for
"current"; I think we can omit that.

Yes, fixed. I realised that all callers where anyways calling the macro with the block/offset of the same TID. So it makes sense to just pass TID to the macro.
 

> @@ -525,7 +527,9 @@ rewrite_heap_tuple(RewriteState state,
>                               new_tuple = unresolved->tuple;
>                               free_new = true;
>                               old_tid = unresolved->old_tid;
> -                             new_tuple->t_data->t_ctid = new_tid;
> +                             HeapTupleHeaderSetNextCtid(new_tuple->t_data,
> +                                             ItemPointerGetBlockNumber(&new_tid),
> +                                             ItemPointerGetOffsetNumber(&new_tid));

Did you forget to SetHeapLatest here, or ..?  (If not, a comment is
warranted).

Umm probably not. The way I see it, new_tuple is not actually the new tuple when this is called, but it's changed to the unresolved tuple (see the start of the hunk). So what we're doing is setting next CTID in the previous tuple in the chain. SetHeapLatest is called on the new tuple inside raw_heap_insert(). I did not add any more comments, but please let me know if you think it's still confusing or if I'm missing something.
 

> diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
> index 32bb3f9..466609c 100644
> --- a/src/backend/executor/execMain.c
> +++ b/src/backend/executor/execMain.c
> @@ -2443,7 +2443,7 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode,
>                * As above, it should be safe to examine xmax and t_ctid without the
>                * buffer content lock, because they can't be changing.
>                */
> -             if (ItemPointerEquals(&tuple.t_self, &tuple.t_data->t_ctid))
> +             if (HeapTupleHeaderIsHeapLatest(tuple.t_data, tuple.t_self))
>               {
>                       /* deleted, so forget about it */
>                       ReleaseBuffer(buffer);

This is the place where this patch would have an effect.  To test this
bit I think we're going to need an ad-hoc stress-test harness.

Sure. I did some pgbench tests and ran consistency checks during and at the end of tests. I chose a small scale factor and many clients so that same tuple is often concurrently updated. That should exercise the new chain-following code reguorsly. But I'll do more of those on a bigger box. Do you have other suggestions for ad-hoc tests?
 


> +/*
> + * If HEAP_LATEST_TUPLE is set in the last tuple in the update chain. But for
> + * clusters which are upgraded from pre-10.0 release, we still check if c_tid
> + * is pointing to itself and declare such tuple as the latest tuple in the
> + * chain
> + */
> +#define HeapTupleHeaderIsHeapLatest(tup, tid) \
> +( \
> +  ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) || \
> +  ((ItemPointerGetBlockNumber(&(tup)->t_ctid) == ItemPointerGetBlockNumber(&tid)) && \
> +   (ItemPointerGetOffsetNumber(&(tup)->t_ctid) == ItemPointerGetOffsetNumber(&tid))) \
> +)

Please add a "!= 0" to the first arm of the ||, so that we return a boolean.


Done. Also rebased with new master where similar changes have been done.
 

> +/*
> + * Get TID of next tuple in the update chain. Traditionally, we have stored
> + * self TID in the t_ctid field if the tuple is the last tuple in the chain. We
> + * try to preserve that behaviour by returning self-TID if HEAP_LATEST_TUPLE
> + * flag is set.
> + */
> +#define HeapTupleHeaderGetNextCtid(tup, next_ctid, offnum) \
> +do { \
> +     if ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) \
> +     { \
> +             ItemPointerSet((next_ctid), ItemPointerGetBlockNumber(&(tup)->t_ctid), \
> +                             (offnum)); \
> +     } \
> +     else \
> +     { \
> +             ItemPointerSet((next_ctid), ItemPointerGetBlockNumber(&(tup)->t_ctid), \
> +                             ItemPointerGetOffsetNumber(&(tup)->t_ctid)); \
> +     } \
> +} while (0)

This is a really odd macro, I think.  Is any of the callers really
depending on the traditional behavior?  If so, can we change them to
avoid that?  (I think the "else" can be more easily written with
ItemPointerCopy).  In any case, I think the documentation of the macro
leaves a bit to be desired -- I don't think we really care all that much
what we used to do, except perhaps as a secondary comment, but we do
care very much about what it actually does, which the current comment
doesn't really explain.


I reworked this quite a bit and I believe the new code does what you suggested.  The HeapTupleHeaderGetNextTid macro is now much simpler (it just copies the TID) and we leave it to the caller to ensure they don't call this on a tuple which is already at the end of the chain (i.e has HEAP_LATEST_TUPLE set, but we don't look for old-style end-of-the-chain markers). The callers can choose to return the same TID back if their callers rely on that behaviour. But inside this macro, we now assert that HEAP_LATEST_TUPLE is not set. 

One thing that worried me is if there exists a path which sets the t_infomask (and hence HEAP_LATEST_TUPLE) during redo recovery and if we will fail to set the root line pointer correctly along with that. But AFAICS the interesting cases of insert, multi-insert and update are being handled ok. The only other places where I saw t_infomask being copied as-is from the WAL record is DecodeXLogTuple() and DecodeMultiInsert(), but those should not cause any problem AFAICS.

Revised patch is attached. All regression tests, isolation tests and pgbench test with -c40 -j10 pass on my laptop.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Antonin Houska
Дата:
Сообщение: Re: [HACKERS] PoC: Grouped base relation
Следующее
От: "Karl O. Pinc"
Дата:
Сообщение: Re: [HACKERS] Patch to implement pg_current_logfile() function