Re: Custom tuplesorts for extensions

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Custom tuplesorts for extensions
Дата
Msg-id CAPpHfdsOTk+5mSju4tsCOTATUrWrM+nqwgd6OtkLaJN3XKa4eA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Custom tuplesorts for extensions  (Andres Freund <andres@anarazel.de>)
Ответы Re: Custom tuplesorts for extensions  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
.Hi!

On Wed, Jul 6, 2022 at 6:01 PM Andres Freund <andres@anarazel.de> wrote:
> I think this needs to be evaluated for performance...

Surely, this needs.

> I agree with the nearby comment that the commits need a bit of justification
> at least to review them.

Will do this.
> > From 1d78e271b22d7c6a1557defbe15ea5039ff28510 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Tue, 21 Jun 2022 14:03:13 +0300
> > Subject: [PATCH v2 2/6] Tuplesortstate.getdatum1 method
>
> Hm. This adds a bunch of indirect function calls were there previously
> weren't.

Yep.  I think it worth changing this function to deal with many
SortTuple's at once.

> > From 494d46dcf938e5f824a498e38ce77782751208e1 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Tue, 21 Jun 2022 14:13:56 +0300
> > Subject: [PATCH v2 3/6] Put abbreviation logic into puttuple_common()
>
> There's definitely a lot of redundancy removed... But the list of branches in
> puttuple_common() grew.  Perhaps we instead can add a few flags to
> putttuple_common() that determine whether abbreviation should happen, so that
> we only do the work necessary for the "type" of sort?

Good point, will refactor that.

> > From ee2dd46b07d62e13ed66b5a38272fb5667c943f3 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Wed, 22 Jun 2022 00:14:51 +0300
> > Subject: [PATCH v2 4/6] Move freeing memory away from writetup()
>
> Seems to do more than just moving freeing around?

Yes, it also move memory accounting from tuplesort_put*() to
puttuple_common().  Will revise this.

> > @@ -1973,8 +1963,13 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull)
> >  static void
> >  puttuple_common(Tuplesortstate *state, SortTuple *tuple)
> >  {
> > +     MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
> > +
> >       Assert(!LEADER(state));
> >
> > +     if (tuple->tuple != NULL)
> > +             USEMEM(state, GetMemoryChunkSpace(tuple->tuple));
> > +
>
> Adding even more branches into common code...

I will see how to reduce branching here.

> > From 3a0e1fa7c7e4da46a86f7d5b9dd0392549f3b460 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Wed, 22 Jun 2022 18:11:26 +0300
> > Subject: [PATCH v2 5/6] Reorganize data structures
>
> Hard to know what this is trying to achieve.

Split the public interface part out of Tuplesortstate.

> > -struct Tuplesortstate
> > +struct TuplesortOps
> >  {
> > -     TupSortStatus status;           /* enumerated value as shown above */
> > -     int                     nKeys;                  /* number of columns in sort key */
> > -     int                     sortopt;                /* Bitmask of flags used to setup sort */
> > -     bool            bounded;                /* did caller specify a maximum number of
> > -                                                              * tuples to return? */
> > -     bool            boundUsed;              /* true if we made use of a bounded heap */
> > -     int                     bound;                  /* if bounded, the maximum number of tuples */
> > -     bool            tuples;                 /* Can SortTuple.tuple ever be set? */
> > -     int64           availMem;               /* remaining memory available, in bytes */
> > -     int64           allowedMem;             /* total memory allowed, in bytes */
> > -     int                     maxTapes;               /* max number of input tapes to merge in each
> > -                                                              * pass */
> > -     int64           maxSpace;               /* maximum amount of space occupied among sort
> > -                                                              * of groups, either in-memory or on-disk */
> > -     bool            isMaxSpaceDisk; /* true when maxSpace is value for on-disk
> > -                                                              * space, false when it's value for in-memory
> > -                                                              * space */
> > -     TupSortStatus maxSpaceStatus;   /* sort status when maxSpace was reached */
> >       MemoryContext maincontext;      /* memory context for tuple sort metadata that
> >                                                                * persists across multiple batches */
> >       MemoryContext sortcontext;      /* memory context holding most sort data */
> >       MemoryContext tuplecontext; /* sub-context of sortcontext for tuple data */
> > -     LogicalTapeSet *tapeset;        /* logtape.c object for tapes in a temp file */
> >
> >       /*
> >        * These function pointers decouple the routines that must know what kind
>
> To me it seems odd to have memory contexts and similar things in a
> datastructure calls *Ops.

Yep, it worth renaming TuplesortOps into TuplesortPublic or something.

> > From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Wed, 22 Jun 2022 21:48:05 +0300
> > Subject: [PATCH v2 6/6] Split tuplesortops.c
>
> I strongly suspect this will cause a slowdown. There was potential for
> cross-function optimization that's now removed.

I wonder how can cross-function optimizations bypass function
pointers.  Is it possible?

------
Regards,
Alexander Korotkov



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: doc: Move enum storage commentary to top of section
Следующее
От: Jacob Champion
Дата:
Сообщение: Re: Make mesage at end-of-recovery less scary.