Re: TupleTableSlot abstraction

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: TupleTableSlot abstraction
Дата
Msg-id CAPpHfdtpQEU8O4V5x+ryNjFXv2QxWT8dxioxYm=2q5WB5bKG0g@mail.gmail.com
обсуждение исходный текст
Ответ на TupleTableSlot abstraction  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi!

Thank you for working on this subject.  See some my comments below.

On Wed, Feb 21, 2018 at 1:43 AM, Andres Freund <andres@anarazel.de> wrote:
- TupleTableSlots have to contain all the necessary information for each
  type of slot. Some implementations might require a buffer pin to be
  hold (our heap), others pointers to multiple underlying points of data
  (columns store), some need additional metadata (our MinimalTuple).

  Unifying the information into one struct makes it much harder to
  provide differing implementations without modifying core postgres
  and/or increasing overhead for every user of slots.

  I think the consequence of that is that we need to allow each type of
  slot to have their own TupleTableSlot embedding struct.
 
+1, I think that it's reasonable to keep minimal common slot information in
TupleTableSlot struct and to let particular slot implementations to extend it.

  Haribabu's patch solved this by adding a tts_storage parameter that
  contains additional data, but imo that's unconvincing due to the added
  indirection in very performance critical codepaths.
 
+1

- The way slots currently are implemented each new variant data is
  stored in slots adds new branches to hot code paths like
  ExecClearTuple(), and they're not extensible.

  Therefore I think we need to move to callback based implementation. In
  my tests, by moving the callback invocations into very thin inline
  functions, the branch prediction accuracy actually goes sligthly
  *up*.

Sounds interesting.  Besides branch prediction accuracy, what can you
say about overall performance?

- Currently we frequently convert from one way of representing a row
  into another, in the same slot. We do so fairly implicilty in
  ExecMaterializeSlot(), ExecFetchSlotMinimalTuple(), ExecCopySlot()...

  The more we abstract specific storage representation away, the worse I
  think that is. I think we should make such conversions explicit by
  requiring transfers between two slots if a specific type is required.


- We have a few codepaths that set fields on tuples that can't be
  represented in virtual slots. E.g. postgres_fdw sets ctid, oid,
  ExecProcessReturning() will set tableoid.


In my POC I turned TupleTableSlot into basically an abstract base class:
struct TupleTableSlot
{
        NodeTag         type;

        uint16          flags;
        uint16          nvalid;         /* # of valid values in tts_values */

        const TupleTableSlotOps *const cb;

        TupleDesc       tupleDescriptor;        /* slot's tuple descriptor */

        Datum      *values;             /* current per-attribute values */
        bool       *nulls;              /* current per-attribute null flags */

        /* can we optimize away? */
        MemoryContext mcxt;             /* slot itself is in this context */
};

which then is inherited from for specific implementations of tuple
slots:

typedef struct HeapTupleTableSlot
{
        TupleTableSlot base;
        HeapTuple       tuple;          /* physical tuple */
        uint32          off;            /* saved state for slot_deform_tuple */
} HeapTupleTableSlot;

...
typedef struct MinimalTupleTableSlot
{
        TupleTableSlot base;
        HeapTuple       tuple;          /* tuple wrapper */
        MinimalTuple mintuple;  /* minimal tuple, or NULL if none */
        HeapTupleData minhdr;   /* workspace for minimal-tuple-only case */
        uint32          off;            /* saved state for slot_deform_tuple */
} MinimalTupleTableSlot;


typedef struct TupleTableSlotOps
{
        void (*init)(TupleTableSlot *slot);
        void (*release)(TupleTableSlot *slot);

        void (*clear)(TupleTableSlot *slot);

        void (*getsomeattrs)(TupleTableSlot *slot, int natts);

        void (*materialize)(TupleTableSlot *slot);

        HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);
        MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);

        HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);
        MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);

} TupleTableSlotOps;

when creating a slot one has to specify the type of slot one wants to
use:

typedef enum TupleTableSlotType
{
        TTS_TYPE_VIRTUAL,
        TTS_TYPE_HEAPTUPLE,
        TTS_TYPE_MINIMALTUPLE,
        TTS_TYPE_BUFFER
} TupleTableSlotType;

extern TupleTableSlot *MakeTupleTableSlot(TupleDesc desc, TupleTableSlotType st);
 
Can user define his own slot type?  If so, I assume that hard-coded
enum is a limitation of current prototype, and eventually we would have
an extendable array of slot types.  Is it correct?

You might notice that I've renamed a few fields (tts_values -> values,
tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed
in the above structs / the attached patch. I now think that's probably
not a good idea, unnecessarily breaking more code than necessary.

For me "tss_" prefix looks a bit weird, but I agree that we probably should
refrain from renaming this in order to break as small number of things as possible.
 
I generally like this scheme because it allows the TupleTableStruct to
be relatively lean, without knowing much about specific implementations
of slots.

After this change functions like slot_getattr are thin inline wrappers:
+static inline Datum
+slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
+{
+   Assert(attnum > 0);
+
+   if (attnum > slot->nvalid)
+       slot->cb->getsomeattrs(slot, attnum);
+
+   Assert(attnum <= slot->nvalid);
+
+   *isnull = slot->nulls[attnum - 1];
+   return slot->values[attnum - 1];
+}

Note that I've moved system attribute handling out of the slot_getattr
path - there were a few paths accessing them via slot_getattr, and it's
a lot of unnecessary branching.


The fact that slots of different types have different storage means that
we can't just change the type of a slot when needed. In nearly all
places that's not a problem. E.g. ExecHashTableInsert() converts the
"incoming" slot to one containing a minimal tuple with
ExecFetchSlotMinimalTuple(), but that's essentially transient as it's
just used to copy the tuple into the hashtable.  We could even gain
quite some efficiency by directly copying into the allocated space
(saving one serialization/copy step for the common case where input is
either a virtual or a heap tuple).

The slightly bigger issue is that several parts of the code currently
require heaptuples they can scribble upon. E.g. nodeModifyTable.c
materializes the tuple - those can be solved by using a local slot to
store the tuple, rather than using the incoming one. In nearly all cases
we'd still be left with the same number of tuple copy steps, as
materializing the tuple with copy into the local storage anyway.  A few
other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or
such instead of materialize.


The biggest issue I don't have a good answer for, and where I don't like
Haribabu's solution, is how to deal with system columns like oid, ctid,
tableoid. Haribabu's patch adds storage of those to the slot, but that
seems like quite the cludge to me.  It appears to me that the best thing
would be to require projections for all accesses to system columns, at
the original level of access. Then we don't need any sort of special
codepaths dealing with them. But that's not exactly a trivial change and
has some added complications too (junkfilter type things).

I also think that projections for accessing system columns is probably
the best solution for them.
 
I made a reference to vectorized execution above. That's because that I
found, while prototyping vectorized execution, that it's a very common
need to interact with parts of the system that aren't vectorized, and
doing so has to be extremely cheap.  With the above we can have a
TupleTableSlot type that's essentially just a cheap cursor into a batch
of tuples.  Besides making it efficiently possible to hand of slots to
part of the system that can't deal with tuple batches, it allows to do
fun things like having the first slot_deform_tuple() call for one tuple
in a batch to deform a full page's worth of tuples, yielding *much*
better cache access patterns.
 
Are you planning to use custom slots only for interaction between
vectorized and non-vectorized parts of executor, or also for interaction
inside vectorized part of executor.  It seems that proposed interface
isn't suitable for interaction between two vectorized nodes.  Are you
planning to introduce special kind of vectorized slots or whatever?

I haven't done that, but I think we should split ExecStoreTuple() into
multiple versions, that only work on specific types of tuple
slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other
code dealing with tuples would call ExecStoreHeapTuple().  The relevant
callers already need to know what kind of tuple they're dealing with,
therefore that's not a huge burden.


Please please note that the attached patch is *NOT* meant to be a full
proposal or even a to be reviewed patch, it's just an illustration of
the concepts I'm talking about.

Comments? Better ideas?

As I get, your primary interest in this topic is vectorized execution.  When
passing data from vectorized to non-vectorized parts of executor tuples are
materialized by moving them from vectorized slot to regular slot.
But what do you thing about alternative tuple formats in custom row-oriented
table access methods?  It's likely we should try to minimize tuple format
conversion.  Thus, some executor nodes can switch their tuple format to
match table access method tuple format.  For instance, we're running
aggregate over table with custom table access method.  Then hash
aggregate node may accumulate tuples in the same format as they are
received from table access method.  Does it make sense?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Online enabling of checksums
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Online enabling of checksums