TupleTableSlot abstraction

Поиск
Список
Период
Сортировка
Hi,

I've recently been discussing with Robert how to abstract
TupleTableSlots in the light of upcoming developments around abstracting
storage away.  Besides that aspect I think we also need to make them
more abstract to later deal with vectorized excution, but I'm more fuzzy
on the details there.

I'd previously, in a response to an early version of the pluggable
storage patch, commented [1] how I think the abstraction should roughly
look like. I'd privately started to prototype that. After my discussion
with Robert I got that prototype in a state that can run many queries,
but doesn't pass the regression tests.

The pluggable storage patch has also been updated [2] to abstract slots
more (see patch 0005).

My position is that there's a number of weaknesses with the current
TupleTableSlot API in light of multiple independent, possibly out of
core, storage implementations:


- TupleTableSlots have to contain all the necessary information for each
  type of slot. Some implementations might require a buffer pin to be
  hold (our heap), others pointers to multiple underlying points of data
  (columns store), some need additional metadata (our MinimalTuple).

  Unifying the information into one struct makes it much harder to
  provide differing implementations without modifying core postgres
  and/or increasing overhead for every user of slots.

  I think the consequence of that is that we need to allow each type of
  slot to have their own TupleTableSlot embedding struct.

  Haribabu's patch solved this by adding a tts_storage parameter that
  contains additional data, but imo that's unconvincing due to the added
  indirection in very performance critical codepaths.


- The way slots currently are implemented each new variant data is
  stored in slots adds new branches to hot code paths like
  ExecClearTuple(), and they're not extensible.

  Therefore I think we need to move to callback based implementation. In
  my tests, by moving the callback invocations into very thin inline
  functions, the branch prediction accuracy actually goes sligthly
  *up*.


- Currently we frequently convert from one way of representing a row
  into another, in the same slot. We do so fairly implicilty in
  ExecMaterializeSlot(), ExecFetchSlotMinimalTuple(), ExecCopySlot()...

  The more we abstract specific storage representation away, the worse I
  think that is. I think we should make such conversions explicit by
  requiring transfers between two slots if a specific type is required.


- We have a few codepaths that set fields on tuples that can't be
  represented in virtual slots. E.g. postgres_fdw sets ctid, oid,
  ExecProcessReturning() will set tableoid.


In my POC I turned TupleTableSlot into basically an abstract base class:
struct TupleTableSlot
{
    NodeTag        type;

    uint16        flags;
    uint16        nvalid;        /* # of valid values in tts_values */

    const TupleTableSlotOps *const cb;

    TupleDesc    tupleDescriptor;    /* slot's tuple descriptor */

    Datum       *values;        /* current per-attribute values */
    bool       *nulls;        /* current per-attribute null flags */

    /* can we optimize away? */
    MemoryContext mcxt;        /* slot itself is in this context */
};

which then is inherited from for specific implementations of tuple
slots:

typedef struct HeapTupleTableSlot
{
    TupleTableSlot base;
    HeapTuple    tuple;        /* physical tuple */
    uint32        off;        /* saved state for slot_deform_tuple */
} HeapTupleTableSlot;

...
typedef struct MinimalTupleTableSlot
{
    TupleTableSlot base;
    HeapTuple    tuple;        /* tuple wrapper */
    MinimalTuple mintuple;    /* minimal tuple, or NULL if none */
    HeapTupleData minhdr;    /* workspace for minimal-tuple-only case */
    uint32        off;        /* saved state for slot_deform_tuple */
} MinimalTupleTableSlot;


typedef struct TupleTableSlotOps
{
    void (*init)(TupleTableSlot *slot);
    void (*release)(TupleTableSlot *slot);

    void (*clear)(TupleTableSlot *slot);

    void (*getsomeattrs)(TupleTableSlot *slot, int natts);

    void (*materialize)(TupleTableSlot *slot);

    HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);
    MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);

    HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);
    MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);

} TupleTableSlotOps;

when creating a slot one has to specify the type of slot one wants to
use:

typedef enum TupleTableSlotType
{
    TTS_TYPE_VIRTUAL,
    TTS_TYPE_HEAPTUPLE,
    TTS_TYPE_MINIMALTUPLE,
    TTS_TYPE_BUFFER
} TupleTableSlotType;

extern TupleTableSlot *MakeTupleTableSlot(TupleDesc desc, TupleTableSlotType st);

You might notice that I've renamed a few fields (tts_values -> values,
tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed
in the above structs / the attached patch. I now think that's probably
not a good idea, unnecessarily breaking more code than necessary.


I generally like this scheme because it allows the TupleTableStruct to
be relatively lean, without knowing much about specific implementations
of slots.

After this change functions like slot_getattr are thin inline wrappers:
+static inline Datum
+slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
+{
+   Assert(attnum > 0);
+
+   if (attnum > slot->nvalid)
+       slot->cb->getsomeattrs(slot, attnum);
+
+   Assert(attnum <= slot->nvalid);
+
+   *isnull = slot->nulls[attnum - 1];
+   return slot->values[attnum - 1];
+}

Note that I've moved system attribute handling out of the slot_getattr
path - there were a few paths accessing them via slot_getattr, and it's
a lot of unnecessary branching.


The fact that slots of different types have different storage means that
we can't just change the type of a slot when needed. In nearly all
places that's not a problem. E.g. ExecHashTableInsert() converts the
"incoming" slot to one containing a minimal tuple with
ExecFetchSlotMinimalTuple(), but that's essentially transient as it's
just used to copy the tuple into the hashtable.  We could even gain
quite some efficiency by directly copying into the allocated space
(saving one serialization/copy step for the common case where input is
either a virtual or a heap tuple).

The slightly bigger issue is that several parts of the code currently
require heaptuples they can scribble upon. E.g. nodeModifyTable.c
materializes the tuple - those can be solved by using a local slot to
store the tuple, rather than using the incoming one. In nearly all cases
we'd still be left with the same number of tuple copy steps, as
materializing the tuple with copy into the local storage anyway.  A few
other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or
such instead of materialize.


The biggest issue I don't have a good answer for, and where I don't like
Haribabu's solution, is how to deal with system columns like oid, ctid,
tableoid. Haribabu's patch adds storage of those to the slot, but that
seems like quite the cludge to me.  It appears to me that the best thing
would be to require projections for all accesses to system columns, at
the original level of access. Then we don't need any sort of special
codepaths dealing with them. But that's not exactly a trivial change and
has some added complications too (junkfilter type things).


I made a reference to vectorized execution above. That's because that I
found, while prototyping vectorized execution, that it's a very common
need to interact with parts of the system that aren't vectorized, and
doing so has to be extremely cheap.  With the above we can have a
TupleTableSlot type that's essentially just a cheap cursor into a batch
of tuples.  Besides making it efficiently possible to hand of slots to
part of the system that can't deal with tuple batches, it allows to do
fun things like having the first slot_deform_tuple() call for one tuple
in a batch to deform a full page's worth of tuples, yielding *much*
better cache access patterns.


I haven't done that, but I think we should split ExecStoreTuple() into
multiple versions, that only work on specific types of tuple
slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other
code dealing with tuples would call ExecStoreHeapTuple().  The relevant
callers already need to know what kind of tuple they're dealing with,
therefore that's not a huge burden.


Please please note that the attached patch is *NOT* meant to be a full
proposal or even a to be reviewed patch, it's just an illustration of
the concepts I'm talking about.

Comments? Better ideas?

[1] http://archives.postgresql.org/message-id/20170815065348.afkj45dgmg22ecfh%40alap3.anarazel.de
[2] http://archives.postgresql.org/message-id/CAJrrPGdY_hgvu06R_vCibUktKRLp%3Db8nBfeZkh%3DKRc95tq4euA%40mail.gmail.com

Greetings,

Andres Freund

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: SHA-2 functions
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Hash Joins vs. Bloom Filters / take 2