TupleTableSlot abstraction
От | Andres Freund |
---|---|
Тема | TupleTableSlot abstraction |
Дата | |
Msg-id | 20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de обсуждение исходный текст |
Ответы |
Re: TupleTableSlot abstraction
Re: TupleTableSlot abstraction Re: TupleTableSlot abstraction Re: TupleTableSlot abstraction |
Список | pgsql-hackers |
Hi, I've recently been discussing with Robert how to abstract TupleTableSlots in the light of upcoming developments around abstracting storage away. Besides that aspect I think we also need to make them more abstract to later deal with vectorized excution, but I'm more fuzzy on the details there. I'd previously, in a response to an early version of the pluggable storage patch, commented [1] how I think the abstraction should roughly look like. I'd privately started to prototype that. After my discussion with Robert I got that prototype in a state that can run many queries, but doesn't pass the regression tests. The pluggable storage patch has also been updated [2] to abstract slots more (see patch 0005). My position is that there's a number of weaknesses with the current TupleTableSlot API in light of multiple independent, possibly out of core, storage implementations: - TupleTableSlots have to contain all the necessary information for each type of slot. Some implementations might require a buffer pin to be hold (our heap), others pointers to multiple underlying points of data (columns store), some need additional metadata (our MinimalTuple). Unifying the information into one struct makes it much harder to provide differing implementations without modifying core postgres and/or increasing overhead for every user of slots. I think the consequence of that is that we need to allow each type of slot to have their own TupleTableSlot embedding struct. Haribabu's patch solved this by adding a tts_storage parameter that contains additional data, but imo that's unconvincing due to the added indirection in very performance critical codepaths. - The way slots currently are implemented each new variant data is stored in slots adds new branches to hot code paths like ExecClearTuple(), and they're not extensible. Therefore I think we need to move to callback based implementation. In my tests, by moving the callback invocations into very thin inline functions, the branch prediction accuracy actually goes sligthly *up*. - Currently we frequently convert from one way of representing a row into another, in the same slot. We do so fairly implicilty in ExecMaterializeSlot(), ExecFetchSlotMinimalTuple(), ExecCopySlot()... The more we abstract specific storage representation away, the worse I think that is. I think we should make such conversions explicit by requiring transfers between two slots if a specific type is required. - We have a few codepaths that set fields on tuples that can't be represented in virtual slots. E.g. postgres_fdw sets ctid, oid, ExecProcessReturning() will set tableoid. In my POC I turned TupleTableSlot into basically an abstract base class: struct TupleTableSlot { NodeTag type; uint16 flags; uint16 nvalid; /* # of valid values in tts_values */ const TupleTableSlotOps *const cb; TupleDesc tupleDescriptor; /* slot's tuple descriptor */ Datum *values; /* current per-attribute values */ bool *nulls; /* current per-attribute null flags */ /* can we optimize away? */ MemoryContext mcxt; /* slot itself is in this context */ }; which then is inherited from for specific implementations of tuple slots: typedef struct HeapTupleTableSlot { TupleTableSlot base; HeapTuple tuple; /* physical tuple */ uint32 off; /* saved state for slot_deform_tuple */ } HeapTupleTableSlot; ... typedef struct MinimalTupleTableSlot { TupleTableSlot base; HeapTuple tuple; /* tuple wrapper */ MinimalTuple mintuple; /* minimal tuple, or NULL if none */ HeapTupleData minhdr; /* workspace for minimal-tuple-only case */ uint32 off; /* saved state for slot_deform_tuple */ } MinimalTupleTableSlot; typedef struct TupleTableSlotOps { void (*init)(TupleTableSlot *slot); void (*release)(TupleTableSlot *slot); void (*clear)(TupleTableSlot *slot); void (*getsomeattrs)(TupleTableSlot *slot, int natts); void (*materialize)(TupleTableSlot *slot); HeapTuple (*get_heap_tuple)(TupleTableSlot *slot); MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot); HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot); MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot); } TupleTableSlotOps; when creating a slot one has to specify the type of slot one wants to use: typedef enum TupleTableSlotType { TTS_TYPE_VIRTUAL, TTS_TYPE_HEAPTUPLE, TTS_TYPE_MINIMALTUPLE, TTS_TYPE_BUFFER } TupleTableSlotType; extern TupleTableSlot *MakeTupleTableSlot(TupleDesc desc, TupleTableSlotType st); You might notice that I've renamed a few fields (tts_values -> values, tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed in the above structs / the attached patch. I now think that's probably not a good idea, unnecessarily breaking more code than necessary. I generally like this scheme because it allows the TupleTableStruct to be relatively lean, without knowing much about specific implementations of slots. After this change functions like slot_getattr are thin inline wrappers: +static inline Datum +slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull) +{ + Assert(attnum > 0); + + if (attnum > slot->nvalid) + slot->cb->getsomeattrs(slot, attnum); + + Assert(attnum <= slot->nvalid); + + *isnull = slot->nulls[attnum - 1]; + return slot->values[attnum - 1]; +} Note that I've moved system attribute handling out of the slot_getattr path - there were a few paths accessing them via slot_getattr, and it's a lot of unnecessary branching. The fact that slots of different types have different storage means that we can't just change the type of a slot when needed. In nearly all places that's not a problem. E.g. ExecHashTableInsert() converts the "incoming" slot to one containing a minimal tuple with ExecFetchSlotMinimalTuple(), but that's essentially transient as it's just used to copy the tuple into the hashtable. We could even gain quite some efficiency by directly copying into the allocated space (saving one serialization/copy step for the common case where input is either a virtual or a heap tuple). The slightly bigger issue is that several parts of the code currently require heaptuples they can scribble upon. E.g. nodeModifyTable.c materializes the tuple - those can be solved by using a local slot to store the tuple, rather than using the incoming one. In nearly all cases we'd still be left with the same number of tuple copy steps, as materializing the tuple with copy into the local storage anyway. A few other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or such instead of materialize. The biggest issue I don't have a good answer for, and where I don't like Haribabu's solution, is how to deal with system columns like oid, ctid, tableoid. Haribabu's patch adds storage of those to the slot, but that seems like quite the cludge to me. It appears to me that the best thing would be to require projections for all accesses to system columns, at the original level of access. Then we don't need any sort of special codepaths dealing with them. But that's not exactly a trivial change and has some added complications too (junkfilter type things). I made a reference to vectorized execution above. That's because that I found, while prototyping vectorized execution, that it's a very common need to interact with parts of the system that aren't vectorized, and doing so has to be extremely cheap. With the above we can have a TupleTableSlot type that's essentially just a cheap cursor into a batch of tuples. Besides making it efficiently possible to hand of slots to part of the system that can't deal with tuple batches, it allows to do fun things like having the first slot_deform_tuple() call for one tuple in a batch to deform a full page's worth of tuples, yielding *much* better cache access patterns. I haven't done that, but I think we should split ExecStoreTuple() into multiple versions, that only work on specific types of tuple slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other code dealing with tuples would call ExecStoreHeapTuple(). The relevant callers already need to know what kind of tuple they're dealing with, therefore that's not a huge burden. Please please note that the attached patch is *NOT* meant to be a full proposal or even a to be reviewed patch, it's just an illustration of the concepts I'm talking about. Comments? Better ideas? [1] http://archives.postgresql.org/message-id/20170815065348.afkj45dgmg22ecfh%40alap3.anarazel.de [2] http://archives.postgresql.org/message-id/CAJrrPGdY_hgvu06R_vCibUktKRLp%3Db8nBfeZkh%3DKRc95tq4euA%40mail.gmail.com Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: