Обсуждение: Understanding TupleQueue impact and overheads?

Поиск
Список
Период
Сортировка

Understanding TupleQueue impact and overheads?

От
Tom Mercha
Дата:
I have been looking at PostgreSQL's Tuple Queue 
(/include/executor/tqueue.h) which provides functionality for queuing 
tuples between processes through shm_mq. I am still familiarising myself 
with the bigger picture and TupTableStores. I can see that a copy (not a 
reference) of a HeapTuple (obtained from TupleTableSlot or SPI_TupTable 
etc) can be sent to a queue using shm_mq. Then, another process can 
receive these HeapTuples, probably later placing them in 'output' 
TupleTableSlots.

What I am having difficulty understanding is what happens to the 
location of the HeapTuple as it moves from one TupleTableSlot to the 
other as described above. Since there most likely is a reference to a 
physical tuple involved, am I incurring a disk-access overhead with each 
copy of a tuple? This would seem like a massive overhead; how can I keep 
such overheads to a minimum?

Furthermore, to what extent can I expect other modules to impact a 
queued HeapTuple? If some external process updates this tuple, when will 
I see the change? Would it be a possiblity that the update is not 
reflected on the queued HeapTuple but the external process is not 
blocked/delayed from updating? In other words, like operating on some 
kind of multiple snapshots? When does DBMS logging kick in whilst I am 
transferring a tuple from TupTableStore to another?

Thanks,
Tom

Re: Understanding TupleQueue impact and overheads?

От
Andres Freund
Дата:
Hi,

On 2019-10-16 01:24:04 +0000, Tom Mercha wrote:
> What I am having difficulty understanding is what happens to the
> location of the HeapTuple as it moves from one TupleTableSlot to the
> other as described above. Since there most likely is a reference to a
> physical tuple involved, am I incurring a disk-access overhead with each
> copy of a tuple? This would seem like a massive overhead; how can I keep
> such overheads to a minimum?

The tuple is fully "materialized" on the sending size, due to
    tuple = ExecFetchSlotHeapTuple(slot, true, &should_free);

so there's no direct references to disk data at that point. But if
there's toasted columns, they'll may only be accessed on the receiving
side.

Side-note: This very likely rather should use a minimal, rather than a
full heap, tuple.


> Furthermore, to what extent can I expect other modules to impact a
> queued HeapTuple? If some external process updates this tuple, when will
> I see the change? Would it be a possiblity that the update is not
> reflected on the queued HeapTuple but the external process is not
> blocked/delayed from updating? In other words, like operating on some
> kind of multiple snapshots? When does DBMS logging kick in whilst I am
> transferring a tuple from TupTableStore to another?

I'm not quite sure what you're actually trying to get at. Whether a
tuple is ferried through the queue or not shouldn't have an impact on
visibility / snapshot and locking considerations. For parallel query etc
the snapshots are synchronized between the "leader" and its workers. If
you want to use them for something separate, it's your responsibility to
do so.

Greetings,

Andres Freund