Re: Use virtual tuple slot for Unique node

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: Use virtual tuple slot for Unique node
Дата
Msg-id CAExHW5uku=D2Dy6ST4JHO1sokZF8arecymk2RksioKk2BGb4UA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Use virtual tuple slot for Unique node  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Use virtual tuple slot for Unique node  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
On Fri, Oct 27, 2023 at 8:48 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 25 Oct 2023 at 22:48, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > We may save the size of data in VirtualTupleTableSlot, thus avoiding
> > the first loop. I assume that when allocating
> > VirtualTupleTableSlot->data, we always know what size we are
> > allocating so it should be just a matter of saving it in
> > VirtualTupleTableSlot->size. This should avoid the first loop in
> > tts_virtual_materialize() and give some speed up. We will need a loop
> > to repoint non-byval Datums. I imagine that the pointers to non-byval
> > Datums can be computed as dest_slot->tts_values[natts] =
> > dest_vslot->data + (src_slot->tts_values[natts] - src_vslot->data).
> > This would work as long as all the non-byval datums in the source slot
> > are all stored flattened in source slot's data. I am assuming that
> > that would be true in a materialized virtual slot. The byval datums
> > are copied as is. I think, this will avoid multiple memcpy calls, one
> > per non-byval attribute and hence some speedup. I may be wrong though.
>
> hmm, do you think it's common enough that we copy an already
> materialised virtual slot?
>
> I tried adding the following code totts_virtual_copyslot and didn't
> see the NOTICE message when running each of your test queries. "make
> check" also worked without anything failing after adjusting
> nodeUnique.c to always use a TTSOpsVirtual slot.
>
> +       if (srcslot->tts_ops == &TTSOpsVirtual && TTS_SHOULDFREE(srcslot))
> +               elog(NOTICE, "We copied a materialized virtual slot!");
>
> I did get a failure in postgres_fdw's tests:
>
>    server loopback options (table_name 'tab_batch_sharded_p1_remote');
>  insert into tab_batch_sharded select * from tab_batch_local;
> +NOTICE:  We copied a materialized virtual slot!
> +NOTICE:  We copied a materialized virtual slot!
>
> so I think it's probably not that common that we'd gain from that optimisation.

Thanks for this analysis. If we aren't copying a materialized virtual
slot often, no point in adding that optimization.

>
> What might buy us a bit more would be to get rid of the for loop
> inside tts_virtual_copyslot() and copy the guts of
> tts_virtual_materialize() into tts_virtual_copyslot() and set the
> dstslot tts_isnull and tts_values arrays in the same loop that we use
> to calculate the size.
>
> I tried that in the attached patch and then tested it alongside the
> patch that changes the slot type.
>
> master = 74604a37f
> 1 = [1]
> 2 = optimize_tts_virtual_copyslot.patch
>
> Using the script from [2] and the setup from [3] but reduced to 10k
> tuples instead of 1 million.
>
> Times the average query time in milliseconds for a 60 second pgbench run.
>
> query    master      master+1        master+1+2    m/m+1         m/m+1+2
> Q1        2.616         2.082                   1.903       125.65%
>     137.47%
> Q2        9.479        10.593                10.361         89.48%
>      91.49%
> Q3        10.329      10.357                10.627         99.73%
>     97.20%
> Q4        7.272          7.306                 6.941          99.53%
>      104.77%
> Q5        7.597          7.043                 6.645        107.87%
>      114.33%
> Q6        162.177  161.037              162.807        100.71%         99.61%
> Q7        59.288      59.43                  61.294          99.76%
>      96.73%
>
>      103.25%        105.94%
>
> I only expect Q2 and Q3 to gain from this. Q1 shouldn't improve but
> did, so the results might not be stable enough to warrant making any
> decisions from.

I am actually surprised to see that eliminating loop is showing
improvements. There aren't hundreds of attributes involved in those
queries. So I share your stability concerns. And even with these
patches, Q2 and Q3 are still slower.

Q1 is consistently giving performance > 25% for both of us. But other
queries aren't showing a whole lot improvement. So I am wondering
whether it's worth pursuing this change; similar to the opinion you
expressed a few emails earlier.

>
> I was uncertain if the old behaviour of when srcslot contains fewer
> attributes than dstslot was intended or not.  What happens there is
> that we'd leave the additional old dstslot tts_values in place and
> only overwrite up to srcslot->natts but then we'd go on and
> materialize all the dstslot attributes. I think this might not be
> needed as we do dstslot->tts_nvalid = srcdesc->natts. I suspect we may
> be ok just to materialize the srcslot attributes and ignore the
> previous additional dstslot attributes.  Changing it to that would
> make the attached patch more simple.

We seem to use both tt_nvalid and tts_tupleDescriptor->natts. I forgot
what's the difference. If we do what you say, we might end up trying
to access unmaterialized values beyond tts_nvalid. Better to
investigate whether such a hazard exists.

--
Best Wishes,
Ashutosh Bapat



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alena Rybakina
Дата:
Сообщение: Invalid Path with UpperRel
Следующее
От: tender wang
Дата:
Сообщение: Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails