Re: Detoasting optionally to make Explain-Analyze less misleading

Поиск
Список
Период
Сортировка
От stepan rutz
Тема Re: Detoasting optionally to make Explain-Analyze less misleading
Дата
Msg-id 03f952d6-b773-be93-e5d5-8f9ea9c2d461@gmx.de
обсуждение исходный текст
Ответ на Re: Detoasting optionally to make Explain-Analyze less misleading  (stepan rutz <stepan.rutz@gmx.de>)
Ответы Re: Detoasting optionally to make Explain-Analyze less misleading  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
Hi,

please see a revised version yesterday's mail. The patch attached now
provides the following:

EXPLAIN(ANALYZE,SERIALIZE)

and

EXPLAIN(ANALYZE,SERIALIZEBINARY)

and timing output.

Both options perform the serialization during analyze and provide an
additional output in the plan like this:


template1=# explain (analyze,serialize) select * from t12 limit 1;
                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------

  ...

  Serialized Bytes: 36 bytes
  Execution Time: 0.035 ms
(5 rows)

or also this


template1=# explain (analyze,serialize) select * from t1;
                                          QUERY PLAN
---------------------------------------------------------------------------------------------
  Seq Scan on t1  (cost=0.00..1.02 rows=2 width=19) (actual
time=0.101..0.111 rows=5 loops=1)
  Planning Time: 0.850 ms
  Serialized Bytes: 85777978 bytes
  Execution Time: 354.284 ms
(4 rows)


Its tempting to divide Serialized-Bytes by Execution-Time to get an idea
of the serialization bandwidth. This is /dev/null serialization though.
The results are length-counted and then discarded.

Since detoasting happens implicitly during serialization, the number of
bytes becomes huge in this case and accounts for the detoasted lengths
as well. I tried to get the number of bytes send for the protocol's
messages and the attribute headers correctly. For the actual values I am
quite sure I get the correct measures, as one can really tell by sending
more values across. Null is 4 bytes on the wire interestingly. I didn't
know that, but it makes sense, since its using the same prefix
length-field as all values do.

I have checked the JBDC driver and it uses binary and text formats
depending on an attribute's type oid. So having the SERIALIZEBINARY
option is not accurate, as in reality both formats can be occur for the
same tuple.

Please provide some feedback on the new patch and let me know if this
makes sense. In general this kind of option for EXPLAIN is a good thing
for sure.


Greetings,

Stepan


On 14.09.23 21:27, stepan rutz wrote:
> Hi Tom, Hi Matthias,
>
> you are right of course. I have looked at the code from printtup.c and
> made a new version of the patch.
>
> Thanks for the MemoryContextReset hint too (@Matthias)
>
> This time is called  EXPLAIN(ANALYZE,SERIALIZE) (hey, it also sounds
> nicer phonetically)
>
> If the option SERIALIZE is active, the output functions are called and
> they perform the detoasting, which I have even checked.
>
> So things are better this way, however I hardcoded the output option
> "Text" (format=0). In printtup.c there is an incoming array which
> applies Text (format=0) or Binary (format=1) for each column
> individually. I am not sure whether this is even needed. I left in the
> if-statement from printtup.c which calls the binary output method of a
> given type. The result of the output is ignored and apparently free'd
> because of the memory-context-reset at the end.
>
> Please also note, that I added a call to DestReceiver's rDestroy hook,
> which was missing from explain.c before altogether.
>
> Feedback is appreciated.
>
> /Stepan
>
>
> On 12.09.23 17:26, Tom Lane wrote:
>> Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
>>> Hmm, maybe we should measure the overhead of serializing the tuples
>>> instead.
>>> The difference between your patch and "serializing the tuples, but not
>>> sending them" is that serializing also does the detoasting, but also
>>> includes any time spent in the serialization functions of the type. So
>>> an option "SERIALIZE" which measures all the time the server spent on
>>> the query (except the final step of sending the bytes to the client)
>>> would likely be more useful than "just" detoasting.
>> +1, that was my immediate reaction to the proposal as well. Some
>> output functions are far from cheap.  Doing only the detoast part
>> seems like it's still misleading.
>>
>> Do we need to go as far as offering both text-output and binary-output
>> options?
>>
>>             regards, tom lane

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yurii Rashkovskii
Дата:
Сообщение: ALTER ROLE documentation improvement
Следующее
От: "David E. Wheeler"
Дата:
Сообщение: Re: JSON Path and GIN Questions