"Measuring timing overhead" in docs seems misleading

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	"Measuring timing overhead" in docs seems misleading
Дата	16 января 2023 г. 21:39:13
Msg-id	20230116213913.4oseovlzvc2674z7@awork3.anarazel.de обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

Hi,

I was trying to extract a commitable piece out of [1]. To be able to judge
changes in timing overhead more accurately, I thought it'd be sensible to
update pg_test_timing to report nanoseconds instead of microseconds. Which
lead to trying to update pg_test_timing's docs [2].

The "Measuring Executor Timing Overhead" section seems misleading:
  <para>
   The i7-860 system measured runs the count query in 9.8 ms while
   the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms, each
   processing just over 100,000 rows.  That 6.8 ms difference means the timing
   overhead per row is 68 ns, about twice what pg_test_timing estimated it
   would be.  Even that relatively small amount of overhead is making the fully
   timed count statement take almost 70% longer.  On more substantial queries,
   the timing overhead would be less problematic.
  </para>

The main reason for the ~2x discrepancy is that we do 2 timestamp calls for
each invocation of an executor node, one in InstrStartNode(), one in
InstrStopNode(). I think this should be clarified in the section?


I also think we should update the section to compare
EXPLAIN (ANALYZE, TIMING OFF) with
EXPLAIN (ANALYZE, TIMING ON)
rather than comparing the "bare" statement with EXPLAIN ANALYZE. There's
plenty other overhead in EXPLAIN, even without TIMING ON.

With the instr_time-as-nanosec patches applied (I'll post a new version in a
few minutes), I get the following:

pg_test_timing:
Per loop time including overhead: 13.97 ns
Histogram of timing durations:
     < ns   % of total      count
       16     97.48221  209400569
       32      2.51201    5396022
       64      0.00477      10246
      128      0.00030        640
      256      0.00005        117
      512      0.00000          0
     1024      0.00000          3
     2048      0.00034        729
     4096      0.00001         14
     8192      0.00000          8
    16384      0.00015        320
    32768      0.00014        303
    65536      0.00001         26
   131072      0.00000          0
   262144      0.00000          1

psql -Xc 'DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT * FROM generate_series(1, 100000) g(i);' && pgbench -n -r -t
100-f <(echo -e "SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING
ON)SELECT COUNT(*) FROM t;") |grep '^ '
 
DROP TABLE
SELECT 100000
         3.431           0  SELECT COUNT(*) FROM t;
         3.888           0  EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t;
         6.671           0  EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;

Pgbench reports about 11% lost just from TIMING OFF ANALYZE, and a further 45%
from TIMING ON. The per-row overhead, compared between TIMING ON/OFF:

((6.187ms - 3.423 ms) * 1000000)/(100000 * 2) = 13.82ns

which is within the run-to-run variance of the pg_test_timing result.

Greetings,

Andres Freund

[1] https://postgr.es/m/20230116023639.rn36vf6ajqmfciua%40awork3.anarazel.de
[2] https://www.postgresql.org/docs/current/pgtesttiming.html#id-1.9.5.11.7.3

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

"Measuring timing overhead" in docs seems misleading