[PATCH] Improve EXPLAIN ANALYZE overhead by sampling

Поиск
Список
Период
Сортировка
This was a suggestion made back in March that would dramatically reduce
the overhead of EXPLAIN ANALYZE on queries that loop continuously over
the same nodes.

http://archives.postgresql.org/pgsql-hackers/2006-03/msg01114.php

What it does behave normally for the first 50 tuples of any node, but
after that it starts sampling at ever increasing intervals, the
intervals controlled by an exponential function. So for a node
iterating over 1 million tuples it takes around 15,000 samples. The
result is that EXPLAIN ANALYZE has a much reduced effect on the total
execution time.

Without EXPLAIN ANALYZE:

postgres=# select count(*) from generate_series(1,1000000);
  count
---------
 1000000
(1 row)

Time: 2303.599 ms

EXPLAIN ANALYZE without patch:

postgres=# explain analyze select count(*) from generate_series(1,1000000);
                                                             QUERY PLAN
            

------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15.00..15.01 rows=1 width=0) (actual time=8022.070..8022.073 rows=1 loops=1)
   ->  Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=1381.762..4873.369
rows=1000000loops=1) 
 Total runtime: 8042.472 ms
(3 rows)

Time: 8043.401 ms

EXPLAIN ANALYZE with patch:

postgres=# explain analyze select count(*) from generate_series(1,1000000);
                                                             QUERY PLAN
            

------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15.00..15.01 rows=1 width=0) (actual time=2469.491..2469.494 rows=1 loops=1)
   ->  Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=1405.002..2187.417
rows=1000000loops=1) 
 Total runtime: 2496.529 ms
(3 rows)

Time: 2497.488 ms

As you can see, the overhead goes from 5.7 seconds to 0.2 seconds.
Obviously this is an extreme case, but it will probably help in a lot
of other cases people have been complaining about.

- To get this close it needs to get an estimate of the sampling overhead.
It does this by a little calibration loop that is run once per backend.
If you don't do this, you end up assuming all tuples take the same time
as tuples with the overhead, resulting in nodes apparently taking
longer than their parent nodes. Incidently, I measured the overhead to
be about 3.6us per tuple per node on my (admittedly slightly old)
machine.

Note that the resulting times still include the overhead actually
incurred, I didn't filter it out. I want the times to remain reflecting
reality as closely as possible.

- I also removed InstrStopNodeMulti and made InstrStopNode take a tuple
count parameter instead. This is much clearer all round.

- I also didn't make it optional. I'm unsure about whether it should be
optional or not, given the number of cases where it will make a
difference to be very few.

- The tuple counter for sampling restarts every loop. Thus a node that is
called repeatedly only returning one value each time will still measure
every tuple, though its parent node won't. We'll need some field
testing to see if that remains a significant effect.

- I don't let the user know anywhere how many samples it took. Is this
something users should care about?

Any comments?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

В списке pgsql-patches по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: cast bytea to/from bit strings
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: [PATCH] Improve EXPLAIN ANALYZE overhead by sampling