Re: auto_explain vs. parallel query

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: auto_explain vs. parallel query
Дата
Msg-id 57bbc52c-5d40-bb80-2992-7e1027d0b67c@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: auto_explain vs. parallel query  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: auto_explain vs. parallel query
Список pgsql-hackers
On 11/01/2016 02:15 PM, Robert Haas wrote:
> On Mon, Oct 31, 2016 at 6:35 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> While debugging something on 9.6, I've noticed that auto_explain handles
>> parallel queries in a slightly strange way - both the leader and all the
>> workers log their chunk of the query (i.e. the leader logs explain for the
>> whole query, while workers only log the parts they've been executing).
>>
>> So for example for a query with 2 workers, the log look like this:
>>
>>    2016-10-31 23:10:23.481 CET [12278] LOG:  duration: 32.562 ms  pl ...
>>        Query Text: <parallel query>                                  ...
>>        Parallel Seq Scan on tables  (cost=0.00..15158.25 rows=220 wi ...
>>          Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
>>          Rows Removed by Filter: 140614                              ...
>>    2016-10-31 23:10:23.481 CET [12277] LOG:  duration: 32.325 ms  pl ...
>>        Query Text: <parallel query>                                  ...
>>        Parallel Seq Scan on tables  (cost=0.00..15158.25 rows=220 wi ...
>>          Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
>>          Rows Removed by Filter: 80948                               ...
>>    2016-10-31 23:10:23.483 CET [12259] LOG:  duration: 38.997 ms  pl ...
>>        Query Text: explain analyze select * from tables where table_ ...
>>        Gather  (cost=1000.00..16211.15 rows=529 width=356) (actual t ...
>>          Workers Planned: 2                                          ...
>>          Workers Launched: 2                                         ...
>>          ->  Parallel Seq Scan on tables  (cost=0.00..15158.25 rows= ...
>>                Filter: ((table_datoid < '1000000000'::oid) AND (tabl ...
>>                Rows Removed by Filter: 105570                        ...
>>
>> I'd say that's fairly surprising, and I haven't found any discussion about
>> auto_explain vs. parallel queries in the archives (within the last year), so
>> I assume this may not be entirely intentional.
>
> Yeah, that's not entirely intentional.  :-(
>
>> Not only this does not match the output when running EXPLAIN ANALYZE
>> manually, it also provides no information about which messages from workers
>> belong to which "leader" message.
>>
>> Another thing is that this interacts with sampling in a rather unfortunate
>> way, because each process evaluates the sampling condition independently. So
>> for example with auto_explain.sample_rate=0.5 a random subset of
>> worker/leader explain plans will be logged.
>>
>> But this also applies to the conditions in ExecutorStart, which enables
>> instrumentation only when the query gets sampled - so when the leader gets
>> sampled but not all the workers, the counters in the query logged by the
>> leader are incomplete.
>>
>> I do believe those are bugs that make auto_explain rather unusable with
>> parallel queries. Adding IsParallelWorker() to the condition in ExecutorEnd
>> should fix the extra logging messages (and log only from the leader).
>>
>> Not sure how to fix the sampling - simply adding IsParallelWorker() to
>> ExecutorStart won't do the trick, because we don't want to enable
>> instrumentation only for sample queries. So I guess this needs to be decided
>> in the leader, and communicated to the workers somehow.
>
> Yes, deciding in the leader and communicating to workers seems like
> the way to go, but I'm not sure how simple that's going to be to
> implement.
>

Clearly we need to pass some information to the worker processes, so
that they know whether to instrument the query or not. I don't know if
there's a good non-invasive way to do that from an extension - the
easiest way I can think of is using a bit of shared memory to pass the
"sample query" flag - attached is a patch that does that, and it seems
to be working fine for me.

The bad thing is that this requires auto_explain to be loaded from
shared_preload_libraries, because it needs to allocate the chunk of
shared memory. That's quite annoying, because auto_explain is often
useful for ad-hoc investigations, so being able to load it on the fly is
valuable. But perhaps DSM would fix this (sorry, my knowledge of the DSM
infrastructure is limited)?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: [PATCH] pgpassfile connection option
Следующее
От: Robert Haas
Дата:
Сообщение: Re: auto_explain vs. parallel query