Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions
Дата
Msg-id CAA4eK1KHhRwBEgtrgGcTjO+VENYTk_u2UPsPMjfT7KY3in7L2A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Sun, Feb 26, 2017 at 4:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 26, 2017 at 6:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Sat, Feb 25, 2017 at 9:47 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>> On Sat, Feb 25, 2017 at 5:12 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> Sure, but that should only happen if the function is *not* declared as
>>>> parallel safe (aka in parallel safe functions, we should not generate
>>>> parallel plans).
>>>
>>> So basically we want to put a restriction that parallel-safe function
>>> can not use the parallel query? This will work but it seems too
>>> restrictive to me. Because by marking function parallel safe we enable
>>> it to be used with the outer parallel query that is fine. But, that
>>> should not restrict the function from using the parallel query if it's
>>> used with the other outer query which is not having the parallel
>>> plan(or function is executed directly).
>>
>> I think if the user is explicitly marking a function as parallel-safe,
>> then it doesn't make much sense to allow parallel query in such
>> functions as it won't be feasible for the planner (or at least it will
>> be quite expensive) to detect the same.  By the way, if the user has
>> any such expectation from a function, then he can mark the function as
>> parallel-restricted or parallel-unsafe.
>
> However, if a query is parallel-safe, it might not end up getting run
> in parallel.  In that case, it could still benefit from parallelism
> internally.  I think we want to allow that.  For example, suppose you
> run a query like:
>
> SELECT x, sum(somewhat_expensive_function(y)) FROM sometab GROUP BY 1;
>
> If sometab isn't very big, it's probably better to use a non-parallel
> plan for this query, because then somewhat_expensive_function() can
> still use parallelism internally, which might be better. However, if
> sometab is large enough, then it might be better to parallelize the
> whole query using a Partial/FinalizeAggregate and force each call to
> somewhat_expensive_function() to run serially.
>

Is there any easy way to find out which way is less expensive?  Even
if we find some way or just make a rule that when an outer query uses
parallelism, then force function call to run serially, how do we
achieve that?  I mean in each worker we can ensure that each
individual statements from a function can run serially (by having a
check of isparallelworker() in gather node), but having a similar
check in the master backend is tricky or maybe we don't want to care
for the same in master backend.  Do you have any suggestions on how to
make it work?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] dropping partitioned tables without CASCADE
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] error detail when partition not found