Re: [HACKERS] why not parallel seq scan for slow functions

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: [HACKERS] why not parallel seq scan for slow functions
Дата	20 февраля 2018 г. 17:43:03
Msg-id	CAA4eK1LtQCjFNm6yLRy4Do0kiB5adDiW3D139jteyMof=oFnmQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] why not parallel seq scan for slow functions (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Список	pgsql-hackers

Дерево обсуждения

On Mon, Feb 19, 2018 at 9:56 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Mon, Feb 19, 2018 at 9:35 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> On Sat, Feb 17, 2018 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat
>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>> On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>> On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
>>>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>>> I happened to look at the patch for something else. But here are some
>>>>>> comments. If any of those have been already discussed, please feel
>>>>>> free to ignore. I have gone through the thread cursorily, so I might
>>>>>> have missed something.
>>>>>>
>>>>>> In grouping_planner() we call query_planner() first which builds the
>>>>>> join tree and creates paths, then calculate the target for scan/join
>>>>>> rel which is applied on the topmost scan join rel. I am wondering
>>>>>> whether we can reverse this order to calculate the target list of
>>>>>> scan/join relation and pass it to standard_join_search() (or the hook)
>>>>>> through query_planner().
>>>>>>
>>>>>
>>>>> I think the reason for doing in that order is that we can't compute
>>>>> target's width till after query_planner().  See the below note in
>>>>> code:
>>>>>
>>>>> /*
>>>>> * Convert the query's result tlist into PathTarget format.
>>>>> *
>>>>> * Note: it's desirable to not do this till after query_planner(),
>>>>> * because the target width estimates can use per-Var width numbers
>>>>> * that were obtained within query_planner().
>>>>> */
>>>>> final_target = create_pathtarget(root, tlist);
>>>>>
>>>>> Now, I think we can try to juggle the code in a way that the width can
>>>>> be computed later, but the rest can be done earlier.
>>>>
>>>> /* Convenience macro to get a PathTarget with valid cost/width fields */
>>>> #define create_pathtarget(root, tlist) \
>>>>     set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
>>>> create_pathtarget already works that way. We will need to split it.
>>>>
>>>> Create the Pathtarget without widths. Apply width estimates once we
>>>> know the width of Vars somewhere here in query_planner()
>>>>     /*
>>>>      * We should now have size estimates for every actual table involved in
>>>>      * the query, and we also know which if any have been deleted from the
>>>>      * query by join removal; so we can compute total_table_pages.
>>>>      *
>>>>      * Note that appendrels are not double-counted here, even though we don't
>>>>      * bother to distinguish RelOptInfos for appendrel parents, because the
>>>>      * parents will still have size zero.
>>>>      *
>>>> The next step is building the join tree. Set the pathtarget before that.
>>>>
>>>>> However, I think
>>>>> that will be somewhat major change
>>>>
>>>> I agree.
>>>>
>>>>>  still won't address all kind of
>>>>> cases (like for ordered paths) unless we can try to get all kind of
>>>>> targets pushed down in the call stack.
>>>>
>>>> I didn't understand that.
>>>>
>>>
>>> The places where we use a target different than the target which is
>>> pushed via query planner can cause a similar problem (ex. see the
>>> usage of adjust_paths_for_srfs) because the cost of that target
>>> wouldn't be taken into consideration for Gather's costing.
>>>
>>
>> Right. Right now apply_projection_to_path() or adjust_paths_for_srfs()
>> do not take into consideration the type of path whose cost is being
>> adjusted for the new targetlist. That works for most of the paths but
>> not all the paths like custom, FDW or parallel paths. The idea I am
>> proposing is to compute final targetlist before query planner so that
>> it's available when we create paths for the topmost scan/join
>> relation. That way, any path creation logic can then take advantage of
>> this list to compute costs, not just parallel paths.
>
> In fact, we should do this not just for scan/join relations, but all
> the upper relations as well. Upper relations too create parallel
> paths, whose costs need to be adjusted after their targetlists are
> updated by adjust_paths_for_srfs(). Similar adjustments are needed for
> any FDWs, custom paths which cost targetlists differently.
>

I think any such change in planner can be quite tricky and can lead to
a lot of work.  I am not denying that it is not possible to think
along the lines you are suggesting, but OTOH, I don't see it as a
realistic approach for this patch where we can deal with the majority
of cases with the much smaller patch.  In future, if you are someone
can have a patch along those lines for some other purpose (considering
it is feasible to do so which I am not completely sure), then we can
adjust the things for parallel paths as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: David Rowley
Дата: 20 февраля 2018 г., 17:38:27
Сообщение: Re: ALTER TABLE ADD COLUMN fast default

Следующее

От: Amit Kapila
Дата: 20 февраля 2018 г., 18:39:41
Сообщение: Re: heap_lock_updated_tuple_rec can leak a buffer refcount

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] why not parallel seq scan for slow functions

Предыдущее

Следующее