Re: [HACKERS] Runtime Partition Pruning

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] Runtime Partition Pruning
Дата
Msg-id CAExHW5t5Q7JuUW28QMRO7szuHcbsfx4M9=WL+up40h3PCd7dXw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Runtime Partition Pruning  (Andy Fan <zhihui.fan1213@gmail.com>)
Ответы Re: [HACKERS] Runtime Partition Pruning  (Andy Fan <zhihui.fan1213@gmail.com>)
Re: [HACKERS] Runtime Partition Pruning  (Amit Langote <amitlangote09@gmail.com>)
Список pgsql-hackers
On Wed, Oct 7, 2020 at 7:00 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
>
>
> On Wed, Oct 7, 2020 at 5:05 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>>
>>
>>
>> On Sun, Oct 4, 2020 at 3:10 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> Now, in my experience, the current system for custom plans vs. generic
>>>> plans doesn't approach the problem in this way at all, and in my
>>>> experience that results in some pretty terrible behavior.  It will do
>>>> things like form a custom plan every time because the estimated cost
>>>> of the custom plan is lower than the estimated cost of the generic
>>>> plan even though the two plans are structurally identical; only the
>>>> estimates differ.  It will waste gobs of CPU cycles by replanning a
>>>> primary key lookup 5 times just on the off chance that a lookup on the
>>>> primary key index isn't the best option.  But this patch isn't going
>>>> to fix any of that.  The best we can probably do is try to adjust the
>>>> costing for Append paths in some way that reflects the costs and
>>>> benefits of pruning.  I'm tentatively in favor of trying to do
>>>> something modest in that area, but I don't have a detailed proposal.
>>>>
>>>
>>> I just realized this issue recently and reported it at [1], then Amit pointed
>>> me to this issue being discussed here, so I would like to continue this topic
>>> here.
>>>
>>> I think we can split the issue into 2 issues.  One is the partition prune in initial
>>> partition prune, which maybe happen in custom plan case only and caused
>>> the above issue.  The other one happens in the "Run-Time" partition prune,
>>> I admit that is an important issue to resolve as well, but looks harder.   So I
>>> think we can fix the first one at first.
>>>
>>> ... When we count for the cost of a
>>> generic plan, we can reduce the cost based on such information.
>>
>>
>> This way doesn't work since after the initial partition prune,  not only the
>> cost of the Append node should be reduced,  the cost of other plans should
>> be reduced as well [1]
>>
>> However I think if we can use partition prune information from a custom plan
>> at the cost_append_path stage,  it looks the issue can be fixed.  If so, the idea
>> is similar to David's idea in [2],  however Robert didn't agree with this[2].
>> Can anyone elaborate this objection?  for a partkey > $1 or BETWEEN cases,
>> some real results from the past are probably better than some hard-coded
>> assumptions IMO.
>
>
> I can understand Robert's idea now,  he intended to resolve both the
> "initial-partition-prune" case and "runtime partition prune" case while I always think
> about the former case (Amit reminded me about that at the beginning, but I just
> wake up right now..)
>
> With the Robert's idea,  I think we can do some hack at create_append_path,
> we can figure out the pruneinfo (it was done at create_append_plan now) at that
> stage, and then did a fix_append_path with such pruneinfo.  We need to fix not
> only the cost of AppendPath,  but also rows of AppendPath, For example:
> SELECT .. FROM t1, t2, p where t1.a = p.partkey and t1.b = t2.b, if the
> plan is:
> Nest Loop
>    Nest Loop
>       t1
>       Append (p)
>    t2
>
> Then the rows of Append (p) will be very important.
>
> For this idea,  how to estimate how many partitions/rows can be pruned is a key
> part.  Robert said "I was thinking, rather, that if we know for example that
> we've doing pruning on partition_column = $1, then we know that only
> one partition will match.  That's probably a common case.  If we've
> got partition_column > $1, we could assume that, say, 75% of the
> partitions would match.  partition_column BETWEEN $1 and $2 is
> probably a bit more selective, so maybe we assume 50% of the
> partitions would match.".   I think we can't say the 75% or 50% is a good
> number,  but the keypoint may be "partition_column = $1" is a common
> case.  And for the others case, we probably don't make it worse.
>
> I think we need to do something here, any thoughts? Personally I'm more
> like this idea now.

Yes, I think we have to do something on those lines. But I am
wondering why our stats machinery is failing to do that already. For
equality I think it's straight forward. But even for other operators
we should use the statistics from the partitioned table to estimate
the selectivity and then adjust number of scanned partitions = (number
of partitions * fraction of rows scanned). That might be better than
50% or 75%.

-- 
Best Wishes,
Ashutosh Bapat



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Следующее
От: Onder Kalaci
Дата:
Сообщение: Assertion failure with LEFT JOINs among >500 relations