Re: [HACKERS] Runtime Partition Pruning

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: [HACKERS] Runtime Partition Pruning
Дата
Msg-id CAKU4AWotokoT=36RU-AV4OBTHu=4aKyDxLa7FPmS6=hFKpHyGw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Runtime Partition Pruning  (Andy Fan <zhihui.fan1213@gmail.com>)
Ответы Re: [HACKERS] Runtime Partition Pruning  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список pgsql-hackers


On Wed, Oct 7, 2020 at 5:05 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:


On Sun, Oct 4, 2020 at 3:10 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:


Now, in my experience, the current system for custom plans vs. generic
plans doesn't approach the problem in this way at all, and in my
experience that results in some pretty terrible behavior.  It will do
things like form a custom plan every time because the estimated cost
of the custom plan is lower than the estimated cost of the generic
plan even though the two plans are structurally identical; only the
estimates differ.  It will waste gobs of CPU cycles by replanning a
primary key lookup 5 times just on the off chance that a lookup on the
primary key index isn't the best option.  But this patch isn't going
to fix any of that.  The best we can probably do is try to adjust the
costing for Append paths in some way that reflects the costs and
benefits of pruning.  I'm tentatively in favor of trying to do
something modest in that area, but I don't have a detailed proposal.


I just realized this issue recently and reported it at [1], then Amit pointed
me to this issue being discussed here, so I would like to continue this topic
here. 

I think we can split the issue into 2 issues.  One is the partition prune in initial
partition prune, which maybe happen in custom plan case only and caused
the above issue.  The other one happens in the "Run-Time" partition prune, 
I admit that is an important issue to resolve as well, but looks harder.   So I
think we can fix the first one at first. 

... When we count for the cost of a
generic plan, we can reduce the cost based on such information. 
 
This way doesn't work since after the initial partition prune,  not only the
cost of the Append node should be reduced,  the cost of other plans should
be reduced as well [1]

However I think if we can use partition prune information from a custom plan
at the cost_append_path stage,  it looks the issue can be fixed.  If so, the idea
is similar to David's idea in [2],  however Robert didn't agree with this[2]. 
Can anyone elaborate this objection?  for a partkey > $1 or BETWEEN cases, 
some real results from the past are probably better than some hard-coded 
assumptions IMO. 

I can understand Robert's idea now,  he intended to resolve both the 
"initial-partition-prune" case and "runtime partition prune" case while I always think
about the former case (Amit reminded me about that at the beginning, but I just
wake up right now..)

With the Robert's idea,  I think we can do some hack at create_append_path,
we can figure out the pruneinfo (it was done at create_append_plan now) at that
stage, and then did a fix_append_path with such pruneinfo.  We need to fix not
only the cost of AppendPath,  but also rows of AppendPath, For example:  
SELECT .. FROM t1, t2, p where t1.a = p.partkey and t1.b = t2.b, if the
plan is:
Nest Loop
   Nest Loop
      t1
      Append (p)
   t2

Then the rows of Append (p) will be very important. 

For this idea,  how to estimate how many partitions/rows can be pruned is a key
part.  Robert said "I was thinking, rather, that if we know for example that
we've doing pruning on partition_column = $1, then we know that only
one partition will match.  That's probably a common case.  If we've
got partition_column > $1, we could assume that, say, 75% of the
partitions would match.  partition_column BETWEEN $1 and $2 is
probably a bit more selective, so maybe we assume 50% of the
partitions would match.".   I think we can't say the 75% or 50% is a good
number,  but the keypoint may be "partition_column = $1" is a common
case.  And for the others case, we probably don't make it worse.

I think we need to do something here, any thoughts? Personally I'm more
like this idea now.

--
Best Regards
Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Anastasia Lubennikova
Дата:
Сообщение: Re: [PATCH] Automatic HASH and LIST partition creation
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_dump bug for extension owned tables