Re: [HACKERS] Runtime Partition Pruning

Поиск

Список

Период

Сортировка

От	Andy Fan
Тема	Re: [HACKERS] Runtime Partition Pruning
Дата	7 октября 2020 г. 16:30:42
Msg-id	CAKU4AWotokoT=36RU-AV4OBTHu=4aKyDxLa7FPmS6=hFKpHyGw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] Runtime Partition Pruning (Andy Fan <zhihui.fan1213@gmail.com>)
Ответы	Re: [HACKERS] Runtime Partition Pruning
Список	pgsql-hackers

Дерево обсуждения

On Wed, Oct 7, 2020 at 5:05 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Sun, Oct 4, 2020 at 3:10 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Now, in my experience, the current system for custom plans vs. generic
plans doesn't approach the problem in this way at all, and in my
experience that results in some pretty terrible behavior. It will do
things like form a custom plan every time because the estimated cost
of the custom plan is lower than the estimated cost of the generic
plan even though the two plans are structurally identical; only the
estimates differ. It will waste gobs of CPU cycles by replanning a
primary key lookup 5 times just on the off chance that a lookup on the
primary key index isn't the best option. But this patch isn't going
to fix any of that. The best we can probably do is try to adjust the
costing for Append paths in some way that reflects the costs and
benefits of pruning. I'm tentatively in favor of trying to do
something modest in that area, but I don't have a detailed proposal.

I just realized this issue recently and reported it at [1], then Amit pointed
me to this issue being discussed here, so I would like to continue this topic
here.

I think we can split the issue into 2 issues. One is the partition prune in initial
partition prune, which maybe happen in custom plan case only and caused
the above issue. The other one happens in the "Run-Time" partition prune,
I admit that is an important issue to resolve as well, but looks harder. So I
think we can fix the first one at first.

... When we count for the cost of a
generic plan, we can reduce the cost based on such information.

This way doesn't work since after the initial partition prune, not only the
cost of the Append node should be reduced, the cost of other plans should
be reduced as well [1]

However I think if we can use partition prune information from a custom plan
at the cost_append_path stage, it looks the issue can be fixed. If so, the idea
is similar to David's idea in [2], however Robert didn't agree with this[2].
Can anyone elaborate this objection? for a partkey > $1 or BETWEEN cases,
some real results from the past are probably better than some hard-coded
assumptions IMO.

I can understand Robert's idea now, he intended to resolve both the

"initial-partition-prune" case and "runtime partition prune" case while I always think

about the former case (Amit reminded me about that at the beginning, but I just

wake up right now..)

With the Robert's idea, I think we can do some hack at create_append_path,

we can figure out the pruneinfo (it was done at create_append_plan now) at that

stage, and then did a fix_append_path with such pruneinfo. We need to fix not

only the cost of AppendPath, but also rows of AppendPath, For example:

SELECT .. FROM t1, t2, p where t1.a = p.partkey and t1.b = t2.b, if the

plan is:

Nest Loop

Append (p)

Then the rows of Append (p) will be very important.

For this idea, how to estimate how many partitions/rows can be pruned is a key

part. Robert said "I was thinking, rather, that if we know for example that

we've doing pruning on partition_column = $1, then we know that only
one partition will match. That's probably a common case. If we've
got partition_column > $1, we could assume that, say, 75% of the
partitions would match. partition_column BETWEEN $1 and $2 is
probably a bit more selective, so maybe we assume 50% of the
partitions would match.". I think we can't say the 75% or 50% is a good

number, but the keypoint may be "partition_column = $1" is a common

case. And for the others case, we probably don't make it worse.

I think we need to do something here, any thoughts? Personally I'm more

like this idea now.

Best Regards

Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Anastasia Lubennikova
Дата: 07 октября 2020 г., 16:05:08
Сообщение: Re: [PATCH] Automatic HASH and LIST partition creation

Следующее

От: Tom Lane
Дата: 07 октября 2020 г., 16:37:01
Сообщение: Re: pg_dump bug for extension owned tables

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Runtime Partition Pruning

Предыдущее

Следующее