Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Поиск

Список

Период

Сортировка

От	Andy Fan
Тема	Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Дата	26 августа 2020 г. 11:03:36
Msg-id	CAKU4AWo9Mod6ou-PY63Z4wVwtD98fqRF1WB+7-XGb8a7opAUXg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Hybrid Hash/Nested Loop joins and caching results from subplans (David Rowley <dgrowleyml@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, Aug 26, 2020 at 8:14 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 26 Aug 2020 at 05:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
>
> On Tue, Aug 25, 2020 at 11:53 PM Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2020-08-25 20:48:37 +1200, David Rowley wrote:
>> > Also, just in case anyone is misunderstanding this Andres' argument.
>> > It's entirely based on the performance impact of having an additional
>> > node.
>>
>> Not entirely, no. It's also just that it doesn't make sense to have two
>> nodes setting parameters that then half magically picked up by a special
>> subsidiary node type and used as a cache key. This is pseudo modularity,
>> not real modularity. And makes it harder to display useful information
>> in explain etc. And makes it harder to e.g. clear the cache in cases we
>> know that there's no further use of the current cache. At least without
>> piercing the abstraction veil.
>>
>>
>> > However, given the correct planner choice, there will never be
>> > a gross slowdown due to having the extra node.
>>
>> There'll be a significant reduction in increase in performance.
>
>
> If this is a key blocking factor for this topic, I'd like to do a simple hack
> to put the cache function into the subplan node, then do some tests to
> show the real difference. But it is better to decide how much difference
> can be thought of as a big difference. And for education purposes,
> I'd like to understand where these differences come from. For my
> current knowledge, my basic idea is it saves some function calls?

If testing this, the cache hit ratio will be pretty key to the
results. You'd notice the overhead much less with a larger cache hit
ratio since you're not pulling the tuple from as deeply a nested node.
I'm unsure how you'd determine what is a good cache hit ratio to
test it with.

I wanted to test the worst case where the cache hit ratio is 0. and then

compare the difference between putting the cache as a dedicated

node and in a SubPlan node. However, we have a better way

to test the difference based on your below message.

The lower the cache expected cache hit ratio, the higher
the cost of the Result Cache node will be, so the planner has less
chance of choosing to use it.

IIRC, we add the ResultCache for subplan nodes unconditionally now.

The main reason is we lack of ndistinct estimation during the subquery

planning. Tom suggested converting the AlternativeSubPlan to SubPlan

in setrefs.c [1], and I also ran into a case that can be resolved if we do

such conversion even earlier[2], the basic idea is we can do such conversation

once we can get the actual values for the subplan.

something like

if (bms_is_subset(subplan->deps_relids, rel->relids)

{

convert_alternativesubplans_to_subplan(rel);

}

you can see if that can be helpful for ResultCache in this user case. my

patch in [2] is still in a very PoC stage so it only takes care of subplan in

rel->reltarget.

Say you find a case with the hit ratio of 90%. Going by [1] I found
pulling a tuple through an additional node to cost about 12
nanoseconds on an intel 4712HQ CPU. With a hit ratio of 90% we'll
only pull 10% of tuples through the additional node, so that's about
1.2 nanoseconds per tuple, or 1.2 milliseconds per million tuples. It
might become hard to measure above the noise. More costly inner scans
will have the planner choose to Result Cache with lower estimated hit
ratios, but in that case, pulling the tuple through the additional
node during a cache miss will be less noticeable due to the more
costly inner side of the join.

Likely you could test the overhead only in theory without going to the
trouble of adapting the code to make SubPlan and Nested Loop do the
caching internally. If you just modify ExecResultCache() to have it
simply return its subnode, then measure the performance with and
without enable_resultcache, you should get an idea of the per-tuple
overhead of pulling the tuple through the additional node on your CPU.

Thanks for the hints. I think we can test it even easier with Limit node.

create table test_pull_tuples(a int);
insert into test_pull_tuples select i from generate_seri
insert into test_pull_tuples select i from generate_series(1, 100000)i;
-- test with pgbench.

select * from test_pull_tuples; 18.850 ms
select * from test_pull_tuples limit 100000; 20.500 ms

Basically it is 16 nanoseconds per tuple on my Intel(R) Xeon(R) CPU E5-2650.

Personally I'd say the performance difference is negligible unless I see some

different numbers.

[1] https://www.postgresql.org/message-id/1992952.1592785225%40sss.pgh.pa.us

[2] https://www.postgresql.org/message-id/CAKU4AWoMRzZKk1vPstKTjS7sYeN43j8WtsAZy2pv73vm_E_6dA%40mail.gmail.com

Best Regards

Andy Fan

В списке pgsql-hackers по дате отправления: