Re: [PoC] Reducing planning time when tables have many partitions

Поиск

Список

Период

Сортировка

От	Yuya Watari
Тема	Re: [PoC] Reducing planning time when tables have many partitions
Дата	5 июля 2022 г. 11:57:14
Msg-id	CAJ2pMkZ4Ag+Ogu6RBa5F0sOTFKwnoreGwTNiEBCSetK-G7K1nQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PoC] Reducing planning time when tables have many partitions (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: [PoC] Reducing planning time when tables have many partitions
Список	pgsql-hackers

Дерево обсуждения

Dear Tom,

Thank you for replying to my email.

On Mon, Jul 4, 2022 at 6:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not real thrilled with trying to throw hashtables at the problem,
> though.  As David noted, they'd be counterproductive for simple
> queries.

As you said, my approach that utilizes hash tables has some overheads,
leading to degradation in query planning time.

I tested the degradation by a brief experiment. In this experiment, I
used a simple query shown below.

===
SELECT students.name, gpas.gpa AS gpa, sum(scores.score) AS total_score
FROM students, scores, gpas
WHERE students.id = scores.student_id AND students.id = gpas.student_id
GROUP BY students.id, gpas.student_id;
===

Here, students, scores, and gpas tables have no partitions, i.e., they
are regular tables. Therefore, my techniques do not work for this
query and instead may lead to some regression. I repeatedly issued
this query 1 million times and measured their planning times.

The attached figure describes the distribution of the planning times.
The figure indicates that my patch has no severe negative impacts on
the planning performance. However, there seems to be a slight
degradation.

I show the mean and median of planning times below. With my patch, the
planning time became 0.002-0.004 milliseconds slower. We have to deal
with this problem, but reducing time complexity while keeping
degradation to zero is significantly challenging.

Planning time (ms)
             |  Mean | Median
------------------------------
 Master      | 0.682 |  0.674
 Patched     | 0.686 |  0.676
------------------------------
 Degradation | 0.004 |  0.002

Of course, the attached result is just an example. Significant
regression might occur in other types of queries.

> For the bms_equal class of lookups, I wonder if we could get anywhere
> by adding an additional List field to every RelOptInfo that chains
> all EquivalenceMembers that match that RelOptInfo's relids.
> The trick here would be to figure out when to build those lists.
> The simple answer would be to do it lazily on-demand, but that
> would mean a separate scan of all the EquivalenceMembers for each
> RelOptInfo; I wonder if there's a way to do better?
>
> Perhaps the bms_is_subset class could be handled in a similar
> way, ie do a one-time pass to make a List of all EquivalenceMembers
> that use a RelOptInfo.

Thank you for giving your idea. I will try to polish up my algorithm
based on your suggestion.

-- 
Best regards,
Yuya Watari

Вложения

figure.png

В списке pgsql-hackers по дате отправления:

Предыдущее

От: John Naylor
Дата: 05 июля 2022 г., 11:49:14
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum

Следующее

От: Peter Eisentraut
Дата: 05 июля 2022 г., 12:20:53
Сообщение: Re: [PATCH] Add result_types column to pg_prepared_statements view

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PoC] Reducing planning time when tables have many partitions

Вложения

Предыдущее

Следующее