Re: Making empty Bitmapsets always be NULL

Поиск

Список

Период

Сортировка

От	Yuya Watari
Тема	Re: Making empty Bitmapsets always be NULL
Дата	27 июня 2023 г. 12:11:14
Msg-id	CAJ2pMkZFv9PjoyTs+SEjY2utGdiB-0WG5uax15wjys8UpYMKBA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Making empty Bitmapsets always be NULL (David Rowley <dgrowleyml@gmail.com>)
Ответы	Re: Making empty Bitmapsets always be NULL
Список	pgsql-hackers

Дерево обсуждения

Hello,

Thank you for your reply and for creating the patch to measure planning times.

On Sat, Jun 24, 2023 at 1:15 PM David Rowley <dgrowleyml@gmail.com> wrote:
> Thanks for benchmarking. It certainly looks like a win for larger
> sets.  Would you be able to profile the 256 partition case to see
> where exactly master is so slow? (I'm surprised this patch improves
> performance that much.)

I have profiled the 256 partition case of the Join Order Benchmark
using the perf command. The attached figures are its frame graphs.
From these figures, we can see that bms_equal() function calls in blue
circles were heavy, and their performance improved after applying the
patch.

To investigate this further, I have created a patch
(profile-patch-for-*.txt) that profiles the bms_equal() function in
more detail. This patch
(1) prints what we are comparing by bms_equal, and
(2) measures the number of loops executed within bms_equal.
(1) is for debugging. (2) intends to see the effect of the
optimization to remove trailing zero words. The guarantee that the
last word is always non-zero enables us to immediately determine two
Bitmapsets having different nwords are not the same. When the patch
works effectively, the (2) will be much smaller than the total number
of the function calls. I will show the results as follows.

=== Master ===
[bms_equal] Comparing (b 335) and (b 35)
[bms_equal] Comparing (b 1085) and (b 61)
[bms_equal] Comparing (b 1208) and (b 86)
[bms_equal] Comparing (b 781) and (b 111)
[bms_equal] Comparing (b 361) and (b 135)
...
[bms_equal] Comparing (b 668) and (b 1773)
[bms_equal] Comparing (b 651) and (b 1781)
[bms_equal] Comparing (b 1191) and (b 1789)
[bms_equal] Comparing (b 771) and (b 1797)
[bms_equal] Total 3950748839 calls, 3944762037 loops executed
==============

=== Patched ===
[bms_equal] Comparing (b 335) and (b 35)
[bms_equal] Comparing (b 1085) and (b 61)
[bms_equal] Comparing (b 1208) and (b 86)
[bms_equal] Comparing (b 781) and (b 111)
[bms_equal] Comparing (b 361) and (b 135)
...
[bms_equal] Comparing (b 668) and (b 1773)
[bms_equal] Comparing (b 651) and (b 1781)
[bms_equal] Comparing (b 1191) and (b 1789)
[bms_equal] Comparing (b 771) and (b 1797)
[bms_equal] Total 3950748839 calls, 200215204 loops executed
===============

The above results reveal that the bms_equal() in this workload
compared two singleton Bitmapsets in most cases, and their members
were more than 64 apart. Therefore, we could have omitted 94.9% of
3,950,748,839 loops with the patch, whereas the percentage was only
0.2% in the master. This is why we obtained a significant performance
improvement and is evidence that the optimization of this patch worked
very well.

The attached figures show these bms_equal() function calls exist in
make_pathkey_from_sortinfo(). The actual location is
get_eclass_for_sort_expr(). I quote the code below.

=====
EquivalenceClass *
get_eclass_for_sort_expr(PlannerInfo *root,
                         Expr *expr,
                         List *opfamilies,
                         Oid opcintype,
                         Oid collation,
                         Index sortref,
                         Relids rel,
                         bool create_it)
{
    ...

    foreach(lc1, root->eq_classes)
    {
        EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
        ...

        foreach(lc2, cur_ec->ec_members)
        {
            EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);

            /*
             * Ignore child members unless they match the request.
             */
            if (cur_em->em_is_child &&
                !bms_equal(cur_em->em_relids, rel)) // <--- Here
                continue;

            ...
        }
    }
    ...
}
=====

The bms_equal() is used to find an EquivalenceMember satisfying some
conditions. The above heavy loop was the bottleneck in the master.
This bottleneck is what I am trying to optimize in another thread [1]
with you. I hope the optimization in this thread will help [1]'s speed
up. (Looking at CFbot, I noticed that [1]'s patch does not compile due
to some compilation errors. I will send a fixed version soon.)

> I think it's also important to check we don't slow anything down for
> more normal-sized sets.  The vast majority of sets will contain just a
> single word, so we should probably focus on making sure we're not
> slowing anything down for those.

I agree with you and thank you for sharing the results. I ran
installcheck with your patch. The result is as follows. The speedup
was 0.33%. At least in my environment, I did not observe any
regression with this test. So, the patch looks very good.

Master:  2.559648 seconds
Patched: 2.551116 seconds (0.33% faster)

[1] https://www.postgresql.org/message-id/CAJ2pMkY10J_PA2jpH5M-VoOo6BvJnTOO_-t_znu_pOaP0q10pA@mail.gmail.com

--
Best regards,
Yuya Watari

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "Joel Jacobson"
Дата: 27 июня 2023 г., 11:26:44
Сообщение: Re: Do we want a hashset type?

Следующее

От: David Rowley
Дата: 27 июня 2023 г., 12:19:26
Сообщение: Add bump memory context type and use it for tuplesorts

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Making empty Bitmapsets always be NULL

Вложения

Предыдущее

Следующее