Re: Speeding Up Bitmapset

Поиск
Список
Период
Сортировка
От Ranier Vilela
Тема Re: Speeding Up Bitmapset
Дата
Msg-id CAEudQAqArnSDczkz7a_wq9NCQFQ_r=6kHx_LODrOP+VVG0ywVA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Speeding Up Bitmapset  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Speeding Up Bitmapset  (David Rowley <dgrowleyml@gmail.com>)
Re: Speeding Up Bitmapset  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Em dom., 25 de jun. de 2023 às 17:49, David Rowley <dgrowleyml@gmail.com> escreveu:
There's no reason in the world
that those will speed up Bitmapsets, so why include them?
Of course optimization is the most important thing,
but since you're going to touch the source, why not make it more readable.
 

> v7 outperforms v4 by 29% (Query A)

I tested v7 with query-a and I also see additional gains. However,
it's entirely down to your changes to bms_is_subset(). It seems, by
chance, with the given Bitmapsets that looping backwards for the given
sets is able to determine the result more quickly
I redid the tests and it seems that most of the difference comes from bms_subset.
 

Here's some results from "perf top"

query-a
v4

  30.08%  postgres          [.] bms_is_subset
  15.84%  postgres          [.] create_join_clause
  13.54%  postgres          [.] bms_equal
  11.03%  postgres          [.] get_eclass_for_sort_expr
   8.53%  postgres          [.] generate_implied_equalities_for_column
   3.11%  postgres          [.] generate_join_implied_equalities_normal
   1.03%  postgres          [.] add_child_rel_equivalences
   0.82%  postgres          [.] SearchCatCacheInternal
   0.73%  postgres          [.] AllocSetAlloc
   0.53%  postgres          [.] find_ec_member_matching_expr
   0.40%  postgres          [.] hash_search_with_hash_value
   0.36%  postgres          [.] palloc
   0.36%  postgres          [.] palloc0

latency average = 452.480 ms

v7
  20.51%  postgres          [.] create_join_clause
  15.33%  postgres          [.] bms_equal
  14.17%  postgres          [.] get_eclass_for_sort_expr
  12.05%  postgres          [.] bms_is_subset
  10.40%  postgres          [.] generate_implied_equalities_for_column
   3.90%  postgres          [.] generate_join_implied_equalities_normal
   1.34%  postgres          [.] add_child_rel_equivalences
   1.06%  postgres          [.] AllocSetAlloc
   1.00%  postgres          [.] SearchCatCacheInternal
   0.72%  postgres          [.] find_ec_member_matching_expr
   0.58%  postgres          [.] palloc0
   0.49%  postgres          [.] palloc
   0.47%  postgres          [.] hash_search_with_hash_value
   0.44%  libc.so.6         [.] __memmove_avx_unaligned_erms


latency average = 350.543 ms

modified v7's bms_is_subset to go forwards then I get latency average
= 445.987 ms.

If I add some debugging to bms_is_subset to have it record how many
words it checks, I see:

postgres=# select sum(nwords) from forward;
    sum
-----------
 181490660
(1 row)

postgres=# select sum(nwords) from backwards;
   sum
----------
 11322564
(1 row)

So, it took about 181 million loops in bms_is_member to plan query-a
when looping forward, but only 11 million when looping backwards.

I think unless you've got some reason that you're able to justify why
we're always more likely to have to perform fewer word checks in
bms_is_subset() by looping backwards instead of forwards then I think
the performance gains you're showing here only happen to show better
results due to the given workload.  It's just as easy to imagine that
you'll apply the equivalent slowdown for some other workload where the
initial words differ but all the remaining words all match.
Have you seen bms_compare?
For some reason someone thought it would be better to loop backward the array.

Since bms_subset performed very well with backward, I think that's a good reason to leave it as bms_compare.
As in most cases, the array size is small, in general in both modes the performance will be equivalent.
 
Anyway, I made a new patch v8, based on v4 but with some changes that I believe improve it.

Windows 64 bits
msvc 2019 64 bits

== Query A ==
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-a.sql
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-a.sql
=============

patched v4:
Time: 2305,445 ms (00:02,305)
Time: 2185,972 ms (00:02,186)
Time: 2177,434 ms (00:02,177)
Time: 2169,883 ms (00:02,170)

patched v8:
Time: 2143,532 ms (00:02,144)
Time: 2140,313 ms (00:02,140)
Time: 2138,481 ms (00:02,138)
Time: 2130,290 ms (00:02,130)

== Query B ==
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-b.sql
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-b.sql

patched v4:
Time: 2684,360 ms (00:02,684)
Time: 2482,571 ms (00:02,483)
Time: 2452,699 ms (00:02,453)
Time: 2465,223 ms (00:02,465)

patched v8:
Time: 2493,281 ms (00:02,493)
Time: 2490,090 ms (00:02,490)
Time: 2432,515 ms (00:02,433)
Time: 2426,860 ms (00:02,427)


linux Ubuntu 64 bit
gcc 64 bits

== Query A ==
/usr/local/pgsql/bin/psql -U postgres -f /home/ranier/Documentos/benchmarks/bitmapset/create-tables-a.sql
/usr/local/pgsql/bin/psql -U postgres -f /home/ranier/Documentos/benchmarks/bitmapset/query-a.sql
=============

patched v4:
Time: 933,181 ms
Time: 931,520 ms
Time: 906,496 ms
Time: 872,446 ms

patched v8:
Time: 937,079 ms
Time: 930,408 ms
Time: 865,548 ms
Time: 865,382 ms

== Query B ==
/usr/local/pgsql/bin/psql -U postgres -f /home/ranier/Documentos/benchmarks/bitmapset/create-tables-b.sql
/usr/local/pgsql/bin/psql -U postgres -f /home/ranier/Documentos/benchmarks/bitmapset/query-b.sql

patched v4:
Time: 1581,317 ms (00:01,581)
Time: 1568,371 ms (00:01,568)
Time: 1468,036 ms (00:01,468)
Time: 1445,698 ms (00:01,446)

patched v8:
Time: 1437,997 ms (00:01,438)
Time: 1437,435 ms (00:01,437)
Time: 1440,422 ms (00:01,440)
Time: 1436,112 ms (00:01,436)

regards,
Ranier Vilela
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alena Rybakina
Дата:
Сообщение: Re: eqjoinsel_semi still sucks ...
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Row pattern recognition