Re: WIP: BRIN multi-range indexes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: WIP: BRIN multi-range indexes
Дата
Msg-id 061b24c0-87eb-51a7-17d6-e1ba8c59fd23@enterprisedb.com
обсуждение исходный текст
Ответ на Re: WIP: BRIN multi-range indexes  (John Naylor <john.naylor@enterprisedb.com>)
Ответы Re: WIP: BRIN multi-range indexes  (Thomas Munro <thomas.munro@gmail.com>)
Re: WIP: BRIN multi-range indexes  (John Naylor <john.naylor@enterprisedb.com>)
Список pgsql-hackers
Hi,

Attached is an updated version of the patch series, rebased on current 
master, and results for benchmark comparing the various bloom variants.

The improvements are fairly minor:

1) Rejecting bloom filters that are clearly too large (larger than page) 
early. This is imperfect, as it works for individual index keys, not the 
whole row. But per discussion it seems useful.

2) I've added sort_mode opclass parameter, allowing disabling the sorted 
mode the bloom indexes start in by default. I'm not convinced we should 
commit this, I've needed this for the benchmarking.


The benchmarking compares the three parts with different Bloom variants:

0004 - single hash with mod by (nbits) and (nbits-1)
0005 - two independent hashes (two random seeds)
0006 - partitioned approach, proposed by John Naylor

I'm attaching the shell script used to run the benchmark, and a summary 
of the results. The 0004 is used as a baseline, and the comparisons show 
speedups for 0005 and 0006 relative to that (if you scroll to the 
right). Essentially, green means "faster than 0004" while red means slower.

I don't think any of those approaches comes as a clearly superior. The 
results for most queries are within 2%, which is mostly just noise. 
There are cases where the differences are more significant (~10%), but 
it's in either direction and if you compare duration of the whole 
benchmark (by summing per-query averages) it's within 1% again.

For the "mismatch" case (i.e. looking for values not contained in the 
table) the differences are larger, but that's mostly due to luck and 
hitting false positives for that particular query - on average the 
differences are negligible, just like for the "match" case.

So based on this I'm tempted to just use the version with two hashes, as 
implemented in 0005. It's much simpler than the partitioning scheme, 
does not need any of the logic to generate primes etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Deleting older versions in unique indexes to avoid page splits
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: pg_preadv() and pg_pwritev()