Re: WIP: BRIN multi-range indexes

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: WIP: BRIN multi-range indexes
Дата	18 сентября 2020 г. 11:38:24
Msg-id	20200918113824.n7wfrccp7jiv7mvk@development обсуждение исходный текст
Ответ на	Re: WIP: BRIN multi-range indexes (John Naylor <john.naylor@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, Sep 17, 2020 at 05:42:59PM -0400, John Naylor wrote:
>On Thu, Sep 17, 2020 at 12:34 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Thu, Sep 17, 2020 at 10:33:06AM -0400, John Naylor wrote:
>> >On Sun, Sep 13, 2020 at 12:40 PM Tomas Vondra
>> ><tomas.vondra@2ndquadrant.com> wrote:
>> >> <20200913 patch set>
>> But those are opclass parameters, so the parameters are not specified in
>> WITH clause, but right after the opclass name:
>>
>> CREATE INDEX idx ON table USING brin (
>>         bigint_col int8_minmax_multi_ops(values_per_range = 15)
>> );
>
>D'oh!
>
>> >+ * The index only supports equality operator, similarly to hash indexes.
>> >
>> >s/operator/operators/
>> >
>>
>> Hmmm, are there really multiple equality operators?
>
>Ah, I see what you meant -- then "_the_ equality operator" is what we want.
>
>> The number of items the comment refers to is this:
>>
>>      /* how many uint32 hashes can we fit into the bitmap */
>>      int maxvalues = filter->nbits / (8 * sizeof(uint32));
>>
>> where nbits is the size of the bloom filter. So I think the "same" is
>> quite right here.
>
>Ok, I get it now.
>
>> >+ /*
>> >+ * The 1% value is mostly arbitrary, it just looks nice.
>> >+ */
>> >+#define BLOOM_DEFAULT_FALSE_POSITIVE_RATE 0.01 /* 1% fp rate */
>> >
>> >I think we'd want better stated justification for this default, even
>> >if just precedence in other implementations. Maybe I can test some
>> >other values here?
>> >
>>
>> Well, I don't know how to pick a better default :-( Ultimately it's a
>> tarde-off between larger indexes and scanning larger fraction of a table
>> due to lower false positive. Then there's the restriction that the whole
>> index tuple needs to fit into a single 8kB page.
>
>Well, it might be a perfectly good default, and it seems common in
>articles on the topic, but the comment is referring to aesthetics. :-)
>I still intend to test some cases.
>

I think we may formulate this as a question of how much I/O we need to
do for a random query, and pick the false positive rate minimizing that.
For a single BRIN range an approximation might look like this:

   bloom_size(fpr, ...) + (fpr * range_size) + (selectivity * range_size)

The "selectivity" shows the true selectivity of ranges, and it might be
esimated from a per-row selectivity I guess. But it does not matter much
because this is constant and independent of the false-positive rate, so
we can ignore it. Which leaves us with

   bloom_size(fpr, ...) + (fpr * range_size)

We might solve this for fixed parameters (range_size, ndistinct, ...),
either analytically or by brute force, giving us the "optimal" fpr.

The trouble is the bloom_size is restricted, and we don't really know
the limit - the whole index tuple needs to fit on a single 8kB page, and
there may be other BRIN summaries etc. So I've opted to use a somewhat
defensive default for the false positive rate.

>> >Also, is there a reference for the algorithm for hash values that
>> >follows? I didn't see anything like it in my cursory reading of the
>> >topic. Might be good to include it in the comments.
>> >
>>
>> This was suggested by Yura Sokolov [1] in 2017. The post refers to a
>> paper [2] but I don't recall which part describes "our" algorithm.
>>
>> [1] https://www.postgresql.org/message-id/94c173b54a0aef6ae9b18157ef52f03e@postgrespro.ru
>> [2] https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf
>
>Hmm, I came across that paper while doing background reading. Okay,
>now I get that "% (filter->nbits - 1)" is the second hash function in
>that scheme. But now I wonder if that second function should actually
>act on the passed "value" (the original hash), so that they are
>actually independent, as required. In the language of that paper, the
>patch seems to have
>
>g(x) = h1(x) + i*h2(h1(x)) + f(i)
>
>instead of
>
>g(x) = h1(x) + i*h2(x) + f(i)
>
>Concretely, I'm wondering if it should be:
>
> big_h = DatumGetUint32(hash_uint32(value));
> h = big_h % filter->nbits;
>-d = big_h % (filter->nbits - 1);
>+d = value % (filter->nbits - 1);
>
>But I could be wrong.
>
>Also, I take it that f(i) = 1 in the patch, hence the increment here:
>
>+ h += d++;
>
>But it's a little hidden. That might be worth commenting, if I haven't
>completely missed something.
>

OK

>> >+ * Tweak the ndistinct value based on the pagesPerRange value. First,
>> >
>> >Nitpick: "Tweak" to my mind means to adjust an existing value. The
>> >above is only true if ndistinct is negative, but we're really not
>> >tweaking, but using it as a scale factor. Otherwise it's not adjusted,
>> >only clamped.
>> >
>>
>> OK. Perhaps 'adjust' would be a better term?
>
>I felt like rewriting the whole thing, but your original gets the
>point across ok, really.
>
>"If the passed ndistinct value is positive, we can just use that, but
>we also clamp the value to prevent over-sizing the bloom filter
>unnecessarily. If it's negative, it represents a multiplier to apply
>to the maximum number of tuples in the range (assuming each page gets
>MaxHeapTuplesPerPage tuples, which is likely a significant
>over-estimate), similar to the corresponding value in table
>statistics."
>
>> >+ /* TODO include the sorted/unsorted values */
>> >
>>
>> This was simplemented as part of the discussion about pageinspect, and
>> I wanted some confirmation if the approach is acceptable or not before
>> spending more time on it. Also, the values are really just hashes of the
>> column values, so I'm not quite sure it makes sense to include that.
>> What would you suggest?
>
>My gut feeling is the hashes are not useful for this purpose, but I
>don't feel strongly either way.
>

OK. I share this feeling.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: WIP: BRIN multi-range indexes