Re: WIP: BRIN multi-range indexes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: WIP: BRIN multi-range indexes
Дата
Msg-id 28c69eb3-3b93-f9f8-1e76-cfac1eda5254@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: WIP: BRIN multi-range indexes  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
Apparently cputube did not pick the last version of the patches I've
submitted in December (and I don't see the message in the thread in
archive either), so it's listed as broken.

So here we go again, hopefully this time everything will go through ...

regards

On 12/28/18 12:45 AM, Tomas Vondra wrote:
> Hi all,
> 
> Attached is an updated/rebased version of the patch series. There are no
> changes to behavior, but let me briefly summarize the current state:
> 
> 0001 and 0002
> -------------
> 
> The first two parts are "just" refactoring the existing code to pass all
> scankeys to the opclass at once - this is needed by the new minmax-like
> opclass, but per discussion with Alvaro it seems worthwhile even
> independently. I tend to agree with that. Similarly for the second part,
> which moves all IS NULL checks entirely to bringetbimap().
> 
> 0003 bloom opclass
> ------------------
> 
> The first new opclasss, based on bloom filters. For each page range
> (i.e. 1MB by default) a small bloom filter is built (with hash values of
> the original values as inputs), and then used to evaluate equality
> queries. A small optimization is that initially the actual (hash) values
> are kept until reaching the bloom filter size. This improves behavior in
> low-cardinality data sets.
> 
> Picking the bloom filter parameters is the tricky part - we don't have a
> reliable source of such information (namely number of distinct values
> per range), and e.g. the false positive rate actually has to be picked
> by the user because it's a compromise between index size and accuracy.
> Essentially, false positive rate is the fraction of the table that has
> to be scanned for a random value (on average). But it also makes the
> index larger, because the per-range bloom filters will be larger.
> 
> Another reason why this needs to be defined by the user is that the
> space for index tuple is limited by one page (8kB by default), so we
> can't allow the bloom filter to be larger (we have to assume it's
> non-compressible, because in the optimal fill it's 50% 0s and 1s). But
> the BRIN index may be multi-column, and the limit applies to the whole
> tuple. And we don't know what the opclasses or parameters of other
> columns are.
> 
> So the patch simply adds two reloptions
> 
> a) n_distinct_per_range - number of distinct values per range
> b) false_positive_rate - false positive rate of the filter
> 
> There are some simple heuristics to ensure the values are reasonable
> (e.g. upper limit for number of distinct values, etc.) and perhaps we
> might consider stats from the underlying table (when not empty), but the
> patch does not do that.
> 
> 
> 0004 multi-minmax opclass
> -------------------------
> 
> The second opclass addresses a common issue for minmax indexes, where
> the table is initially nicely correlated with the index, and it works
> fine. But then deletes/updates route data into other parts of the table
> making the ranges very wide ad rendering the BRIN index inefficient.
> 
> One way to deal improve this would be considering the index(es) while
> routing the new tuple, i.e. looking not only for page with enough free
> space, but for pages in already matching ranges (or close to it).
> 
> A partitioning is a possible approach so segregate the data. But it's
> certainly much higher overhead, both in terms of maintenance and
> planning (particularly with  1:1 of ranges vs. partitions).
> 
> So the new multi-minmax opclass takes a different approach, replacing
> the one minmax range with multiple ranges (64 boundary values or 32
> ranges by default). Initially individual values are stored, and after
> reaching the maximum number of values the values are merged into ranges
> by distance. This allows handling outliers very efficiently, because
> they will not be merged with the "main" range for as long as possible.
> 
> Similarly to the bloom opclass, the main challenge here is deciding the
> parameter - in this case, it's "number of values per range". Again, it's
> a compromise vs. index size and efficiency. The default (64 values) is
> fairly reasonable, but ultimately it's up to the user - there is a new
> reloption "values_per_range".
> 
> 
> 
> regards
> 

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: FETCH FIRST clause PERCENT option
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: explain plans with information about (modified) gucs