Re: maximum columns for brin bloom indexes
| От | Jaime Casanova | 
|---|---|
| Тема | Re: maximum columns for brin bloom indexes | 
| Дата | |
| Msg-id | 20210408144918.GA3086@ahch-to обсуждение исходный текст | 
| Ответ на | Re: maximum columns for brin bloom indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) | 
| Ответы | Re: maximum columns for brin bloom indexes | 
| Список | pgsql-hackers | 
On Thu, Apr 08, 2021 at 12:18:36PM +0200, Tomas Vondra wrote:
> On 4/8/21 9:08 AM, Jaime Casanova wrote:
> > Hi everyone,
> > 
> > When testing brin bloom indexes I noted that we need to reduce the
> > PAGES_PER_RANGE parameter of the index to allow more columns on it.
> > 
> > Sadly, this could be a problem if you create the index before the table
> > grows, once it reaches some number of rows (i see the error as early as
> > 1000 rows) it starts error out.
> > 
> >     create table t1(i int, j int);
> >     
> >     -- uses default PAGES_PER_RANGE=128
> >     create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ;
> >     
> >     insert into t1 
> >         select random()*1000, random()*1000 from generate_series(1, 1000);
> >     ERROR:  index row size 8968 exceeds maximum 8152 for index "t1_i_j_idx"
> > 
> > if instead you create the index with a minor PAGES_PER_RANGE it goes
> > fine, in this case it works once you reduce it to at least 116
> > 
> >     create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) 
> >         with (pages_per_range=116);
> > 
> > 
> > so, for having:
> > two int columns - PAGES_PER_RANGE should be max 116
> > three int columns - PAGES_PER_RANGE should be max 77
> > one int and one timestamp - PAGES_PER_RANGE should be max 121 
> > 
> > and so on
> > 
> 
> No, because this very much depends on the number if distinct values in
> the page page range, which determines how well the bloom filter
> compresses. You used 1000, but that's just an arbitrary value and the
> actual data might have any other value. And it's unlikely that all three
> columns will have the same number of distinct values.
>
Ok, that makes sense. Still I see a few odd things: 
    """
    drop table if exists t1;
    create table t1(i int, j int);
    create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ;
    -- This one will succeed, I guess because it has less different
    -- values
    insert into t1
    select random()*20, random()*100 from generate_series(1, 1000);
    -- succeed
    insert into t1
    select random()*20, random()*100 from generate_series(1, 100000);
    -- succeed
    insert into t1
    select random()*200, random()*1000 from generate_series(1, 1000);
    -- succeed
    insert into t1
    select random()*200, random()*1000 from generate_series(1, 1000);
    -- succeed? This is the case it has been causing problems before
    insert into t1
    select random()*1000, random()*1000 from generate_series(1, 1000);
    """
Maybe this makes sense, but it looks random to me. If it makes sense
this is something we should document better. 
Let's try another combination:
    """
    drop table if exists t1;
    create table t1(i int, j int);
    create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ;
    -- this fails again
    insert into t1
    select random()*1000, random()*1000 from generate_series(1, 1000);
    -- and this starts to fail now, but this worked before
    insert into t1
    select random()*20, random()*100 from generate_series(1, 1000);
    """
> Of course, this also depends on the false positive rate.
> 
How the false positive rate work?
> FWIW I doubt people are using multi-column BRIN indexes very often.
> 
true. 
Another question, should we allow to create a brin multi column index
that uses different opclasses?
CREATE INDEX ON t1 USING brin (i int4_bloom_ops, j int4_minmax_ops);
-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL
		
	В списке pgsql-hackers по дате отправления: