Re: Indexing queries with bit masks

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Indexing queries with bit masks
Дата
Msg-id 26517.1272644990@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Indexing queries with bit masks  (Peter Hunsberger <peter.hunsberger@gmail.com>)
Ответы Re: Indexing queries with bit masks  (Peter Hunsberger <peter.hunsberger@gmail.com>)
Re: Indexing queries with bit masks  (Mike Christensen <mike@kitchenpc.com>)
Список pgsql-general
Peter Hunsberger <peter.hunsberger@gmail.com> writes:
> If all subscriptions are roughly equal in popularity then any single
> select should give ~ 10% of the data.  That would seem to be selective
> enough that you'd really want an index?

My personal rule of thumb is that 10% is around the threshold where
indexes stop being very helpful.  At that selectivity, you're going
to be having to read every page of the table anyway, and it's not
clear that the extra I/O to read the index is going to get repaid in
CPU savings.  (Now if the table+index are fully cached in RAM, the
threshold's probably a bit higher, but there still is not reason to
think that an index is going to make for a huge improvement.)

> If so, any answers to the OP's main question; what would be the most
> efficient way to handle this type of thing?

Well, btree's right out for indexing bit selections.  In principle you
could maybe do something with a GIN index, but I don't think we ship
any prefab GIN opclasses for this.

[ thinks for a bit ]

The best idea that comes to mind offhand is to not use an integer, but a
boolean array, such that the queries look like

    select ... where subscriptions[4];

This already gives you one big advantage, which is that you're not
hard-wiring an assumption about how many notification types there can
ever be.  What I would then do is build a separate partial index for
each subscription column, ie,

    create index ... where subscriptions[1];
    create index ... where subscriptions[2];
    .. etc ..

Now this only works as long as the queries are referencing explicit
constant subscription numbers, else the planner won't be able to
match the WHERE clause to any of the partial indexes.  But if that
is a reasonable restriction for your app then it seems like it
should work.

The main disadvantage of this is that you need N indexes, which could
get a bit expensive if the table is updated heavily.  But you don't need
to bother maintaining indexes corresponding to subscriptions that are
too popular to be worth indexing, so some of that could be bought back
by careful index selection.

Another point is that the partial indexes could be created on some other
column(s) and thereby serve double duty.  This depends on the details of
your typical queries though.  Is the subscriptions[] clause usually used
by itself, or together with additional WHERE conditions?

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: akp geek
Дата:
Сообщение: Re: Function to Table reference
Следующее
От: Grzegorz Jaśkiewicz
Дата:
Сообщение: information_schema.parameters