Re: Performant queries on table with many boolean columns

Поиск
Список
Период
Сортировка
От bricklen
Тема Re: Performant queries on table with many boolean columns
Дата
Msg-id CAGrpgQ9PR8zh2Hdnoa3VnThsbAYo4GA1khbbo59+t9ddAwMqTg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performant queries on table with many boolean columns  (Rob Imig <rimig88@gmail.com>)
Ответы Re: Performant queries on table with many boolean columns  (bricklen <bricklen@gmail.com>)
Список pgsql-performance


On Fri, Apr 22, 2016 at 6:57 AM, Rob Imig <rimig88@gmail.com> wrote:
Just to followup where I'm at, I've constructed a new column which is a 100 bit bitstring representing all the flags. Created a b-tree index on that column and can now do super fast lookups (2) for specific scenarios however getting the behavior I need would require a huge amount of OR conditions (as Rick mentioned earlier). Another option is to do bitwiser operators (3) but that seems really slow. Not sure how I can speed that up.

I tried a slightly different tact - how about creating a function-based md5() index over your columns and doing the same for you input values? For the test I ran, I used a char datatype with two possible values: '1' (true) and '0' (false).
The columns were named (for simplicity), c1 to c100.

eg.
create index lots_of_columns_md5_idx on lots_of_columns (
md5(c1||c2||c3||c4||c5||c6||c7||c8||c9||c10||
c11||c12||c13||c14||c15||c16||c17||c18||c19||c20||
c21||c22||c23||c24||c25||c26||c27||c28||c29||c30||
c31||c32||c33||c34||c35||c36||c37||c38||c39||c40||
c41||c42||c43||c44||c45||c46||c47||c48||c49||c50||
c51||c52||c53||c54||c55||c56||c57||c58||c59||c60||
c61||c62||c63||c64||c65||c66||c67||c68||c69||c70||
c71||c72||c73||c74||c75||c76||c77||c78||c79||c80||
c81||c82||c83||c84||c85||c86||c87||c88||c89||c90||
c91||c92||c93||c94||c95||c96||c97||c98||c99||c100)
) with (fillfactor=100);

The query then looked like:
select ...
from ...
where md5(all||the||columns) = md5(all||your||values);

The test data I fabricated wasn't necessarily 85% true as you expect your data to be, but the tests I ran were returning results in single-digit milliseconds for a 1M row table. The queries become a bit more difficult to create as you need to concatenate all the values together. You could pass the list of columns into a function to abstract that away from the query, but that might mess with the planner.
Note that the method suggested here relies on column ordering always being the same, otherwise the hash will be different/inaccurate.
 

В списке pgsql-performance по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: Performance problems with postgres and null Values?
Следующее
От: bricklen
Дата:
Сообщение: Re: Performant queries on table with many boolean columns