Re: visualizing B-tree index coverage

Поиск
Список
Период
Сортировка
От PFC
Тема Re: visualizing B-tree index coverage
Дата
Msg-id opsk8jtbemth1vuj@musicbox
обсуждение исходный текст
Ответ на Re: visualizing B-tree index coverage  ("TJ O'Donnell" <tjo@acm.org>)
Ответы Re: visualizing B-tree index coverage  (TJ O'Donnell <tjo@acm.org>)
Список pgsql-general
    I think you missed an important "feature" of multicolumn indexes, that
you better not use 'OR' in your expressions. You seem to want only to use
'>=' so this should be OK.

    Suppose you have 3 columns a,z,e containing values linearly distributed
between ...

select min(a),max(a),min(z),max(z),min(e),max(e) from test;
  min | max | min | max | min | max
-----+-----+-----+-----+-----+-----
    0 |  13 |   0 |  99 |   0 |  99


For instance the following query is indexed :
explain analyze select * from test where a>=0 and z>=90 and e>=0;
                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
  Index Scan using testa on test  (cost=0.00..1637.56 rows=11345 width=16)
(actual time=0.085..51.441 rows=13000 loops=1)
    Index Cond: ((a >= 0) AND (z >= 90) AND (e >= 0))
  Total runtime: 56.307 ms

The following is only partially indexed :

explain analyze select * from test where (a=1 or a=2) and (z=1 or z=8) and
e>=0;
                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
  Index Scan using testa, testa on test  (cost=0.00..3269.06 rows=346
width=16) (actual time=0.328..52.961 rows=400 loops=1)
    Index Cond: ((a = 1) OR (a = 2))
    Filter: (((z = 1) OR (z = 8)) AND (e >= 0))
  Total runtime: 53.297 ms

    You see the 'index cond' field which is what determines the fetched rows,
which are then fetched and filtered with the 'filter' expression. Having
the most selective index cond is important because it will diminish the
number of rows to be fetched. However, in your case the filter expression
is also beneficial because any row eliminated by the filter will not need
to go through your expensive matching function.

In this case :

SELECT count(*) FROM test;
    => 131072

SELECT count(*) FROM test WHERE ((a = 1) OR (a = 2));
    => 20000

SELECT count(*) FROM test WHERE (a=1 or a=2) and (z=1 or z=8) and e>=0;
    => 400

In this case the index fetches 20k rows out of 131072 but only 400 are
used...

If you don't use OR, index use is more likely :

explain analyze select * from test where (a,z,e) >= (0,50,80);
                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
  Index Scan using testa on test  (cost=0.00..1669.78 rows=12627 width=16)
(actual time=0.087..58.316 rows=13000 loops=1)
    Index Cond: ((a >= 0) AND (z >= 50) AND (e >= 80))
  Total runtime: 63.049 ms

Here you have a full index scan.

To determine the efficiency of your indexes, you can thus use this method,
and look at the 'index cond' and 'filter' expressions, and counting the
rows matched by each.





















> particular number of columns
> for indexing.  I don't want to use too many, nor too few columns.  I also
> want to optimize the nature(which atom types, bond types, etc.)
> of the count columns.  While I could do this
> and use the speedup as the measure of success, I think
> that if my B-tree were "covering" the data well, I would get the best
> results.
> Covering means finding that optimal situation where there is not one
> index for all rows
> and also not a unique index for every row - something inbetween would be
> ideal,
> or is that basically a wrong idea?

В списке pgsql-general по дате отправления:

Предыдущее
От: PFC
Дата:
Сообщение: Re: Recursive queries
Следующее
От: phil campaigne
Дата:
Сообщение: Numeric type