Re: [PERFORM] Number of characters in column preventing index usage

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [PERFORM] Number of characters in column preventing index usage
Дата
Msg-id bdba910d-f3f5-7eaa-f66d-02ff29361cec@2ndquadrant.com
обсуждение исходный текст
Ответ на [PERFORM] Number of characters in column preventing index usage  (Hustler DBA <hustlerdba@gmail.com>)
Ответы Re: [PERFORM] Number of characters in column preventing index usage  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-performance
Hi,

On 02/17/2017 11:19 PM, Hustler DBA wrote:
> I am seeing this strange behavior, I don't know if this is by design by
> Postgres.
>
> I have an index on a column which is defined as "character
> varying(255)". When the value I am searching for is of a certain length,
> the optimizer uses the index but when the value is long, the optimizer
> doesn't use the index but does a seq scan on the table. Is this by
> design? How can I make the optimizer use the index no matter what the
> size/length of the value being searched for?
>

AFAIK there are no such checks, i.e. the optimizer does not consider the
length of the value when deciding between scan types.

>
> PostgreSQL version: 9.4
>

That's good to know, but we also need information about the table
involved in your queries. I'd bet the table is tiny (it seems to be just
12 pages, so ~100kB), making the indexes rather useless.

> my_db=# explain (analyze, buffers) select count(*) from tab where ID =
> '01625cfa-2bf8-45cf' ;
>                                                               QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=8.29..8.30 rows=1 width=0) (actual time=0.048..0.048
> rows=1 loops=1)
>    Buffers: shared read=2
>    ->  Index Only Scan using tab_idx1 on tab  (cost=0.27..8.29 rows=1
> width=0) (actual time=0.043..0.043 rows=0 loops=1)
>          Index Cond: (ID = '01625cfa-2bf8-45cf'::text)
>          Heap Fetches: 0
>          Buffers: shared read=2
>  Planning time: 0.250 ms
>  Execution time: 0.096 ms
> (8 rows)
>
> my_db=# explain (analyze, buffers) select count(*) from tab where ID =
> '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea' ;
>                                                 QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=14.80..14.81 rows=1 width=0) (actual time=0.115..0.115
> rows=1 loops=1)
>    Buffers: shared hit=12
>    ->  Seq Scan on tab  (cost=0.00..14.79 rows=5 width=0) (actual
> time=0.031..0.108 rows=5 loops=1)
>          Filter: ((ID)::text = '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea'::text)
>          Rows Removed by Filter: 218
>          Buffers: shared hit=12
>  Planning time: 0.122 ms
>  Execution time: 0.180 ms
> (8 rows)

The only difference I see is that for the long value the planner expects
5 rows, while for the short one it expects 1 row. That may seem a bit
strange, but I'd bet it finds the short value in some statistic (MCV,
histogram) ans so can provide very accurate estimate. While for the
longer one, it ends up using some default (0.5% for equality IIRC) or
value deduced from ndistinct. Or something like that.

The differences between the two plans are rather negligible, both in
terms of costs (8.3 vs. 14.81) and runtime (0.1 vs 0.2 ms). The choice
of a sequential scan seems perfectly reasonable for such tiny tables.

FWIW it's impossible to draw conclusions based on two EXPLAIN ANALYZE
executions. The timing instrumentation from EXPLAIN ANALYZE may have
significant impact impact (different for each plan!). You also need to
testing with more values and longer runs, not just a single execution
(there are caching effects etc.)

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-performance по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: [PERFORM] Number of characters in column preventing index usage
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [PERFORM] Number of characters in column preventing index usage