Обсуждение: Using the wrong index (very suboptimal), why?

Поиск
Список
Период
Сортировка

Using the wrong index (very suboptimal), why?

От
"Shane Wright"
Дата:
Hi,

I have somewhat of a quandary with a large table in my database;
PostgreSQL is choosing the 'wrong' index for a certain kind of query;
causing performance to become an order of magnitude slower (query times
usually measured in milliseconds now become seconds/minutes!).

It's not that it isn't using any index (so enable_seqscan=off doesn't
help), it's that the index it picks is suboptimal.

The query is based on 3 of the table columns - there is an index on all
three, but it prefers to use an index on just two of them, then
filtering by the 3rd (EXPLAIN output is below - showing the query as
well).

I've tried everything I can find to make it not do this (config
parameters, statistics target, vacuuming more...)  - what am I missing?

It happened a couple of times before - after some unusual activity
patterns in a client script - but repeatedly vacuuming until the planner
changed its mind made it go away.  We then made config changes
(random_page_cost and effective_cache_size) thinking that fixed the
reason why it made the bad decision in the first place...but it would
appear not

The only thing remaining I can think of is rebuilding the 'preferred'
index; that should reduce its size from 27Gb down to something more
manageable; maybe that will let the database want to use it more because
there'll be less I/O?

I've considered upgrading to 8.3.5 as well, but I can't see anything in
the changelogs that would fix this issue (I appreciate upgrading is a
good idea anyway, that will be done at some point soon anyway).


Hopefully all relevant info is listed below - if anyone's got any ideas
I'd appreciate any help or pointers anyone can give, thanks...



The server is PostgreSQL 8.3.0 on Linux with 32Gb RAM.  /var/lib/pgsql/
is on a fibre-channel SAN.  This table has around 680 million rows - and
has been reasonably regularly vacuumed, but is probably in dire need of
a VACUUM FULL and REINDEX to reclaim dead space (see disk space info at
the bottom of the post).


emystery=# \d answers
                              Table "public.answers"
 Column |  Type   |                             Modifiers
--------+---------+-----------------------------------------------------
---------------
 aid    | integer | not null default
nextval(('seq_answers_aid'::text)::regclass)
 rid    | integer | not null
 nid    | integer | not null
 iid1   | integer |
 iid2   | integer |
 iid3   | integer |
 text   | text    |
 extra  | bigint  |
Indexes:
    "answers_pkey" PRIMARY KEY, btree (aid)
    "index_answers_iid1" btree (iid1) WHERE iid1 > 0
    "index_answers_iid2" btree (iid2) WHERE iid2 > 0
    "index_answers_iid3" btree (iid3) WHERE iid3 > 0
    "index_answers_nidiid1" btree (nid, iid1)
    "index_answers_ridnidiid1" btree (rid, nid, iid1)



This is what it is doing [slowly] - but for many values of rid/nid/iid1
there are a lot of rows to filter through (tens/hundreds of thousands)
so this can take many seconds or minutes:

emystery=# explain select * from answers where rid=1 and nid=2 and
iid1=3;
------------------------------------------------------------------------
-------------------------
 Index Scan using index_answers_nidiid1 on answers  (cost=0.00..28.74
rows=1 width=62)
   Index Cond: ((nid = 2) AND (iid1 = 3))
   Filter: (rid = 1)



This is the pattern it *should* use (and does use on other installations
of similar/older data).  When this pattern is used the query always
completes in <1 second, usually ~0.2 seconds!


emystery20080821=# explain select * from answers where rid=1 and nid=2
and iid1=3;
------------------------------------------------------------------------
----------------------------
 Index Scan using index_answers_ridnidiid1 on answers  (cost=0.00..99.04
rows=1 width=67)
   Index Cond: ((rid = 1) AND (nid = 2) AND (iid1 = 3))




The PostgreSQL configuration has been altered to help prefer random
lookups [via an index], and to indicate to the database how much data
the OS

#seq_page_cost = 1.0                    # measured on an arbitrary scale
random_page_cost = 1.0                  # same scale as above
#cpu_tuple_cost = 0.01                  # same scale as above
#cpu_index_tuple_cost = 0.005           # same scale as above
#cpu_operator_cost = 0.0025             # same scale as above
effective_cache_size = 31GB


The table has been VACUUM ANALYSE'd (repeatedly!) to no avail.

We've also increased the statistics target for the columns in the table
to 200 (from 10) which still isn't making any difference (I thought
because the table is so large and the dataset is quite uneven that this
should help...it's now at 1000 and vacuuming again so we might be
lucky...)

alter table answers alter column rid set statistics 200;
alter table answers alter column nid set statistics 200;
alter table answers alter column iid1 set statistics 200;



Here is the [full] output from the most recent VACUUM:

emystery=# vacuum verbose analyse answers;
INFO:  vacuuming "public.answers"
INFO:  scanned index "index_answers_nidiid1" to remove 38 row versions
DETAIL:  CPU 39.09s/64.46u sec elapsed 1077.90 sec.
INFO:  scanned index "index_answers_iid3" to remove 38 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  scanned index "index_answers_iid2" to remove 38 row versions
DETAIL:  CPU 19.72s/37.15u sec elapsed 404.70 sec.
INFO:  scanned index "index_answers_iid1" to remove 38 row versions
DETAIL:  CPU 28.92s/59.76u sec elapsed 414.62 sec.
INFO:  scanned index "index_answers_ridnidiid1" to remove 38 row
versions
DETAIL:  CPU 50.99s/71.36u sec elapsed 742.03 sec.
INFO:  scanned index "answers_pkey" to remove 38 row versions
DETAIL:  CPU 28.45s/65.63u sec elapsed 376.34 sec.
INFO:  "answers": removed 38 row versions in 3 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "index_answers_nidiid1" now contains 671965676 row versions
in 2646699 pages
DETAIL:  38 index row versions were removed.
2497 index pages have been deleted, 2497 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_answers_iid3" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_answers_iid2" now contains 392408456 row versions in
1302348 pages
DETAIL:  24 index row versions were removed.
9760 index pages have been deleted, 9760 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_answers_iid1" now contains 626231453 row versions in
2044032 pages
DETAIL:  38 index row versions were removed.
1576 index pages have been deleted, 1576 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_answers_ridnidiid1" now contains 671965676 row
versions in 3604651 pages
DETAIL:  38 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "answers_pkey" now contains 671965676 row versions in
2022030 pages
DETAIL:  38 index row versions were removed.
18670 index pages have been deleted, 18670 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "answers": found 38 removable, 671965676 nonremovable row
versions in 4481083 pages
DETAIL:  192913 dead row versions cannot be removed yet.
There were 39603896 unused item pointers.
380386 pages contain useful free space.
0 pages are entirely empty.
CPU 235.00s/366.10u sec elapsed 4100.83 sec.
INFO:  vacuuming "pg_toast.pg_toast_1118174"
INFO:  index "pg_toast_1118174_index" now contains 4479 row versions in
33 pages
DETAIL:  0 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.25 sec.
INFO:  "pg_toast_1118174": found 0 removable, 4479 nonremovable row
versions in 1232 pages
DETAIL:  14 dead row versions cannot be removed yet.
There were 1971 unused item pointers.
441 pages contain useful free space.
0 pages are entirely empty.
CPU 0.01s/0.00u sec elapsed 1.18 sec.
INFO:  analyzing "public.answers"
INFO:  "answers": scanned 60000 of 4481083 pages, containing 9003061
live rows and 2964 dead rows; 60000 rows in sample, 672391060 estimated
total rows
VACUUM


This information might help tell you a bit more about the contents of
the table:

emystery=# SELECT relname, relkind, reltuples, relpages FROM pg_class
WHERE relname LIKE '%answers%';
           relname                 | relkind |  reltuples  | relpages
-----------------------------------+---------+-------------+----------
 answers                           | r       | 6.71966e+08 |  4481083
 seq_answers_aid                   | S       |           1 |        1
 index_answers_nidiid1             | i       | 6.71966e+08 |  2646699
 index_answers_iid3                | i       |           0 |        1
 index_answers_iid2                | i       | 3.92408e+08 |  1302348
 index_answers_iid1                | i       | 6.26231e+08 |  2044032
 index_answers_ridnidiid1          | i       | 6.71966e+08 |  3604651
 answers_pkey                      | i       | 6.71966e+08 |  2022030

emystery=# SELECT attname,null_frac, n_distinct,correlation FROM
pg_stats WHERE tablename = 'answers';
 attname | null_frac | n_distinct | correlation
---------+-----------+------------+-------------
 aid     |         0 |         -1 |     0.78259
 rid     |         0 |     187498 | -0.00133284
 nid     |         0 |      27179 |   0.0342285
 iid1    |    0.0668 |      49520 |   0.0306873
 iid2    |    0.4187 |       5140 |    0.317481
 iid3    |         1 |          0 |
 text    |  0.933117 |       2573 |    0.112144
 extra   |   0.99995 |         -1 |           1


Disk space usage, as calculated from pg_class.relpages * 8 (in
kilobytes) is:

Table data = 34.19Gb
Toast = 64Kb
index_ecos_answers_ridnidiid1    27.50Gb
index_ecos_answers_nidiid1    20.19Gb
index_ecos_answers_iid1    15.59Gb
ecos_answers_pkey    15.43Gb
index_ecos_answers_iid2    9.94Gb
index_ecos_answers_iid3    8.00Kb


Re: Using the wrong index (very suboptimal), why?

От
Tom Lane
Дата:
"Shane Wright" <shane.wright@edigitalresearch.com> writes:
> It's not that it isn't using any index (so enable_seqscan=off doesn't
> help), it's that the index it picks is suboptimal.

> The query is based on 3 of the table columns - there is an index on all
> three, but it prefers to use an index on just two of them, then
> filtering by the 3rd (EXPLAIN output is below - showing the query as
> well).

Well, that isn't ipso facto a stupid choice to make.  The 3-column
index is larger and slower to search, so it should only use it if
there's a substantial reduction in the number of heap rows to be
visited.  You say that there is, but the estimated costs in the EXPLAINs
you provide sure don't indicate that the planner thinks so.  You need to
look into what the estimated vs actual rowcounts are for just the
two-column condition (ie, where nid = something AND iid1 = something).
Getting those estimates to match reality is the key to getting a sane
plan choice here.

            regards, tom lane

Re: Using the wrong index (very suboptimal), why?

От
"Shane Wright"
Дата:
Tom,

> You need to
> look into what the estimated vs actual rowcounts are for just the
> two-column condition (ie, where nid = something AND iid1 = something).
> Getting those estimates to match reality is the key to getting a sane
> plan choice here.

Many thanks - I agree in principle it isn't automatically the worst
decision, however for this data set it unfortunately is :(

Improving the accuracy of the cost estimates is exactly what I'm trying
to achieve, so far I've tried....

- increasing the statistics target to the maximum setting with SET
STATISTICS 1000 on columns rid, nid and iid1 of answers, then
re-vacuuming

- adjusting random_page_cost downwards even more to try to make it
prefer index IO

- increasing cpu_tuple_cost up to 1.0 (from 0.01) to try to discourage
it from processing data rows

- decreating cpu_index_tuple_cost to 0.0005 (from 0.005) to encourage it
to spend more time in index rows.


None of the above had any effect :(

Interestingly, I didn't revacuum between changing the cost figues in the
config file (I did reload).  The cost figures from EXPLAIN did change,
but using the queries below the 'wrong' index always comes up just over
1/3rd more expensive.

Here's a query that matches actual data in the table; it sees the cost
of the 2-column index as much less, but it takes an order of magnitude
longer to run (using nid & iid1 means filtering through around 22,000
rows).  The three queries; first is the one that's used in practise;
against all 3 columns.  The next 2 queries use the two different
combinations of just 2 columns:

emystery=# explain analyse select * FROM answers WHERE rid=668332334 AND
nid=253869913 AND iid1=535292129;
                                                                  QUERY
PLAN
------------------------------------------------------------------------
----------------------------------------------------------------------
 Index Scan using index_answers_nidiid1 on answers  (cost=0.00..114.14
rows=1 width=60) (actual time=0.168..790.262 rows=1 loops=1)
   Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
   Filter: (rid = 668332334)
 Total runtime: 790.305 ms
(4 rows)

emystery=# explain analyse select * FROM answers WHERE rid=668332334 AND
nid=253869913;
                                                                   QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
 Index Scan using index_answers_ridnidiid1 on answers
(cost=0.00..152.49 rows=1 width=60) (actual time=0.120..0.141 rows=18
loops=1)
   Index Cond: ((rid = 668332334) AND (nid = 253869913))
 Total runtime: 0.207 ms
(3 rows)

emystery=# explain analyse select * FROM ecos_answers WHERE
nid=253869913 AND iid1=535292129;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---
 Index Scan using index_ecos_answers_nidiid1 on ecos_answers
(cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645
rows=21891 loops=1)
   Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
 Total runtime: 2424.769 ms
(3 rows)




What setting should I be tweaking to make the two column index more
expensive?  I thought I would need to either...

- inform the database that there are likely to be a lot of rows that
need filtering (SET STATISTICS [maximum]???)
or
- tell the database that filtering rows is expensive (cpu_tuple_cost++,
random_page_cost--  ???)

...but both of which I've already tried....but it would seem that from
the EXPLAIN output that the first one is key; it shouldn't think (for
the last example( that it's only going to get 1 row back!) but aside
from increasing SET STATISTICS what other setting is there to inform it
of this?


The only thing I can think that's remaining is to rebuild/replace the
preferred index, that should make it smaller and *if* the index cost
estimates are partly based on # disk pages that should reduce the
perceived cost of using it.  Is this the case?   Even if it does that
still doesn't fix the underlying problem.

Any help appreciated,

Many thanks

Shane

Re: Using the wrong index (very suboptimal), why?

От
Tom Lane
Дата:
"Shane Wright" <shane.wright@edigitalresearch.com> writes:
> Tom,
>> You need to
>> look into what the estimated vs actual rowcounts are for just the
>> two-column condition (ie, where nid = something AND iid1 = something).

> Improving the accuracy of the cost estimates is exactly what I'm trying
> to achieve, so far I've tried....

> - increasing the statistics target to the maximum setting with SET
> STATISTICS 1000 on columns rid, nid and iid1 of answers, then
> re-vacuuming

I hope you meant re-analyzing.

> emystery=# explain analyse select * FROM ecos_answers WHERE
> nid=253869913 AND iid1=535292129;

> QUERY PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> ---
>  Index Scan using index_ecos_answers_nidiid1 on ecos_answers
> (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645
> rows=21891 loops=1)
>    Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
>  Total runtime: 2424.769 ms
> (3 rows)

Well, here's the problem all right: 1 row estimated vs 21891 actual.
Is there something odd about the joint distribution of these two
columns?

            regards, tom lane

Re: Using the wrong index (very suboptimal), why?

От
"Shane Wright"
Дата:
Hi Tom,

>> - increasing the statistics target to the maximum setting with SET
>> STATISTICS 1000 on columns rid, nid and iid1 of answers, then
>> re-vacuuming.
> I hope you meant re-analyzing.

Hehe absolutely yes; I always VACUUM VERBOSE ANALYSE :)

>>  Index Scan using index_answers_nidiid1 on answers
>> (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645
>> rows=21891 loops=1)
>>    Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
>>  Total runtime: 2424.769 ms
>Well, here's the problem all right: 1 row estimated vs 21891 actual.
>Is there something odd about the joint distribution of these two
>columns?

Good :) - I've been chasing the same thing, albeit not with any luck yet
:(

Shouldn't be anything odd about the data I wouldn't have thought...
There are ~670 million rows.  No nulls in nid, ~6% of iid1 are null
(they will always be null or not null for a given nid.  I.e. rows with a
given nid value will either all be null or all be not null).

nids are randomly selected, there are only ~27000 distinct values, all
between 100000000 and 999999999.
iid1s are also random in the same range, ~50000 distinct values.  All
the rows for a given value of nid will have one of a small set of
possible iid1 values; usually 5-15 distinct values.  The frequency at
which each nid may occurs is quite uneven; some will be tens of times,
others will be a couple of hundred thousand.  Same applies to
corresponding iid1 values.

(the table stores answers to questions; nid is the question ID, iid1 is
the answer ID [for questions where the user picks from a pre-defined
list] - iid1 is NULL for textual answers.  iid1 values are grouped into
sets of options the user can pick from, defined elsewhere.  These sets
can be shared across questions/nids but not often)

Does the above sound strange?

There aren't any strange errors from the database, autovacuum is enabled
(but not vacuuming the table often enough).  This problem was triggered
this time when I manually vacuumed the table - which cleared a lot of
dead rows (again, I always VACUUM VERBOSE ANALYSE).


A separate installation with a similar data set (actually the same but
older; seems the most comparable) estimated 10 rows returned for the
same query.





Re: Using the wrong index (very suboptimal), why?

От
"Shane Wright"
Дата:
Tom,

>>  Index Scan using index_answers_nidiid1 on answers
>> (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645
>> rows=21891 loops=1)
>>    Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
>>  Total runtime: 2424.769 ms
>Well, here's the problem all right: 1 row estimated vs 21891 actual.
>Is there something odd about the joint distribution of these two
>columns?

I should also clarify, there will almost always be many many rows that
match any nid/iid1 pair (for any nid or iid1 value that exists in this
table anyway).  The chance of there actually being only one result is
very small.  Most of the time this will be hundreds/thousands - but will
also be tens of thousands on a regular basis.  A hundred thousand or
more periodically.


Many thanks

Shane