Setting Statistics on Functional Indexes

Поиск
Список
Период
Сортировка
От Shaun Thomas
Тема Setting Statistics on Functional Indexes
Дата
Msg-id 50881D73.100@optionshouse.com
обсуждение исходный текст
Ответы Re: Setting Statistics on Functional Indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Hey everyone,

So recently we upgraded to 9.1 and have noticed a ton of our queries got
much worse. It turns out that 9.1 is *way* more optimistic about our
functional indexes, even when they're entirely the wrong path. So after
going through the docs, I see that the normal way to increase stats is
to alter columns directly on a table, or change the
default_statistics_target itself.

But there doesn't seem to be any way to increase stats for a functional
index unless you modify default_statistics_target. I did some testing,
and for a particularly bad plan, we don't get a good result until the
stats are at 5000 or higher. As you can imagine, this drastically
increases our analyze time, and there's no way we're setting that
system-wide.

I tested this by:

SET default_statistics_target = 5000;

ANALYZE my_table;

EXPLAIN SELECT [ugly query];

I only tested 1000, 2000, 3000, 4000, and 5000 before it switched plans.
This is a 30M row table, and the "good" plan is 100x faster than the bad
one. You can see this behavior yourself with this test case:

CREATE TABLE date_test (
   id SERIAL,
   col1 varchar,
   col2 numeric,
   action_date TIMESTAMP WITHOUT TIME ZONE
);

insert into date_test (col1, col2, action_date)
select 'S:' || ((random()*a.num)::int % 10000),
        (random()*a.num)::int % 15000,
        current_date - (random()*a.num)::int % 1000
   from generate_series(1,10000000) a(num);

create index idx_date_test_action_date_trunc
     on date_test (date_trunc('day', action_date));

create index idx_date_test_col1_col2
     on date_test (col1, col2);

explain analyze
select *
   from date_test
  where col1 IN ('S:96')
    and col2 = 657
    and date_trunc('day', action_date) >= '2012-10-24'
  order by id desc, action_date


This seems to cause the problem more consistently when using a value
where col1 and col2 have no matches. In this particular example, I
didn't get the good plan until using 1000 as the default stats target.
It can't be a coincidence that there are 1000 distinct values in the
table for that column, and we get a terrible plan until a statistic is
recorded for each and every one in the functional index so it can
exclude itself. This seems counter-intuitive to pg_stats with default
stats at 500:

SELECT attname,n_distinct FROM pg_stats WHERE tablename='date_test';

    attname   | n_distinct
-------------+------------
  id          |         -1
  action_date |       1000
  col2        |      14999
  col1        |      10000

SELECT stadistinct FROM pg_statistic
  WHERE starelid='idx_date_test_col1_col2'::regclass

  stadistinct
-------------
         1000

Just on pure selectivity, it should prefer the index on col1 and col2.
Anyway, we're getting all the devs to search out that particular
functional index and eradicate it, but that will take a while to get
through testing and deployment. The overriding problem seems to be two-fold:

1. Is there any way to specifically set stats on a functional index?
2. Why is the planner so ridiculously optimistic with functional
indexes, even in the case of much higher selectivity as reported by
pg_stats on the named columns?

Thanks!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


В списке pgsql-performance по дате отправления:

Предыдущее
От: Böckler Andreas
Дата:
Сообщение: Query-Planer from 6seconds TO DAYS
Следующее
От: Böckler Andreas
Дата:
Сообщение: Re: Query-Planer from 6seconds TO DAYS