Problem with n_distinct being consistently inaccurate.

Поиск
Список
Период
Сортировка
От Nick Fankhauser
Тема Problem with n_distinct being consistently inaccurate.
Дата
Msg-id NEBBLAAHGLEEPCGOBHDGKEHOIJAA.nickf@ontko.com
обсуждение исходный текст
Ответы Re: Problem with n_distinct being consistently inaccurate.
Список pgsql-admin
Hi-

I tried to post this to the performance list, but that list seems to have a
problem at the moment. I think the question fits "admin" as well:

I have a table- called "event" with a field event_date_time that is indexed.
There are 1,700,000 rows in the table and 92,000 distinct values of
event_date_time with anywhere from 1 to 400 rows sharing the same value of
event_date_time. (I did a count grouped by event_date_time & scanned the
results to get this info.)

When I look at the pg_stats on this table after running analyze with the
defaults, I always see 15,000 or lower in the n_distinct column for
event_date_time. I re-ran analyze several times & then checked pg_stats to
see if the numbers varied significantly.

Since this is off by about a factor of 6, I think the planner is missing the
chance to use this table as the "driver" in a complex query plan that I'm
trying to optimize.

So the question is- how can I get a better estimate of n_distinct from
analyze?

If I alter the stats target as high as it will go, I get closer, but it
still shows the index to be about 1/2 as selective as it actually is:

alpha=# alter table event alter column event_date_time set statistics 1000;
ALTER TABLE
alpha=# analyze event;
ANALYZE
alpha=# select n_distinct from pg_stats where tablename='event' and
attname='event_date_time';
 n_distinct
------------
      51741
(1 row)

This number seems to be consistently around 51,000 if I re-run analyze a few
times.

I guess my question is two-part:

(1)Is there any tweak to make this estimate work better?

(2)Since I'm getting numbers that are consistent but way off, is there a bug
here?

(2-1/2) Or alternately, am I totally missing what n-distinct is supposed to
contain ?

Thanks!
   -Nick

---------------------------------------------------------------------
Nick Fankhauser

    nickf@doxpop.com  Phone 1.765.965.7363  Fax 1.765.962.9788
doxpop - Court records at your fingertips - http://www.doxpop.com/



В списке pgsql-admin по дате отправления:

Предыдущее
От: "shreedhar"
Дата:
Сообщение: PostgreSQL query failed: ERROR: pg_atoi: zero-length string
Следующее
От: "Nick Fankhauser"
Дата:
Сообщение: PG version for n_distinct question.