Trouble managing planner for timestamptz columns

Поиск
Список
Период
Сортировка
От Marc Morin
Тема Trouble managing planner for timestamptz columns
Дата
Msg-id 2BCEB9A37A4D354AA276774EE13FB8C2BCC46E@mailserver.sandvine.com
обсуждение исходный текст
Ответы Re: Trouble managing planner for timestamptz columns
Список pgsql-performance
We have large tables that hold statistics based on time. They are of the
form.

CREATE TABLE stats (
    id serial primary key,
    logtime timestamptz,
    d1 int,
    s1 bigint
);

CREATE INDEX idx on stats(logtime);

Some of these tables have new data inserted at a rate of 500,000+ rows /
hour.  The entire table will grow to being 10's to 100's of millions of
rows in size.  (Yes, we are also paritioning these, it's the size of an
individual partition that we're talking about).

We tend to analyze these tables every day or so and this doesn't always
prove to be sufficient....

Our application is a reporting application and the end users typically
like to query the newest data the most. As such,  the queries of the
form...


select
 *
from stats
inner join dimension_d1 using (d1)
where logtime between X and Y and d1.something = value;

This usually results in a hash join (good thing) where the dimension
table is loaded into the hash table and it index scans stats using idx
index.

The trouble starts when both X and Y are times "after" the last analyze.
This restriction clause is outside the range of values in the historgram
created by the last analyze.  Postgres's estimate on the number of rows
returned here is usually very low and incorrect, as you'd expect...

Trouble can occur when the planner will "flip" its decision and decide
to hash join by loading the results of the index scan on idx into the
hash table instead of the dimension table....

Since the table is so large and the system is busy (disk not idle at
all), doing an analyze on this table in the production system can take
1/2 hour!  (statistics collector set to 100).  We can't "afford" to
analyze more often...

It certainly would be nice if postgres could understand somehow that
some columns are "dynamic" and that it's histogram could be stretched to
the maximal values or some other technique for estimating rows to the
right of the range of values in the histogram...

Or have some concept of error bars on it's planner decisions....

Suggestions? Comments?


Marc

В списке pgsql-performance по дате отправления:

Предыдущее
От: Evgeny Gridasov
Дата:
Сообщение: Re: one-field index vs. multi-field index planner
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Trouble managing planner for timestamptz columns