Re: Random Page Cost and Planner

Поиск

Список

Период

Сортировка

От	David Jarvis
Тема	Re: Random Page Cost and Planner
Дата	26 мая 2010 г. 15:55:46
Msg-id	AANLkTinPELm_Ckm1zvcm7-XtWZAm3vMClMQGJ5qMAxdy@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Random Page Cost and Planner (tv@fuzzy.cz)
Список	pgsql-performance

Дерево обсуждения

Hi,

sc.taken_end <= '1996-12-31'::date AND
m.taken BETWEEN sc.taken_start AND sc.taken_end AND

category of data at a certain time. But I'm afraid this makes the planning
much more difficult, as the select from measurements depend on the data
returned by other parts of the query (rows from category).

Right. Users can select 1900 - 2009. Station data hardly ever spans that range.

The station_category is used to create a unique key into the measurement data for every station: station_id, category_id, and taken_start. The measurement data should be contiguous until taken_end.

I thought that that combination would be a pointer to the exact spot in the measurement table where the data starts, which should be ridiculously fast to find.

See this http://explain.depesz.com/s/H1 and this
http://explain.depesz.com/s/GGx

I was getting some red lines when I looked at a different plan. It's a great site.

How many rows does the query return without the group by clause? About
140000 in both cases, right?

SELECT
*
FROM
climate.measurement m
WHERE
m.station_id = 5148 AND
m.taken BETWEEN '1900-08-01'::date AND '2009-12-31'::date AND
m.category_id = 1

5397 rows (10 seconds cold; 0.5 seconds hot); estimated too high by 2275 rows?

http://explain.depesz.com/s/uq

OK, I haven't noticed the table is already partitioned by category_id and
I didn't mean to partition by (taken, category_id) - that would produce a
lot of partitions. Yes, that might cause problems related to number of
files, but that's rather a filesystem related issue.

Constrained as:

CONSTRAINT measurement_013_category_id_ck CHECK (category_id = 7)

I'd expect rather issues related to RULEs or triggers (not sure which of
them you use to redirect the data into partitions). But when partitioning

I created seven child tables of measurement. Each of these has a constraint by category_id. This makes it extremely fast to select the correct partition.

I'm not sure what you mean by 'year wrapping issue' but I think it might
work quite well - right not the problem is PostgreSQL decides to scan the
whole partition (all data for a given category_id).

I'll give it another try. :-)

Use Case #1
User selects: Mar 22 to Dec 22
User selects: 1900 to 2009

Result: Query should average 9 months of climate data per year between Mar 22 and Dec 22 of Year.

Use Case #2

User selects: Dec 22 to Mar 22
User selects: 1900 to 2009

Result: Query should average 3 months of climate data per year between Dec 22 of Year and Mar 22 of Year+1.

So if a user selects 1950 to 1960:

first case should average between 1950 and 1960; and
second case should average between 1950 and 1961.

Dave

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Random Page Cost and Planner