Re: Random Page Cost and Planner

Поиск
Список
Период
Сортировка
От David Jarvis
Тема Re: Random Page Cost and Planner
Дата
Msg-id AANLkTimw-OvwIJ2t_L4kAB7itZFxfDogCjbC0WnPDyRG@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Random Page Cost and Planner  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Ответы Re: Random Page Cost and Planner  (David Jarvis <thangalin@gmail.com>)
Список pgsql-performance
Hi, Kevin.

below something in the range of 1.5 to 2 is probably not going to be
a good choice for the mix as a whole.

Good to know; thanks.
 
This should probably be set to something on the order of 3GB.  This
will help the optimizer make more intelligent choices about when use
of the index will be a win.

I'll try this.
 
times.  You'll get your best information if you can simulate a
more-or-less realistic load, and try that with various settings and

I have no idea what a realistic load will be. The system is still in development and not open to the general public. I also don't know how much publicity the system will receive when finished. Could be a few hundred hits, could be over ten thousand.

I want the system to be ready for the latter case, which means it needs to return data for many different query parameters (date span, elevation, year, radius, etc.) in under two seconds.
 
indexes.  The cache turnover and resource contention involved in
production can influence performance, and are hard to estimate any
other way.

Another person suggested to take a look at the data.

I ran a query to see if it makes sense to split the data by year. The trouble is that there are 110 years and 7 categories. The data is already filtered into child tables by category (that is logical because reporting on two different categories is nonsensical -- it is meaningless to report on snow depth and temperature: we already know it needs to be cold for snow).

count;decade start; decade end; min date; max date
3088;1990;2000;"1990-01-01";"2009-12-31"
2925;1980;2000;"1980-01-01";"2009-12-31"
2752;2000;2000;"2000-01-01";"2009-12-31"
2487;1970;1970;"1970-01-01";"1979-12-31"
2391;1980;1990;"1980-02-01";"1999-12-31"
2221;1980;1980;"1980-01-01";"1989-12-31"
1934;1960;2000;"1960-01-01";"2009-12-31"
1822;1960;1960;"1960-01-01";"1969-12-31"
1659;1970;1980;"1970-01-01";"1989-12-31"
1587;1960;1970;"1960-01-01";"1979-12-31"
1524;1970;2000;"1970-01-01";"2009-12-31"

The majority of data collected by weather stations is between 1960 and 2009, which makes sense because transistor technology would have made for (relatively) inexpensive automated monitoring stations. Or maybe there were more people and more taxes collected thus a bigger budget for weather study. Either way. ;-)

The point is the top three decades (1990, 1980, 2000) have the most data, giving me a few options:
  • Split the seven tables twice more: before 1960 and after 1960.
  • Split the seven tables by decade.
The first case gives 14 tables. The second case gives 102 tables (at 2.5M rows per table) as there are about 17 decades in total. This seems like a manageable number of tables as the data might eventually span 22 decades, which would be 132 tables.

Even though the users will be selecting 1900 to 2009, most of the stations themselves will be within the 1960 - 2009 range, with the majority of those active between 1980 and 2009.

Would splitting by decade improve the speed?

Thank you very much.

Dave

В списке pgsql-performance по дате отправления:

Предыдущее
От: tv@fuzzy.cz
Дата:
Сообщение: Re: Random Page Cost and Planner
Следующее
От: David Jarvis
Дата:
Сообщение: Re: Random Page Cost and Planner