Re: optimizing daily data storage in Pg

Поиск
Список
Период
Сортировка
От P Kishor
Тема Re: optimizing daily data storage in Pg
Дата
Msg-id AANLkTimDtr=nF3L6xwPZ3hh_DFuh0JY9naZcGWyvF2ST@mail.gmail.com
обсуждение исходный текст
Ответ на Re: optimizing daily data storage in Pg  (Andy Colson <andy@squeakycode.net>)
Ответы Re: optimizing daily data storage in Pg  (Andy Colson <andy@squeakycode.net>)
Список pgsql-general
On Thu, Jul 22, 2010 at 4:56 PM, Andy Colson <andy@squeakycode.net> wrote:
> On 7/22/2010 9:41 AM, P Kishor wrote:
>>
>> I have been struggling with this for a while now, have even gone down
>> a few paths but struck out, so I turn now to the community for ideas.
>> First, the problem: Store six daily variables for ~ 25 years for cells
>> in a grid.
>>
>>  * Number of vars = 6
>>  * Number of cells ~ 13 million
>>  * Number of days ~ 9125 (25 * 365)
>>
>> Optimize the store for two different kinds of queries:
>>
>> Query one: Retrieve the value of a single var for all or a portion of
>> the cells for a single day. This is analogous to an image where every
>> pixel is the value of a single var.
>
>>      SELECT<var>  FROM d WHERE yr = ? AND yday = ?;
>>      SELECT<var>  FROM d WHERE yr = ? AND yday = ? AND cell_id IN
>> (?,?,?...);
>
>
>>
>> Query two: Retrieve values for all the days or a duration of days for
>> a single var for a single cell. This is like grabbing a column out of
>> a table in which each row holds all the vars for a single day.
>
>>      SELECT<var>  FROM d WHERE cell_id = ?;
>>      SELECT<var>  FROM d WHERE cell_id IN (?,?,?...);
>
>
>
> First, I must admit to not reading your entire email.

I am not sure how to respond to your feedback give that you haven't
read the entire email. Nevertheless, thanks for writing...

>
> Second, Query 1 should be fast, regardless of how you layout the tables.

It is not fast. Right now I have data for about 250,000 cells loaded.
That comes to circa 92 million rows per year. Performance is pretty
sucky.


>
> Third, Query 2 will return 13M rows?  I dont think it matters how you layout
> the tables, returning 13M rows is always going to be slow.
>

Yes, I understand that. In reality I will never get 13 M rows. For
display purposes, I will probably get around 10,000 rows to 50,000
rows. When more rows are needed, it will be to feed a model, so that
can be offline (without an impatient human being waiting on the other
end).

Right now, my main problem is that I have either too many rows (~4 B
rows) in a manageable number of tables (25 tables) or manageable
number of rows (~13 M rows) in too many tables (~9000 tables).


>
> -Andy
>



--
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================

В списке pgsql-general по дате отправления:

Предыдущее
От: std pik
Дата:
Сообщение: Information Extract
Следующее
От: Scott Marlowe
Дата:
Сообщение: Re: optimizing daily data storage in Pg