Re: optimizing daily data storage in Pg

Поиск
Список
Период
Сортировка
От Andy Colson
Тема Re: optimizing daily data storage in Pg
Дата
Msg-id 4C499D9B.3050101@squeakycode.net
обсуждение исходный текст
Ответ на Re: optimizing daily data storage in Pg  (P Kishor <punk.kish@gmail.com>)
Список pgsql-general
On 7/23/2010 12:39 AM, P Kishor wrote:
> On Thu, Jul 22, 2010 at 4:56 PM, Andy Colson<andy@squeakycode.net>  wrote:
>> On 7/22/2010 9:41 AM, P Kishor wrote:
>>>
>>> I have been struggling with this for a while now, have even gone down
>>> a few paths but struck out, so I turn now to the community for ideas.
>>> First, the problem: Store six daily variables for ~ 25 years for cells
>>> in a grid.
>>>
>>>   * Number of vars = 6
>>>   * Number of cells ~ 13 million
>>>   * Number of days ~ 9125 (25 * 365)
>>>
>>> Optimize the store for two different kinds of queries:
>>>
>>> Query one: Retrieve the value of a single var for all or a portion of
>>> the cells for a single day. This is analogous to an image where every
>>> pixel is the value of a single var.
>>
>>>       SELECT<var>    FROM d WHERE yr = ? AND yday = ?;
>>>       SELECT<var>    FROM d WHERE yr = ? AND yday = ? AND cell_id IN
>>> (?,?,?...);
>>
>>
>>>
>>> Query two: Retrieve values for all the days or a duration of days for
>>> a single var for a single cell. This is like grabbing a column out of
>>> a table in which each row holds all the vars for a single day.
>>
>>>       SELECT<var>    FROM d WHERE cell_id = ?;
>>>       SELECT<var>    FROM d WHERE cell_id IN (?,?,?...);
>>
>>
>>
>> First, I must admit to not reading your entire email.
>
> I am not sure how to respond to your feedback give that you haven't
> read the entire email. Nevertheless, thanks for writing...
>

Heh, sorry, my point was, you put a lot of information into your email,
and I was going to only use one bit of it: row counts.

>>
>> Second, Query 1 should be fast, regardless of how you layout the tables.
>
> It is not fast. Right now I have data for about 250,000 cells loaded.
> That comes to circa 92 million rows per year. Performance is pretty
> sucky.
>
>

This query should return one record, correct?  This should be very fast,
PG should be able to find the record in the index within 5 seeks, and
then find the data in one seek.  Can you post 'explain analyze' for this
query.  (Or, it could be the case, I totally misunderstood your data)


>>
>> Third, Query 2 will return 13M rows?  I dont think it matters how you layout
>> the tables, returning 13M rows is always going to be slow.
>>
>
> Yes, I understand that. In reality I will never get 13 M rows. For
> display purposes, I will probably get around 10,000 rows to 50,000
> rows. When more rows are needed, it will be to feed a model, so that
> can be offline (without an impatient human being waiting on the other
> end).
>
> Right now, my main problem is that I have either too many rows (~4 B
> rows) in a manageable number of tables (25 tables) or manageable
> number of rows (~13 M rows) in too many tables (~9000 tables).
>

Searching by just cell_id is not going to be very selectable, and with
large result-sets I can see this one being slow.  As Scott talked about
in his response, this one will come down to hardware.  Have you dd
tested your hardware?

-Andy

В списке pgsql-general по дате отправления:

Предыдущее
От: "Edmundo Robles L."
Дата:
Сообщение: Re: Question about SCO openserver and postgres...
Следующее
От: "Jose C. Martinez-Llario"
Дата:
Сообщение: JASPA (JAva SPATial) for PostgreSQL and H2 released