Re: Using Postgres to store high volume streams of sensor readings

Поиск
Список
Период
Сортировка
От Scott Marlowe
Тема Re: Using Postgres to store high volume streams of sensor readings
Дата
Msg-id dcc563d10811231023g4d9c91b2yebdec169ff2f2b4@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using Postgres to store high volume streams of sensor readings  ("Ciprian Dorin Craciun" <ciprian.craciun@gmail.com>)
Список pgsql-general
On Sun, Nov 23, 2008 at 10:01 AM, Ciprian Dorin Craciun
<ciprian.craciun@gmail.com> wrote:
> On Sun, Nov 23, 2008 at 3:28 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> * Ciprian Dorin Craciun (ciprian.craciun@gmail.com) wrote:
>>> > Even better might be partitioning on the timestamp.  IF all access is
>>> > in a certain timestamp range it's usually a big win, especially
>>> > because he can move to a new table every hour / day / week or whatever
>>> > and merge the old one into a big "old data" table.
>>>
>>>     Yes, If i would speed the inserts tremendously... I've tested it
>>> and the insert speed is somewhere at 200k->100k.
>>>
>>>     But unfortunately the query speed is not good at all because most
>>> queries are for a specific client (and sensor) in a given time
>>> range...
>>
>> Have you set up your partitions correctly (eg, with appropriate CHECK
>> constraints and with constraint_exclusion turned on)?  Also, you'd want
>> to keep your indexes on the individual partitions, of course..  That
>> should improve query time quite a bit since it should only be hitting
>> the partitions where the data might be.
>>
>>        Stephen
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.9 (GNU/Linux)
>>
>> iEYEARECAAYFAkkpWpEACgkQrzgMPqB3kihvyACgm6ITdkodTqZvDLCjqavj9lkR
>> w1oAnRrB1rbW+bF6Spr77VcH5/Mty4S6
>> =G7aX
>> -----END PGP SIGNATURE-----
>
>    Well, now that I've read the previous two emails better, I
> understand what Scot and Stephen are talking about...
>
>    So if I understood it correctly: I should build indexes only for
> certain parts of the data (like previous full hour and so). But I see
> a problem: wouldn't this lead to a lot of indices beeing created (24 /
> hour, ~150 / week, ...)?

No, not exactly what I'm talking about.  I'm talking about
pre-creating partitions that the data will soon go into (let's say a
new one every hour) with indexes in place, and having a trigger that
fires on insert to put the data into the right partition.  Once that
partition is no longer being inserted into, and we aren't running a
bunch of queries on it, we migrate it to a historical partition.

So, your table looks something like this all the time:

|**|^^|##|##|##|##|$$$$$$$$$$$$$$...|

Where:

** is a partition we have created in advance of needing it.
^^ is the partition we are currently writing to
## are the partitions we're still using in select queries a lot
$$$... are the old data stuffed into the monolithic history table.

When it's time to switch to writing to the new partition (i.e. **) we
make a new one ahead of that, and the trigger starts writing to what
was a ** partition but is now the new ^^, and the ^^ becomes a ##.  At
the end of the day / week whatever, we take all the old ## partitions
and move their data into the $$$ and drop the ## partitions.

Note that we only need to put data into an archive partition to keep
from having hundreds or thousands of partitions.  There's a limit of a
few hundred partitions where things start getting slow again due to
planner overhead.

В списке pgsql-general по дате отправления:

Предыдущее
От: Scara Maccai
Дата:
Сообщение: Re: Using Postgres to store high volume streams of sensor readings
Следующее
От: "Scott Marlowe"
Дата:
Сообщение: Re: [Q]updating multiple rows with Different values