Re: Using Postgres to store high volume streams of sensor readings

Поиск
Список
Период
Сортировка
От Ciprian Dorin Craciun
Тема Re: Using Postgres to store high volume streams of sensor readings
Дата
Msg-id 8e04b5820811230936w48a4cd24kcca7086e2b1db0e7@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using Postgres to store high volume streams of sensor readings  ("V S P" <toreason@fastmail.fm>)
Список pgsql-general
    Thanks for your info! Please see my observations below.

    By the way, we are planning to also try Informix (the time series
extension?)... Do you have some other tips about Informix?

    Ciprian Craciun.


On Sun, Nov 23, 2008 at 6:06 PM, V S P <toreason@fastmail.fm> wrote:
> While most of my experience with oracle/informix
>
> I would also recommend
> a) partitioning on DB level
> Put partitions on on separate hard disks, have the system to be
> at least dual core, and make the disks to be attached via SCSI
> controller (not IDE) for parallel performance.

    This I found out, but for now I'm not able to change the disk layout...


> b) partitioning on application level (that is having
> the insert code dynamically figure out what DB/and what table to go
> (this complicates the application for inserts as well as for reports)

    We wanted to do this, and in this circumstance the winner for the
moment is BerkeleyDB as it's super fast for readings. (This is what
it's currently called sharding, right?)


> c) may be there is a chance to remove the index (if all you are doing
> is inserts) -- and then recreate it later?

    Not possible, as we would like to use the same database (and
table) for both inserts and real time queries... Otherwise the
application would complicate a lot...


> e) I did not see the type of index but if the value of at least
> some of the indexed fields repeated a lot -- Oracle had what's called
> 'bitmap index'
> Postgresql might have something similar, where that type of index
> is optimized for the fact that values are the same for majority
> of the rows (it becomes much smaller, and therefore quicker to update).

    For the moment the index type is the default one (btree), and from
the documentation I didn't see another matching (with the current
usage) one.


> f) review that there are no insert triggers and
> constraints (eithe field or foreign) on those tables
> if there -- validate why they are there and see if they can
> be removed -- and the application would then need to gurantee
> correctness

    Nop, no triggers or constraints (other than not null).


> VSP
>
>
> On Sun, 23 Nov 2008 08:34:57 +0200, "Ciprian Dorin Craciun"
> <ciprian.craciun@gmail.com> said:
>> On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Alvaro Herrera <alvherre@commandprompt.com> writes:
>> >> The problem is, most likely, on updating the indexes.  Heap inserts
>> >> should always take more or less the same time, but index insertion
>> >> requires walking down the index struct for each insert, and the path to
>> >> walk gets larger the more data you have.
>> >
>> > It's worse than that: his test case inserts randomly ordered keys, which
>> > means that there's no locality of access during the index updates.  Once
>> > the indexes get bigger than RAM, update speed goes into the toilet,
>> > because the working set of index pages that need to be touched also
>> > is bigger than RAM.  That effect is going to be present in *any*
>> > standard-design database, not just Postgres.
>> >
>> > It's possible that performance in a real-world situation would be
>> > better, if the incoming data stream isn't so random; but it's
>> > hard to tell about that with the given facts.
>> >
>> > One possibly useful trick is to partition the data by timestamp with
>> > partition sizes chosen so that the indexes don't get out of hand.
>> > But the partition management might be enough of a PITA to negate
>> > any win.
>> >
>> >                        regards, tom lane
>>
>>     Thanks for your feedback! This is just as I supposed, but i didn't
>> had the Postgres experience to be certain.
>>     I'll include your conclusion to my report.
>>
>>     Ciprian Craciun.
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
> --
>  V S P
>  toreason@fastmail.fm
>
> --
> http://www.fastmail.fm - Email service worth paying for. Try it for free

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Postgres mail list traffic over time
Следующее
От: Scara Maccai
Дата:
Сообщение: Re: Using Postgres to store high volume streams of sensor readings