Re: Using Postgres to store high volume streams of sensor readings

Поиск
Список
Период
Сортировка
От Nikolas Everett
Тема Re: Using Postgres to store high volume streams of sensor readings
Дата
Msg-id d4e11e980811210733k541768ecg28187c742b760a87@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using Postgres to store high volume streams of sensor readings  ("Ciprian Dorin Craciun" <ciprian.craciun@gmail.com>)
Список pgsql-general
You might want to look into how OpenNMS uses RRDTool.  It is able to handle a huge number of nodes by queuing inserts into the RRDs and using JRobin. 

I'm not sure if it is a great solution for what you are looking for, but I've found its performance scales quite well.  I'm getting well over 500 updates per second using JRobin and an NFS disk.  I'm sure I could do better but by my application is limited by hibernate.  Each of my files stores an MRTG's worth of data and keeps the average and max of four points in 136k.

Here is an iostat -dmx 1 of the rrd update:
Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              7.00    33.00 122.00 2093.00     0.48     8.34     8.16   150.70   69.92   0.45 100.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda            493.00    26.00 248.00 540.00     2.84     2.21    13.13    43.70   55.42   1.26  99.60

My big problem with RRD is not being able to query it like you can a database and it tending not to keep exact values.  Oh, and it being pretty cryptic.

Sorry to clutter up the list of RRD stuff.  I just thought it might be pertinent to Ciprian.

On Fri, Nov 21, 2008 at 8:03 AM, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote:
On Fri, Nov 21, 2008 at 2:55 PM, Gerhard Heift
<ml-postgresql-20081012-3518@gheift.de> wrote:
> On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote:
>>     Hello all!
>>
>>     I would like to ask some advice about the following problem
>> (related to the Dehems project: http://www.dehems.eu/ ):
>>     * there are some clients; (the clients are in fact house holds;)
>>     * each device has a number of sensors (about 10), and not all the
>> clients have the same sensor; also sensors might appear and disappear
>> dynamicaly; (the sensors are appliances;)
>>     * for each device and each sensor a reading is produced (at about
>> 6 seconds); (the values could be power consumptions;)
>>     * I would like to store the following data: (client, sensor,
>> timestamp, value);
>>     * the usual queries are:
>>         * for a given client (and sensor), and time interval, I need
>> the min, max, and avg of the values;
>>         * for a given time interval (and sensor), I need min, max, and
>> avg of the values;
>>         * other statistics;
>>
>>     Currently I'm benchmarking the following storage solutions for this:
>>     * Hypertable (http://www.hypertable.org/) -- which has good insert
>> rate (about 250k inserts / s), but slow read rate (about 150k reads /
>> s); (the aggregates are manually computed, as Hypertable does not
>> support other queries except scanning (in fact min, and max are easy
>> beeing the first / last key in the ordered set, but avg must be done
>> by sequential scan);)
>>     * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
>> fabulos read rate (about 2M reads / s); (the same issue with
>> aggregates;)
>>     * Postgres -- which behaves quite poorly (see below)...
>>     * MySQL -- next to be tested;
>
> For this case I would use RRDTool: http://oss.oetiker.ch/rrdtool/
>
> Regards,
>  Gerhard
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAkkmr9wACgkQa8fhU24j2flejQCeKl650qmSlomovOXDl6IKrADb
> cTAAnRebAFq420MuW9aMmhoFOo+sPIje
> =Zcoo
> -----END PGP SIGNATURE-----

   Hy Gerhard, I know about RRDTool, but it has some limitations:
   * I must know in advance the number of sensors;
   * I must create for each client a file (and If I have 10 thousand clients?);
   * I have a limited amount of history;
   * (I'm not sure about this one but i think that) I must insert
each data point by executing a command;
   * and also I can not replicate (distribute) it easily;

   Or have you used RRDTool in a similar context as mine? Do you have
some benchmarks?

   Ciprian.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

В списке pgsql-general по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: Postgres mail list traffic over time
Следующее
От: "Pavel Stehule"
Дата:
Сообщение: Re: converter pgplsql funcion