Re: Using Postgres to store high volume streams of sensor readings

Поиск

Список

Период

Сортировка

От	Ciprian Dorin Craciun
Тема	Re: Using Postgres to store high volume streams of sensor readings
Дата	21 ноября 2008 г. 09:27:29
Msg-id	8e04b5820811210526s44767e8fte8445bbbda95ec59@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Using Postgres to store high volume streams of sensor readings ("Grzegorz Jaśkiewicz" <gryzman@gmail.com>)
Ответы	Re: Using Postgres to store high volume streams of sensor readings Re: Using Postgres to store high volume streams of sensor readings
Список	pgsql-general

Дерево обсуждения

On Fri, Nov 21, 2008 at 3:18 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
> you'll have to provide us with some sort of test-case to get some answers,
> please. (set of scripts, queries, etc).

    Bellow is the content of my original post. Inside I mention
exactly the may the benchmark was conducted.

    In short the data is inserted by using COPY sds_benchmark_data
from STDIN, in batches of 500 thousand data points.

    I'll also paste the important part here:

>    * Postgres version: 8.3.3;
>
>    * database schema:
>> create table sds_benchmark_data (
>>     client int4 not null,
>>     sensor int4 not null,
>>     timestamp int4 not null,
>>     value int4
>> );
>>
>> alter table sds_benchmark_data add primary key (client, sensor, timestamp);
>>
>> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor);
>
>    * postgres.conf (the default values, I will list only what has
> been changed):
>> max_connections = 20
>> shared_buffers = 24MB
>> work_mem = 64MB
>> maintenance_work_mem = 256MB
>> fsync = off
>> checkpoint_segments = 32
>> effective_cache_size = 1024MB
>
>    * inserts are done like this:
>        * generated 100 million readings by using the following rule:
>            * client is randomly chosen between 0 and 10 thousand;
>            * sensor is randomly chosen between 0 and 10;
>            * the timestamp is always increasing by one;
>        * the insert is done in batches of 500 thousand inserts (I've
> also tried 5, 25, 50 and 100 thousand without big impact);
>        * the banch inserts are done through COPY sds_benchmark_data
> FROM STDIN through libpq (by using UNIX (local) sockets);

    Unfortunately I don't know what more information to give...

    Thanks,
    Ciprian Dorin Craciun.


On Fri, Nov 21, 2008 at 2:50 PM, Ciprian Dorin Craciun
<ciprian.craciun@gmail.com> wrote:
>    Hello all!
>
>    I would like to ask some advice about the following problem
> (related to the Dehems project: http://www.dehems.eu/ ):
>    * there are some clients; (the clients are in fact house holds;)
>    * each device has a number of sensors (about 10), and not all the
> clients have the same sensor; also sensors might appear and disappear
> dynamicaly; (the sensors are appliances;)
>    * for each device and each sensor a reading is produced (at about
> 6 seconds); (the values could be power consumptions;)
>    * I would like to store the following data: (client, sensor,
> timestamp, value);
>    * the usual queries are:
>        * for a given client (and sensor), and time interval, I need
> the min, max, and avg of the values;
>        * for a given time interval (and sensor), I need min, max, and
> avg of the values;
>        * other statistics;
>
>    Currently I'm benchmarking the following storage solutions for this:
>    * Hypertable (http://www.hypertable.org/) -- which has good insert
> rate (about 250k inserts / s), but slow read rate (about 150k reads /
> s); (the aggregates are manually computed, as Hypertable does not
> support other queries except scanning (in fact min, and max are easy
> beeing the first / last key in the ordered set, but avg must be done
> by sequential scan);)
>    * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
> fabulos read rate (about 2M reads / s); (the same issue with
> aggregates;)
>    * Postgres -- which behaves quite poorly (see below)...
>    * MySQL -- next to be tested;
>
>    So what can I do / how could I optimize the use of Postgres for this usage?
>
>    (I'm aware that there could be optimizations for this problem
> (like computing the aggregates in memory and storing only these
> aggregates at 10 minutes, or other interval), but I want to have the
> full history (for data mining tasks for example);)
>
>    I will also like to present how was the Postgres benchmark implemented:
>
>    * test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb,
> SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2);
>    * Postgres version: 8.3.3;
>
>    * database schema:
>> create table sds_benchmark_data (
>>     client int4 not null,
>>     sensor int4 not null,
>>     timestamp int4 not null,
>>     value int4
>> );
>>
>> alter table sds_benchmark_data add primary key (client, sensor, timestamp);
>>
>> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor);
>
>    * postgres.conf (the default values, I will list only what has
> been changed):
>> max_connections = 20
>> shared_buffers = 24MB
>> work_mem = 64MB
>> maintenance_work_mem = 256MB
>> fsync = off
>> checkpoint_segments = 32
>> effective_cache_size = 1024MB
>
>    * inserts are done like this:
>        * generated 100 million readings by using the following rule:
>            * client is randomly chosen between 0 and 10 thousand;
>            * sensor is randomly chosen between 0 and 10;
>            * the timestamp is always increasing by one;
>        * the insert is done in batches of 500 thousand inserts (I've
> also tried 5, 25, 50 and 100 thousand without big impact);
>        * the banch inserts are done through COPY sds_benchmark_data
> FROM STDIN through libpq (by using UNIX (local) sockets);
>
>    What have I observed / tried:
>    * I've tested without the primary key and the index, and the
> results were the best for inserts (600k inserts / s), but the
> readings, worked extremly slow (due to the lack of indexing);
>    * with only the index (or only the primary key) the insert rate is
> good at start (for the first 2 million readings), but then drops to
> about 200 inserts / s;
>
>    So could someone point me where I'me wrong, or what can I do to
> optimize Postgres for this particular task?
>
>    Thanks for your help,
>    Ciprian Dorin Craciun.
>
>    P.S.: I'll want to publish the benchmark results after they are
> done, and I want to squeeze as much power out of Postgres as possible.

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Using Postgres to store high volume streams of sensor readings