Re: Using Postgres to store high volume streams of sensor readings
От | Ciprian Dorin Craciun |
---|---|
Тема | Re: Using Postgres to store high volume streams of sensor readings |
Дата | |
Msg-id | 8e04b5820811210526s44767e8fte8445bbbda95ec59@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Using Postgres to store high volume streams of sensor readings ("Grzegorz Jaśkiewicz" <gryzman@gmail.com>) |
Ответы |
Re: Using Postgres to store high volume streams of sensor readings
Re: Using Postgres to store high volume streams of sensor readings |
Список | pgsql-general |
On Fri, Nov 21, 2008 at 3:18 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: > you'll have to provide us with some sort of test-case to get some answers, > please. (set of scripts, queries, etc). Bellow is the content of my original post. Inside I mention exactly the may the benchmark was conducted. In short the data is inserted by using COPY sds_benchmark_data from STDIN, in batches of 500 thousand data points. I'll also paste the important part here: > * Postgres version: 8.3.3; > > * database schema: >> create table sds_benchmark_data ( >> client int4 not null, >> sensor int4 not null, >> timestamp int4 not null, >> value int4 >> ); >> >> alter table sds_benchmark_data add primary key (client, sensor, timestamp); >> >> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor); > > * postgres.conf (the default values, I will list only what has > been changed): >> max_connections = 20 >> shared_buffers = 24MB >> work_mem = 64MB >> maintenance_work_mem = 256MB >> fsync = off >> checkpoint_segments = 32 >> effective_cache_size = 1024MB > > * inserts are done like this: > * generated 100 million readings by using the following rule: > * client is randomly chosen between 0 and 10 thousand; > * sensor is randomly chosen between 0 and 10; > * the timestamp is always increasing by one; > * the insert is done in batches of 500 thousand inserts (I've > also tried 5, 25, 50 and 100 thousand without big impact); > * the banch inserts are done through COPY sds_benchmark_data > FROM STDIN through libpq (by using UNIX (local) sockets); Unfortunately I don't know what more information to give... Thanks, Ciprian Dorin Craciun. On Fri, Nov 21, 2008 at 2:50 PM, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote: > Hello all! > > I would like to ask some advice about the following problem > (related to the Dehems project: http://www.dehems.eu/ ): > * there are some clients; (the clients are in fact house holds;) > * each device has a number of sensors (about 10), and not all the > clients have the same sensor; also sensors might appear and disappear > dynamicaly; (the sensors are appliances;) > * for each device and each sensor a reading is produced (at about > 6 seconds); (the values could be power consumptions;) > * I would like to store the following data: (client, sensor, > timestamp, value); > * the usual queries are: > * for a given client (and sensor), and time interval, I need > the min, max, and avg of the values; > * for a given time interval (and sensor), I need min, max, and > avg of the values; > * other statistics; > > Currently I'm benchmarking the following storage solutions for this: > * Hypertable (http://www.hypertable.org/) -- which has good insert > rate (about 250k inserts / s), but slow read rate (about 150k reads / > s); (the aggregates are manually computed, as Hypertable does not > support other queries except scanning (in fact min, and max are easy > beeing the first / last key in the ordered set, but avg must be done > by sequential scan);) > * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but > fabulos read rate (about 2M reads / s); (the same issue with > aggregates;) > * Postgres -- which behaves quite poorly (see below)... > * MySQL -- next to be tested; > > So what can I do / how could I optimize the use of Postgres for this usage? > > (I'm aware that there could be optimizations for this problem > (like computing the aggregates in memory and storing only these > aggregates at 10 minutes, or other interval), but I want to have the > full history (for data mining tasks for example);) > > I will also like to present how was the Postgres benchmark implemented: > > * test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb, > SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2); > * Postgres version: 8.3.3; > > * database schema: >> create table sds_benchmark_data ( >> client int4 not null, >> sensor int4 not null, >> timestamp int4 not null, >> value int4 >> ); >> >> alter table sds_benchmark_data add primary key (client, sensor, timestamp); >> >> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor); > > * postgres.conf (the default values, I will list only what has > been changed): >> max_connections = 20 >> shared_buffers = 24MB >> work_mem = 64MB >> maintenance_work_mem = 256MB >> fsync = off >> checkpoint_segments = 32 >> effective_cache_size = 1024MB > > * inserts are done like this: > * generated 100 million readings by using the following rule: > * client is randomly chosen between 0 and 10 thousand; > * sensor is randomly chosen between 0 and 10; > * the timestamp is always increasing by one; > * the insert is done in batches of 500 thousand inserts (I've > also tried 5, 25, 50 and 100 thousand without big impact); > * the banch inserts are done through COPY sds_benchmark_data > FROM STDIN through libpq (by using UNIX (local) sockets); > > What have I observed / tried: > * I've tested without the primary key and the index, and the > results were the best for inserts (600k inserts / s), but the > readings, worked extremly slow (due to the lack of indexing); > * with only the index (or only the primary key) the insert rate is > good at start (for the first 2 million readings), but then drops to > about 200 inserts / s; > > So could someone point me where I'me wrong, or what can I do to > optimize Postgres for this particular task? > > Thanks for your help, > Ciprian Dorin Craciun. > > P.S.: I'll want to publish the benchmark results after they are > done, and I want to squeeze as much power out of Postgres as possible.
В списке pgsql-general по дате отправления:
Предыдущее
От: "Grzegorz Jaśkiewicz"Дата:
Сообщение: Re: Using Postgres to store high volume streams of sensor readings
Следующее
От: "Grzegorz Jaśkiewicz"Дата:
Сообщение: Re: Using Postgres to store high volume streams of sensor readings