Обсуждение: PG Schema to be used as log and monitoring store

Поиск
Список
Период
Сортировка

PG Schema to be used as log and monitoring store

От
Stefan Keller
Дата:
Hi,

Given this kind of sensors (Internet-of-Things) log and monitoring scenario:

* There are 3 production machines monitored every few seconds for
forthcoming (~2) years.
* Machine m1 is emitting 20 boolean and 20 float4 captured in sensors
(m1s1..m1s40).
* Machine m2 has same attributes as m1 plus 10+10 more (m2s1..m2s20).
* Machine m3: like m2 but half of the attributes are different.
* Queries are happening once every day, like:
  SELECT m1s1,m1s2 FROM m1 WHERE logged BETWEEN '2017-11-01' AND '2017-11-30'".

So this is a kind of an "Immutable DB" with where there are
* rather static schema with sources which have overlapping attributes
* heavy writes,
* periodic reads

Would you model this schema also like my proposition, which saves
place but makes it little bit more complex to insert/update due to the
arrays?

  create table m1 (
    id bigint,
    created timestamp,
    b20 bit(20) default b'00000000000000000000',
    farr20 float8[20]
  );

:Stefan


Re: PG Schema to be used as log and monitoring store

От
James Keener
Дата:
My initial inclination is to always build the simplest to understand system first. Space is cheap and pg is pretty efficient, engineering time is expensive and debugging time doubly so with a side of anxiety when production goes down. Also, it will allow more flexibility later on to describe your current setup semantically rathe than in a condensed form.

I would suggest building a simpler schema and benchmarking insert times and storage space.

If you go with your condensed form, I would suggest writing functions to extract any data that is more complicated than one expression involving one field.

Just my 2¢.

Best wishes,

Jim

On December 9, 2017 2:22:02 PM EST, Stefan Keller <sfkeller@gmail.com> wrote:
Hi,

Given this kind of sensors (Internet-of-Things) log and monitoring scenario:

* There are 3 production machines monitored every few seconds for
forthcoming (~2) years.
* Machine m1 is emitting 20 boolean and 20 float4 captured in sensors
(m1s1..m1s40).
* Machine m2 has same attributes as m1 plus 10+10 more (m2s1..m2s20).
* Machine m3: like m2 but half of the attributes are different.
* Queries are happening once every day, like:
SELECT m1s1,m1s2 FROM m1 WHERE logged BETWEEN '2017-11-01' AND '2017-11-30'".

So this is a kind of an "Immutable DB" with where there are
* rather static schema with sources which have overlapping attributes
* heavy writes,
* periodic reads

Would you model this schema also like my proposition, which saves
place but makes it little bit more complex to insert/update due to the
arrays?

create table m1 (
id bigint,
created timestamp,
b20 bit(20) default b'00000000000000000000',
farr20 float8[20]
);

:Stefan


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: PG Schema to be used as log and monitoring store

От
Steven Lembark
Дата:
On Sat, 9 Dec 2017 20:22:02 +0100
Stefan Keller <sfkeller@gmail.com> wrote:

>   create table m1 (
>     id bigint,
>     created timestamp,
>     b20 bit(20) default b'00000000000000000000',
>     farr20 float8[20]
>   );

In general this is a bad idea *unless* you have benchmarked the 
database and found that the amount of space saved really does make
some difference. Using the packed format makes most SQL a lot harder
to write and makes indexing impossible (or at least messy and rather
error prone). This also makes adding add'l fields harder.

If you were really intent on doing this I'd add a few million recods
with both formats on a database tuned to handle the load and see if 
the packed bits really do make a difference. My guess is that you
won't see all that much difference in storage and the query speed
with effective indexing is going to be decent.

Using this database might be a lot simpler with a few that 
breaks the sub-fields out, or which has indexes on the sub
-fields within the packed data. 


-- 
Steven Lembark                                       1505 National Ave
Workhorse Computing                                 Rockford, IL 61103
lembark@wrkhors.com                                    +1 888 359 3508


Re: PG Schema to be used as log and monitoring store

От
Stefan Keller
Дата:
Hi

Thanks James and Steven!
I hoped somebody will advise me not to do this.
I was just bothered with NoSQL databases.
Even TimescaleDB made me wonder because it says it scales Postgres [1]
for IoT which implies that Postsgres does not scale...

2017-12-09 23:01 GMT+01:00 Steven Lembark <lembark@wrkhors.com>:
> In general this is a bad idea *unless* you have benchmarked the
> database and found that the amount of space saved really does make
> some difference.

I actually made some tests on my own (using generate_series) and did
not find any disk space or performance issues yet.
I've also found this paper from 2012 about "Sensor Data Storage
Performance: SQL or NoSQL, Physical or Virtual"  [2] which confirms my
observations.

Now, you have to know that there are about 100 attributes for the
machines/tables - not only 40 - so I initially thought, it's easier to
setup the schema using bit(50) and float8[50].

Below I re-modeled it to a relational schema as you suggested and also
tried to utilize the INHERITS feature.
Does that look better?

:Stefan

[1] https://blog.timescale.com/choose-postgresql-for-iot-19688efc60ca
[2] https://www.ceid.upatras.gr/webpages/faculty/vasilis/Courses/SpatialTemporalDM/Papers/SQLorNoSQL2012.pdf


/*  Pure relational logging and monitoring schema */

  create table m_meta (
    id int primary key,
    name text
  );

  drop table if exists m cascade;
  create table m (
    id bigint primary key,
    gid int references m_meta not null,
    created timestamp,
    b1 bit,
    b2 bit,
    b3 bit,
    -- b2 .. b20
    f1 float8,
    f2 float8,
    f3 float8
    --f4 ... f20
  );

  create table m1 (
    b21 bit,
    -- b22 .. b50 bit,
    f21 float8,
    --f4 ... f20 float8,
    primary key (id),
    foreign key (gid) references m_meta
  ) inherits (m);

  --create table m1 ( ... ) inherits (m);

/* end */


2017-12-09 23:01 GMT+01:00 Steven Lembark <lembark@wrkhors.com>:
> On Sat, 9 Dec 2017 20:22:02 +0100
> Stefan Keller <sfkeller@gmail.com> wrote:
>
>>   create table m1 (
>>     id bigint,
>>     created timestamp,
>>     b20 bit(20) default b'00000000000000000000',
>>     farr20 float8[20]
>>   );
>
> In general this is a bad idea *unless* you have benchmarked the
> database and found that the amount of space saved really does make
> some difference. Using the packed format makes most SQL a lot harder
> to write and makes indexing impossible (or at least messy and rather
> error prone). This also makes adding add'l fields harder.
>
> If you were really intent on doing this I'd add a few million recods
> with both formats on a database tuned to handle the load and see if
> the packed bits really do make a difference. My guess is that you
> won't see all that much difference in storage and the query speed
> with effective indexing is going to be decent.
>
> Using this database might be a lot simpler with a few that
> breaks the sub-fields out, or which has indexes on the sub
> -fields within the packed data.
>
>
> --
> Steven Lembark                                       1505 National Ave
> Workhorse Computing                                 Rockford, IL 61103
> lembark@wrkhors.com                                    +1 888 359 3508
>


Re: PG Schema to be used as log and monitoring store

От
John R Pierce
Дата:
On 12/9/2017 5:46 PM, Stefan Keller wrote:
> Below I re-modeled it to a relational schema as you suggested and also
> tried to utilize the INHERITS feature.
> Does that look better?


I believe I would use boolean, not bit.


-- 
john r pierce, recycling bits in santa cruz



Re: PG Schema to be used as log and monitoring store

От
Thomas Kellerer
Дата:
> I actually made some tests on my own (using generate_series) and did
> not find any disk space or performance issues yet.
> I've also found this paper from 2012 about "Sensor Data Storage
> Performance: SQL or NoSQL, Physical or Virtual"  [2] which confirms my
> observations.
> 
> Now, you have to know that there are about 100 attributes for the
> machines/tables - not only 40 - so I initially thought, it's easier to
> setup the schema using bit(50) and float8[50].

Did you try to use a (single) hstore or jsonb column instead where the attribute name is the key?

You'd lose some type safety, but those data types can be compressed, so that might be worth the trade off

Thomas




Re: PG Schema to be used as log and monitoring store

От
Stefan Keller
Дата:
Hi,

2017-12-10 2:53 GMT+01:00 John R Pierce <pierce@hogranch.com>:
> I believe I would use boolean, not bit.

Agreed.

2017-12-10 10:01 GMT+01:00 Thomas Kellerer <spam_eater@gmx.net>:
> Did you try to use a (single) hstore or jsonb column instead where the
> attribute name is the key?

Thought about that and I'm using hstore extensively with OpenStreetMap
data in my PostGIS Terminal.

But it "hurts" really with key-values of type text (with numeric
optimization)...

1. to see boolean values stored in text values
2. to see tables m1 and m2 which have a relatively unchanging schema
(since sensors are HW)
3. and knowing that and m1 and m2 have a large common set of sensors
(i.e. common set of attributes).

:Stefan


2017-12-10 10:01 GMT+01:00 Thomas Kellerer <spam_eater@gmx.net>:
>> I actually made some tests on my own (using generate_series) and did
>> not find any disk space or performance issues yet.
>> I've also found this paper from 2012 about "Sensor Data Storage
>> Performance: SQL or NoSQL, Physical or Virtual"  [2] which confirms my
>> observations.
>>
>> Now, you have to know that there are about 100 attributes for the
>> machines/tables - not only 40 - so I initially thought, it's easier to
>> setup the schema using bit(50) and float8[50].
>
>
> Did you try to use a (single) hstore or jsonb column instead where the
> attribute name is the key?
>
> You'd lose some type safety, but those data types can be compressed, so that
> might be worth the trade off
>
> Thomas
>
>
>