Обсуждение: logger table
Hello, I need some ideas for creating a PG based logger. I have got a job, which can run more than one time. So the PK is at themoment jobid & cycle number. The inserts in this table are in parallel with the same username from different host (clustering). The user calls in theexecutable "myprint" and the message will insert into this table, but at the moment I don't know a good structure of the table. Each print call can be differentlength, so I think a text field is a good choice, but I don't know how can I create a good PK value. IMHO a sequence can be create problems that I'm logged in withthe same user on multiple hosts, a hash key value like SHA1 based on the content are not a good choice, because content is not unique, so I can getkey collisions. I would like to create on each "print" call a own record in the table, but how can I create a good key value and get no problemsin parallel access. I think there can be more than 1000 inserts each second. Does anybody can post a good idea? Thanks Phil
Did you use pg_audit?
https://github.com/jcasanov/pg_audit
De: Philipp Kraus <philipp.kraus@flashpixx.de>
Para: pgsql-general@postgresql.org
Enviado: Domingo 23 de diciembre de 2012 22:01
Asunto: [GENERAL] logger table
Hello,
I need some ideas for creating a PG based logger. I have got a job, which can run more than one time. So the PK is at the moment jobid & cycle number.
The inserts in this table are in parallel with the same username from different host (clustering). The user calls in the executable "myprint" and the message
will insert into this table, but at the moment I don't know a good structure of the table. Each print call can be different length, so I think a text field is a good
choice, but I don't know how can I create a good PK value. IMHO a sequence can be create problems that I'm logged in with the same user on multiple
hosts, a hash key value like SHA1 based on the content are not a good choice, because content is not unique, so I can get key collisions.
I would like to create on each "print" call a own record in the table, but how can I create a good key value and get no problems in parallel access.
I think there can be more than 1000 inserts each second.
Does anybody can post a good idea?
Thanks
Phil
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 12/23/2012 7:01 PM, Philipp Kraus wrote: > I don't know how can I create a good PK value. IMHO a sequence can be create problems that I'm logged in with the sameuser on multiple hosts, why is that a reason ? sequences work no matter how many clients there are. > PK is at the moment jobid & cycle number. how is this jobid assigned? is this something external? how are you keeping track of the cycle number for a given job?
2012/12/24 Philipp Kraus <philipp.kraus@flashpixx.de>: > I need some ideas for creating a PG based logger. I have got a > job, which can run more than one time. So the PK is at the > moment jobid & cycle number. The inserts in this table are in > parallel with the same username from different host > (clustering). The user calls in the executable "myprint" and > the message will insert into this table, but at the moment I > don't know a good structure of the table. Each print call can > be different length, so I think a text field is a good choice, > but I don't know how can I create a good PK value. IMHO a > sequence can be create problems that I'm logged in with the > same user on multiple hosts, a hash key value like SHA1 based > on the content are not a good choice, because content is not > unique, so I can get key collisions. I would like to create > on each "print" call a own record in the table, but how can I > create a good key value and get no problems in parallel > access. I think there can be more than 1000 inserts each > second. > > Does anybody can post a good idea? Why is it neccesry to have a primary key? What is the "cycle number"? For what it is worth, I put all my syslog in PG and have so far been fine without primary keys. (I keep only an hour there at a time, though, and it's only a few hundred megs.) In the past, I have had trouble maintaining a high TPS while having lots (hundreds) of connected clients; maybe you'll want to use a connection pool. -- Jason Dusek pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B
Am 25.12.2012 17:19, schrieb Jason Dusek: > 2012/12/24 Philipp Kraus <philipp.kraus@flashpixx.de>: >> I need some ideas for creating a PG based logger. I have got a >> job, which can run more than one time. So the PK is at the >> moment jobid & cycle number. The inserts in this table are in >> parallel with the same username from different host >> (clustering). The user calls in the executable "myprint" and >> the message will insert into this table, but at the moment I >> don't know a good structure of the table. Each print call can >> be different length, so I think a text field is a good choice, >> but I don't know how can I create a good PK value. IMHO a >> sequence can be create problems that I'm logged in with the >> same user on multiple hosts, a hash key value like SHA1 based >> on the content are not a good choice, because content is not >> unique, so I can get key collisions. I would like to create >> on each "print" call a own record in the table, but how can I >> create a good key value and get no problems in parallel >> access. I think there can be more than 1000 inserts each >> second. >> >> Does anybody can post a good idea? > > Why is it neccesry to have a primary key? What is the "cycle > number"? the cycle number is an increment number starting by 0 til cycle-1 > For what it is worth, I put all my syslog in PG and have so far > been fine without primary keys. (I keep only an hour there at a > time, though, and it's only a few hundred megs.) > > In the past, I have had trouble maintaining a high TPS while > having lots (hundreds) of connected clients; maybe you'll want > to use a connection pool. I use a connection pool at the time. I have a MPI process: for (std::size_t i=0; i < cycle; ++i) for (std::size_t n=0; n < iterations; ++n) { ..... log_to_pg_table(i, "log message") ..... mpi::barrier() } so the clients are synchronized on each inner loop, the primary key is also a order number, so message with a previous number get a lower index like a message that is pushed later to the table. So with a primary key I can say, that only the messages within an iteration are unordered.