Обсуждение: Postgresql "FIFO" Tables, How-To ?

Поиск

Список

Период

Сортировка

Postgresql "FIFO" Tables, How-To ?

От

"Kirill Ponazdyr"

Дата:

16 июля 2003 г., 15:59:10

Hello,

We are currently working on a project where we need to limit number of
records in a table to a certain number. As soon as the number has been
reached, for each new row the oldest row should be deleted (Kinda FIFO),
thus keeping a total number of rows at predefined number.

The actual limits would be anywhere from 250k to 10mil rows per table.

It would be great if this could be achieved by RDBMS engine itself, does
Postgres supports this kind of tables ? And if not, what would be the most
elegant soluion to achieve our goal in your oppinion ?

Regards

Kirill

--
Kirill Ponazdyr
Technical Director
Codeangels Solutions GmbH
Tel: +41 (0)43 844 90 10
Fax: +41 (0)43 844 90 12
Web: www.codeangels.com

Re: Postgresql "FIFO" Tables, How-To ?

От

"Nigel J. Andrews"

Дата:

16 июля 2003 г., 16:05:59

On Wed, 16 Jul 2003, Kirill Ponazdyr wrote:

> Hello,
>
> We are currently working on a project where we need to limit number of
> records in a table to a certain number. As soon as the number has been
> reached, for each new row the oldest row should be deleted (Kinda FIFO),
> thus keeping a total number of rows at predefined number.
>
> The actual limits would be anywhere from 250k to 10mil rows per table.
>
> It would be great if this could be achieved by RDBMS engine itself, does
> Postgres supports this kind of tables ? And if not, what would be the most
> elegant soluion to achieve our goal in your oppinion ?
>

An after insert trigger springs to mind.


--
Nigel Andrews

Re: Postgresql "FIFO" Tables, How-To ?

От

Dennis Gearon

Дата:

16 июля 2003 г., 16:13:04

use a PL/PGSQL function. Just do count(*) to find out how many there are, calculate how many to be deleted, and put a
timestampfield in the table. NOW, how to  select the correct ones to delete is PROBABLY done by: 

DELETE FROM table_in_question
WHERE some_primary_key_id IN
   (SELECT some_primary_key_id
    FROM table_in_question
    ORDER BY the_time_stamp_field
    LIMIT the_qty_to_be_deleted);

More than likely, in a concurrent environment, you will oscillate between:

(the maximum number you want)

and

(the maximum number you want - the maximum current connections).

Unless you so some kind of table locking.

Kirill Ponazdyr wrote:

> Hello,
>
> We are currently working on a project where we need to limit number of
> records in a table to a certain number. As soon as the number has been
> reached, for each new row the oldest row should be deleted (Kinda FIFO),
> thus keeping a total number of rows at predefined number.
>
> The actual limits would be anywhere from 250k to 10mil rows per table.
>
> It would be great if this could be achieved by RDBMS engine itself, does
> Postgres supports this kind of tables ? And if not, what would be the most
> elegant soluion to achieve our goal in your oppinion ?
>
> Regards
>
> Kirill
>

Re: Postgresql "FIFO" Tables, How-To ?

От

"Shridhar Daithankar"

Дата:

16 июля 2003 г., 16:14:38

On 16 Jul 2003 at 17:59, Kirill Ponazdyr wrote:

> Hello,
>
> We are currently working on a project where we need to limit number of
> records in a table to a certain number. As soon as the number has been
> reached, for each new row the oldest row should be deleted (Kinda FIFO),
> thus keeping a total number of rows at predefined number.
>
> The actual limits would be anywhere from 250k to 10mil rows per table.
>
> It would be great if this could be achieved by RDBMS engine itself, does
> Postgres supports this kind of tables ? And if not, what would be the most
> elegant soluion to achieve our goal in your oppinion ?

It is practically impossible due to concurrency limitation unless you
explicitly serialize everything which might be a bad idea.

I think it is doable. Create a normal table 't' and write a before insert
trigger. Create another table 'control' which contains the limit value and oid
of last row deleted. In the before insert trigger, do a select for update on
table 'control' so that no other transaction can update it. Proceed to
insertion/deletion in  table 't'.

It would be a bad idea to update the control table itself. You need to release
the lock with transaction commit.( I hope it gets released with the commit) If
you update control table, you would generate a dead row for every insertion in
main table which could be a major performance penalty for sizes you are talking
about.

Frankly I would like to know fist why do you want to do this. Unless there are
good enough practical reasons, I would not recommend this approach at all. Can
you tell us why do you want to do this?

Bye
 Shridhar

--
Lieberman's Law:    Everybody lies, but it doesn't matter since nobody listens.

Re: Postgresql "FIFO" Tables, How-To ?

От

"Viorel Dragomir"

Дата:

16 июля 2003 г., 16:17:01

----- Original Message -----
From: "Nigel J. Andrews" <nandrews@investsystems.co.uk>
To: "Kirill Ponazdyr" <softlist@codeangels.com>
Cc: "pg_general" <pgsql-general@postgresql.org>
Sent: Wednesday, July 16, 2003 7:06 PM
Subject: Re: [GENERAL] Postgresql "FIFO" Tables, How-To ?


> On Wed, 16 Jul 2003, Kirill Ponazdyr wrote:
>
> > Hello,
> >
> > We are currently working on a project where we need to limit number of
> > records in a table to a certain number. As soon as the number has been
> > reached, for each new row the oldest row should be deleted (Kinda FIFO),
> > thus keeping a total number of rows at predefined number.
> >
> > The actual limits would be anywhere from 250k to 10mil rows per table.
> >
> > It would be great if this could be achieved by RDBMS engine itself, does
> > Postgres supports this kind of tables ? And if not, what would be the
most
> > elegant soluion to achieve our goal in your oppinion ?
> >
>
> An after insert trigger springs to mind.
>

I see that the tables are quite big and I think a procedure launched by cron
at certain time to truncate the tables is a better solution.
If the server runs well with the trigger than choose to create the trigger
otherwise...

Re: Postgresql "FIFO" Tables, How-To ?

От

"Kirill Ponazdyr"

Дата:

16 июля 2003 г., 16:17:46

>> Hello,
>>
>> We are currently working on a project where we need to limit number of
>> records in a table to a certain number. As soon as the number has been
>> reached, for each new row the oldest row should be deleted (Kinda FIFO),
>> thus keeping a total number of rows at predefined number.
>>
>> The actual limits would be anywhere from 250k to 10mil rows per table.
>>
>> It would be great if this could be achieved by RDBMS engine itself, does
>> Postgres supports this kind of tables ? And if not, what would be the
>> most
>> elegant soluion to achieve our goal in your oppinion ?
>>
>
> An after insert trigger springs to mind.

Ouch, this means that for every insert we would have to trigger a
procedure which will:

COUNT
IF > Limit
DELETE OLDEST

This would be pretty much damn ressource intensive on a table with million
of records, would not it ?

Regards

--
Kirill Ponazdyr
Technical Director
Codeangels Solutions GmbH
Tel: +41 (0)43 844 90 10
Fax: +41 (0)43 844 90 12
Web: www.codeangels.com

Re: Postgresql "FIFO" Tables, How-To ?

От

"scott.marlowe"

Дата:

16 июля 2003 г., 16:23:00

On Wed, 16 Jul 2003, Shridhar Daithankar wrote:

> On 16 Jul 2003 at 17:59, Kirill Ponazdyr wrote:
>
> > Hello,
> >
> > We are currently working on a project where we need to limit number of
> > records in a table to a certain number. As soon as the number has been
> > reached, for each new row the oldest row should be deleted (Kinda FIFO),
> > thus keeping a total number of rows at predefined number.
> >
> > The actual limits would be anywhere from 250k to 10mil rows per table.
> >
> > It would be great if this could be achieved by RDBMS engine itself, does
> > Postgres supports this kind of tables ? And if not, what would be the most
> > elegant soluion to achieve our goal in your oppinion ?
>
> It is practically impossible due to concurrency limitation unless you
> explicitly serialize everything which might be a bad idea.
>
> I think it is doable. Create a normal table 't' and write a before insert
> trigger. Create another table 'control' which contains the limit value and oid
> of last row deleted. In the before insert trigger, do a select for update on
> table 'control' so that no other transaction can update it. Proceed to
> insertion/deletion in  table 't'.
>
> It would be a bad idea to update the control table itself. You need to release
> the lock with transaction commit.( I hope it gets released with the commit) If
> you update control table, you would generate a dead row for every insertion in
> main table which could be a major performance penalty for sizes you are talking
> about.
>
> Frankly I would like to know fist why do you want to do this. Unless there are
> good enough practical reasons, I would not recommend this approach at all. Can
> you tell us why do you want to do this?

If he only needs an approximate number of rows (i.e. having max +/- 100
rows is ok...)  then maybe just use a sequence and delete any rows that
are current_max_seq - max_records???

Re: Postgresql "FIFO" Tables, How-To ?

От

Dennis Gearon

Дата:

16 июля 2003 г., 16:25:10

Or, you could make a view, or all of your queries have the phrase
'ORDER BY the_time_stamp_field
 LIMIT the_qty_to_be_deleted'

in them. That'd be the standard way.

Dennis Gearon wrote:
> use a PL/PGSQL function. Just do count(*) to find out how many there
> are, calculate how many to be deleted, and put a timestamp field in the
> table. NOW, how to  select the correct ones to delete is PROBABLY done by:
>
> DELETE FROM table_in_question
> WHERE some_primary_key_id IN
>   (SELECT some_primary_key_id
>    FROM table_in_question
>    ORDER BY the_time_stamp_field
>    LIMIT the_qty_to_be_deleted);
>
> More than likely, in a concurrent environment, you will oscillate between:
>
> (the maximum number you want)
>
> and
>
> (the maximum number you want - the maximum current connections).
>
> Unless you so some kind of table locking.
>
>
> Kirill Ponazdyr wrote:
>
>> Hello,
>>
>> We are currently working on a project where we need to limit number of
>> records in a table to a certain number. As soon as the number has been
>> reached, for each new row the oldest row should be deleted (Kinda FIFO),
>> thus keeping a total number of rows at predefined number.
>>
>> The actual limits would be anywhere from 250k to 10mil rows per table.
>>
>> It would be great if this could be achieved by RDBMS engine itself, does
>> Postgres supports this kind of tables ? And if not, what would be the
>> most
>> elegant soluion to achieve our goal in your oppinion ?
>>
>> Regards
>>
>> Kirill
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>

Re: Postgresql "FIFO" Tables, How-To ?

От

Jonathan Bartlett

Дата:

16 июля 2003 г., 16:28:12

Two possiblities I might try:

1) a view:

create view foo as select * from x order by y limit WHATEVER;

Then, periodically do a physical cleaning of the table.

2) Fake records

Have an extra field, a boolean, called "fake".  Create a view which
removes the fakes.  Initialize the table with however many fakes you need.
Have a date_inserted field, and simply remove the last entered record for
every insert.  Then create a view which filters out the fakes.

Jon

On Wed, 16 Jul 2003, Viorel Dragomir wrote:

>
> ----- Original Message -----
> From: "Nigel J. Andrews" <nandrews@investsystems.co.uk>
> To: "Kirill Ponazdyr" <softlist@codeangels.com>
> Cc: "pg_general" <pgsql-general@postgresql.org>
> Sent: Wednesday, July 16, 2003 7:06 PM
> Subject: Re: [GENERAL] Postgresql "FIFO" Tables, How-To ?
>
>
> > On Wed, 16 Jul 2003, Kirill Ponazdyr wrote:
> >
> > > Hello,
> > >
> > > We are currently working on a project where we need to limit number of
> > > records in a table to a certain number. As soon as the number has been
> > > reached, for each new row the oldest row should be deleted (Kinda FIFO),
> > > thus keeping a total number of rows at predefined number.
> > >
> > > The actual limits would be anywhere from 250k to 10mil rows per table.
> > >
> > > It would be great if this could be achieved by RDBMS engine itself, does
> > > Postgres supports this kind of tables ? And if not, what would be the
> most
> > > elegant soluion to achieve our goal in your oppinion ?
> > >
> >
> > An after insert trigger springs to mind.
> >
>
> I see that the tables are quite big and I think a procedure launched by cron
> at certain time to truncate the tables is a better solution.
> If the server runs well with the trigger than choose to create the trigger
> otherwise...
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: Postgresql "FIFO" Tables, How-To ?

От

Bruno Wolff III

Дата:

16 июля 2003 г., 16:46:14

On Wed, Jul 16, 2003 at 18:18:09 +0200,
  Kirill Ponazdyr <softlist@codeangels.com> wrote:
>
> Ouch, this means that for every insert we would have to trigger a
> procedure which will:
>
> COUNT
> IF > Limit
> DELETE OLDEST
>
> This would be pretty much damn ressource intensive on a table with million
> of records, would not it ?

If you preload the database with the required number of records, then you
don't need to count. You can just delete the oldest record. You can use
a sequence or timestamp to order the records for this purpose. If you
don't to see the dummy records you can flag them somehow. But if you reach
the limit in a short time you might not really care about that enough
to add the extra overhead.

Re: Postgresql "FIFO" Tables, How-To ?

От

"Shridhar Daithankar"

Дата:

16 июля 2003 г., 16:48:16

On 16 Jul 2003 at 10:13, scott.marlowe wrote:

> On Wed, 16 Jul 2003, Shridhar Daithankar wrote:
> > It would be a bad idea to update the control table itself. You need to release
> > the lock with transaction commit.( I hope it gets released with the commit) If
> > you update control table, you would generate a dead row for every insertion in
> > main table which could be a major performance penalty for sizes you are talking
> > about.
> >
> > Frankly I would like to know fist why do you want to do this. Unless there are
> > good enough practical reasons, I would not recommend this approach at all. Can
> > you tell us why do you want to do this?
>
> If he only needs an approximate number of rows (i.e. having max +/- 100
> rows is ok...)  then maybe just use a sequence and delete any rows that
> are current_max_seq - max_records???

Surely there are more than one way to do it depending upon how much strict OP
wants to be. This seems to be a much better solution along with periodic vacuum
analyze if required.

Bye
 Shridhar

--
"On a normal ascii line, the only safe condition to detect is a 'BREAK'-
everything else having been assigned functions by Gnu EMACS."(By Tarl
Neustaedter)

Re: Postgresql "FIFO" Tables, How-To ?

От

"Kirill Ponazdyr"

Дата:

16 июля 2003 г., 17:08:39

> Frankly I would like to know fist why do you want to do this. Unless there
> are
> good enough practical reasons, I would not recommend this approach at all.
> Can
> you tell us why do you want to do this?
>

It is for a advanced syslog server product we are currently developing.

The very basic idea is to feed all syslog messages into a DB and allow
easy monitoring and even correlation, we use Postgres as our DB Backend,
in big environments the machine would be hit with dozens of syslog
messages in a second and the messaging tables could grow out of controll
pretty soon (We are talking of up to 10mil messages daily).

We do have a "cleansing" logic build in which runs at certain times and
could delete the records over limit. So FIFO is not a requirement, it was
actually rather a theoretical question.

Regards

Kirill

--
Kirill Ponazdyr
Technical Director
Codeangels Solutions GmbH
Tel: +41 (0)43 844 90 10
Fax: +41 (0)43 844 90 12
Web: www.codeangels.com

Re: Postgresql "FIFO" Tables, How-To ?

От

Dmitry Tkach

Дата:

16 июля 2003 г., 17:43:01

>
>
>
>Ouch, this means that for every insert we would have to trigger a
>procedure which will:
>
>COUNT
>IF > Limit
>DELETE OLDEST
>
>This would be pretty much damn ressource intensive on a table with million
>of records, would not it ?
>
>

You can keep the count in a table on the side, and have it updated by
the same trigger (after insert or delete)...

Dima

Re: Postgresql "FIFO" Tables, How-To ?

От

Jean-Luc Lachance

Дата:

16 июля 2003 г., 17:57:00

OUCH!!! Do a COUNT(*) on a 10M row table???? Forget it.

Here is a simple solution.
Add a SERIAL field to the table.
Set the maximum value for that sequence to the number of records you
want to keep.
Use a before insert trigger to replace the insert with an update if the
key already exist.

No need for a cron.



Dennis Gearon wrote:
>
> use a PL/PGSQL function. Just do count(*) to find out how many there are, calculate how many to be deleted, and put a
timestampfield in the table. NOW, how to  select the correct ones to delete is PROBABLY done by: 
>
> DELETE FROM table_in_question
> WHERE some_primary_key_id IN
>    (SELECT some_primary_key_id
>     FROM table_in_question
>     ORDER BY the_time_stamp_field
>     LIMIT the_qty_to_be_deleted);
>
> More than likely, in a concurrent environment, you will oscillate between:
>
> (the maximum number you want)
>
> and
>
> (the maximum number you want - the maximum current connections).
>
> Unless you so some kind of table locking.
>
> Kirill Ponazdyr wrote:
>
> > Hello,
> >
> > We are currently working on a project where we need to limit number of
> > records in a table to a certain number. As soon as the number has been
> > reached, for each new row the oldest row should be deleted (Kinda FIFO),
> > thus keeping a total number of rows at predefined number.
> >
> > The actual limits would be anywhere from 250k to 10mil rows per table.
> >
> > It would be great if this could be achieved by RDBMS engine itself, does
> > Postgres supports this kind of tables ? And if not, what would be the most
> > elegant soluion to achieve our goal in your oppinion ?
> >
> > Regards
> >
> > Kirill
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: Postgresql "FIFO" Tables, How-To ?

От

Dennis Gearon

Дата:

16 июля 2003 г., 18:10:56

Good idea!

Jean-Luc Lachance wrote:

> OUCH!!! Do a COUNT(*) on a 10M row table???? Forget it.
>
> Here is a simple solution.
> Add a SERIAL field to the table.
> Set the maximum value for that sequence to the number of records you
> want to keep.
> Use a before insert trigger to replace the insert with an update if the
> key already exist.
>
> No need for a cron.
>
>
>
> Dennis Gearon wrote:
>
>>use a PL/PGSQL function. Just do count(*) to find out how many there are, calculate how many to be deleted, and put a
timestampfield in the table. NOW, how to  select the correct ones to delete is PROBABLY done by: 
>>
>>DELETE FROM table_in_question
>>WHERE some_primary_key_id IN
>>   (SELECT some_primary_key_id
>>    FROM table_in_question
>>    ORDER BY the_time_stamp_field
>>    LIMIT the_qty_to_be_deleted);
>>
>>More than likely, in a concurrent environment, you will oscillate between:
>>
>>(the maximum number you want)
>>
>>and
>>
>>(the maximum number you want - the maximum current connections).
>>
>>Unless you so some kind of table locking.
>>
>>Kirill Ponazdyr wrote:
>>
>>
>>>Hello,
>>>
>>>We are currently working on a project where we need to limit number of
>>>records in a table to a certain number. As soon as the number has been
>>>reached, for each new row the oldest row should be deleted (Kinda FIFO),
>>>thus keeping a total number of rows at predefined number.
>>>
>>>The actual limits would be anywhere from 250k to 10mil rows per table.
>>>
>>>It would be great if this could be achieved by RDBMS engine itself, does
>>>Postgres supports this kind of tables ? And if not, what would be the most
>>>elegant soluion to achieve our goal in your oppinion ?
>>>
>>>Regards
>>>
>>>Kirill
>>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 9: the planner will ignore your desire to choose an index scan if your
>>      joining column's datatypes do not match
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

Re: Postgresql "FIFO" Tables, How-To ?

От

"Leon Oosterwijk"

Дата:

16 июля 2003 г., 18:33:37

All,

There's a database format called "rrd" or round robin database. this was
specifically designed for wrap-around data storage. since you are trying to
store 10mil+ syslog messages this might not be the right tool. I'm just
mentioning it because it perhaps the way the rrd keeps track of wrap-around
might be a good way to implement this in postgres.

Sincerely,

Leon Oosterwijk



> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Dennis Gearon
> Sent: Wednesday, July 16, 2003 1:14 PM
> To: Jean-Luc Lachance
> Cc: softlist@codeangels.com; pg_general
> Subject: Re: [GENERAL] Postgresql "FIFO" Tables, How-To ?
>
>
> Good idea!
>
> Jean-Luc Lachance wrote:
>
> > OUCH!!! Do a COUNT(*) on a 10M row table???? Forget it.
> >
> > Here is a simple solution.
> > Add a SERIAL field to the table.
> > Set the maximum value for that sequence to the number of records you
> > want to keep.
> > Use a before insert trigger to replace the insert with an update if the
> > key already exist.
> >
> > No need for a cron.
> >
> >
> >
> > Dennis Gearon wrote:
> >
> >>use a PL/PGSQL function. Just do count(*) to find out how many
> there are, calculate how many to be deleted, and put a timestamp
> field in the table. NOW, how to  select the correct ones to
> delete is PROBABLY done by:
> >>
> >>DELETE FROM table_in_question
> >>WHERE some_primary_key_id IN
> >>   (SELECT some_primary_key_id
> >>    FROM table_in_question
> >>    ORDER BY the_time_stamp_field
> >>    LIMIT the_qty_to_be_deleted);
> >>
> >>More than likely, in a concurrent environment, you will
> oscillate between:
> >>
> >>(the maximum number you want)
> >>
> >>and
> >>
> >>(the maximum number you want - the maximum current connections).
> >>
> >>Unless you so some kind of table locking.
> >>
> >>Kirill Ponazdyr wrote:
> >>
> >>
> >>>Hello,
> >>>
> >>>We are currently working on a project where we need to limit number of
> >>>records in a table to a certain number. As soon as the number has been
> >>>reached, for each new row the oldest row should be deleted
> (Kinda FIFO),
> >>>thus keeping a total number of rows at predefined number.
> >>>
> >>>The actual limits would be anywhere from 250k to 10mil rows per table.
> >>>
> >>>It would be great if this could be achieved by RDBMS engine
> itself, does
> >>>Postgres supports this kind of tables ? And if not, what would
> be the most
> >>>elegant soluion to achieve our goal in your oppinion ?
> >>>
> >>>Regards
> >>>
> >>>Kirill
> >>>
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 9: the planner will ignore your desire to choose an index
> scan if your
> >>      joining column's datatypes do not match
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> >                http://www.postgresql.org/docs/faqs/FAQ.html
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Postgresql "FIFO" Tables, How-To ?

От

Andrew Sullivan

Дата:

16 июля 2003 г., 18:41:07

On Wed, Jul 16, 2003 at 01:33:35PM -0500, Leon Oosterwijk wrote:
> store 10mil+ syslog messages this might not be the right tool. I'm just
> mentioning it because it perhaps the way the rrd keeps track of wrap-around
> might be a good way to implement this in postgres.

Hmm.  Using the cycling feature of a sequence, couldn't you create a
trigger which either inserts (if, e.g., the value of the trigger is
not there) or updates (if the value of the trigger is there)?  I'm
not sure how to do it efficiently, but I haven't thought about it
very much.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Postgresql "FIFO" Tables, How-To ?

От

greg@turnstep.com

Дата:

16 июля 2003 г., 19:03:24

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> It would be great if this could be achieved by RDBMS engine itself, does
> Postgres supports this kind of tables ? And if not, what would be the most
> elegant soluion to achieve our goal in your oppinion ?

First question: no. :)

Second question: two ideas.

I. First idea:

Assuming that it is okay for the tables to run over a little bit, the best
way I can think of is to run a periodic cronjob. Assuming you have an
indexed column "id" which increments on new inserts:

DELETE FROM bigtable WHERE id <
  (SELECT id FROM bigtable ORDER BY id DESC LIMIT 1 OFFSET 100000);

If you don't have a unique incrementing field but have a timestamp:

DELETE FROM bigtable WHERE oid <
  (SELECT oid FROM bigtable ORDER BY in_time DESC LIMIT 1 OFFSET 100000);

This assumes your oid is not about to roll over, and that you have an
index on the oid column, of course.

Running a VACUUM immediately after the DELETE is highly advised.

Even running this job every 10 minutes would probably not be too bad - if
the table is not "full", no harm is done. Deleting and vacuuming for the
rows that have built up in the previous 10 minutes will also not
be too expensive. If it is, decrease the time. Even a cronjob running once
a minute is probably better than a trigger that fires upon very insert,
if the table is very active and getting hundreds or thousands of inserts
per minute.


II. Second idea:

Prepopulate the table, use a wrapping sequence, and a timestamp.

CREATE SEQUENCE mmlog_seq MINVALUE 1 MAXVALUE 500 CYCLE;

CREATE TABLE mmlog (
  id INTEGER NOT NULL,
  message VARCHAR,
  ctime TIMESTAMP WITH TIME ZONE NOT NULL
);

CREATE INDEX mmlog_id ON mmlog(id);
CREATE INDEX mmlog_ctime ON mmlog(ctime);

BEGIN;

- -- Run this next command exactly 500 times:
INSERT INTO mmlog (id,message,ctime) VALUES (nextval('mmlog_seq'),'',now());

COMMIT;

REVOKE INSERT ON mmlog FROM PUBLIC;

To add a new row, run this:

UPDATE mmlog SET message=?, ctime=now() WHERE a = (SELECT nextval('mmlog_seq'));


Voila! A self-limiting table: INSERTING is not allowed, and the oldest record
is always overwritten. Remember to vacuum of course. And ORDER BY on the
ctime field to keep things in the proper order.


- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200307161435

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE/FaEdvJuQZxSWSsgRAs2TAKCi+ss3cGmwYEWU1zl7c6MpT+5RuACfW/K5
SfkKRslsAqMBLL7wLA0Dt7w=
=kUQE
-----END PGP SIGNATURE-----

Re: Postgresql "FIFO" Tables, How-To ?

От

Sean Chittenden

Дата:

16 июля 2003 г., 19:04:36

> > store 10mil+ syslog messages this might not be the right tool. I'm
> > just mentioning it because it perhaps the way the rrd keeps track
> > of wrap-around might be a good way to implement this in postgres.
>
> Hmm.  Using the cycling feature of a sequence, couldn't you create a
> trigger which either inserts (if, e.g., the value of the trigger is
> not there) or updates (if the value of the trigger is there)?  I'm
> not sure how to do it efficiently, but I haven't thought about it
> very much.

I use this very approach.

CREATE SEQUENCE syslog_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 250000 CYCLE;
CREATE TABLE syslog (
  id INT NOT NULL,
  msg TEXT NOT NULL
);
CREATE UNIQUE INDEX syslog_id_udx ON syslog(id);
CREATE FUNCTION syslog_ins(TEXT)
  RETURNS INT
  EXTERNAL SECURITY DEFINER
  AS '
DECLARE
    a_msg ALIAS FOR $1;
    v_id syslog.id%TYPE;
BEGIN
    v_id := NEXTVAL(''syslog_id_seq''::TEXT);
    PERFORM TRUE FROM syslog WHERE id = v_id;
    IF FOUND THEN
        UPDATE syslog SET msg = a_msg WHERE id = v_id;
    ELSE
        INSERT INTO syslog (id,msg) VALUES (id,msg);
    END IF;

    RETURN v_id;
' LANGUAGE 'plpgsql';

Though this is the inefficient way of doing this.  If you wanted to be
really slick about it and incur some upfront disk space, populate the
table with your 250000 rows of bogus data, empty strings, then use the
following instead to save yourself a SELECT (which is only of use for
the first 250000 syslog msgs, then it becomes a given after the
sequence wraps):

CREATE FUNCTION syslog_ins(TEXT)
  RETURNS INT
  EXTERNAL SECURITY DEFINER
  AS '
DECLARE
    a_msg ALIAS FOR $1;
    v_id syslog.id%TYPE;
BEGIN
    v_id := NEXTVAL(''syslog_id_seq''::TEXT);
    UPDATE syslog SET msg = a_msg WHERE id = v_id;
    RETURN v_id;
' LANGUAGE 'plpgsql';


You may want to add a time component to the table/function, but I'll
leave that as an exercise to the reader.  Just make sure you're
VACUUMing on a regular basis.  :) -sc

--
Sean Chittenden

Re: Postgresql "FIFO" Tables, How-To ?

От

Sean Chittenden

Дата:

16 июля 2003 г., 19:08:06

> > > store 10mil+ syslog messages this might not be the right tool. I'm
> > > just mentioning it because it perhaps the way the rrd keeps track
> > > of wrap-around might be a good way to implement this in postgres.
> >
> > Hmm.  Using the cycling feature of a sequence, couldn't you create a
> > trigger which either inserts (if, e.g., the value of the trigger is
> > not there) or updates (if the value of the trigger is there)?  I'm
> > not sure how to do it efficiently, but I haven't thought about it
> > very much.
>
> I use this very approach.
>
> CREATE SEQUENCE syslog_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 250000 CYCLE;
> CREATE TABLE syslog (
>   id INT NOT NULL,
>   msg TEXT NOT NULL
> );
> CREATE UNIQUE INDEX syslog_id_udx ON syslog(id);
> CREATE FUNCTION syslog_ins(TEXT)
>   RETURNS INT
>   EXTERNAL SECURITY DEFINER
>   AS '
> DECLARE
>     a_msg ALIAS FOR $1;
>     v_id syslog.id%TYPE;
> BEGIN
>     v_id := NEXTVAL(''syslog_id_seq''::TEXT);
>     PERFORM TRUE FROM syslog WHERE id = v_id;
>     IF FOUND THEN
>         UPDATE syslog SET msg = a_msg WHERE id = v_id;
>     ELSE
>         INSERT INTO syslog (id,msg) VALUES (id,msg);
>     END IF;
>
>     RETURN v_id;

END; -- *blush*

> ' LANGUAGE 'plpgsql';
>
> Though this is the inefficient way of doing this.  If you wanted to be
> really slick about it and incur some upfront disk space, populate the
> table with your 250000 rows of bogus data, empty strings, then use the
> following instead to save yourself a SELECT (which is only of use for
> the first 250000 syslog msgs, then it becomes a given after the
> sequence wraps):
>
> CREATE FUNCTION syslog_ins(TEXT)
>   RETURNS INT
>   EXTERNAL SECURITY DEFINER
>   AS '
> DECLARE
>     a_msg ALIAS FOR $1;
>     v_id syslog.id%TYPE;
> BEGIN
>     v_id := NEXTVAL(''syslog_id_seq''::TEXT);
>     UPDATE syslog SET msg = a_msg WHERE id = v_id;
>     RETURN v_id;
END;
> ' LANGUAGE 'plpgsql';

-sc
--
Sean Chittenden

Re: Postgresql "FIFO" Tables, How-To ?

От

Dmitry Tkach

Дата:

16 июля 2003 г., 19:39:59

Sean Chittenden wrote:

>>>store 10mil+ syslog messages this might not be the right tool. I'm
>>>just mentioning it because it perhaps the way the rrd keeps track
>>>of wrap-around might be a good way to implement this in postgres.
>>>
>>>
>>Hmm.  Using the cycling feature of a sequence, couldn't you create a
>>trigger which either inserts (if, e.g., the value of the trigger is
>>not there) or updates (if the value of the trigger is there)?  I'm
>>not sure how to do it efficiently, but I haven't thought about it
>>very much.
>>
>>
>
>I use this very approach.
>
>CREATE SEQUENCE syslog_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 250000 CYCLE;
>CREATE TABLE syslog (
>  id INT NOT NULL,
>  msg TEXT NOT NULL
>);
>CREATE UNIQUE INDEX syslog_id_udx ON syslog(id);
>CREATE FUNCTION syslog_ins(TEXT)
>  RETURNS INT
>  EXTERNAL SECURITY DEFINER
>  AS '
>DECLARE
>    a_msg ALIAS FOR $1;
>    v_id syslog.id%TYPE;
>BEGIN
>    v_id := NEXTVAL(''syslog_id_seq''::TEXT);
>    PERFORM TRUE FROM syslog WHERE id = v_id;
>    IF FOUND THEN
>        UPDATE syslog SET msg = a_msg WHERE id = v_id;
>    ELSE
>        INSERT INTO syslog (id,msg) VALUES (id,msg);
>    END IF;
>
>    RETURN v_id;
>' LANGUAGE 'plpgsql';
>

I believe, you can save one query by replacing 'if exists then update else insert' part with just
'delete unconditionally then insert'

>Though this is the inefficient way of doing this.  If you wanted to be
>really slick about it and incur some upfront disk space, populate the
>table with your 250000 rows of bogus data, empty strings, then use the
>following instead to save yourself a SELECT (which is only of use for
>the first 250000 syslog msgs, then it becomes a given after the
>sequence wraps):
>
>CREATE FUNCTION syslog_ins(TEXT)
>  RETURNS INT
>  EXTERNAL SECURITY DEFINER
>  AS '
>DECLARE
>    a_msg ALIAS FOR $1;
>    v_id syslog.id%TYPE;
>BEGIN
>    v_id := NEXTVAL(''syslog_id_seq''::TEXT);
>    UPDATE syslog SET msg = a_msg WHERE id = v_id;
>    RETURN v_id;
>' LANGUAGE 'plpgsql';
>
>

... or you could have another sequence (with no limit, and no cycle) to
count the number of inserts - you'd then increment both in the trigger,
and, if the insert count is greater then the limit you'd update, else
insert.

... or you could do it with the single sequence still, if you get rid of
the limit and cycle, and just do
if nextval >= limit then update ... where id = nextval % limit else insert


Dima

Re: Postgresql "FIFO" Tables, How-To ?

От

"Jim C. Nasby"

Дата:

16 июля 2003 г., 19:55:31

On Wed, Jul 16, 2003 at 03:40:47PM -0400, Dmitry Tkach wrote:
> I believe, you can save one query by replacing 'if exists then update else
> insert' part with just 'delete unconditionally then insert'

I was going to mention this. Because updates in PGSQL are actually
DELETE/INSERT in a single statement, you can just code it that way.
--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Postgresql "FIFO" Tables, How-To ?

От

Sean Chittenden

Дата:

16 июля 2003 г., 20:02:30

> >I use this very approach.
> >
> >CREATE SEQUENCE syslog_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 250000 CYCLE;
> >CREATE TABLE syslog (
> > id INT NOT NULL,
> > msg TEXT NOT NULL
> >);
[snip slower function]
> I believe, you can save one query by replacing 'if exists then
> update else insert' part with just 'delete unconditionally then
> insert'

Which is why I use something similar to the function below.

> >CREATE FUNCTION syslog_ins(TEXT)
> > RETURNS INT
> > EXTERNAL SECURITY DEFINER
> > AS '
> >DECLARE
> >    a_msg ALIAS FOR $1;
> >    v_id syslog.id%TYPE;
> >BEGIN
> >    v_id := NEXTVAL(''syslog_id_seq''::TEXT);
> >    UPDATE syslog SET msg = a_msg WHERE id = v_id;
> >    RETURN v_id;
> >END;
> >' LANGUAGE 'plpgsql';
> >
> >
>
> ... or you could have another sequence (with no limit, and no cycle)
> to count the number of inserts - you'd then increment both in the
> trigger, and, if the insert count is greater then the limit you'd
> update, else insert.

That doesn't help you limit the number of rows in the table though
because what row is going to get "pushed out" of the table?  The nifty
thing about using a wrapping sequence is that the id's are sequential
across transactions, which correctly maps to the progression of time,
which obviates the need for relying on any kind of a date column for
doing syslog message ordering.

> ... or you could do it with the single sequence still, if you get
> rid of the limit and cycle, and just do if nextval >= limit then
> update ... where id = nextval % limit else insert

There's no logic necessary with the above function.  You get the next
val in the sequence and update the ID.  No need to delete, no nothing
other than VACUUM to remove the dead tuples, which, you have to do
regardless of the method chosen.  A DELETE + UPDATE is just a waste of
CPU and IO with the table being MVCC backed.  Just UPDATE and be done
with it.

That said, it'd be cool if there was a GLOBAL TEMP table that could be
used for this kinda stuff... not having something WAL backed on this
kind of an application would make things really scream.

-sc

--
Sean Chittenden

Re: Postgresql "FIFO" Tables, How-To ?

От

Dennis Gearon

Дата:

16 июля 2003 г., 20:25:22

You could skip the extra select in any case, if you used a boolean column named 'valid_data', and always updated it
witha rue, but prepopulated it with a 'false'. Your selects OUT of the table would use the WHERE clause of 'valid_data'
=TRUE. 
 updating and selecting on a bool would not take much time.

Sean Chittenden wrote:

>>>store 10mil+ syslog messages this might not be the right tool. I'm
>>>just mentioning it because it perhaps the way the rrd keeps track
>>>of wrap-around might be a good way to implement this in postgres.
>>
>>Hmm.  Using the cycling feature of a sequence, couldn't you create a
>>trigger which either inserts (if, e.g., the value of the trigger is
>>not there) or updates (if the value of the trigger is there)?  I'm
>>not sure how to do it efficiently, but I haven't thought about it
>>very much.
>
>
> I use this very approach.
>
> CREATE SEQUENCE syslog_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 250000 CYCLE;
> CREATE TABLE syslog (
>   id INT NOT NULL,
>   msg TEXT NOT NULL
> );
> CREATE UNIQUE INDEX syslog_id_udx ON syslog(id);
> CREATE FUNCTION syslog_ins(TEXT)
>   RETURNS INT
>   EXTERNAL SECURITY DEFINER
>   AS '
> DECLARE
>     a_msg ALIAS FOR $1;
>     v_id syslog.id%TYPE;
> BEGIN
>     v_id := NEXTVAL(''syslog_id_seq''::TEXT);
>     PERFORM TRUE FROM syslog WHERE id = v_id;
>     IF FOUND THEN
>         UPDATE syslog SET msg = a_msg WHERE id = v_id;
>     ELSE
>         INSERT INTO syslog (id,msg) VALUES (id,msg);
>     END IF;
>
>     RETURN v_id;
> ' LANGUAGE 'plpgsql';
>
> Though this is the inefficient way of doing this.  If you wanted to be
> really slick about it and incur some upfront disk space, populate the
> table with your 250000 rows of bogus data, empty strings, then use the
> following instead to save yourself a SELECT (which is only of use for
> the first 250000 syslog msgs, then it becomes a given after the
> sequence wraps):
>
> CREATE FUNCTION syslog_ins(TEXT)
>   RETURNS INT
>   EXTERNAL SECURITY DEFINER
>   AS '
> DECLARE
>     a_msg ALIAS FOR $1;
>     v_id syslog.id%TYPE;
> BEGIN
>     v_id := NEXTVAL(''syslog_id_seq''::TEXT);
>     UPDATE syslog SET msg = a_msg WHERE id = v_id;
>     RETURN v_id;
> ' LANGUAGE 'plpgsql';
>
>
> You may want to add a time component to the table/function, but I'll
> leave that as an exercise to the reader.  Just make sure you're
> VACUUMing on a regular basis.  :) -sc
>

Re: Postgresql "FIFO" Tables, How-To ?

От

Dmitry Tkach

Дата:

16 июля 2003 г., 20:57:07

>
>
>That doesn't help you limit the number of rows in the table though
>because what row is going to get "pushed out" of the table?  The nifty
>thing about using a wrapping sequence is that the id's are sequential
>across transactions, which correctly maps to the progression of time,
>which obviates the need for relying on any kind of a date column for
>doing syslog message ordering.
>
You must have misunderstood my suggestion - you'd still have a wrapping
sequence, that would generate your ids, and determine which row would be
pushed out - you would just have *another* sequence *in addition* to
that, that would count the total number of inserts, so that you would
know if the wraparound had already happened without having to issue
another query, which, by the way, is not very robust, because another
connection may be inserting at the same time, so your update will not
see the new rowid, but the insert will fail... Keeping two sequences -
one for id generation, and one for counting solves that.

>
>
>
>>... or you could do it with the single sequence still, if you get
>>rid of the limit and cycle, and just do if nextval >= limit then
>>update ... where id = nextval % limit else insert
>>
>>
>
>There's no logic necessary with the above function.  You get the next
>val in the sequence and update the ID.  No need to delete, no nothing
>other than VACUUM to remove the dead tuples, which, you have to do
>regardless of the method chosen.  A DELETE + UPDATE is just a waste of
>CPU and IO with the table being MVCC backed.  Just UPDATE and be done
>with it.
>
Well... first of all, "just UPDATE" is *exactly* the same internally as
DELETE+INSERT.
Secondly, I just don't like having to populate the table with millions
of empty rows just to keep that 'no logic' function happy :-)
And finally, nextval % limit hardly qualifies as 'logic'  to me... and
that's really all there is.

>
>That said, it'd be cool if there was a GLOBAL TEMP table that could be
>used for this kinda stuff... not having something WAL backed on this
>kind of an application would make things really scream.
>
>
Hmmm... That brings up the question I wanted to ask from the very
beginning, but just didn't dare :-)
What is the reason to throw logs into the database in the first place?
What's wrong with 'normal' syslog?
The only difference I can imagine is MVCC, transaction consistency, etc...
But if you don't care about all that, why not just throw things into a
file? Now *that* indeed would scream :-)

Dima

Re: Postgresql "FIFO" Tables, How-To ?

От

Sean Chittenden

Дата:

16 июля 2003 г., 21:16:23

> >That doesn't help you limit the number of rows in the table though
> >because what row is going to get "pushed out" of the table?  The
> >nifty thing about using a wrapping sequence is that the id's are
> >sequential across transactions, which correctly maps to the
> >progression of time, which obviates the need for relying on any
> >kind of a date column for doing syslog message ordering.
>
> You must have misunderstood my suggestion - you'd still have a wrapping
> sequence, that would generate your ids, and determine which row would be
> pushed out - you would just have *another* sequence *in addition* to
> that, that would count the total number of inserts, so that you would
> know if the wraparound had already happened without having to issue
> another query, which, by the way, is not very robust, because another
> connection may be inserting at the same time, so your update will not
> see the new rowid, but the insert will fail... Keeping two sequences -
> one for id generation, and one for counting solves that.

*nods*  Got 'cha.

> Well... first of all, "just UPDATE" is *exactly* the same internally
> as DELETE+INSERT.  Secondly, I just don't like having to populate
> the table with millions of empty rows just to keep that 'no logic'
> function happy :-) And finally, nextval % limit hardly qualifies as
> 'logic' to me... and that's really all there is.

True, but in theory, because there will be less latency between the
DELETE+INSERT when using an UPDATE, there will be less resources used.

> >That said, it'd be cool if there was a GLOBAL TEMP table that could
> >be used for this kinda stuff... not having something WAL backed on
> >this kind of an application would make things really scream.
>
> Hmmm... That brings up the question I wanted to ask from the very
> beginning, but just didn't dare :-) What is the reason to throw logs
> into the database in the first place?  What's wrong with 'normal'
> syslog?  The only difference I can imagine is MVCC, transaction
> consistency, etc...  But if you don't care about all that, why not
> just throw things into a file? Now *that* indeed would scream :-)

At Cisco, we had syslog messages coming in from tens of thousands of
machines that generated millions of syslog entries a day.  The
messages were dumped into text files and once a minute the data was
loaded and stored in the database.  Why a database?  *shrug* The only
useful thing that we did was run Perl regexps stored in the DB over
the data and send out custom alerts.  I suppose the baby puke colors
for the web interface were aesthetically pleasing to some, but I never
saw the attaction.

You can design to use or not use any one particular data storage
format.  What's nice about using a DB for this kind of stuff, however,
is it's easily searchable via SQL, which, can be quite valuable.  Data
that isn't accessible/processable, is pretty worthless data.  All that
said and done, I use a similar system for my hosting clients.  I'll
archive their syslog bits for as long as they'd like me to, provided
they pay for the disk that their bits take up.  :)

-sc

--
Sean Chittenden

Re: Postgresql "FIFO" Tables, How-To ?

От

Thomas Kellerer

Дата:

16 июля 2003 г., 21:19:14

Sean Chittenden schrieb:
> That doesn't help you limit the number of rows in the table though
> because what row is going to get "pushed out" of the table?  The nifty
> thing about using a wrapping sequence is that the id's are sequential
> across transactions, which correctly maps to the progression of time,
> which obviates the need for relying on any kind of a date column for
> doing syslog message ordering.
>
But isn't that exactly the problem? Once the sequence wraps around how do I
know that id=1 is actually later then id=2 without a date column?

Thomas

Re: Postgresql "FIFO" Tables, How-To ?

От

Dmitry Tkach

Дата:

16 июля 2003 г., 21:36:49

>
>
>At Cisco, we had syslog messages coming in from tens of thousands of
>machines that generated millions of syslog entries a day.  The
>messages were dumped into text files and once a minute the data was
>loaded and stored in the database.  Why a database?  *shrug* The only
>useful thing that we did was run Perl regexps stored in the DB over
>the data and send out custom alerts.  I suppose the baby puke colors
>for the web interface were aesthetically pleasing to some, but I never
>saw the attaction.
>
Yeah.... but the question is - why could you not run those same regexps
on a text file :-)

>
>You can design to use or not use any one particular data storage
>format.  What's nice about using a DB for this kind of stuff, however,
>is it's easily searchable via SQL, which, can be quite valuable.
>
Searchable for what? A regexp? I'll bet you my lunch, a simple grep on a
text file will be quicker :-)
And you don't need to vacuum it, and/or concern yourslef with writing
triggers ...


Dima

Re: Postgresql "FIFO" Tables, How-To ?

От

Tom Lane

Дата:

16 июля 2003 г., 22:27:33

Thomas Kellerer <spam_eater@gmx.net> writes:
> But isn't that exactly the problem? Once the sequence wraps around how do I
> know that id=1 is actually later then id=2 without a date column?

If you use an int8 sequence column, I doubt you need to worry about
wraparound.  A date column probably hasn't got enough resolution,
so the other workable approach is to use a timestamp column.  Ends up
costing 8 bytes either way.

            regards, tom lane

Re: Postgresql "FIFO" Tables, How-To ?

От

"Shridhar Daithankar"

Дата:

17 июля 2003 г., 06:54:16

On 16 Jul 2003 at 19:09, Kirill Ponazdyr wrote:
> It is for a advanced syslog server product we are currently developing.
>
> The very basic idea is to feed all syslog messages into a DB and allow
> easy monitoring and even correlation, we use Postgres as our DB Backend,
> in big environments the machine would be hit with dozens of syslog
> messages in a second and the messaging tables could grow out of controll
> pretty soon (We are talking of up to 10mil messages daily).

You are looking at wrong end of pipe with fifo approach.

What I would do.

1. Create an empty table.
2. Create a table stamped with date and time. Let's call it 'T'
3. Create rule on original table to direct all the processing to 'T'.
4. Do record counting in middleware application. A mutex in application would
be far less expensive than PG locking mechanism with disk.

When 'T' fills

 a. create another table 'T1',
 b. rewrite rules accordingly

and bingo you are done.

Do (a) and (b) in transaction and nobody would notice the difference.No vacuum
drag, no dead tuples nothing. And you are free to play with 'T' the way you
want, completely independently.

I am sure this will work barring relations between rules and transaction. I
don't know if rewriting rules is transaction safe or not. Most probably yes but
would like to confirm that.

Bye
 Shridhar

--
What's this script do?    unzip ; touch ; finger ; mount ; gasp ; yes ; umount
; sleepHint for the answer: not everything is computer-oriented. Sometimes
you'rein a sleeping bag, camping out.(Contributed by Frans van der Zande.)

Re: Postgresql "FIFO" Tables, How-To ?

От

"Kirill Ponazdyr"

Дата:

17 июля 2003 г., 11:43:14

> Searchable for what? A regexp? I'll bet you my lunch, a simple grep on a
> text file will be quicker :-)
> And you don't need to vacuum it, and/or concern yourslef with writing
> triggers ...

Well, simple Grep might find you a single occurance of the event, but as
soon as you try to correlate many events together, maybe set real time
alerts which are triggered by several events from different devices /
services within a certain timeframe, then you will really want some kind
of frontend. And that is what our product does (And quite a bit more).

But this is OT for this Mailing List :)

Anyway, I would like to thank everyone for great hints, I have certanly
got ideas which we will try and see how they works !

Best Regards

Kirill

--
Kirill Ponazdyr
Technical Director
Codeangels Solutions GmbH
Tel: +41 (0)43 844 90 10
Fax: +41 (0)43 844 90 12
Web: www.codeangels.com

Re: Postgresql "FIFO" Tables, How-To ?

От

Thomas Kellerer

Дата:

17 июля 2003 г., 13:25:02

Tom Lane schrieb:
> Thomas Kellerer <spam_eater@gmx.net> writes:
>
>>But isn't that exactly the problem? Once the sequence wraps around how do I
>>know that id=1 is actually later then id=2 without a date column?
>
>
> If you use an int8 sequence column, I doubt you need to worry about
> wraparound.  A date column probably hasn't got enough resolution,
> so the other workable approach is to use a timestamp column.  Ends up
> costing 8 bytes either way.
>

I'm aware of that, I was referring to Sean's comment:

 > The nifty thing about using a wrapping sequence is that the id's are
 > sequential across transactions, which correctly maps to the
 > progression of time, which obviates the need for relying on any kind
 > of a date column for doing syslog message ordering.

If you only use the id, you can't really tell the message ordering by
the ID as id=1 could well be inserted *after* id=2 due to the wrapping
of the sequence

Cheers
Thomas

Re: Postgresql "FIFO" Tables, How-To ?

От

Dennis Gearon

Дата:

17 июля 2003 г., 15:12:37

What you don't get is a look at the maximum number of records when T is full and T1 is half full. How will you pull out
halfof the records in T1 and half of the records in T? 

Shridhar Daithankar wrote:

> On 16 Jul 2003 at 19:09, Kirill Ponazdyr wrote:
>
>>It is for a advanced syslog server product we are currently developing.
>>
>>The very basic idea is to feed all syslog messages into a DB and allow
>>easy monitoring and even correlation, we use Postgres as our DB Backend,
>>in big environments the machine would be hit with dozens of syslog
>>messages in a second and the messaging tables could grow out of controll
>>pretty soon (We are talking of up to 10mil messages daily).
>
>
> You are looking at wrong end of pipe with fifo approach.
>
> What I would do.
>
> 1. Create an empty table.
> 2. Create a table stamped with date and time. Let's call it 'T'
> 3. Create rule on original table to direct all the processing to 'T'.
> 4. Do record counting in middleware application. A mutex in application would
> be far less expensive than PG locking mechanism with disk.
>
> When 'T' fills
>
>  a. create another table 'T1',
>  b. rewrite rules accordingly
>
> and bingo you are done.
>
> Do (a) and (b) in transaction and nobody would notice the difference.No vacuum
> drag, no dead tuples nothing. And you are free to play with 'T' the way you
> want, completely independently.
>
> I am sure this will work barring relations between rules and transaction. I
> don't know if rewriting rules is transaction safe or not. Most probably yes but
> would like to confirm that.
>
> Bye
>  Shridhar
>
> --
> What's this script do?    unzip ; touch ; finger ; mount ; gasp ; yes ; umount
> ; sleepHint for the answer: not everything is computer-oriented. Sometimes
> you'rein a sleeping bag, camping out.(Contributed by Frans van der Zande.)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>

Re: Postgresql "FIFO" Tables, How-To ?

От

"Shridhar Daithankar"

Дата:

18 июля 2003 г., 06:26:18

On 17 Jul 2003 at 8:15, Dennis Gearon wrote:

> What you don't get is a look at the maximum number of records when T is full and T1 is half full. How will you pull
outhalf of the records in T1 and half of the records in T? 

Woudln't a view do that or inheritance if indexing is not a problem?

Bye
 Shridhar

--
Slurm, n.:    The slime that accumulates on the underside of a soap bar when    it
sits in the dish too long.        -- Rich Hall, "Sniglets"

Application & Authentication

От

"Sean Mullen"

Дата:

18 июля 2003 г., 06:58:30

Hi,

I'm a bit concerned about a 'conceptual' issue.

I'm working on a web based app and am using postgresql
 users/groups for authentication, and using functions
/rules/views to ensure users only access/modify their own data.

i.e. many references to 'current_user'

However, in my travels around sf.net, freshmeat etc I haven't come
across any projects built like mine which is concerning to me.

Other projects I've seen use their app for authentication/security
and bypass/ignore the extremely 'useful' security system built into
postgresql and build their own security/authentication system.

I'm wondering if the reason for this is:

A) Necessity.
i.e. Their project frontends run on a mysql backend - and has
to do 'everything'

OR

B) There is some horrible limitation that is going to ruin my day down
the track

Thanks for any comments,

Sean

Re: Application & Authentication

От

"Shridhar Daithankar"

Дата:

18 июля 2003 г., 08:53:26

On 18 Jul 2003 at 16:58, Sean Mullen wrote:
> Other projects I've seen use their app for authentication/security
> and bypass/ignore the extremely 'useful' security system built into
> postgresql and build their own security/authentication system.
>
> I'm wondering if the reason for this is:
>
> A) Necessity.
> i.e. Their project frontends run on a mysql backend - and has
> to do 'everything'

That is a strong accusation.

> OR
>
> B) There is some horrible limitation that is going to ruin my day down
> the track

I designed a web app which needed authentication. However since my middleware
was using connection pooling, only way I could authenticate each user was via
pam.

Postgresql supports set session authorisation but while doing so it does not
accept password of new user.

So I was forced to use app. connecting to database as single user and
maintaining it's own authentication database. I had to give up access control
offered by postgresql..:-(

I raised this issue on hacker but it didn't achieve significance anytime. IMO
postgresql needs separate authentication APIs exposed to user where people can
use postgresql authentication in there system without using PAM etc.

Bye
 Shridhar

--
COBOL:    An exercise in Artificial Inelegance.

Re: Postgresql "FIFO" Tables, How-To ?

От

Florian Weimer

Дата:

18 июля 2003 г., 13:41:54

"Kirill Ponazdyr" <softlist@codeangels.com> writes:

> It is for a advanced syslog server product we are currently developing.
>
> The very basic idea is to feed all syslog messages into a DB and allow
> easy monitoring and even correlation, we use Postgres as our DB Backend,
> in big environments the machine would be hit with dozens of syslog
> messages in a second and the messaging tables could grow out of controll
> pretty soon (We are talking of up to 10mil messages daily).

We have something similar (with about 50 log entries written per
second).  I guess you too have got a time-based column which is
indexed.  This means that you'll run into the "creeping index
syndrome" (as far as I understand it, pages in the index are never
reused because your column values are monotonically increasing).
Expiring old rows is a problem, too.  We now experiment with per-day
tables, which makes expire rather cheep and avoids growing indices.
And backup is much easier, too.

If you have some time for toying with different ideas, you might want
to look at TelegraphCQ.

Re: Application & Authentication

От

"Chris Travers"

Дата:

19 июля 2003 г., 01:56:31

When I started writing Hermes (http://hermesweb.sourceforge.net), I was
faced with this problem as well.  I wanted to support both MySQL (because it
is widely supported) but provide a flexible way of supporting some of the
more advanced features of PostgreSQL.  You might find my project refreshing
if you are looking for something using native database accounts, but it is
not a small program.  I settled on doing the following:

1:  A pluggable authenticaiton module system so that I could authenticate
using database native accounts without worrying that maybe someday I would
be limited to one account and have to "make do."

2:  A full database abstraction model which reduces most of the operations
to a (mostly) ANSI 92 complient database.

3:  Permissions administration abstraction layer which I could use to wrap
the fact that MySQL does not support roles or groups (what a horrid hack I
used for MySQL ;)).

Here are the issues I think most people are facing with these applications:
1:  Most hosted solutions do not offer a dedicated instance of the database
manager so that user accounts are global and the subscriber cannot create
them.
2:  No easy way for users to change their passwords without being able to
change anyone else's (if the application can be bypassed).  A stored
procedure in PostgreSQL should solve this issue.
3:  Supporting multiple database backends with different user permissions
systems can result in both QA problems and much extra work.

But here is what database native accounts give you:
1:  Enforcing the permissions as far "back" as possible so that they cannot
be bypassed by a simple application exploit.
2:  More extensible user account management.
3:  If properly managed, credential separation is possible so that even a
web server compromise is not sufficient alone to access the database.

Anyway, my $0.02,
Chris Travers

----- Original Message -----
From: "Sean Mullen" <smullen@optusnet.com.au>
To: "'pg_general'" <pgsql-general@postgresql.org>
Sent: Thursday, July 17, 2003 11:58 PM
Subject: [GENERAL] Application & Authentication


> Hi,
>
> I'm a bit concerned about a 'conceptual' issue.
>
> I'm working on a web based app and am using postgresql
>  users/groups for authentication, and using functions
> /rules/views to ensure users only access/modify their own data.
>
> i.e. many references to 'current_user'
>
> However, in my travels around sf.net, freshmeat etc I haven't come
> across any projects built like mine which is concerning to me.
>
> Other projects I've seen use their app for authentication/security
> and bypass/ignore the extremely 'useful' security system built into
> postgresql and build their own security/authentication system.
>
> I'm wondering if the reason for this is:
>
> A) Necessity.
> i.e. Their project frontends run on a mysql backend - and has
> to do 'everything'
>
> OR
>
> B) There is some horrible limitation that is going to ruin my day down
> the track
>
> Thanks for any comments,
>
> Sean
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Postgresql "FIFO" Tables, How-To ?