Re: Can Postgres Not Do This Safely ?!?
От | Andy Colson |
---|---|
Тема | Re: Can Postgres Not Do This Safely ?!? |
Дата | |
Msg-id | 4CCB4344.3010602@squeakycode.net обсуждение исходный текст |
Ответ на | Re: Can Postgres Not Do This Safely ?!? (Merlin Moncure <mmoncure@gmail.com>) |
Ответы |
Re: Can Postgres Not Do This Safely ?!?
(Jeff Davis <pgsql@j-davis.com>)
|
Список | pgsql-general |
On 10/29/2010 9:49 AM, Merlin Moncure wrote: > On Thu, Oct 28, 2010 at 10:04 PM, Karl Pickett<karl.pickett@gmail.com> wrote: >> Hello Postgres Hackers, >> >> We have a simple 'event log' table that is insert only (by multiple >> concurrent clients). It has an integer primary key. We want to do >> incremental queries of this table every 5 minutes or so, i.e. "select >> * from events where id> LAST_ID_I_GOT" to insert into a separate >> reporting database. The problem is, this simple approach has a race >> that will forever skip uncommitted events. I.e., if 5000 was >> committed sooner than 4999, and we get 5000, we will never go back and >> get 4999 when it finally commits. How can we solve this? Basically >> it's a phantom row problem but it spans transactions. >> >> I looked at checking the internal 'xmin' column but the docs say that >> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit >> value. I don't get it. All I want to is make sure I skip over any >> rows that are newer than the oldest currently running transaction. >> Has nobody else run into this before? > > You don't have a sequence problem so much as a wrong implementation > problem. Sequences are always *grabbed* in order but they can hit the > table out of order and there is a time lag between when the sequence > value is generated and the transaction commits. If I issue 'begin', > insert a log record, and hold the commit for 5 minutes you are going > to skip the record because you are only looking at the last processed > record. Your algorithm is going to fail if you use a sequence, > timestamp, or gapless sequence to manage your queue position. You > need to divide your log records into two logical sets, procesed and > unprocessed, and look at the set as a whole. > > I would suggest staging your unprocessed records to a queue table and > having your writer consume them and move them to a processed table. > You can also look at already built queuing implementations like PGQ > written by our spectacularly skilled friends at Skype (haven't used it > myself, but I've heard it's good!). > > merlin > Yep, you dont want a sequence. You want a flag. add a boolean "processed" flag, default it to false. then every 5 minutes run this: begin insert into logged select * from events where processed = false; update events set processed = true where processed = false; commit; or, if you want to select them and do something to them: begin select * from events where processed = false; ... do you processing on each, which would include inserting it... update events set processed = true where processed = false; commit; Just make sure you do it all in the same transaction, so the update sees the exact same set as the select. You could also create a function index on processed to keep track of just those that are false. -Andy
В списке pgsql-general по дате отправления:
Предыдущее
От: Guillaume LelargeДата:
Сообщение: Re: Unhandled exception in PGAdmin when opening 16-million-record table