Обсуждение: Re: Bug #613: Sequence values fall back to previously chec

Поиск
Список
Период
Сортировка

Re: Bug #613: Sequence values fall back to previously chec

От
"Tom Pfau"
Дата:
I don't fully understand the xlog files or WAL records but...

Why isn't the writing of the WAL record based on the CACHE value of the
sequence?  If a request to nextval() can't be satisfied by the cache,
the sequence on disk should be updated resulting in a WAL record being
written.

If two sessions are accessing the sequence, they will likely end up not
writing sequential values as they have both taken a chunk of values by
calling nextval() but the effect of this could be controlled by the user
by selecting an acceptable value for CACHE.  If I don't mind having
session 1 write records 1-10 while session 2 interleaves those with
records 11-20, I should set my cache to 10.  If I want my id's to map to
insertion time as closely as possible I should set the cache lower (or
NOCACHE, is that an option?).

I'm concerned that the discussion here has been of the opinion that
since no records were written to the database using the value retrieved
from the sequence that no damage has been done.  We are using database
sequences to get unique identifiers for things outside the database.  If
a sequence could ever under any circumstances reissue a value, this
could be damaging to the integrity of our software.


Re: Bug #613: Sequence values fall back to previously chec

От
Дата:
On Thu, 14 Mar 2002, Tom Pfau wrote:

> I don't fully understand the xlog files or WAL records but...
> 
> Why isn't the writing of the WAL record based on the CACHE value of the
> sequence?  If a request to nextval() can't be satisfied by the cache,
> the sequence on disk should be updated resulting in a WAL record being
> written.
>
> If two sessions are accessing the sequence, they will likely end up not
> writing sequential values as they have both taken a chunk of values by
> calling nextval() but the effect of this could be controlled by the user
> by selecting an acceptable value for CACHE.  


I was thinking that too, just leave it up to the end user to decide the
level of performance gain they want.  But the cool thing about writing
ahead to the WAL when compared to CACHE is that the on disk copy is
advanced ahead of the cached value so that if you have a cache value of
1 you still get time ordered sequences from multiple backends AND you're
only writing to the WAL once every 32 nextval's -- though the write
ahead should really be based on a multiple of CACHE since if your cache
value is 32 then you're not getting any benefit from the WAL savings.

> If I don't mind having
> session 1 write records 1-10 while session 2 interleaves those with
> records 11-20, I should set my cache to 10.  If I want my id's to map to
> insertion time as closely as possible I should set the cache lower (or
> NOCACHE, is that an option?).

The CACHE value defaults to 1 which means no cache (each time nextval is 
called it only grabs 1 value).  

> I'm concerned that the discussion here has been of the opinion that
> since no records were written to the database using the value retrieved
> from the sequence that no damage has been done.  We are using database
> sequences to get unique identifiers for things outside the database.  If
> a sequence could ever under any circumstances reissue a value, this
> could be damaging to the integrity of our software.

Absolutely, we use sequences the same way.  And the problem exhibits
itself regardless of whether data is being inserted or not, and
independantly of CACHE value.  So this has to be fixed for both
scenarios.

-- Ben



Re: Bug #613: Sequence values fall back to previously chec

От
Tom Lane
Дата:
"Tom Pfau" <T.Pfau@emCrit.com> writes:
> I'm concerned that the discussion here has been of the opinion that
> since no records were written to the database using the value retrieved
> from the sequence that no damage has been done.

Um, you certainly didn't hear me saying that ;-)

There are two different bugs involved here.  One is the no-WAL-flush-
if-transaction-is-only-nextval problem.  AFAIK everyone agrees we must
fix that.  The other issue is this business about "logging ahead"
(to reduce the number of WAL records written) not interacting correctly
with checkpoints.  What we're arguing about is exactly how to fix that
part.

> We are using database
> sequences to get unique identifiers for things outside the database.  If
> a sequence could ever under any circumstances reissue a value, this
> could be damaging to the integrity of our software.

If you do a SELECT nextval() and then use the returned value externally
*without waiting for a commit acknowledgement*, then I think you are
risking trouble; there's no guarantee that the WAL record (if one is
needed) has hit disk yet, and so a crash could roll back the sequence.

This isn't an issue for a SELECT nextval() standing on its own ---
AFAIK the result will not be transmitted to the client until after the
commit happens.  But it would be an issue for a select executed inside
a transaction block (begin/commit).
        regards, tom lane


Re: Bug #613: Sequence values fall back to previously chec

От
Ben Grimm
Дата:
On Thu, 14 Mar 2002, Tom Lane wrote:
> 
> If you do a SELECT nextval() and then use the returned value externally
> *without waiting for a commit acknowledgement*, then I think you are
> risking trouble; there's no guarantee that the WAL record (if one is
> needed) has hit disk yet, and so a crash could roll back the sequence.
> 
> This isn't an issue for a SELECT nextval() standing on its own ---
> AFAIK the result will not be transmitted to the client until after the
> commit happens.  But it would be an issue for a select executed inside
> a transaction block (begin/commit).
> 

The behavior of SELECT nextval() should not be conditional on being in or 
out of a transaction block.  What you implying by saying that the data is 
that it would be possible to rollback an uncommited call to nextval().
Am I missing some terminology?  

I think I finally realized why my old patch that forces a log right off 
the bat works to fix at least part of the problem.  When the database 
is shutdown properly all of the sequences that are in memory are written 
back to disk in their state at that time.   But the problem with that is
that their state at that time can have log_cnt > 0.  This is why after
startup that the sequence in memory is 'behind' the one on disk, the 
code sees log > fetch and doesn't log.  When you really think about it
log_cnt should not be part of the sequence record at all since there
is never a valid case for storing a log_cnt on disk with a value other 
than 0.  

Maybe the purpose for the on disk value of log_cnt should be changed?
It could be the value used in place of the static SEQ_LOG_VALS, which 
could then be definable on a per sequence basis.  And then log_cnt 
could be moved into elm->log_cnt.  Anyway, that's just a thought.

Here's my latest patch to work around the problem.  If there is a way 
to prevent log_cnt from being written out with a value greater than
zero, that would be better than this.  With this behavior log_cnt is 
reset to 0 each time a backend accesses a sequence for the first time.
That's probably overkill...  But I still believe that the XLogFlush 
after XLogInsert is necessary to ensure that the WAL value is written 
to disk immediately.  In my testing this patch works fine, YMMV.

-- Ben



*** src/backend/commands/sequence.c.orig        Tue Mar 12 18:58:55 2002
--- src/backend/commands/sequence.c     Thu Mar 14 17:34:25 2002
***************
*** 62,67 ****
--- 62,68 ----       int64           cached;       int64           last;       int64           increment;
+       bool            reset_logcnt;       struct SeqTableData *next; } SeqTableData;

***************
*** 270,275 ****
--- 271,277 ----
               PageSetLSN(page, recptr);               PageSetSUI(page, ThisStartUpID);
+               XLogFlush(recptr);       }       END_CRIT_SECTION();

***************
*** 314,321 ****               PG_RETURN_INT64(elm->last);       }

!       seq = read_info("nextval", elm, &buf);          /* lock page' buffer and
!                                                                                                * read tuple */
       last = next = result = seq->last_value;       incby = seq->increment_by;
--- 316,322 ----               PG_RETURN_INT64(elm->last);       }

!       seq = read_info("nextval", elm, &buf);          /* lock page' buffer and read tuple */
       last = next = result = seq->last_value;       incby = seq->increment_by;
***************
*** 331,339 ****               log--;       }

!       if (log < fetch)       {
!               fetch = log = fetch - log + SEQ_LOG_VALS;               logit = true;       }

--- 332,340 ----               log--;       }

!       if (log < fetch)       {
!               fetch = log = fetch - log + SEQ_LOG_VALS * cache;               logit = true;       }

***************
*** 403,409 ****               rdata[0].data = (char *) &xlrec;               rdata[0].len = sizeof(xl_seq_rec);
      rdata[0].next = &(rdata[1]);
 
!               seq->last_value = next;               seq->is_called = true;               seq->log_cnt = 0;
--- 404,410 ----               rdata[0].data = (char *) &xlrec;               rdata[0].len = sizeof(xl_seq_rec);
      rdata[0].next = &(rdata[1]);
 
!               seq->last_value = next;               seq->is_called = true;               seq->log_cnt = 0;
***************
*** 417,423 ****
               PageSetLSN(page, recptr);               PageSetSUI(page, ThisStartUpID);
!               if (fetch)                              /* not all numbers were fetched */                       log -=
fetch;      }
 
--- 418,425 ----
               PageSetLSN(page, recptr);               PageSetSUI(page, ThisStartUpID);
!               XLogFlush(recptr);
!               if (fetch)                              /* not all numbers were fetched */                       log -=
fetch;      }
 
***************
*** 507,513 ****               XLogRecPtr      recptr;               XLogRecData rdata[2];               Page
page = BufferGetPage(buf);
 
!               xlrec.node = elm->rel->rd_node;               rdata[0].buffer = InvalidBuffer;
rdata[0].data= (char *) &xlrec;
 
--- 509,516 ----               XLogRecPtr      recptr;               XLogRecData rdata[2];               Page
page = BufferGetPage(buf);
 
!
!               xlrec.node = elm->rel->rd_node;               rdata[0].buffer = InvalidBuffer;
rdata[0].data= (char *) &xlrec;
 
***************
*** 527,532 ****
--- 530,536 ----
               PageSetLSN(page, recptr);               PageSetSUI(page, ThisStartUpID);
+               XLogFlush(recptr);       }       /* save info in sequence relation */       seq->last_value = next;
   /* last fetched number */
 
***************
*** 660,665 ****
--- 664,674 ----
       seq = (Form_pg_sequence) GETSTRUCT(&tuple);

+       if (elm->reset_logcnt)
+       {
+               seq->log_cnt = 0;
+               elm->reset_logcnt = false;
+       }       elm->increment = seq->increment_by;
       return seq;
***************
*** 703,708 ****
--- 712,718 ----                                name, caller);                       elm->relid =
RelationGetRelid(seqrel);                      elm->cached = elm->last = elm->increment = 0;
 
+                       elm->reset_logcnt = true;               }       }       else
***************
*** 721,726 ****
--- 731,737 ----               elm->relid = RelationGetRelid(seqrel);               elm->cached = elm->last =
elm->increment= 0;               elm->next = (SeqTable) NULL;
 
+               elm->reset_logcnt = true;
               if (seqtab == (SeqTable) NULL)                       seqtab = elm;





Re: Bug #613: Sequence values fall back to previously chec

От
Tom Lane
Дата:
Ben Grimm <bgrimm@zaeon.com> writes:
> The behavior of SELECT nextval() should not be conditional on being in or 
> out of a transaction block.

Nonsense.  The behavior of INSERT or UPDATE is "conditional" in exactly
the same way: you should not rely on the reported result until it's
committed.

Given Vadim's performance concerns, I doubt he'll hold still for forcing
an XLogFlush immediately every time a sequence XLOG record is written
-- but AFAICS that'd be the only way to guarantee durability of a
nextval result in advance of commit.  Since I don't think that's an
appropriate goal for the system to have, I don't care for it either.

I'm planning to try coding up Vadim's approach (pay attention to page's
old LSN to see if a WAL record must be generated) tonight or tomorrow
and see if it seems reasonable.
        regards, tom lane


Re: [HACKERS] Bug #613: Sequence values fall back to previously chec

От
"Dave Cramer"
Дата:
I noticed a message asking if this scenario was consistent with the
other reports, and yes it is. We have seen this occuring on our system
with versions as old as 7.0.

Glad to see someone has finally nailed this one.

Dave