Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.119
diff -c -r1.119 config.sgml
*** doc/src/sgml/config.sgml 2 Apr 2007 15:27:02 -0000 1.119
--- doc/src/sgml/config.sgml 5 Apr 2007 21:51:18 -0000
***************
*** 1314,1372 ****
Settings
!
! fsync> configuration parameter
- fsync (boolean)
! If this parameter is on, the PostgreSQL> server
! will try to make sure that updates are physically written to
! disk, by issuing fsync()> system calls or various
! equivalent methods (see ).
! This ensures that the database cluster can recover to a
! consistent state after an operating system or hardware crash.
!
!
!
! However, using fsync results in a
! performance penalty: when a transaction is committed,
! PostgreSQL must wait for the
! operating system to flush the write-ahead log to disk. When
! fsync is disabled, the operating system is
! allowed to do its best in buffering, ordering, and delaying
! writes. This can result in significantly improved performance.
! However, if the system crashes, the results of the last few
! committed transactions might be lost in part or whole. In the
! worst case, unrecoverable data corruption might occur.
! (Crashes of the database software itself are not>
! a risk factor here. Only an operating-system-level crash
! creates a risk of corruption.)
!
!
!
! Due to the risks involved, there is no universally correct
! setting for fsync. Some administrators
! always disable fsync, while others only
! turn it off during initial bulk data loads, where there is a clear
! restart point if something goes wrong. Others
! always leave fsync enabled. The default is
! to enable fsync, for maximum reliability.
! If you trust your operating system, your hardware, and your
! utility company (or your battery backup), you can consider
! disabling fsync.
! This parameter can only be set in the postgresql.conf>
! file or on the server command line.
! If you turn this parameter off, also consider turning off
! .
!
wal_sync_method (string)
--- 1314,1344 ----
Settings
!
! wal_buffers (integer)
! wal_buffers> configuration parameter
! The amount of memory used in shared memory for WAL data. The
! default is 64 kilobytes (64kB>). The setting need only
! be large enough to hold the amount of WAL data generated by one
! typical transaction, since the data is written out to disk at
! every transaction commit. This parameter can only be set at server
! start.
! Increasing this parameter might cause PostgreSQL>
! to request more System V> shared
! memory than your operating system's default configuration
! allows. See for information on how to
! adjust those parameters, if necessary.
!
wal_sync_method (string)
***************
*** 1445,1451 ****
Turning this parameter off speeds normal operation, but
might lead to a corrupt database after an operating system crash
or power failure. The risks are similar to turning off
! fsync>, though smaller. It might be safe to turn off
this parameter if you have hardware (such as a battery-backed disk
controller) or file-system software that reduces
the risk of partial page writes to an acceptably low level (e.g., ReiserFS 4).
--- 1417,1423 ----
Turning this parameter off speeds normal operation, but
might lead to a corrupt database after an operating system crash
or power failure. The risks are similar to turning off
! transaction_guarantee>. It might be safe to turn off
this parameter if you have hardware (such as a battery-backed disk
controller) or file-system software that reduces
the risk of partial page writes to an acceptably low level (e.g., ReiserFS 4).
***************
*** 1465,1495 ****
!
! wal_buffers (integer)
! wal_buffers> configuration parameter
! The amount of memory used in shared memory for WAL data. The
! default is 64 kilobytes (64kB>). The setting need only
! be large enough to hold the amount of WAL data generated by one
! typical transaction, since the data is written out to disk at
! every transaction commit. This parameter can only be set at server
! start.
! Increasing this parameter might cause PostgreSQL>
! to request more System V> shared
! memory than your operating system's default configuration
! allows. See for information on how to
! adjust those parameters, if necessary.
!
commit_delay (integer)
--- 1437,1546 ----
!
! wal_writer_delay> configuration parameter
+ fsync (integer)
! If this parameter greater than zero, the PostgreSQL>
! server will start a separate server process called the
! WAL writer>, whose sole function is to issue writes
! of dirty> WAL buffers. This enables functionality
! that is new in PostgreSQL> 8.3, that replaces
! and supercedes the previous fsync parameter.
! (see ).
!
!
!
! The WAL Writer will flush the write-ahead log to disk every
! wal_writer_delay milliseconds. Typical
! settings would be in the range 50ms - 250ms, though the
! allowed range is from 0 - 1000ms. The default value is 0,
! meaning this process is disabled by default. Note that on many
! systems, the effective resolution of sleep delays is 10
! milliseconds; setting wal_writer_delay> to a value
! that is not a multiple of 10 might have the same results as
! setting it to the next higher multiple of 10. This parameter
! can only be set in the postgresql.conf> file or on
! the server command line.
!
!
!
!
!
!
! transaction_guarantee> configuration parameter
!
! transaction_guarantee (boolean)
!
!
! By default, PostgreSQL
! will try to make sure that updates are physically written to
! disk, by issuing fsync()> system calls or various
! equivalent methods (see ).
! This ensures that the database cluster can recover to a
! consistent state after an operating system or hardware crash.
! However, using transaction_guarantee results
! in a performance penalty: when a transaction is committed,
! PostgreSQL must wait for the
! operating system to flush the write-ahead log to disk. When
! transaction_guarantee is disabled, the user's
! process can start the next transaction. This can result in
! significantly improved performance for single or multiple sessions
! executing reasonably short write transactions.
! However, if the system crashes, the results of the last few
! committed transactions will very likely be lost in part or whole.
+
+
+ The data loss from using this parameter is the number of
+ unguaranteed transactions that were committed within the last
+ wal_writer_delay> milliseconds. The data loss
+ is both certain and predictable. Unguaranteed transactions
+ that have not been written to WAL files will definitely be lost,
+ there is no maybe. Only those effected transactions will be lost
+ and the rest of the system will be in a safe, consistent state.
+ This parameter can only be disabled when wal_writer_delay>
+ is set to a value higher than zero.
+
+
+
+ It is safe to use a mix of transactions with
+ transaction_guarantee> on and off. Only the
+ transaction_guarantee> = off transactions will be
+ at risk. In no circumstances will the
+ transaction_guarantee> = on transactions be at risk.
+ Any changes made by an unguaranteed transaction may be readable
+ later by guaranteed transactions, but the guaranteed commit will
+ also always flush the commit of the unguaranteed transaction -
+ so guaranteed transactions live up to their name.
+ The parameter affects transaction commits only, not aborts.
+ It also has no effect on most utility commands such as VACUUM FULL
+ and other commands that will not run inside a transaction block.
+ Any transaction that causes files to be deleted will always
+ be a guaranteed transaction.
+
+
+
+ This parameter can be set in postgresql.conf>, though
+ is better specified only for those users or sessions to which
+ the potential data loss is acceptable. General disabling of this
+ parameter is not recommended unless you've explained to the system
+ owner the full implications of their decision to use this feature.
+
+
+
+ There is no legal meaning to the phrase guaranteed>
+ and the terms of the PostgreSQL> licence remain unchanged.
+
!
commit_delay (integer)
Index: doc/src/sgml/wal.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/wal.sgml,v
retrieving revision 1.43
diff -c -r1.43 wal.sgml
*** doc/src/sgml/wal.sgml 31 Jan 2007 20:56:19 -0000 1.43
--- doc/src/sgml/wal.sgml 5 Apr 2007 21:51:19 -0000
***************
*** 267,273 ****
performing a LogFlush. This delay allows other
server processes to add their commit records to the log so as to have all
of them flushed with a single log sync. No sleep will occur if
!
is not enabled, nor if fewer than
other sessions are currently in active transactions; this avoids
sleeping when it's unlikely that any other session will commit soon.
--- 267,273 ----
performing a LogFlush. This delay allows other
server processes to add their commit records to the log so as to have all
of them flushed with a single log sync. No sleep will occur if
!
is not enabled, nor if fewer than
other sessions are currently in active transactions; this avoids
sleeping when it's unlikely that any other session will commit soon.
Index: doc/src/sgml/ref/postgres-ref.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ref/postgres-ref.sgml,v
retrieving revision 1.50
diff -c -r1.50 postgres-ref.sgml
*** doc/src/sgml/ref/postgres-ref.sgml 16 Feb 2007 02:10:07 -0000 1.50
--- doc/src/sgml/ref/postgres-ref.sgml 5 Apr 2007 21:51:21 -0000
***************
*** 183,189 ****
Disables fsync calls for improved
performance, at the risk of data corruption in the event of a
system crash. Specifying this option is equivalent to
! disabling the configuration
parameter. Read the detailed documentation before using this!
--- 183,189 ----
Disables fsync calls for improved
performance, at the risk of data corruption in the event of a
system crash. Specifying this option is equivalent to
! disabling the configuration
parameter. Read the detailed documentation before using this!
Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.42
diff -c -r1.42 clog.c
*** src/backend/access/transam/clog.c 5 Jan 2007 22:19:23 -0000 1.42
--- src/backend/access/transam/clog.c 5 Apr 2007 21:51:21 -0000
***************
*** 79,85 ****
* for most uses; TransactionLogUpdate() in transam.c is the intended caller.
*/
void
! TransactionIdSetStatus(TransactionId xid, XidStatus status)
{
int pageno = TransactionIdToPage(xid);
int byteno = TransactionIdToByte(xid);
--- 79,85 ----
* for most uses; TransactionLogUpdate() in transam.c is the intended caller.
*/
void
! TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn)
{
int pageno = TransactionIdToPage(xid);
int byteno = TransactionIdToByte(xid);
***************
*** 94,99 ****
--- 94,112 ----
LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
+ /*
+ * SimpleLruReadPage() calls SlruSelectLRUPage() which
+ * never returns until I/O has finished on a page. All I/O
+ * starts by holding Control lock, so this next call never
+ * returns until we have completed all I/O on the block.
+ * This assumption is important because unguaranteed
+ * transaction commits must *never* reach disk until
+ * XLogFlush() confirms flush. Allowing a page write
+ * concurrently with writing to the page might allow the
+ * committed status to reach disk ahead of a flush, so
+ * for unguaranteed transactions it is important that we
+ * never allow this to occur. Got that?
+ */
slotno = SimpleLruReadPage(ClogCtl, pageno, xid);
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
***************
*** 110,115 ****
--- 123,138 ----
ClogCtl->shared->page_dirty[slotno] = true;
+ /*
+ * Update the page LSN if the transaction completion LSN is higher.
+ * lsn will be invalid when supplied during InRecovery processing,
+ * so we don't need to do anything special to avoid LSN updates
+ * during recovery. After recovery completes the next clog change
+ * will set the LSN correctly.
+ */
+ if (XLByteLT(ClogCtl->shared->page_lsn[slotno], lsn))
+ ClogCtl->shared->page_lsn[slotno] = lsn;
+
LWLockRelease(CLogControlLock);
}
***************
*** 157,162 ****
--- 180,187 ----
ClogCtl->PagePrecedes = CLOGPagePrecedes;
SimpleLruInit(ClogCtl, "CLOG Ctl", NUM_CLOG_BUFFERS,
CLogControlLock, "pg_clog");
+ ClogCtl->do_wal_flush = true;
+
}
/*
Index: src/backend/access/transam/multixact.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.23
diff -c -r1.23 multixact.c
*** src/backend/access/transam/multixact.c 5 Jan 2007 22:19:23 -0000 1.23
--- src/backend/access/transam/multixact.c 5 Apr 2007 21:51:23 -0000
***************
*** 418,423 ****
--- 418,486 ----
}
/*
+ * MultiXactIdIsFlushed
+ * Returns whether a MultiXactId is "flushed".
+ *
+ * We return false if at least one member of the given MultiXactId is not yet
+ * flushed. Note that a "true" result is certain not to change,
+ * because it is not legal to add members to an existing MultiXactId.
+ */
+ bool
+ MultiXactIdIsFlushed(MultiXactId multi)
+ {
+ TransactionId *members;
+ int nmembers;
+ int i;
+
+ debug_elog3(DEBUG2, "IsFlushed %u?", multi);
+
+ nmembers = GetMultiXactIdMembers(multi, &members);
+
+ if (nmembers < 0)
+ {
+ debug_elog2(DEBUG2, "IsFlushed: no members");
+ return true;
+ }
+
+ /*
+ * Checking for myself is cheap compared to looking in shared memory,
+ * so first do the equivalent of MultiXactIdIsCurrent(). This is not
+ * needed for correctness, it's just a fast path.
+ */
+ for (i = 0; i < nmembers; i++)
+ {
+ if (TransactionIdIsCurrentTransactionId(members[i]))
+ {
+ debug_elog3(DEBUG2, "IsFlushed: I (%d) am running! So not flushed", i);
+ pfree(members);
+ return false;
+ }
+ }
+
+ /*
+ * This could be made faster by having another entry point in procarray.c,
+ * walking the flushed array only once for all the members. But in most
+ * cases nmembers should be small enough that it doesn't much matter.
+ */
+ for (i = 0; i < nmembers; i++)
+ {
+ if (!TransactionIdIsFlushed(members[i]))
+ {
+ debug_elog4(DEBUG2, "IsFlushed: member %d (%u) is not flushed",
+ i, members[i]);
+ pfree(members);
+ return false;
+ }
+ }
+
+ pfree(members);
+
+ debug_elog3(DEBUG2, "IsFlushed: %u is flushed", multi);
+
+ return true;
+ }
+
+ /*
* MultiXactIdIsCurrent
* Returns true if the current transaction is a member of the MultiXactId.
*
Index: src/backend/access/transam/slru.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/slru.c,v
retrieving revision 1.40
diff -c -r1.40 slru.c
*** src/backend/access/transam/slru.c 5 Jan 2007 22:19:23 -0000 1.40
--- src/backend/access/transam/slru.c 5 Apr 2007 21:51:24 -0000
***************
*** 161,166 ****
--- 161,167 ----
sz += MAXALIGN(nslots * sizeof(char *)); /* page_buffer[] */
sz += MAXALIGN(nslots * sizeof(SlruPageStatus)); /* page_status[] */
sz += MAXALIGN(nslots * sizeof(bool)); /* page_dirty[] */
+ sz += MAXALIGN(nslots * sizeof(XLogRecPtr));/* page_lsn[] */
sz += MAXALIGN(nslots * sizeof(int)); /* page_number[] */
sz += MAXALIGN(nslots * sizeof(int)); /* page_lru_count[] */
sz += MAXALIGN(nslots * sizeof(LWLockId)); /* buffer_locks[] */
***************
*** 206,211 ****
--- 207,214 ----
offset += MAXALIGN(nslots * sizeof(SlruPageStatus));
shared->page_dirty = (bool *) (ptr + offset);
offset += MAXALIGN(nslots * sizeof(bool));
+ shared->page_lsn = (XLogRecPtr *) (ptr + offset);
+ offset += MAXALIGN(nslots * sizeof(XLogRecPtr));
shared->page_number = (int *) (ptr + offset);
offset += MAXALIGN(nslots * sizeof(int));
shared->page_lru_count = (int *) (ptr + offset);
***************
*** 219,224 ****
--- 222,229 ----
shared->page_buffer[slotno] = ptr;
shared->page_status[slotno] = SLRU_PAGE_EMPTY;
shared->page_dirty[slotno] = false;
+ shared->page_lsn[slotno].xlogid = 0;
+ shared->page_lsn[slotno].xrecoff = 0;
shared->page_lru_count[slotno] = 0;
shared->buffer_locks[slotno] = LWLockAssign();
ptr += BLCKSZ;
***************
*** 232,238 ****
* assume caller set PagePrecedes.
*/
ctl->shared = shared;
! ctl->do_fsync = true; /* default behavior */
StrNCpy(ctl->Dir, subdir, sizeof(ctl->Dir));
}
--- 237,244 ----
* assume caller set PagePrecedes.
*/
ctl->shared = shared;
! ctl->do_fsync = true; /* default behavior */
! ctl->do_wal_flush = false; /* default behavior */
StrNCpy(ctl->Dir, subdir, sizeof(ctl->Dir));
}
***************
*** 620,625 ****
--- 626,643 ----
int offset = rpageno * BLCKSZ;
char path[MAXPGPATH];
int fd = -1;
+ XLogRecPtr lsn = shared->page_lsn[slotno];
+
+ /*
+ * Honour the write-WAL-before-data guarantee if we care about that the
+ * integrity of the slru page to be protected across a crash. This will
+ * return almost immediately except in rare cases where we have
+ * unguaranteed transactions not yet flushed because normal commits
+ * do an XLogFlush before updating clog. This is the same step as we do
+ * during FlushBuffer() in the main shared buffer manager.
+ */
+ if (ctl->do_wal_flush && !XLogRecPtrIsInvalid(lsn))
+ XLogFlush(lsn);
/*
* During a Flush, we may already have the desired file open.
Index: src/backend/access/transam/transam.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/transam.c,v
retrieving revision 1.69
diff -c -r1.69 transam.c
*** src/backend/access/transam/transam.c 5 Jan 2007 22:19:23 -0000 1.69
--- src/backend/access/transam/transam.c 5 Apr 2007 21:51:25 -0000
***************
*** 27,33 ****
static XidStatus TransactionLogFetch(TransactionId transactionId);
static void TransactionLogUpdate(TransactionId transactionId,
! XidStatus status);
/* ----------------
* Single-item cache for results of TransactionLogFetch.
--- 27,33 ----
static XidStatus TransactionLogFetch(TransactionId transactionId);
static void TransactionLogUpdate(TransactionId transactionId,
! XidStatus status, XLogRecPtr lsn);
/* ----------------
* Single-item cache for results of TransactionLogFetch.
***************
*** 97,108 ****
*/
static void
TransactionLogUpdate(TransactionId transactionId, /* trans id to update */
! XidStatus status) /* new trans status */
{
/*
* update the commit log
*/
! TransactionIdSetStatus(transactionId, status);
}
/*
--- 97,109 ----
*/
static void
TransactionLogUpdate(TransactionId transactionId, /* trans id to update */
! XidStatus status, /* new trans status */
! XLogRecPtr lsn) /* lsn of transaction completion */
{
/*
* update the commit log
*/
! TransactionIdSetStatus(transactionId, status, lsn);
}
/*
***************
*** 112,125 ****
* Don't depend on this being atomic; it's not.
*/
static void
! TransactionLogMultiUpdate(int nxids, TransactionId *xids, XidStatus status)
{
int i;
Assert(nxids != 0);
for (i = 0; i < nxids; i++)
! TransactionIdSetStatus(xids[i], status);
}
/* ----------------------------------------------------------------
--- 113,126 ----
* Don't depend on this being atomic; it's not.
*/
static void
! TransactionLogMultiUpdate(int nxids, TransactionId *xids, XidStatus status, XLogRecPtr lsn)
{
int i;
Assert(nxids != 0);
for (i = 0; i < nxids; i++)
! TransactionIdSetStatus(xids[i], status, lsn);
}
/* ----------------------------------------------------------------
***************
*** 267,275 ****
* Assumes transaction identifier is valid.
*/
void
! TransactionIdCommit(TransactionId transactionId)
{
! TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED);
}
/*
--- 268,276 ----
* Assumes transaction identifier is valid.
*/
void
! TransactionIdCommit(TransactionId transactionId, XLogRecPtr lsn)
{
! TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED, lsn);
}
/*
***************
*** 280,288 ****
* Assumes transaction identifier is valid.
*/
void
! TransactionIdAbort(TransactionId transactionId)
{
! TransactionLogUpdate(transactionId, TRANSACTION_STATUS_ABORTED);
}
/*
--- 281,289 ----
* Assumes transaction identifier is valid.
*/
void
! TransactionIdAbort(TransactionId transactionId, XLogRecPtr lsn)
{
! TransactionLogUpdate(transactionId, TRANSACTION_STATUS_ABORTED, lsn);
}
/*
***************
*** 293,299 ****
void
TransactionIdSubCommit(TransactionId transactionId)
{
! TransactionLogUpdate(transactionId, TRANSACTION_STATUS_SUB_COMMITTED);
}
/*
--- 294,302 ----
void
TransactionIdSubCommit(TransactionId transactionId)
{
! XLogRecPtr lsn = {0,0}; /* Invalid XLogRecPtr */
!
! TransactionLogUpdate(transactionId, TRANSACTION_STATUS_SUB_COMMITTED, lsn);
}
/*
***************
*** 306,315 ****
* TransactionIdDidCommit.
*/
void
! TransactionIdCommitTree(int nxids, TransactionId *xids)
{
if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED);
}
/*
--- 309,318 ----
* TransactionIdDidCommit.
*/
void
! TransactionIdCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn)
{
if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED, lsn);
}
/*
***************
*** 320,329 ****
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(int nxids, TransactionId *xids)
{
if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_ABORTED);
}
/*
--- 323,332 ----
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(int nxids, TransactionId *xids, XLogRecPtr lsn)
{
if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_ABORTED, lsn);
}
/*
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.29
diff -c -r1.29 twophase.c
*** src/backend/access/transam/twophase.c 3 Apr 2007 16:34:35 -0000 1.29
--- src/backend/access/transam/twophase.c 5 Apr 2007 21:51:26 -0000
***************
*** 1711,1719 ****
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommit(xid);
/* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
--- 1711,1719 ----
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommit(xid, recptr);
/* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children, recptr);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
***************
*** 1790,1797 ****
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
}
--- 1790,1797 ----
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbort(xid, recptr);
! TransactionIdAbortTree(nchildren, children, recptr);
END_CRIT_SECTION();
}
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.239
diff -c -r1.239 xact.c
*** src/backend/access/transam/xact.c 3 Apr 2007 16:34:35 -0000 1.239
--- src/backend/access/transam/xact.c 5 Apr 2007 21:51:31 -0000
***************
*** 36,41 ****
--- 36,42 ----
#include "pgstat.h"
#include "storage/fd.h"
#include "storage/lmgr.h"
+ #include "storage/pmsignal.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/combocid.h"
***************
*** 58,63 ****
--- 59,68 ----
int CommitDelay = 0; /* precommit delay in microseconds */
int CommitSiblings = 5; /* # concurrent xacts needed to sleep */
+ bool DefaultXactCommitGuarantee = true; /* USERSET GUC: what user wants */
+ static bool XactCommitGuarantee = true; /* the xact guarantee for This Xid */
+ bool trace_commit = false;
+ bool trace_bg_flush = true;
/*
* transaction states - transaction state from server perspective
***************
*** 203,208 ****
--- 208,308 ----
static SubXactCallbackItem *SubXact_callbacks = NULL;
+ /*
+ * DeferredFsyncCache (DFC) is a shared-memory array where we keep track
+ * of the transactions for which deferred fsync has been requested.
+ * The array is divided into chunks, each of which fits within 1-2
+ * cache lines so that both changes and lookups can be made quickly.
+ * A chunk has more than one dfc slot within it, with each dfc slot
+ * holding details about one deferred transaction.
+ *
+ * When we access a chunk we loop through all dfc slots in the chunk,
+ * designed so that loop will be unrolled.
+ * When we flush the DFC, we don't bother to remove transactions from it.
+ * When we insert new transactions we simply overwrite expired slots,
+ * so the bookkeeping never requires the lock to be held for any length
+ * of time.
+ *
+ * When a chunk is nearly full we signal the WALWriter to wake up and
+ * flush the DFC. When a chunk is full we flush the DFC while holding
+ * the lock.
+ *
+ * The DFC is striped so that consecutive transactions aren't in the same
+ * chunk, nor will transactions from the same backend always hit the
+ * same spot in the cache.
+ */
+ #define DFC_XACTS_PER_CHUNK 8
+ #define MAX_DFC_CHUNKS 128
+
+ #define MAX_DFC_XACTS 1024
+ #define TransactionIdToDFCChunk(xid) ((int)((xid) % (TransactionId) MAX_DFC_CHUNKS))
+ #define DFCChunkToDFCSlot(chunk) ((chunk) * DFC_XACTS_PER_CHUNK)
+
+ /* Deferred Fsync tuning parameters: */
+ #define DFC_SIGNAL_WALWRITER_THRESHOLD 6
+ #define BUSY_NUM_XACTS_THRESHOLD 16
+
+ /*
+ * The DFC tracks the LSN and xmin of deferred transactions.
+ *
+ * - lsn refers to xlog pointers
+ *
+ * - xmin refers to the oldest known TransactionIds. When we
+ * flush a transaction we know that all transactions prior
+ * to the RecentGlobalXmin seen by that backend will also
+ * be known flushed. So by keeping track of the latest
+ * RecentGlobalXmin we can have a TransactionId to test
+ * known flushed state against.
+ *
+ * Pointers behave similarly to the WAL buffer because both
+ * xmin and lsn continually advance, so that the request point
+ * is always ahead of or the same as the flush point.
+ * When we make a new request we advance the request point.
+ * When we flush we advance the flush point.
+ */
+ typedef struct DeferredFsyncTransactionData
+ {
+ TransactionId xid;
+ XLogRecPtr lsn;
+ char padding[4];
+ } DeferredFsyncXactData; /* 16 bytes */
+
+ typedef struct
+ {
+ XLogRecPtr request_lsn;
+ XLogRecPtr flushed_lsn;
+
+ TransactionId request_xmin;
+ TransactionId flushed_xmin;
+
+ DeferredFsyncXactData dfccache[MAX_DFC_XACTS];
+
+ /* auto-tuning info */
+ int numNewDeferredCommits;
+
+ /* trace info */
+ int numFlushes;
+
+ } DeferredFsyncShmemStruct;
+
+ struct
+ {
+ /* copies of global tuning info */
+ int numNewDeferredCommits;
+
+ /* number of xacts sharing this hash bucket */
+ int numValid;
+
+ /* copies of global trace info */
+ int numFlushes;
+
+ /* local trace info */
+ int flush_test_exit_local;
+ int flush_test_exit_search;
+ } trace_dfc;
+
+ static DeferredFsyncShmemStruct *dfc;
+ static TransactionId RecentFlushedXmin = InvalidTransactionId;
/* local function prototypes */
static void AssignSubTransactionId(TransactionState s);
***************
*** 244,249 ****
--- 344,356 ----
static const char *BlockStateAsString(TBlockState blockState);
static const char *TransStateAsString(TransState state);
+ static void TransactionDeferFsync(TransactionId xid, XLogRecPtr deferLSN);
+
+ static void reset_trace_dfc(void);
+ static void dfc_trace_chunk(int slot, TransactionId xid, XLogRecPtr deferLSN);
+ static void dfc_trace_commit(XLogRecPtr recptr);
+ static void get_trace_dfc(void);
+
/* ----------------------------------------------------------------
* transaction state accessors
***************
*** 794,814 ****
if (MyXactMadeXLogEntry)
{
/*
! * Sleep before flush! So we can flush more than one commit
! * records per single fsync. (The idea is some other backend may
! * do the XLogFlush while we're sleeping. This needs work still,
! * because on most Unixen, the minimum select() delay is 10msec or
! * more, which is way too long.)
! *
! * We do not sleep if enableFsync is not turned on, nor if there
! * are fewer than CommitSiblings other backends with active
! * transactions.
! */
! if (CommitDelay > 0 && enableFsync &&
! CountActiveBackends() >= CommitSiblings)
! pg_usleep(CommitDelay);
! XLogFlush(recptr);
}
/*
--- 901,934 ----
if (MyXactMadeXLogEntry)
{
/*
! * If we have chosen to use unguaranteed transactions and we're
! * not doing cleanup of any rels, then we can defer fsync.
! * The WAL writer acts to minimise the window of data loss,
! * and we rely on it to flush WAL soon, but not precisely now.
! */
! if (trace_commit)
! reset_trace_dfc();
! if (XactCommitGuarantee || nrels > 0)
! {
! /*
! * Sleep before flush! So we can flush more than one commit
! * records per single fsync. (The idea is some other backend may
! * do the XLogFlush while we're sleeping. This needs work still,
! * because on most Unixen, the minimum select() delay is 10msec or
! * more, which is way too long.)
! *
! * We do not sleep if enableFsync is not turned on, nor if there
! * are fewer than CommitSiblings other backends with active
! * transactions.
! */
! if (CommitDelay > 0 && enableFsync &&
! CountActiveBackends() >= CommitSiblings)
! pg_usleep(CommitDelay);
! XLogFlush(recptr);
! }
! else
! TransactionDeferFsync(xid, recptr);
}
/*
***************
*** 819,836 ****
* emitted an XLOG record for our commit, and so in the event of a
* crash the clog update might be lost. This is okay because no one
* else will ever care whether we committed.
*/
if (madeTCentries || MyXactMadeTempRelUpdate)
{
! TransactionIdCommit(xid);
/* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
}
/* Checkpoint can proceed now */
MyProc->inCommit = false;
END_CRIT_SECTION();
}
/* Break the chain of back-links in the XLOG records I output */
--- 939,962 ----
* emitted an XLOG record for our commit, and so in the event of a
* crash the clog update might be lost. This is okay because no one
* else will ever care whether we committed.
+ *
+ * The recptr here refers to the last xlog entry by this transaction
+ * so is the correct value to use for setting the clog.
*/
if (madeTCentries || MyXactMadeTempRelUpdate)
{
! TransactionIdCommit(xid, recptr);
/* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children, recptr);
}
/* Checkpoint can proceed now */
MyProc->inCommit = false;
END_CRIT_SECTION();
+
+ if (trace_commit && madeTCentries && WALWriterActive())
+ dfc_trace_commit(recptr);
}
/* Break the chain of back-links in the XLOG records I output */
***************
*** 1013,1018 ****
--- 1139,1145 ----
if (MyLastRecPtr.xrecoff != 0 || MyXactMadeTempRelUpdate || nrels > 0)
{
TransactionId xid = GetCurrentTransactionId();
+ XLogRecPtr recptr;
/*
* Catch the scenario where we aborted partway through
***************
*** 1040,1046 ****
XLogRecData rdata[3];
int lastrdata = 0;
xl_xact_abort xlrec;
- XLogRecPtr recptr;
xlrec.xtime = time(NULL);
xlrec.nrels = nrels;
--- 1167,1172 ----
***************
*** 1074,1079 ****
--- 1200,1207 ----
if (nrels > 0)
XLogFlush(recptr);
}
+ else
+ recptr = MyLastRecPtr;
/*
* Mark the transaction aborted in clog. This is not absolutely
***************
*** 1084,1091 ****
* subtransactions to aborted state from the point of view of
* concurrent TransactionIdDidAbort calls.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
}
--- 1212,1219 ----
* subtransactions to aborted state from the point of view of
* concurrent TransactionIdDidAbort calls.
*/
! TransactionIdAbort(xid, recptr);
! TransactionIdAbortTree(nchildren, children, recptr);
END_CRIT_SECTION();
}
***************
*** 1207,1212 ****
--- 1335,1342 ----
*/
if (MyLastRecPtr.xrecoff != 0 || MyXactMadeTempRelUpdate || nrels > 0)
{
+ XLogRecPtr recptr;
+
START_CRIT_SECTION();
/*
***************
*** 1218,1224 ****
XLogRecData rdata[3];
int lastrdata = 0;
xl_xact_abort xlrec;
- XLogRecPtr recptr;
xlrec.xtime = time(NULL);
xlrec.nrels = nrels;
--- 1348,1353 ----
***************
*** 1252,1265 ****
if (nrels > 0)
XLogFlush(recptr);
}
/*
* Mark the transaction aborted in clog. This is not absolutely
* necessary but XactLockTableWait makes use of it to avoid waiting
* for already-aborted subtransactions.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
}
--- 1381,1396 ----
if (nrels > 0)
XLogFlush(recptr);
}
+ else
+ recptr = MyLastRecPtr;
/*
* Mark the transaction aborted in clog. This is not absolutely
* necessary but XactLockTableWait makes use of it to avoid waiting
* for already-aborted subtransactions.
*/
! TransactionIdAbort(xid, recptr);
! TransactionIdAbortTree(nchildren, children, recptr);
END_CRIT_SECTION();
}
***************
*** 1389,1394 ****
--- 1520,1526 ----
FreeXactSnapshot();
XactIsoLevel = DefaultXactIsoLevel;
XactReadOnly = DefaultXactReadOnly;
+ SetXactCommitGuarantee(true);
/*
* reinitialize within-transaction counters
***************
*** 4094,4099 ****
--- 4226,4237 ----
return "UNRECOGNIZED";
}
+ void
+ SetXactCommitGuarantee(bool RequestedXactCommitGuarantee)
+ {
+ XactCommitGuarantee = RequestedXactCommitGuarantee;
+ }
+
/*
* xactGetCommittedChildren
*
***************
*** 4132,4137 ****
--- 4270,4279 ----
/*
* XLOG support routines
+ *
+ * LSN supplied for clog changes is invalid, so that we avoid
+ * WAL flushes while we are rebuilding clog. After recovery
+ * completes the next clog change will set the LSN correctly.
*/
static void
***************
*** 4140,4151 ****
TransactionId *sub_xids;
TransactionId max_xid;
int i;
! TransactionIdCommit(xid);
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4282,4294 ----
TransactionId *sub_xids;
TransactionId max_xid;
int i;
+ XLogRecPtr lsn = {0,0}; /* Invalid XLogRecPtr */
! TransactionIdCommit(xid, lsn);
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xlrec->nsubxacts, sub_xids, lsn);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
***************
*** 4175,4186 ****
TransactionId *sub_xids;
TransactionId max_xid;
int i;
! TransactionIdAbort(xid);
/* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4318,4330 ----
TransactionId *sub_xids;
TransactionId max_xid;
int i;
+ XLogRecPtr lsn = {0,0}; /* Invalid XLogRecPtr */
! TransactionIdAbort(xid, lsn);
/* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xlrec->nsubxacts, sub_xids, lsn);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
***************
*** 4347,4349 ****
--- 4491,4849 ----
else
appendStringInfo(buf, "UNKNOWN");
}
+
+
+ /*
+ * Initialize the deferred fsync cache at server start
+ */
+ void
+ DeferredFsyncShmemInit(void)
+ {
+ bool found;
+
+ dfc = ShmemInitStruct("Deferred Fsync Cache",
+ DeferredFsyncShmemSize(),
+ &found);
+
+ if (dfc == NULL)
+ ereport(FATAL,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("insufficient shared memory for deferred fsync cache")));
+
+ if (found)
+ return;
+
+ MemSet(dfc, 0, DeferredFsyncShmemSize());
+ }
+
+ /*
+ * Estimate amount of shmem space needed for deferred fsync cache
+ */
+ Size
+ DeferredFsyncShmemSize(void)
+ {
+ return sizeof(DeferredFsyncShmemStruct);
+ }
+
+ /*
+ * TransactionDeferFsync()
+ *
+ * Register that an fsync will be needed in the future for this xact,
+ * stores also the LSN of the commit record in xlog, so we know
+ * where to flush to in order to make this commit safe.
+ *
+ * Guaranteed transactions need not register here.
+ */
+ static void
+ TransactionDeferFsync(TransactionId deferXid, XLogRecPtr deferLSN)
+ {
+ int chunk = TransactionIdToDFCChunk(deferXid);
+ int slot = DFCChunkToDFCSlot(chunk);
+ bool signalWALWriter = false;
+ int numValid = 0;
+ bool found = false;
+
+ LWLockAcquire(DeferredFsyncLock, LW_EXCLUSIVE);
+
+ /*
+ * Set the global highest deferLSN and advance the request xmin
+ */
+ if (XLByteLT(dfc->request_lsn, deferLSN))
+ dfc->request_lsn = deferLSN;
+
+ if (TransactionIdPrecedes(dfc->request_xmin, RecentGlobalXmin))
+ dfc->request_xmin = RecentGlobalXmin;
+
+ /*
+ * Now look for a place to record this deferred transaction
+ */
+ for (;;)
+ {
+ bool may_retry = true;
+
+ for (numValid = 0; numValid < DFC_XACTS_PER_CHUNK; numValid++)
+ {
+ /*
+ * If we find an out-of-date entry, overwrite it
+ */
+ if (XLByteLE(dfc->dfccache[slot + numValid].lsn, dfc->flushed_lsn))
+ {
+ dfc->dfccache[slot + numValid].xid = deferXid;
+ dfc->dfccache[slot + numValid].lsn = deferLSN;
+
+ /* Keep track of how busy we are */
+ dfc->numNewDeferredCommits++;
+
+ get_trace_dfc();
+
+ found = true;
+
+ break;
+ }
+ }
+
+ /*
+ * If we couldn't find anywhere to store this deferXid,
+ * then we need to flush while holding the lock,
+ * then loop back around for another attempt. Only
+ * allow ourselves to retry once though.
+ */
+ if (numValid >= DFC_XACTS_PER_CHUNK && may_retry)
+ {
+ FlushAnyDeferredFsyncXacts(false, true);
+ may_retry = false;
+ }
+ else
+ {
+ if (numValid > DFC_SIGNAL_WALWRITER_THRESHOLD)
+ signalWALWriter = true;
+ break;
+ }
+ }
+
+ if (!found)
+ dfc_trace_chunk(slot, deferXid, deferLSN);
+
+ LWLockRelease(DeferredFsyncLock);
+
+ trace_dfc.numValid = numValid;
+
+ if (!found)
+ {
+ dfc_trace_commit(deferLSN);
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+ errmsg("unable to locate slot in deferred transaction cache for TransactionId=%d LSN=%X/%X",
+ deferXid, deferLSN.xlogid, deferLSN.xrecoff)));
+ }
+
+ if (signalWALWriter)
+ SendPostmasterSignal(PMSIGNAL_WAKEN_WALWRITER);
+ }
+
+ /*
+ * FlushAnyDeferredFsyncXacts()
+ *
+ * Gets the current high-water mark LSN and then flushes xlog
+ *
+ * Doesn't confirm that all deferred fsync transactions have been flushed,
+ * unless called with DeferredFsyncLock already held.
+ */
+ void
+ FlushAnyDeferredFsyncXacts(bool loop_if_busy, bool have_lock)
+ {
+ XLogRecPtr FlushLSN = {0,0}; /* InvalidXLogRecPtr */
+ TransactionId FlushXmin = 0;
+ int num_xacts_since_last_flush;
+ int num_xacts_while_flushing;
+ int num_flushes = 0;
+
+ /* Make sure we never loop when we have the lock */
+ Assert(!(loop_if_busy && have_lock));
+
+ for (;;)
+ {
+ /*
+ * Get the current request points, then reset the
+ * counter so we can see how busy we are after we flush.
+ */
+ if (!have_lock)
+ LWLockAcquire(DeferredFsyncLock, LW_EXCLUSIVE);
+
+ if (!XLByteEQ(dfc->flushed_lsn, dfc->request_lsn))
+ {
+ FlushLSN = dfc->request_lsn;
+ FlushXmin = dfc->request_xmin;
+
+ num_flushes = dfc->numFlushes;
+ }
+
+ num_xacts_since_last_flush = dfc->numNewDeferredCommits;
+ dfc->numNewDeferredCommits = 0;
+
+ if (!have_lock)
+ LWLockRelease(DeferredFsyncLock);
+
+ if (!XLogRecPtrIsInvalid(FlushLSN))
+ XLogFlush(FlushLSN);
+
+ /*
+ * Get the number of transactions added while we've been flushing.
+ * Decide whether to keep flushing if we are busy enough.
+ * Move the known-flushed-xmin forwards
+ */
+ if (!have_lock)
+ LWLockAcquire(DeferredFsyncLock, LW_EXCLUSIVE);
+
+ /*
+ * If this new FlushLSN is higher than the flushed_lsn
+ * then update that also, unless someone already did it
+ */
+ if (XLByteLT(dfc->flushed_lsn, FlushLSN))
+ dfc->flushed_lsn = FlushLSN;
+
+ /* Move the known flushed pointer forwards, unless already done */
+ if (TransactionIdPrecedes(dfc->flushed_xmin, FlushXmin))
+ dfc->flushed_xmin = FlushXmin;
+
+ num_xacts_while_flushing = dfc->numNewDeferredCommits;
+ dfc->numFlushes++;
+
+ if (!have_lock)
+ LWLockRelease(DeferredFsyncLock);
+
+ if (!loop_if_busy || num_xacts_while_flushing < BUSY_NUM_XACTS_THRESHOLD)
+ break;
+ }
+
+ /*
+ * Only report the background flush if it did something... otherwise we get
+ * floods of messages for no purpose. We still report the background flush
+ * even if XLogFlush() had already occurred because of another backend
+ */
+ if (trace_bg_flush && num_flushes > 0 && !have_lock)
+ ereport(LOG,
+ (errmsg("background flush: lsn=%X/%X xmin=%d flushId=%d commits=%d (while flushing=%d)",
+ FlushLSN.xlogid, FlushLSN.xrecoff,
+ FlushXmin,
+ num_flushes,
+ num_xacts_since_last_flush,
+ num_xacts_while_flushing)));
+ }
+
+ /*
+ * TransactionIdIsFlushed -- has transaction commit been flushed?
+ *
+ * Since no guaranteed transactions are stored in the DFC this
+ * should always return true for guaranteed ("normal") xacts.
+ * Deferred fsync transactions will be placed in the cache by
+ * TransactionDeferFsync() though may be expired by
+ * FlushAnyDeferredFsyncXacts().
+ */
+ bool
+ TransactionIdIsFlushed(TransactionId xid)
+ {
+ bool result = true;
+ TransactionId topxid = SubTransGetTopmostTransaction(xid);
+ int chunk;
+ int slot;
+ int i;
+
+ /*
+ * If xid is already locally known-flushed then exit quickly
+ * without grabbing the lock
+ */
+ if (TransactionIdPrecedes(xid, RecentFlushedXmin))
+ {
+ trace_dfc.flush_test_exit_local++;
+ return true;
+ }
+
+ chunk = TransactionIdToDFCChunk(topxid);
+ slot = DFCChunkToDFCSlot(chunk);
+
+ LWLockAcquire(DeferredFsyncLock, LW_SHARED);
+
+ /* Update local state - not worth effort to recheck */
+ RecentFlushedXmin = dfc->flushed_xmin;
+
+ /*
+ * Search through the chunk looking for the xid, if we find
+ * it, check whether its lsn is flushed yet or not
+ */
+ result = true;
+ for (i = 0; i < DFC_XACTS_PER_CHUNK; i++)
+ {
+ if (TransactionIdEquals(dfc->dfccache[slot + i].xid,xid))
+ {
+ if (!XLByteLT(dfc->dfccache[slot + i].lsn, dfc->flushed_lsn))
+ {
+ result = false;
+ break;
+ }
+ }
+ }
+
+ LWLockRelease(DeferredFsyncLock);
+ trace_dfc.flush_test_exit_search++;
+
+ /*
+ * If we couldn't find xid then it must have been either flushed
+ * and then subsequently overwritten, or it was never a
+ * deferred transaction at all.
+ */
+ return result;
+ }
+
+ /*
+ * Trace support functions for Deferred Fsync Cache
+ */
+
+ /*
+ * reset_trace_dfc()
+ *
+ * reset any trace information in this backend, prior to commit
+ */
+ static void
+ reset_trace_dfc(void)
+ {
+ trace_dfc.numValid = 0;
+ }
+
+ /*
+ * get_trace_dfc()
+ *
+ * Get trace information to allow this commit to be traced later.
+ * use with DeferredFsyncLock held, then use dfc_trace_commit()
+ */
+ static void
+ get_trace_dfc(void)
+ {
+ trace_dfc.numFlushes = dfc->numFlushes;
+ trace_dfc.numNewDeferredCommits = dfc->numNewDeferredCommits;
+ }
+
+ /*
+ * dfc_trace_commit()
+ *
+ * log commit trace information, for use with DeferredFsyncLock not-held
+ */
+ static void
+ dfc_trace_commit(XLogRecPtr recptr)
+ {
+ if (XactCommitGuarantee)
+ ereport(LOG,
+ (errmsg(" safe commit: lsn %X/%X",
+ recptr.xlogid, recptr.xrecoff)));
+ else
+ ereport(LOG,
+ (errmsg("unsafe commit: lsn %X/%X slots=%d nFlushes=%d nCommits=%d flushTest=%d/%d",
+ recptr.xlogid, recptr.xrecoff,
+ trace_dfc.numValid,
+ trace_dfc.numFlushes,
+ trace_dfc.numNewDeferredCommits,
+ trace_dfc.flush_test_exit_local,
+ trace_dfc.flush_test_exit_search)));
+ }
+
+ /*
+ * dfc_trace_chunk()
+ *
+ * internal diagnostic or pre-error tracing, use with DeferredFsyncLock held
+ */
+ static void
+ dfc_trace_chunk(int slot, TransactionId xid, XLogRecPtr deferLSN)
+ {
+ int i;
+
+ for (i = 0; i < DFC_XACTS_PER_CHUNK; i++)
+ {
+ ereport(LOG,
+ (errmsg("dfc chunk %d: TransactionId=%d LSN=%X/%X %s",
+ slot,
+ dfc->dfccache[slot + i].xid,
+ dfc->dfccache[slot + i].lsn.xlogid,
+ dfc->dfccache[slot + i].lsn.xrecoff,
+ (XLByteLE(dfc->dfccache[slot + i].lsn, dfc->flushed_lsn) ? "flushed" : "current"))));
+ }
+ }
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.267
diff -c -r1.267 xlog.c
*** src/backend/access/transam/xlog.c 3 Apr 2007 16:34:35 -0000 1.267
--- src/backend/access/transam/xlog.c 5 Apr 2007 21:51:37 -0000
***************
*** 5393,5398 ****
--- 5393,5409 ----
checkPoint.ThisTimeLineID = ThisTimeLineID;
checkPoint.time = time(NULL);
+ /*
+ * Now confirm that all unguaranteed transactions are written to WAL
+ * before we proceed further. This may require WALWriteLock and possibly
+ * WALInsertLock if we need to flush.
+ */
+ if (WALWriterActive())
+ {
+ LWLockAcquire(DeferredFsyncLock, LW_EXCLUSIVE);
+ FlushAnyDeferredFsyncXacts(false, true);
+ }
+
/*
* We must hold WALInsertLock while examining insert state to determine
* the checkpoint REDO pointer.
***************
*** 5428,5433 ****
--- 5439,5446 ----
ControlFile->checkPointCopy.redo.xrecoff)
{
LWLockRelease(WALInsertLock);
+ if (WALWriterActive())
+ LWLockRelease(DeferredFsyncLock);
LWLockRelease(CheckpointLock);
END_CRIT_SECTION();
return;
***************
*** 5476,5481 ****
--- 5489,5496 ----
* while we are flushing disk buffers.
*/
LWLockRelease(WALInsertLock);
+ if (WALWriterActive())
+ LWLockRelease(DeferredFsyncLock);
if (!shutdown)
ereport(DEBUG2,
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.349
diff -c -r1.349 vacuum.c
*** src/backend/commands/vacuum.c 14 Mar 2007 18:48:55 -0000 1.349
--- src/backend/commands/vacuum.c 5 Apr 2007 21:51:41 -0000
***************
*** 1275,1280 ****
--- 1275,1289 ----
*/
vacpage = (VacPage) palloc(sizeof(VacPageData) + MaxOffsetNumber * sizeof(OffsetNumber));
+ /*
+ * VACUUM FULL assumes that all tuple states are well-known prior to moving
+ * tuples around. see comment "known dead" in repair_frag(). So before
+ * we perform this initial scan of the heap we must ensure there are
+ * no unflushed deferred transactions with changes against this table.
+ */
+ if (WALWriterActive())
+ FlushAnyDeferredFsyncXacts(false, false);
+
for (blkno = 0; blkno < nblocks; blkno++)
{
Page page,
Index: src/backend/postmaster/Makefile
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/Makefile,v
retrieving revision 1.22
diff -c -r1.22 Makefile
*** src/backend/postmaster/Makefile 20 Jan 2007 17:16:12 -0000 1.22
--- src/backend/postmaster/Makefile 5 Apr 2007 21:51:41 -0000
***************
*** 12,18 ****
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
! OBJS = bgwriter.o autovacuum.o pgarch.o pgstat.o postmaster.o syslogger.o \
fork_process.o
all: SUBSYS.o
--- 12,18 ----
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
! OBJS = bgwriter.o walwriter.o autovacuum.o pgarch.o pgstat.o postmaster.o syslogger.o \
fork_process.o
all: SUBSYS.o
Index: src/backend/postmaster/postmaster.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
retrieving revision 1.527
diff -c -r1.527 postmaster.c
*** src/backend/postmaster/postmaster.c 22 Mar 2007 19:53:30 -0000 1.527
--- src/backend/postmaster/postmaster.c 5 Apr 2007 21:51:45 -0000
***************
*** 107,112 ****
--- 107,113 ----
#include "postmaster/pgarch.h"
#include "postmaster/postmaster.h"
#include "postmaster/syslogger.h"
+ #include "postmaster/walwriter.h"
#include "storage/fd.h"
#include "storage/ipc.h"
#include "storage/pg_shmem.h"
***************
*** 201,206 ****
--- 202,208 ----
/* PIDs of special child processes; 0 when not running */
static pid_t StartupPID = 0,
BgWriterPID = 0,
+ WALWriterPID = 0,
AutoVacPID = 0,
PgArchPID = 0,
PgStatPID = 0;
***************
*** 907,913 ****
* CAUTION: when changing this list, check for side-effects on the signal
* handling setup of child processes. See tcop/postgres.c,
* bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/autovacuum.c,
! * postmaster/pgarch.c, postmaster/pgstat.c, and postmaster/syslogger.c.
*/
pqinitmask();
PG_SETMASK(&BlockSig);
--- 909,916 ----
* CAUTION: when changing this list, check for side-effects on the signal
* handling setup of child processes. See tcop/postgres.c,
* bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/autovacuum.c,
! * postmaster/pgarch.c, postmaster/pgstat.c, postmaster/syslogger.c
! * and postmaster/walwriter.c
*/
pqinitmask();
PG_SETMASK(&BlockSig);
***************
*** 1250,1255 ****
--- 1253,1263 ----
start_autovac_launcher = false; /* signal successfully processed */
}
+ /* If we have lost the WAL writer, try to start a new one */
+ if (WALWriterActive() && WALWriterPID == 0 &&
+ StartupPID == 0 && !FatalError && Shutdown == NoShutdown)
+ WALWriterPID = StartWALWriter();
+
/* If we have lost the archiver, try to start a new one */
if (XLogArchivingActive() && PgArchPID == 0 &&
StartupPID == 0 && !FatalError && Shutdown == NoShutdown)
***************
*** 1822,1827 ****
--- 1830,1837 ----
signal_child(BgWriterPID, SIGHUP);
if (AutoVacPID != 0)
signal_child(AutoVacPID, SIGHUP);
+ if (WALWriterPID != 0)
+ signal_child(WALWriterPID, SIGHUP);
if (PgArchPID != 0)
signal_child(PgArchPID, SIGHUP);
if (SysLoggerPID != 0)
***************
*** 1891,1896 ****
--- 1901,1909 ----
/* And tell it to shut down */
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGUSR2);
+ /* Tell WALWriter to shut down too; nothing left for it to do */
+ if (WALWriterPID != 0)
+ signal_child(WALWriterPID, SIGQUIT);
/* Tell pgarch to shut down too; nothing left for it to do */
if (PgArchPID != 0)
signal_child(PgArchPID, SIGQUIT);
***************
*** 1950,1955 ****
--- 1963,1971 ----
/* And tell it to shut down */
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGUSR2);
+ /* Tell WALWriter to shut down too; nothing left for it to do */
+ if (WALWriterPID != 0)
+ signal_child(WALWriterPID, SIGQUIT);
/* Tell pgarch to shut down too; nothing left for it to do */
if (PgArchPID != 0)
signal_child(PgArchPID, SIGQUIT);
***************
*** 1978,1983 ****
--- 1994,2001 ----
signal_child(StartupPID, SIGQUIT);
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGQUIT);
+ if (WALWriterPID != 0)
+ signal_child(WALWriterPID, SIGQUIT);
if (AutoVacPID != 0)
signal_child(AutoVacPID, SIGQUIT);
if (PgArchPID != 0)
***************
*** 2079,2086 ****
/*
* Go to shutdown mode if a shutdown request was pending.
! * Otherwise, try to start the archiver, stats collector and
! * autovacuum launcher.
*/
if (Shutdown > NoShutdown && BgWriterPID != 0)
signal_child(BgWriterPID, SIGUSR2);
--- 2097,2104 ----
/*
* Go to shutdown mode if a shutdown request was pending.
! * Otherwise, try to start the archiver, stats collector,
! * autovacuum launcher and WALWriter.
*/
if (Shutdown > NoShutdown && BgWriterPID != 0)
signal_child(BgWriterPID, SIGUSR2);
***************
*** 2090,2095 ****
--- 2108,2115 ----
PgArchPID = pgarch_start();
if (PgStatPID == 0)
PgStatPID = pgstat_start();
+ if (WALWriterPID == 0)
+ WALWriterPID = StartWALWriter();
if (AutoVacuumingActive() && AutoVacPID == 0)
AutoVacPID = StartAutoVacLauncher();
***************
*** 2150,2155 ****
--- 2170,2189 ----
}
/*
+ * Was it the WALWriter? Normal exit can be ignored; we'll
+ * start a new one at the next iteration of the postmaster's main loop,
+ * if necessary. Any other exit condition is treated as a crash.
+ */
+ if (WALWriterPID != 0 && pid == WALWriterPID)
+ {
+ WALWriterPID = 0;
+ if (!EXIT_STATUS_0(exitstatus))
+ HandleChildCrash(pid, exitstatus,
+ _("WALWriter process"));
+ continue;
+ }
+
+ /*
* Was it the autovacuum launcher? Normal exit can be ignored; we'll
* start a new one at the next iteration of the postmaster's main loop,
* if necessary. Any other exit condition is treated as a crash.
***************
*** 2245,2250 ****
--- 2279,2287 ----
/* And tell it to shut down */
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGUSR2);
+ /* Tell WALWriter to shut down too; nothing left for it to do */
+ if (WALWriterPID != 0)
+ signal_child(WALWriterPID, SIGQUIT);
/* Tell pgarch to shut down too; nothing left for it to do */
if (PgArchPID != 0)
signal_child(PgArchPID, SIGQUIT);
***************
*** 2396,2401 ****
--- 2433,2449 ----
signal_child(AutoVacPID, (SendStop ? SIGSTOP : SIGQUIT));
}
+ /* Force a power-cycle of the WALWriter process too */
+ /* (Shouldn't be necessary, but just for luck) */
+ if (WALWriterPID != 0 && !FatalError)
+ {
+ ereport(DEBUG2,
+ (errmsg_internal("sending %s to process %d",
+ "SIGQUIT",
+ (int) WALWriterPID)));
+ signal_child(WALWriterPID, SIGQUIT);
+ }
+
/* Force a power-cycle of the pgarch process too */
/* (Shouldn't be necessary, but just for luck) */
if (PgArchPID != 0 && !FatalError)
***************
*** 3488,3493 ****
--- 3536,3558 ----
AutoVacWorkerMain(argc - 2, argv + 2);
proc_exit(0);
}
+ if (strcmp(argv[1], "--forkwalwriter") == 0)
+ {
+ /* Close the postmaster's sockets */
+ ClosePostmasterPorts(false);
+
+ /* Restore basic shared memory pointers */
+ InitShmemAccess(UsedShmemSegAddr);
+
+ /* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
+ InitProcess();
+
+ /* Attach process to shared data structures */
+ CreateSharedMemoryAndSemaphores(false, 0);
+
+ WALWriterMain(argc, argv);
+ proc_exit(0);
+ }
if (strcmp(argv[1], "--forkarch") == 0)
{
/* Close the postmaster's sockets */
***************
*** 3582,3587 ****
--- 3647,3661 ----
signal_child(PgArchPID, SIGUSR1);
}
+ if (CheckPostmasterSignal(PMSIGNAL_WAKEN_WALWRITER) &&
+ WALWriterPID != 0 && Shutdown == NoShutdown)
+ {
+ /*
+ * Send SIGUSR1 to WALWriter process, to wake it up and begin fsyncing WAL
+ */
+ signal_child(WALWriterPID, SIGUSR1);
+ }
+
if (CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE) &&
SysLoggerPID != 0)
{
Index: src/backend/storage/ipc/ipci.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/ipc/ipci.c,v
retrieving revision 1.91
diff -c -r1.91 ipci.c
*** src/backend/storage/ipc/ipci.c 15 Feb 2007 23:23:23 -0000 1.91
--- src/backend/storage/ipc/ipci.c 5 Apr 2007 21:51:45 -0000
***************
*** 19,24 ****
--- 19,25 ----
#include "access/nbtree.h"
#include "access/subtrans.h"
#include "access/twophase.h"
+ #include "access/xact.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
***************
*** 101,106 ****
--- 102,108 ----
size = add_size(size, ProcGlobalShmemSize());
size = add_size(size, XLOGShmemSize());
size = add_size(size, CLOGShmemSize());
+ size = add_size(size, DeferredFsyncShmemSize());
size = add_size(size, SUBTRANSShmemSize());
size = add_size(size, TwoPhaseShmemSize());
size = add_size(size, MultiXactShmemSize());
***************
*** 177,182 ****
--- 179,185 ----
*/
XLOGShmemInit();
CLOGShmemInit();
+ DeferredFsyncShmemInit();
SUBTRANSShmemInit();
TwoPhaseShmemInit();
MultiXactShmemInit();
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.530
diff -c -r1.530 postgres.c
*** src/backend/tcop/postgres.c 29 Mar 2007 19:10:10 -0000 1.530
--- src/backend/tcop/postgres.c 5 Apr 2007 21:51:49 -0000
***************
*** 2266,2271 ****
--- 2266,2273 ----
ereport(DEBUG3,
(errmsg_internal("CommitTransactionCommand")));
+ SetXactCommitGuarantee(DefaultXactCommitGuarantee);
+
CommitTransactionCommand();
#ifdef MEMORY_CONTEXT_CHECKING
Index: src/backend/utils/misc/guc.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.383
diff -c -r1.383 guc.c
*** src/backend/utils/misc/guc.c 19 Mar 2007 23:38:30 -0000 1.383
--- src/backend/utils/misc/guc.c 5 Apr 2007 21:51:55 -0000
***************
*** 53,58 ****
--- 53,59 ----
#include "postmaster/bgwriter.h"
#include "postmaster/postmaster.h"
#include "postmaster/syslogger.h"
+ #include "postmaster/walwriter.h"
#include "storage/fd.h"
#include "storage/freespace.h"
#include "tcop/tcopprot.h"
***************
*** 102,107 ****
--- 103,111 ----
extern int CommitSiblings;
extern char *default_tablespace;
extern bool fullPageWrites;
+ extern bool trace_commit;
+ extern bool trace_bg_flush;
+
#ifdef TRACE_SORT
extern bool trace_sort;
***************
*** 149,154 ****
--- 153,159 ----
static bool assign_stage_log_stats(bool newval, bool doit, GucSource source);
static bool assign_log_stats(bool newval, bool doit, GucSource source);
static bool assign_transaction_read_only(bool newval, bool doit, GucSource source);
+ static bool assign_transaction_guarantee(bool newval, bool doit, GucSource source);
static const char *assign_canonical_path(const char *newval, bool doit, GucSource source);
static const char *assign_backslash_quote(const char *newval, bool doit, GucSource source);
static const char *assign_timezone_abbreviations(const char *newval, bool doit, GucSource source);
***************
*** 317,322 ****
--- 322,329 ----
gettext_noop("Write-Ahead Log"),
/* WAL_SETTINGS */
gettext_noop("Write-Ahead Log / Settings"),
+ /* WAL_COMMITS */
+ gettext_noop("Write-Ahead Log / Commit Behavior"),
/* WAL_CHECKPOINTS */
gettext_noop("Write-Ahead Log / Checkpoints"),
/* QUERY_TUNING */
***************
*** 573,578 ****
--- 580,601 ----
false, NULL, NULL
},
{
+ {"trace_commit", PGC_SIGHUP, DEVELOPER_OPTIONS,
+ gettext_noop("Shows details of commits, for use with transaction_guarantee."),
+ NULL
+ },
+ &trace_commit,
+ false, NULL, NULL
+ },
+ {
+ {"trace_bg_flush", PGC_SIGHUP, DEVELOPER_OPTIONS,
+ gettext_noop("Shows details of WAL Writer, for use with transaction_guarantee."),
+ NULL
+ },
+ &trace_bg_flush,
+ true, NULL, NULL
+ },
+ {
{"log_connections", PGC_BACKEND, LOGGING_WHAT,
gettext_noop("Logs each successful connection."),
NULL
***************
*** 883,888 ****
--- 906,919 ----
true, assign_phony_autocommit, NULL
},
{
+ {"transaction_guarantee", PGC_USERSET, WAL_COMMITS,
+ gettext_noop("Sets the default of wait-for-commit."),
+ NULL
+ },
+ &DefaultXactCommitGuarantee,
+ true, assign_transaction_guarantee, NULL
+ },
+ {
{"default_transaction_read_only", PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop("Sets the default read-only status of new transactions."),
NULL
***************
*** 1165,1171 ****
NULL
},
&ReservedBackends,
! 3, 0, INT_MAX / 4, NULL, NULL
},
{
--- 1196,1202 ----
NULL
},
&ReservedBackends,
! 5, 0, INT_MAX / 4, NULL, NULL
},
{
***************
*** 1457,1463 ****
},
{
! {"commit_delay", PGC_USERSET, WAL_CHECKPOINTS,
gettext_noop("Sets the delay in microseconds between transaction commit and "
"flushing WAL to disk."),
NULL
--- 1488,1494 ----
},
{
! {"commit_delay", PGC_USERSET, WAL_COMMITS,
gettext_noop("Sets the delay in microseconds between transaction commit and "
"flushing WAL to disk."),
NULL
***************
*** 1467,1473 ****
},
{
! {"commit_siblings", PGC_USERSET, WAL_CHECKPOINTS,
gettext_noop("Sets the minimum concurrent open transactions before performing "
"commit_delay."),
NULL
--- 1498,1504 ----
},
{
! {"commit_siblings", PGC_USERSET, WAL_COMMITS,
gettext_noop("Sets the minimum concurrent open transactions before performing "
"commit_delay."),
NULL
***************
*** 1477,1482 ****
--- 1508,1523 ----
},
{
+ {"wal_writer_delay", PGC_SIGHUP, WAL_COMMITS,
+ gettext_noop("Sets the delay in milliseconds between regular flushing of WAL "
+ "to disk by the WALWriter."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &WALWriterDelay,
+ 0, 0, 1000, NULL, NULL
+ },
+ {
{"extra_float_digits", PGC_USERSET, CLIENT_CONN_LOCALE,
gettext_noop("Sets the number of digits displayed for floating-point values."),
gettext_noop("This affects real, double precision, and geometric data types. "
***************
*** 6472,6477 ****
--- 6513,6537 ----
return true;
}
+ static bool
+ assign_transaction_guarantee(bool newval, bool doit, GucSource source)
+ {
+ /*
+ * Transaction guarantee can only be disabled if the
+ * WALWriter has been activated. This is important since it allows
+ * us to place a sensible time limit on the extent of the data loss
+ * window for deferred fsync transactions.
+ */
+ if (newval == false && !WALWriterActive())
+ {
+ if (source >= PGC_S_INTERACTIVE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot set transaction guarantee when server wal_writer_delay = 0")));
+ }
+ return true;
+ }
+
static const char *
assign_canonical_path(const char *newval, bool doit, GucSource source)
{
Index: src/backend/utils/misc/postgresql.conf.sample
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/misc/postgresql.conf.sample,v
retrieving revision 1.213
diff -c -r1.213 postgresql.conf.sample
*** src/backend/utils/misc/postgresql.conf.sample 19 Mar 2007 23:38:30 -0000 1.213
--- src/backend/utils/misc/postgresql.conf.sample 5 Apr 2007 21:51:56 -0000
***************
*** 150,156 ****
# - Settings -
! #fsync = on # turns forced synchronization on or off
#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
--- 150,156 ----
# - Settings -
! #wal_buffers = 64kB # min 32kB
#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
***************
*** 159,169 ****
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
- #wal_buffers = 64kB # min 32kB
# (change requires restart)
! #commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
# - Checkpoints -
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
--- 159,173 ----
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
# (change requires restart)
!
! #wal_writer_delay = 0 # range 0-1000, in milliseconds
! #transaction_guarantee = on # default: immediate fsync at commit
!
! #commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
+
# - Checkpoints -
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
Index: src/backend/utils/time/tqual.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/time/tqual.c,v
retrieving revision 1.102
diff -c -r1.102 tqual.c
*** src/backend/utils/time/tqual.c 25 Mar 2007 19:45:14 -0000 1.102
--- src/backend/utils/time/tqual.c 5 Apr 2007 21:51:57 -0000
***************
*** 78,83 ****
--- 78,85 ----
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
+ static void HeapTupleSetVisibilityInfo(HeapTupleHeader tuple,
+ Buffer buffer, SetTupleVisibilityAction action, uint16 infomask);
/*
***************
*** 122,133 ****
{
if (TransactionIdDidCommit(xvac))
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
--- 124,133 ----
{
if (TransactionIdDidCommit(xvac))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
***************
*** 139,152 ****
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
}
--- 139,148 ----
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
else
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
}
***************
*** 164,171 ****
/* deleting subtransaction aborted? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
--- 160,166 ----
/* deleting subtransaction aborted? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX_SUBTRANS, HEAP_XMAX_INVALID);
return true;
}
***************
*** 176,190 ****
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
}
--- 171,181 ----
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_COMMITTED);
else
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_INVALID);
return false;
}
}
***************
*** 221,228 ****
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
--- 212,218 ----
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return true;
}
***************
*** 230,242 ****
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
! tuple->t_infomask |= HEAP_XMAX_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
--- 220,230 ----
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return true;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_COMMITTED);
return false;
}
***************
*** 299,310 ****
{
if (TransactionIdDidCommit(xvac))
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
--- 287,296 ----
{
if (TransactionIdDidCommit(xvac))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
***************
*** 316,329 ****
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
}
--- 302,311 ----
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
else
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
}
***************
*** 344,351 ****
/* deleting subtransaction aborted? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
--- 326,332 ----
/* deleting subtransaction aborted? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX_SUBTRANS, HEAP_XMAX_INVALID);
return true;
}
***************
*** 359,373 ****
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
}
--- 340,350 ----
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_COMMITTED);
else
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_INVALID);
return false;
}
}
***************
*** 407,414 ****
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
--- 384,390 ----
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return true;
}
***************
*** 416,428 ****
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
! tuple->t_infomask |= HEAP_XMAX_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
--- 392,402 ----
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return true;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_COMMITTED);
return false;
}
***************
*** 469,480 ****
{
if (TransactionIdDidCommit(xvac))
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
--- 443,452 ----
{
if (TransactionIdDidCommit(xvac))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
***************
*** 486,499 ****
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
}
--- 458,467 ----
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
else
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
}
***************
*** 550,561 ****
{
if (TransactionIdDidCommit(xvac))
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HeapTupleInvisible;
}
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
--- 518,527 ----
{
if (TransactionIdDidCommit(xvac))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return HeapTupleInvisible;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
***************
*** 567,580 ****
if (TransactionIdIsInProgress(xvac))
return HeapTupleInvisible;
if (TransactionIdDidCommit(xvac))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HeapTupleInvisible;
}
}
--- 533,542 ----
if (TransactionIdIsInProgress(xvac))
return HeapTupleInvisible;
if (TransactionIdDidCommit(xvac))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
else
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return HeapTupleInvisible;
}
}
***************
*** 595,602 ****
/* deleting subtransaction aborted? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HeapTupleMayBeUpdated;
}
--- 557,563 ----
/* deleting subtransaction aborted? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX_SUBTRANS, HEAP_XMAX_INVALID);
return HeapTupleMayBeUpdated;
}
***************
*** 610,624 ****
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
return HeapTupleInvisible;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HeapTupleInvisible;
}
}
--- 571,581 ----
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
return HeapTupleInvisible;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_COMMITTED);
else
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_INVALID);
return HeapTupleInvisible;
}
}
***************
*** 642,649 ****
if (MultiXactIdIsRunning(HeapTupleHeaderGetXmax(tuple)))
return HeapTupleBeingUpdated;
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HeapTupleMayBeUpdated;
}
--- 599,605 ----
if (MultiXactIdIsRunning(HeapTupleHeaderGetXmax(tuple)))
return HeapTupleBeingUpdated;
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMULTI, HEAP_XMAX_INVALID);
return HeapTupleMayBeUpdated;
}
***************
*** 663,670 ****
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HeapTupleMayBeUpdated;
}
--- 619,625 ----
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return HeapTupleMayBeUpdated;
}
***************
*** 672,684 ****
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HeapTupleMayBeUpdated;
}
! tuple->t_infomask |= HEAP_XMAX_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
return HeapTupleUpdated; /* updated by other */
}
--- 627,637 ----
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return HeapTupleMayBeUpdated;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_COMMITTED);
return HeapTupleUpdated; /* updated by other */
}
***************
*** 723,734 ****
{
if (TransactionIdDidCommit(xvac))
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
--- 676,685 ----
{
if (TransactionIdDidCommit(xvac))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
***************
*** 740,753 ****
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
}
--- 691,700 ----
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
else
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
}
***************
*** 765,772 ****
/* deleting subtransaction aborted? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
--- 712,718 ----
/* deleting subtransaction aborted? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX_SUBTRANS, HEAP_XMAX_INVALID);
return true;
}
***************
*** 781,795 ****
return true; /* in insertion by other */
}
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
}
--- 727,737 ----
return true; /* in insertion by other */
}
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_COMMITTED);
else
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_INVALID);
return false;
}
}
***************
*** 829,836 ****
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
--- 771,777 ----
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return true;
}
***************
*** 838,850 ****
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
! tuple->t_infomask |= HEAP_XMAX_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
return false; /* updated by other */
}
--- 779,789 ----
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return true;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_COMMITTED);
return false; /* updated by other */
}
***************
*** 888,899 ****
{
if (TransactionIdDidCommit(xvac))
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
--- 827,836 ----
{
if (TransactionIdDidCommit(xvac))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
***************
*** 905,918 ****
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
}
--- 842,851 ----
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
else
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return false;
}
}
***************
*** 934,941 ****
/* FIXME -- is this correct w.r.t. the cmax of the tuple? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
--- 867,873 ----
/* FIXME -- is this correct w.r.t. the cmax of the tuple? */
if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX_SUBTRANS, HEAP_XMAX_INVALID);
return true;
}
***************
*** 949,963 ****
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return false;
}
}
--- 881,891 ----
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_COMMITTED);
else
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_INVALID);
return false;
}
}
***************
*** 998,1011 ****
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return true;
}
/* xmax transaction committed */
! tuple->t_infomask |= HEAP_XMAX_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
}
/*
--- 926,937 ----
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return true;
}
/* xmax transaction committed */
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_COMMITTED);
}
/*
***************
*** 1054,1065 ****
return HEAPTUPLE_DELETE_IN_PROGRESS;
if (TransactionIdDidCommit(xvac))
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HEAPTUPLE_DEAD;
}
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
--- 980,989 ----
return HEAPTUPLE_DELETE_IN_PROGRESS;
if (TransactionIdDidCommit(xvac))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return HEAPTUPLE_DEAD;
}
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
***************
*** 1071,1083 ****
return HEAPTUPLE_INSERT_IN_PROGRESS;
if (TransactionIdDidCommit(xvac))
{
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HEAPTUPLE_DEAD;
}
}
--- 995,1005 ----
return HEAPTUPLE_INSERT_IN_PROGRESS;
if (TransactionIdDidCommit(xvac))
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_COMMITTED);
! }
else
{
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XVAC, HEAP_XMIN_INVALID);
return HEAPTUPLE_DEAD;
}
}
***************
*** 1091,1107 ****
return HEAPTUPLE_DELETE_IN_PROGRESS;
}
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! {
! tuple->t_infomask |= HEAP_XMIN_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
/*
* Not in Progress, Not Committed, so either Aborted or crashed
*/
! tuple->t_infomask |= HEAP_XMIN_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HEAPTUPLE_DEAD;
}
/* Should only get here if we set XMIN_COMMITTED */
--- 1013,1025 ----
return HEAPTUPLE_DELETE_IN_PROGRESS;
}
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_COMMITTED);
else
{
/*
* Not in Progress, Not Committed, so either Aborted or crashed
*/
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMIN, HEAP_XMIN_INVALID);
return HEAPTUPLE_DEAD;
}
/* Should only get here if we set XMIN_COMMITTED */
***************
*** 1143,1150 ****
* We know that xmax did lock the tuple, but it did not and will
* never actually update it.
*/
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
}
return HEAPTUPLE_LIVE;
}
--- 1061,1067 ----
* We know that xmax did lock the tuple, but it did not and will
* never actually update it.
*/
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMULTI, HEAP_XMAX_INVALID);
}
return HEAPTUPLE_LIVE;
}
***************
*** 1161,1177 ****
if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
return HEAPTUPLE_DELETE_IN_PROGRESS;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
! {
! tuple->t_infomask |= HEAP_XMAX_COMMITTED;
! SetBufferCommitInfoNeedsSave(buffer);
! }
else
{
/*
* Not in Progress, Not Committed, so either Aborted or crashed
*/
! tuple->t_infomask |= HEAP_XMAX_INVALID;
! SetBufferCommitInfoNeedsSave(buffer);
return HEAPTUPLE_LIVE;
}
/* Should only get here if we set XMAX_COMMITTED */
--- 1078,1090 ----
if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
return HEAPTUPLE_DELETE_IN_PROGRESS;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_COMMITTED);
else
{
/*
* Not in Progress, Not Committed, so either Aborted or crashed
*/
! HeapTupleSetVisibilityInfo(tuple, buffer, TUPLE_XMAX, HEAP_XMAX_INVALID);
return HEAPTUPLE_LIVE;
}
/* Should only get here if we set XMAX_COMMITTED */
***************
*** 1205,1210 ****
--- 1118,1185 ----
return HEAPTUPLE_DEAD;
}
+ /*
+ * HeapTupleSetVisibilityInfo()
+ *
+ * Set the visibility info on a tuple, if allowable at this point in
+ * time, do so.
+ *
+ * We're able to set this info when we are looking at one of our own
+ * transaction's aborted subtransactions, or when we are examining
+ * the xvac field, since a VACUUM FULL is always a guaranteed transaction.
+ *
+ * Otherwise we can only set visibility information for a tuple when
+ * the transaction commit has been flushed, which may not yet be the
+ * case for unguaranteed transactions - so we check. Note that if we
+ * do have to check then we have already confirmed that the
+ * TransactionId is not in progress (see comments in this file header)
+ * No need to check Aborts, since those are never deferred.
+ *
+ * For Multitransactions we won't be able to mark them until all
+ * transactions that were part of them have been flushed.
+ */
+ static void
+ HeapTupleSetVisibilityInfo(HeapTupleHeader tuple,
+ Buffer buffer, SetTupleVisibilityAction action, uint16 infomask)
+ {
+ if (WALWriterActive())
+ {
+ switch (action)
+ {
+ case TUPLE_XMIN:
+ if (infomask == HEAP_XMIN_COMMITTED &&
+ !TransactionIdIsFlushed(HeapTupleHeaderGetXmin(tuple)))
+ return;
+ break;
+
+ case TUPLE_XMAX:
+ if (infomask == HEAP_XMAX_COMMITTED &&
+ !TransactionIdIsFlushed(HeapTupleHeaderGetXmax(tuple)))
+ return;
+ break;
+
+ /* Multitransactions are always xmax */
+ case TUPLE_XMULTI:
+ if (!MultiXactIdIsFlushed(HeapTupleHeaderGetXmax(tuple)))
+ return;
+ break;
+
+ case TUPLE_XVAC:
+ case TUPLE_XMAX_SUBTRANS:
+ break;
+
+ default:
+ elog(ERROR, "invalid action for HeapTupleSetVisibilityInfo");
+ break;
+ }
+ }
+
+ /*
+ * We're allowed to set the info bits and mark the buffer dirty
+ */
+ tuple->t_infomask |= infomask;
+ SetBufferCommitInfoNeedsSave(buffer);
+ }
/*
* GetTransactionSnapshot
Index: src/include/access/clog.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v
retrieving revision 1.19
diff -c -r1.19 clog.h
*** src/include/access/clog.h 5 Jan 2007 22:19:50 -0000 1.19
--- src/include/access/clog.h 5 Apr 2007 21:51:57 -0000
***************
*** 32,38 ****
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
extern XidStatus TransactionIdGetStatus(TransactionId xid);
extern Size CLOGShmemSize(void);
--- 32,38 ----
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid);
extern Size CLOGShmemSize(void);
Index: src/include/access/multixact.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/access/multixact.h,v
retrieving revision 1.12
diff -c -r1.12 multixact.h
*** src/include/access/multixact.h 5 Jan 2007 22:19:51 -0000 1.12
--- src/include/access/multixact.h 5 Apr 2007 21:51:58 -0000
***************
*** 45,50 ****
--- 45,51 ----
extern MultiXactId MultiXactIdCreate(TransactionId xid1, TransactionId xid2);
extern MultiXactId MultiXactIdExpand(MultiXactId multi, TransactionId xid);
extern bool MultiXactIdIsRunning(MultiXactId multi);
+ extern bool MultiXactIdIsFlushed(MultiXactId multi);
extern bool MultiXactIdIsCurrent(MultiXactId multi);
extern void MultiXactIdWait(MultiXactId multi);
extern bool ConditionalMultiXactIdWait(MultiXactId multi);
Index: src/include/access/slru.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/access/slru.h,v
retrieving revision 1.20
diff -c -r1.20 slru.h
*** src/include/access/slru.h 5 Jan 2007 22:19:51 -0000 1.20
--- src/include/access/slru.h 5 Apr 2007 21:51:58 -0000
***************
*** 14,19 ****
--- 14,20 ----
#define SLRU_H
#include "storage/lwlock.h"
+ #include "access/xlogdefs.h"
/*
***************
*** 47,52 ****
--- 48,54 ----
char **page_buffer;
SlruPageStatus *page_status;
bool *page_dirty;
+ XLogRecPtr *page_lsn; /* only set if do_wal_flush is true */
int *page_number;
int *page_lru_count;
LWLockId *buffer_locks;
***************
*** 74,92 ****
/*
* SlruCtlData is an unshared structure that points to the active information
! * in shared memory.
*/
typedef struct SlruCtlData
{
SlruShared shared;
/*
! * This flag tells whether to fsync writes (true for pg_clog, false for
! * pg_subtrans).
*/
bool do_fsync;
/*
* Decide which of two page numbers is "older" for truncation purposes. We
* need to use comparison of TransactionIds here in order to do the right
* thing with wraparound XID arithmetic.
--- 76,101 ----
/*
* SlruCtlData is an unshared structure that points to the active information
! * in shared memory. Just so its clear: this information is accessible even
! * when you do not hold the Control lock for the slru
*/
typedef struct SlruCtlData
{
SlruShared shared;
/*
! * This flag tells whether to fsync writes
! * (true for pg_clog and multitrans, false for pg_subtrans).
*/
bool do_fsync;
/*
+ * This flag tells whether to flush WAL before writing pages
+ * (true for pg_clog, false for multitrans and pg_subtrans).
+ */
+ bool do_wal_flush;
+
+ /*
* Decide which of two page numbers is "older" for truncation purposes. We
* need to use comparison of TransactionIds here in order to do the right
* thing with wraparound XID arithmetic.
Index: src/include/access/transam.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/access/transam.h,v
retrieving revision 1.60
diff -c -r1.60 transam.h
*** src/include/access/transam.h 5 Jan 2007 22:19:51 -0000 1.60
--- src/include/access/transam.h 5 Apr 2007 21:51:58 -0000
***************
*** 14,19 ****
--- 14,20 ----
#ifndef TRANSAM_H
#define TRANSAM_H
+ #include "access/xlogdefs.h"
/* ----------------
* Special transaction ID values
***************
*** 114,124 ****
*/
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
! extern void TransactionIdCommit(TransactionId transactionId);
! extern void TransactionIdAbort(TransactionId transactionId);
extern void TransactionIdSubCommit(TransactionId transactionId);
! extern void TransactionIdCommitTree(int nxids, TransactionId *xids);
! extern void TransactionIdAbortTree(int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
--- 115,125 ----
*/
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
! extern void TransactionIdCommit(TransactionId transactionId, XLogRecPtr lsn);
! extern void TransactionIdAbort(TransactionId transactionId, XLogRecPtr lsn);
extern void TransactionIdSubCommit(TransactionId transactionId);
! extern void TransactionIdCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(int nxids, TransactionId *xids, XLogRecPtr lsn);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
Index: src/include/access/xact.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/access/xact.h,v
retrieving revision 1.85
diff -c -r1.85 xact.h
*** src/include/access/xact.h 13 Mar 2007 00:33:42 -0000 1.85
--- src/include/access/xact.h 5 Apr 2007 21:51:58 -0000
***************
*** 16,21 ****
--- 16,22 ----
#include "access/xlog.h"
#include "nodes/pg_list.h"
+ #include "postmaster/walwriter.h"
#include "storage/relfilenode.h"
#include "utils/timestamp.h"
***************
*** 41,46 ****
--- 42,50 ----
extern bool DefaultXactReadOnly;
extern bool XactReadOnly;
+ /* Deferred Fsync */
+ extern bool DefaultXactCommitGuarantee;
+ extern void SetXactCommitGuarantee(bool RequestedXactCommitGuarantee);
/*
* start- and end-of-transaction callbacks for dynamically loaded modules
*/
***************
*** 145,150 ****
--- 149,155 ----
extern void SetCurrentStatementStartTimestamp(void);
extern int GetCurrentTransactionNestLevel(void);
extern bool TransactionIdIsCurrentTransactionId(TransactionId xid);
+ extern bool TransactionIdIsFlushed(TransactionId xid);
extern void CommandCounterIncrement(void);
extern void StartTransactionCommand(void);
extern void CommitTransactionCommand(void);
***************
*** 179,182 ****
--- 184,192 ----
extern void xact_redo(XLogRecPtr lsn, XLogRecord *record);
extern void xact_desc(StringInfo buf, uint8 xl_info, char *rec);
+ extern void DeferredFsyncShmemInit(void);
+ extern Size DeferredFsyncShmemSize(void);
+ extern void FlushAnyDeferredFsyncXacts(bool loop_if_busy, bool have_lock);
+ extern bool TransactionIdIsFlushed(TransactionId xid);
+
#endif /* XACT_H */
Index: src/include/access/xlogdefs.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/access/xlogdefs.h,v
retrieving revision 1.17
diff -c -r1.17 xlogdefs.h
*** src/include/access/xlogdefs.h 14 Feb 2007 05:00:40 -0000 1.17
--- src/include/access/xlogdefs.h 5 Apr 2007 21:51:58 -0000
***************
*** 33,38 ****
--- 33,40 ----
uint32 xrecoff; /* byte offset of location in log file */
} XLogRecPtr;
+ #define XLogRecPtrIsInvalid(p) \
+ (((p).xlogid == 0 && (p).xrecoff == 0) ? true : false)
/*
* Macros for comparing XLogRecPtrs
Index: src/include/storage/lwlock.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/storage/lwlock.h,v
retrieving revision 1.35
diff -c -r1.35 lwlock.h
*** src/include/storage/lwlock.h 3 Apr 2007 16:34:36 -0000 1.35
--- src/include/storage/lwlock.h 5 Apr 2007 21:51:59 -0000
***************
*** 61,66 ****
--- 61,67 ----
BtreeVacuumLock,
AddinShmemInitLock,
AutovacuumLock,
+ DeferredFsyncLock,
/* Individual lock IDs end here */
FirstBufMappingLock,
FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
Index: src/include/storage/pmsignal.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/storage/pmsignal.h,v
retrieving revision 1.17
diff -c -r1.17 pmsignal.h
*** src/include/storage/pmsignal.h 15 Feb 2007 23:23:23 -0000 1.17
--- src/include/storage/pmsignal.h 5 Apr 2007 21:51:59 -0000
***************
*** 25,30 ****
--- 25,31 ----
PMSIGNAL_PASSWORD_CHANGE, /* pg_auth file has changed */
PMSIGNAL_WAKEN_CHILDREN, /* send a SIGUSR1 signal to all backends */
PMSIGNAL_WAKEN_ARCHIVER, /* send a NOTIFY signal to xlog archiver */
+ PMSIGNAL_WAKEN_WALWRITER, /* send a NOTIFY signal to WAL Writer */
PMSIGNAL_ROTATE_LOGFILE, /* send SIGUSR1 to syslogger to rotate logfile */
PMSIGNAL_START_AUTOVAC_LAUNCHER, /* start an autovacuum launcher */
PMSIGNAL_START_AUTOVAC_WORKER, /* start an autovacuum worker */
Index: src/include/utils/guc_tables.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/utils/guc_tables.h,v
retrieving revision 1.32
diff -c -r1.32 guc_tables.h
*** src/include/utils/guc_tables.h 13 Mar 2007 14:32:25 -0000 1.32
--- src/include/utils/guc_tables.h 5 Apr 2007 21:51:59 -0000
***************
*** 51,56 ****
--- 51,57 ----
RESOURCES_KERNEL,
WAL,
WAL_SETTINGS,
+ WAL_COMMITS,
WAL_CHECKPOINTS,
QUERY_TUNING,
QUERY_TUNING_METHOD,
Index: src/include/utils/tqual.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/utils/tqual.h,v
retrieving revision 1.66
diff -c -r1.66 tqual.h
*** src/include/utils/tqual.h 25 Mar 2007 19:45:14 -0000 1.66
--- src/include/utils/tqual.h 5 Apr 2007 21:52:00 -0000
***************
*** 125,130 ****
--- 125,140 ----
HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */
} HTSV_Result;
+ /* Action codes for HeapTupleSetVisibilityInfo */
+ typedef enum
+ {
+ TUPLE_XMIN, /* check the tuple's xmin as a TransactionId */
+ TUPLE_XMAX, /* check the tuple's xmax as a TransactionId */
+ TUPLE_XMULTI, /* check the tuple's xmax as a MultitransactionId */
+ TUPLE_XVAC, /* looking at xvac */
+ TUPLE_XMAX_SUBTRANS /* looking at xmax of an aborted subtrans */
+ } SetTupleVisibilityAction;
+
/* These are the "satisfies" test routines for the various snapshot types */
extern bool HeapTupleSatisfiesMVCC(HeapTupleHeader tuple,
Snapshot snapshot, Buffer buffer);