Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.130
diff -c -r1.130 config.sgml
*** doc/src/sgml/config.sgml 30 Jun 2007 19:12:01 -0000 1.130
--- doc/src/sgml/config.sgml 13 Jul 2007 11:17:36 -0000
***************
*** 1412,1417 ****
--- 1412,1457 ----
+
+
+ wal_writer_delay (integer)
+
+ wal_writer_delay> configuration parameter
+
+
+
+ Specifies the delay between activity rounds for the WAL Writer. In each
+ round the writer will flush WAL to disk. It then sleeps for
+ wal_writer_delay milliseconds, and repeats. The default value is 200
+ milliseconds (200ms). Note that on many systems, the effective
+ resolution of sleep delays is 10 milliseconds; setting wal_writer_delay
+ to a value that is not a multiple of 10 might have the same results as
+ setting it to the next higher multiple of 10. This parameter can only
+ be set in the postgresql.conf file or on the server command line.
+
+
+
+
+
+ synchronous_commit (boolean)
+
+ synchronous_commit> configuration parameter
+
+
+
+ Specifies whether explicit or implicit commit commands will wait
+ for WAL to be written to disk before the command returns success.
+ Turning this parameter off will give you asynchronous commits,
+ which will increase performance for some workloads, though
+ introduces risk of data loss, see .
+
+
+ This parameter can be set for individual transactions or sessions,
+ so it is advisable to use asynchronous commits only for those
+ transactions for which risk of data loss is acceptable.
+
+
+
wal_buffers (integer)
Index: doc/src/sgml/wal.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/wal.sgml,v
retrieving revision 1.44
diff -c -r1.44 wal.sgml
*** doc/src/sgml/wal.sgml 28 Jun 2007 00:02:37 -0000 1.44
--- doc/src/sgml/wal.sgml 13 Jul 2007 11:17:38 -0000
***************
*** 23,29 ****
ordinarily meets this requirement. In fact, even if a computer is
fatally damaged, if the disk drives survive they can be moved to
another computer with similar hardware and all committed
! transactions will remain intact.
--- 23,33 ----
ordinarily meets this requirement. In fact, even if a computer is
fatally damaged, if the disk drives survive they can be moved to
another computer with similar hardware and all committed
! transactions will remain intact. We refer to this mode of operation
! as synchronous commit, since the user issuing the commit waits for
! the writing of the WAL to permanent storage.
! Asynchronous commit is also possible as a performance option,
! described in
***************
*** 394,397 ****
--- 398,586 ----
seem to be a problem in practice.
+
+
+ Asynchronous Commit
+
+
+ synchronous commit
+
+
+
+ asynchronous commit
+
+
+
+ Asynchronous Commit allows a commit command to complete faster, at the
+ cost that the most recent transactions will be lost if the database
+ should crash. This feature is particularly useful for real-time
+ sensor data collection applications, such as RFID tags or other
+ monitoring applications.
+
+
+
+ Normal commits, or as we now refer to them, synchronous commits, wait
+ for the writing of the WAL to permanent storage
+ before returning control to the user. An asynchronous commit will
+ return control to the user before> WAL
+ data has been written and flushed to disk, which gives a significant
+ performance boost if there has been few other disk accesses over the course
+ of this transaction. From an SQL perspective, this makes asynchronous
+ commits particularly useful for shorter, mainly INSERT transactions or
+ writes on tables small enough to reside mainly in memory.
+
+
+
+ Asynchronous commits introduce the risk of data loss. There
+ is a time window between the time that a commit has returned
+ successfully to the user and the time of the WAL write during which
+ data will certainly be lost if> the server crashes.
+ The data loss is deterministic: if the server stays up then the
+ commits will be durable, while if the server crashes there will be
+ certain data loss of the transactions that have most recently
+ committed.
+
+
+
+ The user can select the commit mode of their transactions, so that
+ it is possible to have both synchronous and asynchronous commit
+ transactions concurrently. The proof that this is safe is presented
+ later in this section. The commit mode is
+ controlled by the user settable parameter
+ .
+ synchronous_commit may be set in
+ postgresql.conf, or for a specific session using
+
+ SET synchronous_commit = off
+
+ which will provide faster, asynchronous commits.
+ synchronous_commit can be set at any point right
+ up to the final COMMIT statement. This parameter may also be set for
+ just one individual transaction using
+
+ SET LOCAL synchronous_commit = off
+
+
+
+
+ Whatever the setting of synchronous_commit,
+ commits will always be synchronous for utility commands, such
+ as VACUUM, as well as for any transaction that created or removed
+ files.
+
+
+
+ Risk of Data Loss
+
+
+ If the database crashes during the risk window between the
+ asynchronous commit and the writing of the WAL
+ then data written during that transaction will> be lost.
+ Only data written by asynchronous transactions will be lost, just as
+ if the transaction had never actually completed. Transactions are
+ atomic, so this data loss does not propagate and any already
+ written data is completely safe.
+
+
+
+ The window of data loss is limited because the WAL Writer process
+ regularly writes WAL to disk every
+ milliseconds.
+ The actual maximum duration of the risk window is twice the
+ wal_writer_delay because the writes are optimised
+ to favor writing whole pages at a time during busy periods.
+
+
+
+ The extent of data loss for asynchronous commit transactions will be
+ limited to the number of transactions that normally complete in the
+ duration of the maximum time window. You can calculate this for your
+ specific application/system, though typically this would be in the
+ ballpark of 1000 transactions. This aspect means that asynch commits
+ are a useful technique for applications such as sensor measurements or
+ web statistics where a steady stream of data must be processed, yet
+ individual measurements are not particularly important. Data loss
+ would not be acceptable when the data processed is financial
+ transactions or other valuable customer commitments.
+
+
+
+ Asynchronous commit provides different behaviour to setting
+ fsync = off, since that is a server-wide
+ setting that will alter the behaviour of all transactions,
+ overriding the setting of synchronous_commit,
+ as well as risking much wider data loss. With fsync
+ = off the WAL written but not fsynced, so data is lost only in case
+ of a system crash. With asynchronous commit the WAL is not written
+ at all by the user, so data is lost if there is a database server crash.
+
+
+
+ commit_delay also sounds very similar to
+ asynchronous commit, but it is actually a synchronous commit
+ with an additional wait that allows a technique known as group
+ commit. commit_delay is also a server-wide setting.
+
+
+
+
+
+ Proof of safety for concurrent use
+
+
+ It is useful that we can run both synchronous and asynchronous
+ commit transactions concurrently, though there must be a clear proof
+ to ensure we minimise the possibility of technical error.
+
+
+
+ We have two transactions, T1 and T2. The Log Sequence Number (LSN)
+ is the point in the WAL sequence where a transaction commit is
+ recorded, so LSN1 and LSN2 are the commit records of those transactions.
+ If T2 can see changes made by T1 then when T2 commits it
+ must be true that LSN2 > LSN1. Thus when T2 commits it is certain
+ that all of the changes made by T1 are also now recorded in the WAL.
+ This situation is true whether or not T1 was asynchronous or
+ synchronous. As a result, it is safe for asynchronous commits and
+ synchronous commits to work concurrently without endangering data
+ written by synchronous commits. Sub-transactions are not important
+ here since the final write to disk only occurs at the commit of
+ the top level transaction.
+
+
+
+ Changes to data blocks cannot reach disk unless WAL is flushed up
+ to the point of the LSN of the data blocks. Any attempt to write
+ unsafe data to disk will trigger a write which ensures the safety
+ of all data written by that and prior transactions. Data blocks
+ and clog pages are both protected by LSNs.
+
+
+
+ Non-WAL-logged changes to a temp table are also safe. Those changes
+ could reach disk in advance of T2's commit, but we don't care since
+ temp table contents don't survive crashes anyway.
+
+
+
+ Non-WAL-logged change made via one of the paths we have introduced
+ to avoid WAL overhead for bulk updates. In these cases it's entirely
+ possible for the data to reach disk before T1's commit, because T2 will
+ fsync it down to disk without any sort of interlock, as soon as it
+ finishes the bulk update. However, all these paths are designed to
+ write data that no other transaction can see until after T2 commits.
+ That commit must follow T1's in the WAL log, so until it has reached
+ disk, the contents of the bulk-updated file are unimportant after a
+ crash.
+
+
+
+ Transaction status hint bits are normally set for data rows changed
+ by a transaction. When we attempt to set a hint bit, we check the
+ LSN for a transaction and if this has already been written then we
+ will set the status hint bit. If we haven't yet flushed WAL up to
+ that LSN then we will defer writing the status hint bit for that row.
+
+
+