Index: doc/src/sgml/config.sgml =================================================================== RCS file: /projects/cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.130 diff -c -r1.130 config.sgml *** doc/src/sgml/config.sgml 30 Jun 2007 19:12:01 -0000 1.130 --- doc/src/sgml/config.sgml 13 Jul 2007 11:17:36 -0000 *************** *** 1412,1417 **** --- 1412,1457 ---- + + + wal_writer_delay (integer) + + wal_writer_delay configuration parameter + + + + Specifies the delay between activity rounds for the WAL Writer. In each + round the writer will flush WAL to disk. It then sleeps for + wal_writer_delay milliseconds, and repeats. The default value is 200 + milliseconds (200ms). Note that on many systems, the effective + resolution of sleep delays is 10 milliseconds; setting wal_writer_delay + to a value that is not a multiple of 10 might have the same results as + setting it to the next higher multiple of 10. This parameter can only + be set in the postgresql.conf file or on the server command line. + + + + + + synchronous_commit (boolean) + + synchronous_commit configuration parameter + + + + Specifies whether explicit or implicit commit commands will wait + for WAL to be written to disk before the command returns success. + Turning this parameter off will give you asynchronous commits, + which will increase performance for some workloads, though + introduces risk of data loss, see . + + + This parameter can be set for individual transactions or sessions, + so it is advisable to use asynchronous commits only for those + transactions for which risk of data loss is acceptable. + + + wal_buffers (integer) Index: doc/src/sgml/wal.sgml =================================================================== RCS file: /projects/cvsroot/pgsql/doc/src/sgml/wal.sgml,v retrieving revision 1.44 diff -c -r1.44 wal.sgml *** doc/src/sgml/wal.sgml 28 Jun 2007 00:02:37 -0000 1.44 --- doc/src/sgml/wal.sgml 13 Jul 2007 11:17:38 -0000 *************** *** 23,29 **** ordinarily meets this requirement. In fact, even if a computer is fatally damaged, if the disk drives survive they can be moved to another computer with similar hardware and all committed ! transactions will remain intact. --- 23,33 ---- ordinarily meets this requirement. In fact, even if a computer is fatally damaged, if the disk drives survive they can be moved to another computer with similar hardware and all committed ! transactions will remain intact. We refer to this mode of operation ! as synchronous commit, since the user issuing the commit waits for ! the writing of the WAL to permanent storage. ! Asynchronous commit is also possible as a performance option, ! described in *************** *** 394,397 **** --- 398,586 ---- seem to be a problem in practice. + + + Asynchronous Commit + + + synchronous commit + + + + asynchronous commit + + + + Asynchronous Commit allows a commit command to complete faster, at the + cost that the most recent transactions will be lost if the database + should crash. This feature is particularly useful for real-time + sensor data collection applications, such as RFID tags or other + monitoring applications. + + + + Normal commits, or as we now refer to them, synchronous commits, wait + for the writing of the WAL to permanent storage + before returning control to the user. An asynchronous commit will + return control to the user before WAL + data has been written and flushed to disk, which gives a significant + performance boost if there has been few other disk accesses over the course + of this transaction. From an SQL perspective, this makes asynchronous + commits particularly useful for shorter, mainly INSERT transactions or + writes on tables small enough to reside mainly in memory. + + + + Asynchronous commits introduce the risk of data loss. There + is a time window between the time that a commit has returned + successfully to the user and the time of the WAL write during which + data will certainly be lost if the server crashes. + The data loss is deterministic: if the server stays up then the + commits will be durable, while if the server crashes there will be + certain data loss of the transactions that have most recently + committed. + + + + The user can select the commit mode of their transactions, so that + it is possible to have both synchronous and asynchronous commit + transactions concurrently. The proof that this is safe is presented + later in this section. The commit mode is + controlled by the user settable parameter + . + synchronous_commit may be set in + postgresql.conf, or for a specific session using + + SET synchronous_commit = off + + which will provide faster, asynchronous commits. + synchronous_commit can be set at any point right + up to the final COMMIT statement. This parameter may also be set for + just one individual transaction using + + SET LOCAL synchronous_commit = off + + + + + Whatever the setting of synchronous_commit, + commits will always be synchronous for utility commands, such + as VACUUM, as well as for any transaction that created or removed + files. + + + + Risk of Data Loss + + + If the database crashes during the risk window between the + asynchronous commit and the writing of the WAL + then data written during that transaction will be lost. + Only data written by asynchronous transactions will be lost, just as + if the transaction had never actually completed. Transactions are + atomic, so this data loss does not propagate and any already + written data is completely safe. + + + + The window of data loss is limited because the WAL Writer process + regularly writes WAL to disk every + milliseconds. + The actual maximum duration of the risk window is twice the + wal_writer_delay because the writes are optimised + to favor writing whole pages at a time during busy periods. + + + + The extent of data loss for asynchronous commit transactions will be + limited to the number of transactions that normally complete in the + duration of the maximum time window. You can calculate this for your + specific application/system, though typically this would be in the + ballpark of 1000 transactions. This aspect means that asynch commits + are a useful technique for applications such as sensor measurements or + web statistics where a steady stream of data must be processed, yet + individual measurements are not particularly important. Data loss + would not be acceptable when the data processed is financial + transactions or other valuable customer commitments. + + + + Asynchronous commit provides different behaviour to setting + fsync = off, since that is a server-wide + setting that will alter the behaviour of all transactions, + overriding the setting of synchronous_commit, + as well as risking much wider data loss. With fsync + = off the WAL written but not fsynced, so data is lost only in case + of a system crash. With asynchronous commit the WAL is not written + at all by the user, so data is lost if there is a database server crash. + + + + commit_delay also sounds very similar to + asynchronous commit, but it is actually a synchronous commit + with an additional wait that allows a technique known as group + commit. commit_delay is also a server-wide setting. + + + + + + Proof of safety for concurrent use + + + It is useful that we can run both synchronous and asynchronous + commit transactions concurrently, though there must be a clear proof + to ensure we minimise the possibility of technical error. + + + + We have two transactions, T1 and T2. The Log Sequence Number (LSN) + is the point in the WAL sequence where a transaction commit is + recorded, so LSN1 and LSN2 are the commit records of those transactions. + If T2 can see changes made by T1 then when T2 commits it + must be true that LSN2 > LSN1. Thus when T2 commits it is certain + that all of the changes made by T1 are also now recorded in the WAL. + This situation is true whether or not T1 was asynchronous or + synchronous. As a result, it is safe for asynchronous commits and + synchronous commits to work concurrently without endangering data + written by synchronous commits. Sub-transactions are not important + here since the final write to disk only occurs at the commit of + the top level transaction. + + + + Changes to data blocks cannot reach disk unless WAL is flushed up + to the point of the LSN of the data blocks. Any attempt to write + unsafe data to disk will trigger a write which ensures the safety + of all data written by that and prior transactions. Data blocks + and clog pages are both protected by LSNs. + + + + Non-WAL-logged changes to a temp table are also safe. Those changes + could reach disk in advance of T2's commit, but we don't care since + temp table contents don't survive crashes anyway. + + + + Non-WAL-logged change made via one of the paths we have introduced + to avoid WAL overhead for bulk updates. In these cases it's entirely + possible for the data to reach disk before T1's commit, because T2 will + fsync it down to disk without any sort of interlock, as soon as it + finishes the bulk update. However, all these paths are designed to + write data that no other transaction can see until after T2 commits. + That commit must follow T1's in the WAL log, so until it has reached + disk, the contents of the bulk-updated file are unimportant after a + crash. + + + + Transaction status hint bits are normally set for data rows changed + by a transaction. When we attempt to set a hint bit, we check the + LSN for a transaction and if this has already been written then we + will set the status hint bit. If we haven't yet flushed WAL up to + that LSN then we will defer writing the status hint bit for that row. + + +