Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.130
diff -r1.130 config.sgml
1414a1415,1454
> 
>      <varlistentry id="guc-wal-writer-delay" xreflabel="wal_writer_delay">
>       <term><varname>wal_writer_delay</varname> (<type>integer</type>)</term>
>       <indexterm>
>        <primary><varname>wal_writer_delay</> configuration parameter</primary>
>       </indexterm>
>       <listitem>
>        <para>
> 		Specifies the delay between activity rounds for the WAL Writer. In each
> 		round the writer will flush WAL to disk. It then sleeps for 
> 		wal_writer_delay milliseconds, and repeats. The default value is 200 
> 		milliseconds (200ms). Note that on many systems, the effective 
> 		resolution of sleep delays is 10 milliseconds; setting wal_writer_delay 
> 		to a value that is not a multiple of 10 might have the same results as 
> 		setting it to the next higher multiple of 10. This parameter can only 
> 		be set in the postgresql.conf file or on the server command line.
>        </para>
>       </listitem>
>      </varlistentry>
>      
>      <varlistentry id="guc-synchronous-commit" xreflabel="synchronous_commit">
>       <term><varname>synchronous_commit</varname> (<type>boolean</type>)</term>
>       <indexterm>
>        <primary><varname>synchronous_commit</> configuration parameter</primary>
>       </indexterm>
>       <listitem>
>        <para>
> 		Specifies whether explicit or implicit commit commands will wait 
> 		for WAL to be written to disk before the command returns success.
> 		Turning this parameter off will give you asynchronous commits, 
> 		which will increase performance for some workloads, though 
> 		introduces risk of data loss, see <xref linkend="wal-asynch-commit">.
>        </para>
>        <para>
> 		This parameter can be set for individual transactions or sessions,
> 		so it is advisable to use asynchronous commits only for those
> 		transactions for which risk of data loss is acceptable.
>        </para>
>       </listitem>
>      </varlistentry>
Index: doc/src/sgml/wal.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/wal.sgml,v
retrieving revision 1.44
diff -r1.44 wal.sgml
26c26,30
<    transactions will remain intact.
---
>    transactions will remain intact. We refer to this mode of operation 
>    as synchronous commit, since the user issuing the commit waits for 
>    the writing of the <xref linkend="wal-intro"> to permanent storage. 
>    Asynchronous commit is also possible as a performance option, 
>    described in <xref linkend="wal-asynch-commit">
396a401,594
> 
>  <sect1 id="wal-asynch-commit">
>   <title>Asynchronous Commit</title>
> 
>    <indexterm>
>     <primary>synchronous commit</primary>
>    </indexterm>
> 
>    <indexterm>
>     <primary>asynchronous commit</primary>
>    </indexterm>
> 
>   <para>
>    Asynchronous Commit allows a commit command to complete faster, at the 
>    cost that the most recent transactions will be lost if the database 
>    should crash. This feature is particularly useful for real-time
>    sensor data collection applications, such as RFID tags or other
>    monitoring applications.
>   </para>
> 
>   <para>
>    Normal commits, or as we now refer to them, synchronous commits, wait
>    for the writing of the <acronym>WAL</acronym> to permanent storage
>    before returning control to the user.  An asynchronous commit will
>    return control to the user <emphasis>before</> <acronym>WAL</acronym> 
>    data has been written and flushed to disk, which gives a significant 
>    performance boost if there has been few other disk accesses over the course 
>    of this transaction.  From an SQL perspective, this makes asynchronous 
>    commits particularly useful for shorter, mainly INSERT transactions or 
>    writes on tables small enough to reside mainly in memory.
>   </para>
> 
>   <para>
>    Asynchronous commits introduce the risk of data loss. There
>    is a time window between the time that a commit has returned
>    successfully to the user and the time of the WAL write during which
>    data will certainly be lost <emphasis>if</> the server crashes.  
>    The data loss is deterministic: if the server stays up then the 
>    commits will be durable, while if the server crashes there will be 
>    certain data loss of the transactions that have most recently
>    committed.
>   </para>
> 
>   <para>
>    The user can select the commit mode of their transactions, so that 
>    it is possible to have both synchronous and asynchronous commit 
>    transactions concurrently.  The proof that this is safe is presented 
>    later in this section.  The commit mode is 
>    controlled by the user settable parameter 
>    <xref linkend="guc-synchronous-commit">. 
>    <varname>synchronous_commit</varname> may be set in 
>    <filename>postgresql.conf</filename>, or for a specific session using 
> <programlisting>
> SET synchronous_commit = off
> </programlisting>
>    which will provide faster, asynchronous commits. 
>    <varname>synchronous_commit</varname> can be set at any point right 
>    up to the final COMMIT statement.  This parameter may also be set for
>    just one individual transaction using 
> <programlisting>
> SET LOCAL synchronous_commit = off
> </programlisting>
>   </para>
> 
>   <para>
>    Whatever the setting of <varname>synchronous_commit</varname>, 
>    commits will always be synchronous for utility commands, such 
>    as VACUUM, as well as for any transaction that created or removed 
>    files.
>   </para>
> 
>   <sect2 id="wal-asynch-commit-risk">
>    <title>Risk of Data Loss</title>
> 
>    <para>
> 	If the database crashes during the risk window between the 
> 	asynchronous commit and the writing of the <acronym>WAL</acronym>
>     then data written during that transaction <emphasis>will</> be lost.
> 	Only data written by asynchronous transactions will be lost, just as 
> 	if the transaction had never actually completed.  Transactions are
> 	atomic, so this data loss does not propagate and any already 
> 	written data is completely safe.
>    </para>
> 
>    <para>
> 	The window of data loss is limited because the WAL Writer process 
> 	regularly writes <acronym>WAL</acronym> to disk every 
> 	<xref linkend="guc-wal-writer-delay"> milliseconds.
> 	The actual maximum duration of the risk window is twice the 
> 	<varname>wal_writer_delay</varname> because the writes are optimised
> 	to favor writing whole pages at a time during busy periods.
>    </para>
> 
>    <para>
> 	The extent of data loss for asynchronous commit transactions will be
> 	limited to the number of transactions that normally complete in the
> 	duration of the maximum time window.  You can calculate this for your 
> 	specific application/system, though typically this would be in the 
> 	ballpark of 1000 transactions.  This aspect means that asynch commits
> 	are a useful technique for applications such as sensor measurements or
> 	web statistics where a steady stream of data must be processed, yet
> 	individual measurements are not particularly important.  Data loss 
> 	would not be acceptable when the data processed is financial
> 	transactions or other valuable customer commitments.
>    </para>
> 
>    <para>
>     Asynchronous commit provides different behaviour to setting 
>     <varname>fsync</varname> = off, since that is a server-wide
>     setting that will alter the behaviour of all transactions,
>     overriding the setting of <varname>synchronous_commit</varname>,
>     as well as risking much wider data loss.  With <varname>fsync</varname> 
> 	= off the WAL written but not fsynced, so data is lost only in case
> 	of a system crash. With asynchronous commit the WAL is not written
> 	to disk at all by the user, so data is lost if there is a database 
> 	server crash, as well as when the system crashes.
>    </para>
> 
>    <para>
>     <varname>commit_delay</varname> also sounds very similar to 
> 	asynchronous commit, but it is actually a synchronous commit 
> 	with an additional wait that allows a technique known as group 
> 	commit. <varname>commit_delay</varname> is also a server-wide setting.
>    </para>
> 
>   </sect2>
> 
>   <sect2 id="wal-asynch-commit-proof">
>    <title>Proof of safety for concurrent use</title>
> 
>    <para>
> 	It is useful that we can run both synchronous and asynchronous 
> 	commit transactions concurrently, though there must be a clear proof
> 	to ensure we minimise the possibility of technical error.
>    </para>
> 
>    <para>
> 	We have two transactions, T1 and T2.  The 
> 	<firstterm>Log Sequence Number</firstterm> (<acronym>LSN</acronym>)
> 	is the point in the WAL sequence where a transaction commit is  
> 	recorded, so LSN1 and LSN2 are the commit records of those transactions. 
> 	If T2 can see changes made by T1 then when T2 commits it
> 	must be true that LSN2 follows LSN1.  Thus when T2 commits it is certain
> 	that all of the changes made by T1 are also now recorded in the WAL.
> 	This situation is true whether or not T1 was asynchronous or
> 	synchronous.  As a result, it is safe for asynchronous commits and 
> 	synchronous commits to work concurrently without endangering data 
> 	written by synchronous commits.  Sub-transactions are not important 
> 	here since the final write to disk only occurs at the commit of 
> 	the top level transaction.
>    </para>
> 
>    <para>
> 	Changes to data blocks cannot reach disk unless <acronym>WAL</acronym> 
> 	is flushed up to the point of the <acronym>LSN</acronym> of the data 
> 	blocks.  Any attempt to write unsafe data to disk will trigger a write 
> 	which ensures the safety of all data written by that and prior 
> 	transactions. Data blocks and clog pages are both protected by 
> 	<acronym>LSN</acronym>s.
>    </para>
> 
>    <para>
> 	Changes to a temp table are also safe.  Those changes could reach disk
> 	in advance of T2's commit, but we don't care since temp table contents 
> 	don't survive crashes anyway.
>    </para>
> 
>    <para>
> 	Database writes made any of the paths we have introduced
> 	to avoid <acronym>WAL</acronym> overhead for bulk updates are also safe.
> 	In these cases it's entirely possible for the data to reach disk before 
> 	T1's commit, because T2 will fsync it down to disk without any sort of 
> 	interlock, as soon as it finishes the bulk update.  However, all these 
> 	paths are designed to write data that no other transaction can see until 
> 	after T2 commits.  LSN2 follows LSN1, so until it has reached disk
> 	the contents of the bulk-updated file are unimportant after a crash.
>    </para>
> 
>    <para>
> 	Each row version contains the transaction numbers of the transactions
> 	that inserted or deleted them, to allow <acronym>MVCC</acronym> 
> 	visibility checks. To reduce accesses to the Commit Log we normally
> 	set transaction status hint bits are normally set on each row on the 
> 	first visibility check after the commit or abort of that transaction.
> 	When we attempt to set a commit hint bit for a normal transaction, we 
> 	check the <acronym>LSN</acronym> for a transaction and if this has already 
> 	been written then we will set the status hint bit.  If we haven't yet 
> 	flushed <acronym>WAL</acronym> up to that <acronym>LSN</acronym> then 
> 	we will defer writing the status hint bit for that row. A later 
> 	visibility test will be able to set it correctly within a short time,
> 	as already described for the WAL writer process.
>    </para>
>   </sect2>
>  </sect1>