Обсуждение: [RFC] extended txid docs
Although the new txid functions are very clean 1:1 interface to the internal MVCC info and they don't need much docs in that respect, their "killer" usage comes from the possibility to query txids committed between 2 snapshots. But how to do that (efficiently) is far from obvious when just looking at the API. So with attached docs patch I try to fill the gap. Here I also show 2 variants for the common query helper function. But I'm pretty bad at SGML, english and writing docs, so please review it. In addition to english/typos/sgml the suspicious aspects are: - code style - writing style - used mostly PgQ terminology (ticks), could there be something better? - giving two variants of helper function may be too much Even the realistic code may be too much for general docs, but considering this is not a functionality covered by general SQL textbooks, I think it is worth having. I also put rendered pages up here: http://skytools.projects.postgresql.org/txid/datatype-txid-snapshot.html http://skytools.projects.postgresql.org/txid/functions-txid.html -- marko
Вложения
markokr@gmail.com ("Marko Kreen") writes: > Even the realistic code may be too much for general docs, > but considering this is not a functionality covered > by general SQL textbooks, I think it is worth having. > > I also put rendered pages up here: > > http://skytools.projects.postgresql.org/txid/datatype-txid-snapshot.html "The data type txid_snapshot stores info about what transaction ids are visible in a particular moment of time. Components are described in..." I'd suggest instead: "The data type txid_snapshot stores info about transaction ID visibility at a particular moment in time. The components are described in..." "Smallest txid that may be active. Below it all txids are visible." I'd suggest instead: "Earliest transaction ID that is still active. All earlier transactions will either be committed and visible, or rolled back and dead." "Next unassigned txid. Above it all txids are unassigned, thus invisible." I'd suggest instead: "Next unassigned txid. All txids later than this one are unassigned, and thus invisible." > http://skytools.projects.postgresql.org/txid/functions-txid.html "The main use of the functions comes from the fact that user can query txids that were committed between 2 snapshots. Asthis is slightly tricky, it is described here in details on the example of simple queue table." I'd suggest instead: "The main use of the functions is to determine which transactions were committed between 2 snapshots. As this is somewhat tricky, a demonstration of their use with a simple queue table is provided." "Then let there be table for snapshots, into which a separate process inserts a row with current snapshot after each 5 seconds (for example). Lets call it 'ticks' table:" I'd suggest instead: "We define a table to store snapshots, called 'ticks', into which a separate process inserts a row indicating a current transaction snapshot every 5 seconds." "Now if someone wants to read events from the queue table, then at first he needs to get 2 rows with snapshots from ticks table, then query for txids that were committed between those 2 snapshots on events table. Because the txids and snapshots are tied to PostgreSQL internal MVCC mechanism, the reader can be certain that the txid range queried stays constant." I'd suggest instead: "In order to consistently read event data for a particular period, then first the user must read 2 rows from the 'ticks' table that indicate, between them, transaction visibility information, and then search the event table for the txids that were committed between those 2 snapshots. Since the txid and snapshot values are tied to PostgreSQL's internal MVCC mechanism, the reader may be certain that the txid range queried is consistent." "But it will have problems if there are long transactions running. That means the snap1.xmin will stay at the position of running transaction and the range will get very large. This can be fixed by fetching only [snap1.xmax..snap2.xmax] by range and fetching possible txids below snap1.xmax explicitly:" I'd suggest instead: "But the query may be processed inefficiently if there are long-running transactions during the period. That would have the result that the snap1.xmin value would continue to refer to the elderly running transaction, and the range will grow very large. This may be rectified by fetching only [snap1.xmax..snap2.xmax] by range and, and fetching candidate txids earlier than snap1.xmax explicitly:" "But that is also slightly inefficient as long transactions can be open during several snapshots. So it would be good topick out exact transactions that were open at the time of snap1 and committed before snap2. That can be done with followingquery:" I'd suggest instead: "But that query is also somewhat inefficient because long-running transactions may be open across multiple snapshots. As a result, it may be more efficient to pick out exact transactions that were open at the time of snap1 and committed before snap2. That can be done with following query:" "As txids returned by last query are certainly interesting, their visiblity does not need additional checks. That means thefinal query can be in form:" I'd suggest instead: "As txids returned by that last query are certainly of interest, visibility checking does not require additional checks. That means the final query may of the form:" "Although the above queries are technically correct, PostgreSQL fails to plan them efficiently. The actual query should alwaysbe made with actual values written in." I'd suggest instead: "Although of the above queries are all technically correct, PostgreSQL will not plan them efficiently unless specific values are used. The actual query should always be executed using specific values." I believe that those suggested texts describe what you intended, and they should represent better English text for this. -- let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];; http://www3.sympatico.ca/cbbrowne/spreadsheets.html "What you said you want to do is roughly equivalent to nailing horseshoes to the tires of your Buick." -- danceswithcrows@usa.net on the question "Why can't Linux use Windows Drivers?"
On 10/16/07, Chris Browne <cbbrowne@acm.org> wrote: > markokr@gmail.com ("Marko Kreen") writes: > > Even the realistic code may be too much for general docs, > > but considering this is not a functionality covered > > by general SQL textbooks, I think it is worth having. > > > > I also put rendered pages up here: > > http://skytools.projects.postgresql.org/txid/datatype-txid-snapshot.html > > > http://skytools.projects.postgresql.org/txid/functions-txid.html > > I believe that those suggested texts describe what you intended, and > they should represent better English text for this. Thanks. Here is a version with your changes applied, plus minor code cleanup and example output. I uploaded full docs to above urls, should be easier to browse. -- marko
Вложения
"Marko Kreen" <markokr@gmail.com> writes: > Thanks. Here is a version with your changes applied, plus > minor code cleanup and example output. I can't really see the reasoning for putting this into the PG documentation. It's tremendously complicated and doesn't seem like something very many people would want to read about. In any case it seems rather out of place where it is --- we don't have large code examples elsewhere in func.sgml. It almost looks like something that should be turned into a pgfoundry or contrib module. regards, tom lane
On 10/17/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > Thanks. Here is a version with your changes applied, plus > > minor code cleanup and example output. > > I can't really see the reasoning for putting this into the PG > documentation. It's tremendously complicated and doesn't seem like > something very many people would want to read about. In any case > it seems rather out of place where it is --- we don't have large > code examples elsewhere in func.sgml. > > It almost looks like something that should be turned into a pgfoundry > or contrib module. The whole point of the functions it to allow doing snapshot-based queries. It is indeed tricky, but that increases the need for documentaton, no? I think the last "more realistic code" section can be dropped, it shows more user-friendly function but adds nothing new, and the code is rather unreadeable. -- marko
Marko Kreen wrote: > On 10/16/07, Chris Browne <cbbrowne@acm.org> wrote: > > markokr@gmail.com ("Marko Kreen") writes: > > > Even the realistic code may be too much for general docs, > > > but considering this is not a functionality covered > > > by general SQL textbooks, I think it is worth having. > > > > > > I also put rendered pages up here: > > > http://skytools.projects.postgresql.org/txid/datatype-txid-snapshot.html > > > > > http://skytools.projects.postgresql.org/txid/functions-txid.html > > > > I believe that those suggested texts describe what you intended, and > > they should represent better English text for this. > > Thanks. Here is a version with your changes applied, plus > minor code cleanup and example output. > > I uploaded full docs to above urls, should be easier to browse. I have applied part of your patch that documents the txid components in the datatype section. I didn't apply any of your example usage. I just added the mention that: The main use of these functions is to determine which transactions were committed between two snapshots. If you want to put those examples on a web site or pgfoundry, we can link to it from the documentation. Applied patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/datatype.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v retrieving revision 1.211 diff -c -c -r1.211 datatype.sgml *** doc/src/sgml/datatype.sgml 21 Oct 2007 20:04:37 -0000 1.211 --- doc/src/sgml/datatype.sgml 5 Nov 2007 14:35:49 -0000 *************** *** 3437,3442 **** --- 3437,3513 ---- </sect1> + <sect1 id="datatype-txid-snapshot"> + <title>Transaction Snapshot Type</title> + + <indexterm zone="datatype-txid-snapshot"> + <primary>txid_snapshot</primary> + </indexterm> + + <para> + The data type <type>txid_snapshot</type> stores info about transaction ID + visibility at a particular moment in time. The components are + described in <xref linkend="datatype-txid-snapshot-parts">. + </para> + + <table id="datatype-txid-snapshot-parts"> + <title>Snapshot components</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Name</entry> + <entry>Query Function</entry> + <entry>Description</entry> + </row> + </thead> + + <tbody> + + <row> + <entry><type>xmin</type></entry> + <entry>txid_snapshot_xmin()</entry> + <entry> + Earliest transaction ID that is still active. All earlier + transactions will either be committed and visible, or rolled + back and dead. + </entry> + </row> + + <row> + <entry><type>xmax</type></entry> + <entry>txid_snapshot_xmax()</entry> + <entry> + Next unassigned txid. All txids later than this one are + unassigned, and thus invisible. + </entry> + </row> + + <row> + <entry><type>xip_list</type></entry> + <entry>txid_snapshot_xip()</entry> + <entry> + Active txids at the time of snapshot. All of them are between + xmin and xmax. A txid that is <literal>xmin <= txid < + xmax</literal> and not in this list is visible. + </entry> + </row> + + </tbody> + </tgroup> + </table> + + <para> + Snapshot's textual representation is <literal>[xmin]:[xmax]:[xip_list]</literal> + for example <literal>10:20:10,14,15</literal> means + <literal>xmin=10 xmax=20 xip_list=10,14,15</literal>. + </para> + + <para> + Functions for getting and querying transaction ids and snapshots are + described in <xref linkend="functions-txid">. + </para> + </sect1> + <sect1 id="datatype-uuid"> <title><acronym>UUID</acronym> Type</title> Index: doc/src/sgml/func.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v retrieving revision 1.406 diff -c -c -r1.406 func.sgml *** doc/src/sgml/func.sgml 30 Oct 2007 19:06:56 -0000 1.406 --- doc/src/sgml/func.sgml 5 Nov 2007 14:35:50 -0000 *************** *** 11490,11495 **** --- 11490,11500 ---- as well. </para> + </sect1> + + <sect1 id="functions-txid"> + <title>Transaction ID and Snapshot Functions</title> + <indexterm> <primary>txid_current</primary> </indexterm> *************** *** 11562,11581 **** </table> <para> ! The internal transaction ID type (<type>xid</>) is 32 bits wide and so ! it wraps around every 4 billion transactions. However, these functions ! export a 64-bit format that is extended with an <quote>epoch</> counter ! so that it will not wrap around for the life of an installation. </para> </sect1> ! <sect1 id="functions-admin"> ! <title>System Administration Functions</title> ! <para> ! <xref linkend="functions-admin-set-table"> shows the functions ! available to query and alter run-time configuration parameters. ! </para> <table id="functions-admin-set-table"> <title>Configuration Settings Functions</title> --- 11567,11589 ---- </table> <para> ! The internal transaction ID type (<type>xid</>) is 32 bits wide and ! so it wraps around every 4 billion transactions. However, these ! functions export a 64-bit format that is extended with an ! <quote>epoch</> counter so that it will not wrap around for the life ! of an installation. The main use of these functions is to determine ! which transactions were committed between two snapshots. </para> + </sect1> ! <sect1 id="functions-admin"> ! <title>System Administration Functions</title> ! <para> ! <xref linkend="functions-admin-set-table"> shows the functions ! available to query and alter run-time configuration parameters. ! </para> <table id="functions-admin-set-table"> <title>Configuration Settings Functions</title>