Re: Add a new table for Transaction Isolation?

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Add a new table for Transaction Isolation?
Дата
Msg-id 1691752982.3879494.1429908040587.JavaMail.yahoo@mail.yahoo.com
обсуждение исходный текст
Ответ на Re: Add a new table for Transaction Isolation?  ("David G. Johnston" <david.g.johnston@gmail.com>)
Ответы Re: Add a new table for Transaction Isolation?
Список pgsql-docs
David G. Johnston <david.g.johnston@gmail.com> wrote:
> On Fri, Apr 24, 2015 at 9:57 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> On 4/17/15 7:36 PM, David G. Johnston wrote:

>>> +  <para>
>>> +   The concepts covered in this section are
>>> +   presented without examples of the behaviors described.  The internet,
>>> +   including and espcially the <productname>PostgreSQL</productname> Wiki, is
>>> +   an excellent resource to learn more about circumstances under which these
>>> +   data phenomena occur, and what the results look like when they do.
>>> +  </para>
>>
>> I don't think our documentation should go out of its way to say, "our
>> documentation is bad, look elsewhere".  If we think examples are
>> necessary, then we should add some.  Otherwise, it's implied that
>> improvement is always possible.
>
> ​I'm not - I am explicitly listing the assumptions the
> documentation makes regarding reader experience (and ease of
> documenting) - and pointing out were the reader can go if their
> experience is lacking in those areas.​  It seems unproductive to
> move all of the SSI content on our Wiki into the documentation and
> so, lacking such, we should point out where else content can be
> found.

There have been suggestions before that some or all of the Wiki's
SSI page be brought into the docs, or that the docs reference it.
Bringing all of it in does seem like quite a lot for a single
feature like this.  I'm not sure what the best course is.

>>>      <table tocentry="1" id="mvcc-isolevel-table">
>>> -     <title>Standard <acronym>SQL</acronym> Transaction Isolation Levels</title>
>>> +     <title><acronym>SQL</acronym> Standard Transaction Isolation Levels</title>
>>>       <tgroup cols="4">
>>>        <thead>
>>>         <row>
>>
>> Why this change?
>
> ​The new table reads "PostgreSQL ..." and the corresponding noun is
> the "SQL Standard".  Writing "Standard SQL" can be read as implying
> the existence of "Non-Standard SQL..." which is not correct.  Just
> saying "SQL..." seems to be too generic - though after reading the
> conclusion and pondering that "SQL Standard" could also imply "SQL
> Non-Standard..." I'm not so sure whether just saying SQL wouldn't
> be best.  "Here are the four words that can be used with SET
> TRANSACTION ISOLATION LEVEL..." - and then show/describe the
> minimum required non-behaviors and the non-behaviors as implemented
> in PostgreSQL.
>
> ​Maybe possessive would work "PostgreSQL's ..." and "SQL Standard's..."​

Personally I think that "standard SQL" means "SQL, as defined by
international standards documents."  I see no benefit to changes
along the lines suggested here.

>>>     <para>
>>> +    The three <productname>PostgreSQL</productname> transaction isolation levels, and their corresponding
>>> +    behaviors, are described in <xref linkend="mvcc-pgsql-isolevel-table">.
>>> +   </para>
>>
>> This isn't really correct.  The PostgreSQL isolation levels were
>> described in the paragraph above.  The table is really just a summary of
>> the previous explanation.
>
> ​"[...], are summarized in <xref...>" ?

The problem with tables like this is that sometimes people just
look at the table and assume that it is the *definition* of the
isolation levels.  At *no* point did *any* version of the SQL
standard *ever* define the serializable transaction isolation level
in terms of the phenomena shown in the table.  The definition has
always been:

| The execution of concurrent SQL-transactions at isolation level
| SERIALIZABLE is guaranteed to be serializable. A serializable
| execution is defined to be an execution of the operations of
| concurrently executing SQL-transactions that produces the same
| effect as some serial execution of those same SQL-transactions. A
| serial execution is one in which each SQL-transaction executes to
| completion before the next SQL-transaction begins.

Serializable transactions have been included in the table of which
phenomena are allowed to occur at which isolation levels; but the
table has always been followed by this note:

| The exclusion of these phenomena for SQL-transactions executing
| at isolation level SERIALIZABLE is a consequence of the
| requirement that such transactions be serializable.

Yet so many people have not looked beyond the table to see the
actual definition of "serializable" in the standard that the
absence of these three phenomena has often been mistakenly
considered adequate for compliance with the standard.  A 1995 paper
titled "A Critique of ANSI SQL Isolation Levels" by Berenson, et
al, notes this, saying:

| Subclause 4.28, “SQL-transactions”, in [ANSI] notes that the
| SERIALIZABLE isolation level must provide what is “commonly known
| as fully serializable execution.” The  prominence of the table
| compared to this extra proviso leads to a common misconception
| that disallowing the three phenomena implies serializability.

... and later observes:

| It would have been simpler [...] to drop [references to phantom
| reads] and just use Subclause 4.28 to define ANSI SERIALIZABLE.

I tend to agree.  Not only would it have been simpler, I think it
would have prevented a lot of misunderstanding of the requirements
of the standard.  Tables like this can do a lot more to promote
confusion and misunderstanding than clarity.  If we're going to
make a change here, I think rather than doubling down on the
standard's questionable inclusion of such a table by providing
*two* tables, we should consider removing the existing table.

> ​​Then why not sure write the entire section relative to the
> standard and point out the differences between the standard and our
> implementation on the command definition page in the compatibility
> section?

Many people don't have access to the standard, the standard is
confusing to many, and the standard is specifically written to
specify minimum required behaviors rather than anything that is
dependent on implementation.  The standard does not say that the
READ UNCOMMITTED transaction isolation level allows other
transactions to see the uncommitted work of a transaction; it
merely says that no other transaction isolation level may do so.
The same is true with all the phenomena -- our implementation does
not "differ" from the standard on those points; it is in full
compliance with it.

> ​Otherwise, a summary table describing our implementation seems
> like a self-evident need.  We are already going to great lengths to
> describe everything in the table anyway and we already are using a
> table to describe the standard's definitions.​  Placing said table
> here seems easiest and if summarizing what is already present in
> the text somehow makes the section more confusing I posit that it
> must already be confusing without the table.  At least this way the
> confusing stuff is summarized and is readily available for lookup
> by those who know what they are looking for.

But the table, by its nature, does not provide the full set of
information, and too many people just look at the table because
"it's easy".  The question seems to me to be whether providing an
easy way to get an inaccurate understanding of the topic has value;
I submit that the confusion caused by the table in the standard (in
spite of a note immediately after the table to try to prevent that)
shows that it is not.

> Two separate patches here:
>
> 1) ​pointing out that additional information is available on the
> wiki and the internet

That and/or bringing in one or more of the Wiki example.

> 2) summarizing the PostgreSQL implementation into a table similar
> to that already present for the Standard
>
> #2 can be implemented in the MVCC section or a more extensive patch
> can also update the SQL command SET TRANSACTION section - which
> will mean someone feels strongly enough that the status quo is
> better than updating MVCC while waiting for someone to write the
> more invasive patch.

And, for reasons given above, I really question whether such a
table doesn't do more harm than good.  Even those citing the paper
by Berenson, et al., often miss the text in *that* paper about what
the actual definition of serializable transactions in the standard
is, and instead focus on the quick-to-read tables of how the
misinterpretation of serializable transactions based on the
standard's table of phenomena (which the paper dubs "ANOMALY
SERIALIZABLE") differs from truly serializable behavior.

People do love tables like this, which makes providing them
tempting; but when a short, clean table is available they often
seem less inclined to take the trouble to read the real information
the table summarizes -- and they come away with distorted and
incorrect ideas about the subject matter.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-docs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Add a new table for Transaction Isolation?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Add a new table for Transaction Isolation?