Обсуждение: MVCC snapshot timing

Поиск
Список
Период
Сортировка

MVCC snapshot timing

От
Bruce Momjian
Дата:
I received a private email report that our introductory MVCC
documentation is unclear about when a snapshot is taken.  I have
adjusted the wording in the attached patch to be less precise about
snapshot timing. Snapshot timing is controlled by the session isolation
level, which I don't think we want to cover in this introductory
paragraph.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


Вложения

Re: MVCC snapshot timing

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> I received a private email report that our introductory MVCC
> documentation is unclear about when a snapshot is taken.  I have
> adjusted the wording in the attached patch to be less precise about
> snapshot timing. Snapshot timing is controlled by the session isolation
> level, which I don't think we want to cover in this introductory
> paragraph.

I'm not really seeing the point of s/transaction/session/ here.
The phrasing is a bit awkward and maybe could be improved, but I think you
should keep it referring to transactions.

            regards, tom lane


Re: MVCC snapshot timing

От
Bruce Momjian
Дата:
On Mon, Nov 11, 2013 at 03:39:45PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I received a private email report that our introductory MVCC
> > documentation is unclear about when a snapshot is taken.  I have
> > adjusted the wording in the attached patch to be less precise about
> > snapshot timing. Snapshot timing is controlled by the session isolation
> > level, which I don't think we want to cover in this introductory
> > paragraph.
>
> I'm not really seeing the point of s/transaction/session/ here.
> The phrasing is a bit awkward and maybe could be improved, but I think you
> should keep it referring to transactions.

Well, the problem with the original wording is that we don't take a new
snapshot for every transaction in the default read-committed mode.
Would you prefer I refer to statements, e.g.:

     This means that while querying a database each statement sees

This is our default behavior.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


Re: MVCC snapshot timing

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Mon, Nov 11, 2013 at 03:39:45PM -0500, Tom Lane wrote:
>> I'm not really seeing the point of s/transaction/session/ here.

> Well, the problem with the original wording is that we don't take a new
> snapshot for every transaction in the default read-committed mode.

We take at least one snapshot per transaction, in any mode.  Referring
to sessions makes it even further away from being a useful concept.

> Would you prefer I refer to statements, e.g.:

'Statement' might work.

            regards, tom lane


Re: MVCC snapshot timing

От
Bruce Momjian
Дата:
On Mon, Nov 11, 2013 at 08:59:35PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Mon, Nov 11, 2013 at 03:39:45PM -0500, Tom Lane wrote:
> >> I'm not really seeing the point of s/transaction/session/ here.
>
> > Well, the problem with the original wording is that we don't take a new
> > snapshot for every transaction in the default read-committed mode.
>
> We take at least one snapshot per transaction, in any mode.  Referring
> to sessions makes it even further away from being a useful concept.
>
> > Would you prefer I refer to statements, e.g.:
>
> 'Statement' might work.

OK, updated patch attached.  Is "statement" too vague here?  SQL
statement?  query?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: MVCC snapshot timing

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Mon, Nov 11, 2013 at 08:59:35PM -0500, Tom Lane wrote:
>> 'Statement' might work.

> OK, updated patch attached.  Is "statement" too vague here?  SQL
> statement?  query?

"SQL statement" might be a good idea in the first sentence, but
I don't think you need to repeat it in the second.

What's bothering me about this wording is that you're talking about
statements and then suddenly reference transactions (as being "those
other things messing with your data").  This seems weirdly asymmetric,
since after all you could equally well be the one messing with their
data.

            regards, tom lane


Re: MVCC snapshot timing

От
Bruce Momjian
Дата:
On Mon, Nov 11, 2013 at 09:27:15PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Mon, Nov 11, 2013 at 08:59:35PM -0500, Tom Lane wrote:
> >> 'Statement' might work.
>
> > OK, updated patch attached.  Is "statement" too vague here?  SQL
> > statement?  query?
>
> "SQL statement" might be a good idea in the first sentence, but
> I don't think you need to repeat it in the second.
>
> What's bothering me about this wording is that you're talking about
> statements and then suddenly reference transactions (as being "those
> other things messing with your data").  This seems weirdly asymmetric,
> since after all you could equally well be the one messing with their
> data.

Yes, that bugged me too, but then I realized that you only see the
changes from a transaction when it completes, not from each statement,
e.g. you can never see changes between statements of a multi-statement
transaction.

I used "SQL statement" in the updated, attached patch.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: MVCC snapshot timing

От
David Johnston
Дата:
This reads badly to my ears:


> This means that while querying a database each SQL statement sees a
> snapshot of data (a database version) as it was some time ago, regardless
> of the current state of the underlying data.

How about something closer to:


> This means for each SQL statement the user can specify a relative
> point-in-time snapshot (database version) of the database against which to
> query.  These snapshot options are 1) the most recent committed data
> currently available database-wide - including implicit commits (see note),
> or 2) the committed data as-of the beginning of the current transaction -
> including any changes made in the same.
>
> Note: an implicit commit occurs only within a multi-statement transaction.
> For the purpose of determining if data has been committed any prior
> statements in the same transaction are deemed to have been committed when
> viewed by later statements.

I know this is an introduction paragraph so the broad concept is being
focused on rather than how such a user would in fact make this choice.

I don't know that the term "implicit commit" is used elsewhere, likely not,
but in effect that is what a statement in a transaction is seeing with
respect to prior statements in the same transaction.  Naming this behavior
in the introduction would allow for someone less verbose descriptions to be
used in detail sections.

The above could be better integrated into the intro but I wanted to get
opinions on the approach first.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5777852.html
Sent from the PostgreSQL - docs mailing list archive at Nabble.com.


Re: MVCC snapshot timing

От
David Johnston
Дата:
David Johnston wrote
> This reads badly to my ears:
>> This means that while querying a database each SQL statement sees a
>> snapshot of data (a database version) as it was some time ago, regardless
>> of the current state of the underlying data.
> How about something closer to:
>> This means for each SQL statement the user can specify a relative
>> point-in-time snapshot (database version) of the database against which
>> to query.  These snapshot options are 1) the most recent committed data
>> currently available database-wide - including implicit commits (see
>> note), or 2) the committed data as-of the beginning of the current
>> transaction - including any changes made in the same.
>>
>> Note: an implicit commit occurs only within a multi-statement
>> transaction.  For the purpose of determining if data has been committed
>> any prior statements in the same transaction are deemed to have been
>> committed when viewed by later statements.
> I know this is an introduction paragraph so the broad concept is being
> focused on rather than how such a user would in fact make this choice.
>
> I don't know that the term "implicit commit" is used elsewhere, likely
> not, but in effect that is what a statement in a transaction is seeing
> with respect to prior statements in the same transaction.  Naming this
> behavior in the introduction would allow for someone less verbose
> descriptions to be used in detail sections.
>
> The above could be better integrated into the intro but I wanted to get
> opinions on the approach first.
>
> David J.

So with the comment about implicit commits the phrase "including any changes
made in the same." can be dropped since that is what I was trying to imply
before I devised the new term.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5777854.html
Sent from the PostgreSQL - docs mailing list archive at Nabble.com.


Re: MVCC snapshot timing

От
Bruce Momjian
Дата:
On Mon, Nov 11, 2013 at 07:25:59PM -0800, David Johnston wrote:
> This reads badly to my ears:
>
>
> > This means that while querying a database each SQL statement sees a
> > snapshot of data (a database version) as it was some time ago, regardless
> > of the current state of the underlying data.
>
> How about something closer to:
>
>
> > This means for each SQL statement the user can specify a relative
> > point-in-time snapshot (database version) of the database against which to
> > query.  These snapshot options are 1) the most recent committed data
> > currently available database-wide - including implicit commits (see note),
> > or 2) the committed data as-of the beginning of the current transaction -
> > including any changes made in the same.
> >
> > Note: an implicit commit occurs only within a multi-statement transaction.
> > For the purpose of determining if data has been committed any prior
> > statements in the same transaction are deemed to have been committed when
> > viewed by later statements.
>
> I know this is an introduction paragraph so the broad concept is being
> focused on rather than how such a user would in fact make this choice.
>
> I don't know that the term "implicit commit" is used elsewhere, likely not,
> but in effect that is what a statement in a transaction is seeing with
> respect to prior statements in the same transaction.  Naming this behavior
> in the introduction would allow for someone less verbose descriptions to be
> used in detail sections.
>
> The above could be better integrated into the intro but I wanted to get
> opinions on the approach first.

We just want to get across the MVCC concept in the intro --- we cover
the snapshots later in the document.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


Re: MVCC snapshot timing

От
David Johnston
Дата:
Bruce Momjian wrote
> We just want to get across the MVCC concept in the intro --- we cover
> the snapshots later in the document.

I just think we're being too vague here; and we are covering them in the
intro with the use of "some point in the past".

IMO, the main point regarding MVCC is that every change in the system
creates a new record and causes a prior record to be invalidated at a
point-in-time.  The combination of these two things increases concurrency
since you can create new records while people are still using the old ones.
One consequence, though, is that it is necessary for the user to decide at
what point in the timeline they want to view the database.

Does this sound right?

The current (and modified) intro indeed covers these two points so it really
comes down to style.

My current gut feel is the documentation (generally speaking) does a good
job of describing the mechanics of the system but, in some areas, could use
more detail as to why and also the various implications of those mechanics
[1].  Bringing those up in the intro gives the reader additional context so
that when they get into the "how" detail sections they can more quickly link
the mechanics with the problem they are meant to solve.  Thus introducing
the more specific "snapshot" concept in the intro provides the context for
when they are reading why isolation levels exist.

[1] Not that I'm proactively looking; but when questions arise regarding the
docs I try and put myself in the person's shoes and find not that the docs
are incorrect but that they could be improved - which is just a part of our
reality).

My $0.02

David J.





--
View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5778016.html
Sent from the PostgreSQL - docs mailing list archive at Nabble.com.


Re: MVCC snapshot timing

От
Bruce Momjian
Дата:
On Tue, Nov 12, 2013 at 03:36:01PM -0800, David Johnston wrote:
> Bruce Momjian wrote
> > We just want to get across the MVCC concept in the intro --- we cover
> > the snapshots later in the document.
>
> I just think we're being too vague here; and we are covering them in the
> intro with the use of "some point in the past".
>
> IMO, the main point regarding MVCC is that every change in the system
> creates a new record and causes a prior record to be invalidated at a
> point-in-time.  The combination of these two things increases concurrency
> since you can create new records while people are still using the old ones.
> One consequence, though, is that it is necessary for the user to decide at
> what point in the timeline they want to view the database.
>
> Does this sound right?

I still do not see how this fits appropriately in the introduction.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


Re: MVCC snapshot timing

От
David Johnston
Дата:
Bruce Momjian wrote
> On Tue, Nov 12, 2013 at 03:36:01PM -0800, David Johnston wrote:
>> Bruce Momjian wrote
>> > We just want to get across the MVCC concept in the intro --- we cover
>> > the snapshots later in the document.
>>
>> I just think we're being too vague here; and we are covering them in the
>> intro with the use of "some point in the past".
>>
>> IMO, the main point regarding MVCC is that every change in the system
>> creates a new record and causes a prior record to be invalidated at a
>> point-in-time.  The combination of these two things increases concurrency
>> since you can create new records while people are still using the old
>> ones.
>> One consequence, though, is that it is necessary for the user to decide
>> at
>> what point in the timeline they want to view the database.
>>
>> Does this sound right?
>
> I still do not see how this fits appropriately in the introduction.

The concept or the actual wording?

The intended question was whether my understanding (and simplification) of
the concept is correct.

My specific wording is incoherent mostly because it really belongs to a
larger corpus that currently exists only in my head.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5778033.html
Sent from the PostgreSQL - docs mailing list archive at Nabble.com.


Re: MVCC snapshot timing

От
Bruce Momjian
Дата:
On Tue, Nov 12, 2013 at 05:35:23PM -0800, David Johnston wrote:
> Bruce Momjian wrote
> > On Tue, Nov 12, 2013 at 03:36:01PM -0800, David Johnston wrote:
> >> Bruce Momjian wrote
> >> > We just want to get across the MVCC concept in the intro --- we cover
> >> > the snapshots later in the document.
> >>
> >> I just think we're being too vague here; and we are covering them in the
> >> intro with the use of "some point in the past".
> >>
> >> IMO, the main point regarding MVCC is that every change in the system
> >> creates a new record and causes a prior record to be invalidated at a
> >> point-in-time.  The combination of these two things increases concurrency
> >> since you can create new records while people are still using the old
> >> ones.
> >> One consequence, though, is that it is necessary for the user to decide
> >> at
> >> what point in the timeline they want to view the database.
> >>
> >> Does this sound right?
> >
> > I still do not see how this fits appropriately in the introduction.
>
> The concept or the actual wording?
>
> The intended question was whether my understanding (and simplification) of
> the concept is correct.
>
> My specific wording is incoherent mostly because it really belongs to a
> larger corpus that currently exists only in my head.

Oh, OK, it sounds fine.  The user really doesn't choose what timeline to
see --- rather, it is the current xid at the time they take their
snapshot and other running xids that controls that.  You can control
your transaction isolation level, but that only controls how often you
take snapshots.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


Re: MVCC snapshot timing

От
David Johnston
Дата:
Bruce Momjian wrote
> Oh, OK, it sounds fine.  The user really doesn't choose what timeline to
> see --- rather, it is the current xid at the time they take their
> snapshot and other running xids that controls that.  You can control
> your transaction isolation level, but that only controls how often you
> take snapshots.

^ This kind of makes my point.

You've described perfectly the mechanics of the system but from a mostly
black-box perspective a user-decision (choosing the isolation level)
directly impacts which point-in-time (xid) is chosen for the snapshot behind
a particular SQL statement.  The fact they can only choose between two
pre-defined and relative points-in-time is a detail to explain later but the
fact is they are required to make such a choice (one is provided by default
but still one chooses - even if through ignorance - to use the default) as
an outcome of MVCC should be included - in some form - in the introduction.
In the current, proposed, and my revisions it is indeed covered but to
various degrees of detail and low/high level focus.

David J.






--
View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5778039.html
Sent from the PostgreSQL - docs mailing list archive at Nabble.com.


Re: MVCC snapshot timing

От
Bruce Momjian
Дата:
On Mon, Nov 11, 2013 at 09:46:09PM -0500, Bruce Momjian wrote:
> On Mon, Nov 11, 2013 at 09:27:15PM -0500, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > On Mon, Nov 11, 2013 at 08:59:35PM -0500, Tom Lane wrote:
> > >> 'Statement' might work.
> >
> > > OK, updated patch attached.  Is "statement" too vague here?  SQL
> > > statement?  query?
> >
> > "SQL statement" might be a good idea in the first sentence, but
> > I don't think you need to repeat it in the second.
> >
> > What's bothering me about this wording is that you're talking about
> > statements and then suddenly reference transactions (as being "those
> > other things messing with your data").  This seems weirdly asymmetric,
> > since after all you could equally well be the one messing with their
> > data.
>
> Yes, that bugged me too, but then I realized that you only see the
> changes from a transaction when it completes, not from each statement,
> e.g. you can never see changes between statements of a multi-statement
> transaction.
>
> I used "SQL statement" in the updated, attached patch.

Applied.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +