Обсуждение: lazy snapshots?

Поиск
Список
Период
Сортировка

lazy snapshots?

От
Robert Haas
Дата:
I had the following idea for an optimization.  Feel free to tell me I'm nuts.

Would it be possible to postpone the operation of taking a snapshot
until we encounter an in-doubt tuple - that is, a tuple whose XMIN or
XMAX is committed but not all-visible?  It seems to me that there are
many transactions that probably never look at any recently-modified
data, and that the overhead (and contention) of scanning the ProcArray
could be avoided for such transactions.  At the time when we currently
take a snapshot, we could instead record an estimate of the oldest XID
still running; I'll call this value the threshold XID.  Ideally, this
would be something we could read from shared memory in O(1) time.
Subsequently, when we examine XMIN or XMAX, we may find that it's
aborted (in which case we don't need a snapshot to decide what to do)
or that the XID we're examining precedes the threshold XID (in which
case we don't need a snapshot to decide what to do) or that the XID
we're examining is our own (in which case we again don't need a
snapshot to decide what to do).  If none of those conditions hold, we
take a snapshot.  (Possibly, we could try rereading the threshold XID
from shared memory, because it might have advanced far enough to get
us out of the woods.)

It's necessary to convince ourselves not only that this has some
performance benefit but that it's actually correct.  It's easy to see
that, if we never take a snapshot, all the tuple visibility decisions
we make will be exactly identical to the ones that we would have made
with a snapshot; the choice of snapshot in that case is arbitrary.
But if we do eventually take a snapshot, we'll likely make different
tuple visibility decisions than we would have made had we taken the
snapshot earlier. However, the decisions that we make prior to taking
the snapshot will be consistent with the snapshot, and we will
certainly see the effects of all transactions that committed before we
started.  We may also see the effects of some transactions that commit
after we started, but that is OK: it is just as if our whole
transaction had been started slightly later and then executed more
quickly thereafter.  It would be bad if we saw the effect of
transaction A but not transaction B where transaction B committed
after transaction A, but the way snapshots are taken prevents that
regardless of exactly when we do it.

VACUUM can't remove any tuples with committed XMINs unless their XMAX
precedes our threshold XID, but I think that's not any worse under
this proposal than it is anyway.  If we took a full snapshot instead
of just writing down a threshold XID, we'd have the same problem.

OK, that's it.  Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: lazy snapshots?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> It's necessary to convince ourselves not only that this has some
> performance benefit but that it's actually correct.  It's easy to see
> that, if we never take a snapshot, all the tuple visibility decisions
> we make will be exactly identical to the ones that we would have made
> with a snapshot; the choice of snapshot in that case is arbitrary.
> But if we do eventually take a snapshot, we'll likely make different
> tuple visibility decisions than we would have made had we taken the
> snapshot earlier. However, the decisions that we make prior to taking
> the snapshot will be consistent with the snapshot, and we will
> certainly see the effects of all transactions that committed before we
> started.  We may also see the effects of some transactions that commit
> after we started, but that is OK: it is just as if our whole
> transaction had been started slightly later and then executed more
> quickly thereafter.

I don't think this is going to be acceptable at all.  You're assuming
that clients have no independent means of determining what order
transactions execute in, which isn't the case.  It would be quite
possible, for example, for a query submitted to one backend to see the
effects of a transaction that was submitted to another backend long
after the first query started.  If the two clients involved interact
at all, they're not going to be happy.  Even if they just compare
transaction timestamps, they're not going to be happy.

I'm less than convinced by the hypothesis that most transactions would
avoid taking snapshots in this regime, anyway.  It would only hold up
if there's little locality of reference in terms of which tuples are
getting examined/modified by concurrent transactions, and that's a
theory that seems much more likely to be wrong than right.

I wonder whether we could do something involving WAL properties --- the
current tuple visibility logic was designed before WAL existed, so it's
not exploiting that resource at all.  I'm imagining that the kernel of a
snapshot is just a WAL position, ie the end of WAL as of the time you
take the snapshot (easy to get in O(1) time).  Visibility tests then
reduce to "did this transaction commit with a WAL record located before
the specified position?".  You'd need some index datastructure that made
it reasonably cheap to find out the commit locations of recently
committed transactions, where "recent" means "back to recentGlobalXmin".
That seems possibly do-able, though I don't have a concrete design in
mind.
        regards, tom lane


Re: lazy snapshots?

От
Robert Haas
Дата:
On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> It's necessary to convince ourselves not only that this has some
>> performance benefit but that it's actually correct.  It's easy to see
>> that, if we never take a snapshot, all the tuple visibility decisions
>> we make will be exactly identical to the ones that we would have made
>> with a snapshot; the choice of snapshot in that case is arbitrary.
>> But if we do eventually take a snapshot, we'll likely make different
>> tuple visibility decisions than we would have made had we taken the
>> snapshot earlier. However, the decisions that we make prior to taking
>> the snapshot will be consistent with the snapshot, and we will
>> certainly see the effects of all transactions that committed before we
>> started.  We may also see the effects of some transactions that commit
>> after we started, but that is OK: it is just as if our whole
>> transaction had been started slightly later and then executed more
>> quickly thereafter.
>
> I don't think this is going to be acceptable at all.  You're assuming
> that clients have no independent means of determining what order
> transactions execute in, which isn't the case.  It would be quite
> possible, for example, for a query submitted to one backend to see the
> effects of a transaction that was submitted to another backend long
> after the first query started.  If the two clients involved interact
> at all, they're not going to be happy.  Even if they just compare
> transaction timestamps, they're not going to be happy.

I'm not sure they're entitled to rely on any other behavior.  Couldn't
the exact same thing happen in a non-MVCC database based on SS2PL?

> I'm less than convinced by the hypothesis that most transactions would
> avoid taking snapshots in this regime, anyway.  It would only hold up
> if there's little locality of reference in terms of which tuples are
> getting examined/modified by concurrent transactions, and that's a
> theory that seems much more likely to be wrong than right.

There will certainly be workloads where most transactions acquire a
snapshot, but just to take one obvious example, suppose we have a data
warehouse where every night we bulk load the day's data, and then we
run reporting queries all day.  Except during the overnight bulk
loads, there's no concurrent write activity at all, and thus no need
for snapshots.  Or imagine a database where we store monitoring data.
There's a continuous flow of monitoring data from multiple sources;
and then people run reports.  The users running reports will need
snapshots, but the processes updating the monitoring data will
presumably be touching discrete sets of tuples.  They may be
INSERT-only, and even if they do updates, the process monitoring
resource A only needs to look at the rows for resource A, not the rows
for resource B.  If the tables are large enough that index scans are
used and the threshold XID is updated sufficiently frequently, you
might get away without snapshots.  This isn't quite so clear a win as
the first one but maybe it's worth thinking about.

One thing we could do is instrument the current code to track whether
any field of the snapshot other than snapshot->xmin is ever used, and
then run some benchmarks to see how often that happens.

> I wonder whether we could do something involving WAL properties --- the
> current tuple visibility logic was designed before WAL existed,

Wow.

> so it's
> not exploiting that resource at all.  I'm imagining that the kernel of a
> snapshot is just a WAL position, ie the end of WAL as of the time you
> take the snapshot (easy to get in O(1) time).  Visibility tests then
> reduce to "did this transaction commit with a WAL record located before
> the specified position?".  You'd need some index datastructure that made
> it reasonably cheap to find out the commit locations of recently
> committed transactions, where "recent" means "back to recentGlobalXmin".
> That seems possibly do-able, though I don't have a concrete design in
> mind.

Interesting.  O(1) snapshots would be great.  I need to think about
this more before commenting on it, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: lazy snapshots?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm less than convinced by the hypothesis that most transactions would
>> avoid taking snapshots in this regime, anyway. �It would only hold up
>> if there's little locality of reference in terms of which tuples are
>> getting examined/modified by concurrent transactions, and that's a
>> theory that seems much more likely to be wrong than right.

> There will certainly be workloads where most transactions acquire a
> snapshot, but just to take one obvious example, suppose we have a data
> warehouse where every night we bulk load the day's data, and then we
> run reporting queries all day.  Except during the overnight bulk
> loads, there's no concurrent write activity at all, and thus no need
> for snapshots.

Well, yeah, but in this scenario there's also no contention involved in
taking snapshots --- there are only readers of ProcArray and (IIRC) they
only need shared locks on the array.  If you want to make any meaningful
improvement in this area, you need something that solves the ProcArray
access contention caused by a heavy mixed read/write transaction load.
        regards, tom lane


Re: lazy snapshots?

От
Robert Haas
Дата:
On Wed, Oct 20, 2010 at 8:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> so it's
>> not exploiting that resource at all.  I'm imagining that the kernel of a
>> snapshot is just a WAL position, ie the end of WAL as of the time you
>> take the snapshot (easy to get in O(1) time).  Visibility tests then
>> reduce to "did this transaction commit with a WAL record located before
>> the specified position?".  You'd need some index datastructure that made
>> it reasonably cheap to find out the commit locations of recently
>> committed transactions, where "recent" means "back to recentGlobalXmin".
>> That seems possibly do-able, though I don't have a concrete design in
>> mind.
>
> Interesting.  O(1) snapshots would be great.  I need to think about
> this more before commenting on it, though.

I spent a bunch of time thinking about this, and I don't see any way
to get the memory usage requirements down to something reasonable.
The problem is that RecentGlobalXmin might be arbitrarily far back in
XID space, and you'll need to know the LSN of every commit from that
point forward; whereas the ProcArray requires only constant space.  To
put this another way, for any given snapshot, the "interesting" XIDs
are the top-level XIDs between the its xmin and xmax; the number of
those that are invisible to the snapshot is bounded, but the total
number is unbounded.  To make this work, you'd need an inexpensive way
of finding the set of transactions which committed after a given LSN
with XIDs less than the snapshot's xmax.  Another idea for making that
work is to keep a table of the <min-LSN, max-xmax> pairs for every
snapshot in the system and only track exactly those commits that are
relevant, but it seems hard to think about how to maintain that data
structure, and hard also to prune the table of tracked commits as
snapshots are released.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: lazy snapshots?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Oct 20, 2010 at 8:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I'm imagining that the kernel of a
>>> snapshot is just a WAL position, ie the end of WAL as of the time you
>>> take the snapshot (easy to get in O(1) time). �Visibility tests then
>>> reduce to "did this transaction commit with a WAL record located before
>>> the specified position?".

> I spent a bunch of time thinking about this, and I don't see any way
> to get the memory usage requirements down to something reasonable.
> The problem is that RecentGlobalXmin might be arbitrarily far back in
> XID space, and you'll need to know the LSN of every commit from that
> point forward; whereas the ProcArray requires only constant space.

That's like arguing that clog is no good because it doesn't fit in
constant space.  ISTM it would be entirely practical to remember the
commit LSN positions of every transaction back to RecentGlobalXmin,
using a data structure similar to pg_subtrans --- in fact, it'd require
exactly twice as much working space as pg_subtrans, ie 64 bits per XID
instead of 32.  Now, it might be that access contention problems would
make this unworkable (I think pg_subtrans works largely because we don't
have to access it often) but it's not something that can be dismissed
on space grounds.

[ thinks for a bit... ]  But actually this probably ends up being a
wash or a loss as far as contention goes.  We're talking about a data
structure that has to be updated during each commit, and read pretty
frequently, and it's not obvious how that's any better than getting
commit info from the ProcArray.  Although neither commit nor reading
would require a *global* lock, so maybe there's a way ...
        regards, tom lane


Re: lazy snapshots?

От
Robert Haas
Дата:
On Thu, Nov 4, 2010 at 2:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Oct 20, 2010 at 8:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> I'm imagining that the kernel of a
>>>> snapshot is just a WAL position, ie the end of WAL as of the time you
>>>> take the snapshot (easy to get in O(1) time).  Visibility tests then
>>>> reduce to "did this transaction commit with a WAL record located before
>>>> the specified position?".
>
>> I spent a bunch of time thinking about this, and I don't see any way
>> to get the memory usage requirements down to something reasonable.
>> The problem is that RecentGlobalXmin might be arbitrarily far back in
>> XID space, and you'll need to know the LSN of every commit from that
>> point forward; whereas the ProcArray requires only constant space.
>
> That's like arguing that clog is no good because it doesn't fit in
> constant space.  ISTM it would be entirely practical to remember the
> commit LSN positions of every transaction back to RecentGlobalXmin,
> using a data structure similar to pg_subtrans --- in fact, it'd require
> exactly twice as much working space as pg_subtrans, ie 64 bits per XID
> instead of 32.  Now, it might be that access contention problems would
> make this unworkable (I think pg_subtrans works largely because we don't
> have to access it often) but it's not something that can be dismissed
> on space grounds.

Maybe I didn't explain that very well.  The point is not so much how
much memory you're using in an absolute sense as how much of it you
have to look at to construct a snapshot.  If you store a giant array
indexed by XID whose value is an LSN, you have to read a potentially
unbounded number of entries from that array.  You can either make a
single read through the relevant portion of the array (snapshot xmin
to snapshot xmax) or you can check each XID as you see it and try to
build up a local cache, but either way there's no fixed limit on how
many bytes must be read from the shared data structure.  That compares
unfavorably with the current design, where you do a one-time read of a
bounded amount of data and you're done.  I suspect your theory about
pg_subtrans is correct.

> [ thinks for a bit... ]  But actually this probably ends up being a
> wash or a loss as far as contention goes.  We're talking about a data
> structure that has to be updated during each commit, and read pretty
> frequently, and it's not obvious how that's any better than getting
> commit info from the ProcArray.  Although neither commit nor reading
> would require a *global* lock, so maybe there's a way ...

Well, if we're talking about the "giant XID array" design, or some
variant of that, I would expect that nearly all of the contention
would be on the last page or two, so I don't think it would be much
better than a global lock.  You might be able to get around that by
not using an LWLock, and instead using something like lock xchg or
LL/SC to atomatically update entries, but I'm not sure there's a
portable way to do such operations on anything larger than a 4-byte
word.  At any rate, I think the problem described in the preceding
paragraph is the more serious one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: lazy snapshots?

От
Robert Haas
Дата:
On Wed, Oct 20, 2010 at 8:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'm less than convinced by the hypothesis that most transactions would
>>> avoid taking snapshots in this regime, anyway.  It would only hold up
>>> if there's little locality of reference in terms of which tuples are
>>> getting examined/modified by concurrent transactions, and that's a
>>> theory that seems much more likely to be wrong than right.
>
>> There will certainly be workloads where most transactions acquire a
>> snapshot, but just to take one obvious example, suppose we have a data
>> warehouse where every night we bulk load the day's data, and then we
>> run reporting queries all day.  Except during the overnight bulk
>> loads, there's no concurrent write activity at all, and thus no need
>> for snapshots.
>
> Well, yeah, but in this scenario there's also no contention involved in
> taking snapshots --- there are only readers of ProcArray and (IIRC) they
> only need shared locks on the array.  If you want to make any meaningful
> improvement in this area, you need something that solves the ProcArray
> access contention caused by a heavy mixed read/write transaction load.

There is certainly some improvement to be had even in eliminating
shared ProcArrayLocks, and the work that must be done while holding
them.  For example, you have to hold an exclusive lock to end a
transaction, so that's going to compete with any shared lockers.

But suppose we do have a heavy mixed read/write transaction load.  Who
is to say that requires a snapshot?  It will require a snapshot if one
transactions reads data that has been recently written by another
transaction, but I'm not convinced that'll necessarily happen that
often.  For example, suppose you have a table with a million records
and you have 200 concurrent database connections.  Each connection
repeatedly starts a transaction where it reads 10 records and then
writes 10 records.  When a new transaction starts, it overlaps 199
other transactions; if all of those transactions have done all of
their reads and writes already, there are 3,980 records in the table
for which we'll need a snapshot to determine tuple visibility;
assuming we look at no other tuples (all the accesses using index
scans rather than sequential scans), the chances that we'll need to
take a snapshot are only 1-((1-(3980/1000000))^20) = ~7.7%.  That's
pretty good, and of course if you have fewer overlapping transactions
or fewer operations per transaction it gets better very quickly.

Now, of course, if you have a lot of locality of reference, things
aren't going to be nearly so good.  If we assume that the accesses are
spread across only 100,000 records instead of 1,000,000, then each
transaction has a better-than-even chance of needing a snapshot.
However, in that situation, you're going to have other contention
problems, too: there's very significant probability that two backends
will actually try to update the same tuple, and one will sleep until
the other commits.   So maybe the cost of taking snapshots won't be
the biggest problem in that case anyway (he said hopefully).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company