Обсуждение: On the need for a snapshot in exec_bind_message()

Поиск
Список
Период
Сортировка

On the need for a snapshot in exec_bind_message()

От
Daniel Wood
Дата:

In particular:
    exec_bind_message()
        PushActiveSnapshot(GetTransactionSnapshot());


Suppressing this I've achieved over 1.9 M TXN's a second on select only pgbench on a 48 core box.  It is about 50% faster with this change.  The cpu usage of GetSnapshotData drops from about 22% to 4.5%.

If there were no input functions, that needed this, nor reparsing or reanalyzing needed, and we knew this up front, it'd be a huge win.  We could test for a number of conditions on the first parse/optimization of the query and set a flag to suppress this for subsequent executions.


NOTE:

In GetSnapshotData because pgxact, is declared volatile, the compiler will not reduce the following two IF tests into a single test:


    if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
        continue;


    if (pgxact->vacuumFlags & PROC_IN_VACUUM)
        continue;


You can reduce the code path in the inner loop by coding this as:

    if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING|PROC_IN_VACUUM))
        continue;


I'm still working on quantifying any gain.  Note it isn't just one L1 cache

fetch and one conditional branch eliminated.  Due to the update frequency of the pgxact cache line, for single statement TXN's, there are a certain number of full cache misses, due to invalidation, that occurs when given pgxact is updated between the first fetch of vacuumFlags and the 2nd fetch.

Re: On the need for a snapshot in exec_bind_message()

От
Tom Lane
Дата:
Daniel Wood <hexexpert@comcast.net> writes:
> In particular:
>     exec_bind_message()
>         PushActiveSnapshot(GetTransactionSnapshot());

> If there were no input functions, that needed this, nor reparsing or
> reanalyzing needed, and we knew this up front, it'd be a huge win.

Unfortunately, that's not the case, so I think trying to get rid of
this call is a nonstarter.

What we have kicked around a bit is trying to get rid of the additional
snapshot-taking at the start of execution, so that the snapshot taken
at BIND time serves all the way through the query.  That'd require a
fair amount of refactoring I think, but at least it's not a broken idea
on its face.

> In GetSnapshotData because pgxact, is declared volatile, the compiler will not reduce the following two IF tests into
asingle test: 
>     if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
>         continue;
>     if (pgxact->vacuumFlags & PROC_IN_VACUUM)
>         continue;

That, on the other hand, is just crummy coding :-(

> I'm still working on quantifying any gain.

It'll be interesting to see if you can show visible improvement from
merging those.  It's enough of a hotspot that I wouldn't be surprised
to find some, but actual numbers would be nice.

            regards, tom lane


Re: On the need for a snapshot in exec_bind_message()

От
Andres Freund
Дата:
Hi,

On 2018-09-05 12:31:04 -0700, Daniel Wood wrote:
> NOTE:
> 
> In GetSnapshotData because pgxact, is declared volatile, the compiler will not reduce the following two IF tests into
asingle test:
 
> 
> 
>     if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
>         continue;
> 
> 
>     if (pgxact->vacuumFlags & PROC_IN_VACUUM)
>         continue;
> 
> 
> You can reduce the code path in the inner loop by coding this as:
> 
>     if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING|PROC_IN_VACUUM))
>         continue;
> 
> 
> I'm still working on quantifying any gain.  Note it isn't just one L1 cache
> 
> fetch and one conditional branch eliminated.  Due to the update frequency of the pgxact cache line, for single
statementTXN's, there are a certain number of full cache misses, due to invalidation, that occurs when given pgxact is
updatedbetween the first fetch of vacuumFlags and the 2nd fetch.
 

These two obviously could be combined, but I think we should just get
rid of the volatiles. With a small bit of work they shouldn't be
required anymore these days.

Greetings,

Andres Freund


Re: On the need for a snapshot in exec_bind_message()

От
Daniel Wood
Дата:
> >     exec_bind_message()
> >         PushActiveSnapshot(GetTransactionSnapshot());
> 
> > If there were no input functions, that needed this, nor reparsing or
> > reanalyzing needed, and we knew this up front, it'd be a huge win.
> 
> Unfortunately, that's not the case, so I think trying to get rid of
> this call is a nonstarter.

Queries stop getting re-optimized after 5 times, unless better plans are to be had.  In the absence of schema changes
orchanging search path why is the snapshot needed?
 

- Dan


Re: On the need for a snapshot in exec_bind_message()

От
Tom Lane
Дата:
Daniel Wood <hexexpert@comcast.net> writes:
>>> exec_bind_message()
>>>   PushActiveSnapshot(GetTransactionSnapshot());
>>>
>>> If there were no input functions, that needed this, nor reparsing or
>>> reanalyzing needed, and we knew this up front, it'd be a huge win.

>> Unfortunately, that's not the case, so I think trying to get rid of
>> this call is a nonstarter.

> Queries stop getting re-optimized after 5 times, unless better plans are to be had.  In the absence of schema changes
orchanging search path why is the snapshot needed? 

The snapshot has little to do with the query plan, usually.  It's about
what view of the database the executed query will see, and particularly
about what view the parameter input functions will see, if they look.

You could maybe argue that immutable input functions shouldn't need
snapshots, but there are enough not-immutable ones that I don't think
that gets you very far.  In any case, we'd still need a snapshot for
query execution.  The set of queries that could possibly never need
a snapshot at all is probably not large enough to be interesting.

            regards, tom lane


Re: On the need for a snapshot in exec_bind_message()

От
Andres Freund
Дата:
On 2018-09-05 18:45:52 -0400, Tom Lane wrote:
> Daniel Wood <hexexpert@comcast.net> writes:
> >>> exec_bind_message()
> >>>   PushActiveSnapshot(GetTransactionSnapshot());
> >>> 
> >>> If there were no input functions, that needed this, nor reparsing or
> >>> reanalyzing needed, and we knew this up front, it'd be a huge win.
> 
> >> Unfortunately, that's not the case, so I think trying to get rid of
> >> this call is a nonstarter.
> 
> > Queries stop getting re-optimized after 5 times, unless better plans are to be had.  In the absence of schema
changesor changing search path why is the snapshot needed?
 
> 
> The snapshot has little to do with the query plan, usually.  It's about
> what view of the database the executed query will see, and particularly
> about what view the parameter input functions will see, if they look.

The snapshot in exec_bind_message() shouldn't be about what the executed
query sees, no?  Sure, we'll evaluate input params and might replan etc,
but other than that?


> You could maybe argue that immutable input functions shouldn't need
> snapshots, but there are enough not-immutable ones that I don't think
> that gets you very far.  In any case, we'd still need a snapshot for
> query execution.  The set of queries that could possibly never need
> a snapshot at all is probably not large enough to be interesting.

It'd not be insane to rejigger things so there's a version of
PushActiveSnapshot() doesn't eagerly compute the snapshot, but instead
does so on first access.  Obviously we couldn't use that everywhere, but
the exec_bind_message() seems like a prime case for it.

Greetings,

Andres Freund


Re: On the need for a snapshot in exec_bind_message()

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-09-05 18:45:52 -0400, Tom Lane wrote:
>> The snapshot has little to do with the query plan, usually.  It's about
>> what view of the database the executed query will see, and particularly
>> about what view the parameter input functions will see, if they look.

> The snapshot in exec_bind_message() shouldn't be about what the executed
> query sees, no?  Sure, we'll evaluate input params and might replan etc,
> but other than that?

As things stand, yeah, it's not used for execution.

> It'd not be insane to rejigger things so there's a version of
> PushActiveSnapshot() doesn't eagerly compute the snapshot, but instead
> does so on first access.  Obviously we couldn't use that everywhere, but
> the exec_bind_message() seems like a prime case for it.

I dislike this proposal, because it introduces an element of uncertainty
into when the snapshot is taken.  It doesn't seem a lot different from
saying that we'll try to save a gettimeofday call by postponing
calculation of the statement_timestamp until somebody asks for it.
Then it's no longer the statement start time, but some hard-to-define
time within execution.

I think the other idea of trying to keep the bind-time snapshot in use
throughout execution is more worth pursuing.  The main problem with it,
now that I think harder, is that we need the execution snapshot to be
taken after we've acquired whatever locks the query needs.  But we
actually know the set of required locks, if we have a cached plan
at hand.  (It's *possible* that that set would change, if we're forced
to replan, but it's unlikely.)  So you could imagine rejiggering things
so that we do the acquire-locks bit before doing parameter input, and
unless the lock list changes, we can keep the parameter-input snapshot
in force throughout execution.  Not quite sure how to make that play
with custom plans though.

            regards, tom lane


Re: On the need for a snapshot in exec_bind_message()

От
Daniel Wood
Дата:
> > Queries stop getting re-optimized after 5 times, unless better plans are to be had.  In the absence of schema
changesor changing search path why is the snapshot needed?
 
> 
> The snapshot has little to do with the query plan, usually.  It's about
> what view of the database the executed query will see, and particularly
> about what view the parameter input functions will see, if they look.
> 
> You could maybe argue that immutable input functions shouldn't need
> snapshots, but there are enough not-immutable ones that I don't think
> that gets you very far.  In any case, we'd still need a snapshot for
> query execution.  The set of queries that could possibly never need
> a snapshot at all is probably not large enough to be interesting.

I would think that the vast majority of bind variables in customer queries would involve built-in data types whose
inputfunctions are immutable.  As far as the "view of the database the executed query will see" goes, either the
databaseitself changes, which should trigger catcache invalidations, that can be accounted for, or the "view" changes
asin searchpath which can be tested for.  I would think the optimization would be applicable most of the time.  It
isn'tthe number of obscure user created non-immutable input functions that exist.  It is the frequency with which
immutableinput functions are used in real app's.  This is a 50% performance improvement in point lookups on larger
boxes.


Re: On the need for a snapshot in exec_bind_message()

От
Andres Freund
Дата:
On 2018-09-05 19:11:56 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-09-05 18:45:52 -0400, Tom Lane wrote:
> >> The snapshot has little to do with the query plan, usually.  It's about
> >> what view of the database the executed query will see, and particularly
> >> about what view the parameter input functions will see, if they look.
> 
> > The snapshot in exec_bind_message() shouldn't be about what the executed
> > query sees, no?  Sure, we'll evaluate input params and might replan etc,
> > but other than that?
> 
> As things stand, yeah, it's not used for execution.
> 
> > It'd not be insane to rejigger things so there's a version of
> > PushActiveSnapshot() doesn't eagerly compute the snapshot, but instead
> > does so on first access.  Obviously we couldn't use that everywhere, but
> > the exec_bind_message() seems like a prime case for it.
> 
> I dislike this proposal, because it introduces an element of uncertainty
> into when the snapshot is taken.  It doesn't seem a lot different from
> saying that we'll try to save a gettimeofday call by postponing
> calculation of the statement_timestamp until somebody asks for it.
> Then it's no longer the statement start time, but some hard-to-define
> time within execution.

I don't think that really is comparable. The timestamp really exists on
a (well, for our purposes) absolute scale. That's much less clearer for
snapshots in read committed: Without access to the database or returning
results the moment the snapshot has been taken really doesn't mean that
much - the session could really just have been scheduled out just before
taking the snapshot (or waited for the procarray lock, to be more on
topic).

Additionally, this isn't the snapshot for the actual query execution -
it's the one for the parameter evaluation. And we *already* take a
separate snapshot for that from the one actually executing the query, so
skew between the two really can't be meaningful.


> I think the other idea of trying to keep the bind-time snapshot in use
> throughout execution is more worth pursuing.

Hm, that seems to have its own set of problems. Some of them
nontrivial. It's e.g. perfectly legal to bind multiple statements and
then execute them - and I think it's quite possible that pgjdbc would
generate something like that.

And then there's:
> The main problem with it, now that I think harder, is that we need the
> execution snapshot to be taken after we've acquired whatever locks the
> query needs.

Greetings,

Andres Freund