Обсуждение: Hot Standby query cancellation and Streaming Replication integration

Поиск
Список
Период
Сортировка

Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
I'm happy to see we've crossed the point where the worst of the Hot
Standby and Streaming Replication issues are sorted out.  A look at the
to-do lists:  http://wiki.postgresql.org/wiki/Hot_Standby_TODO
http://wiki.postgresql.org/wiki/Streaming_Replication show no Must-fix
items and 5 Serious Issues for Hot Standby left; there are 9 Streaming
Replication items there, which aren't as clearly prioritized yet.
Correct me if I'm wrong here, but those read to me like tweaks and
polishing rather than major architecture issues at this point, so I
believe that code is the same position as HS:  some serious issues, but
no really terrible parts.

The job Simon asked me to take a look at starting last week is which of
the listed HS "Serious Issues" might be promoted into must-fix items
after seeing how easy they were to encounter.  There are a number of HS
tunables that interact with one another, and depending your priorities a
few ways you can try to optimize the configuration for what I expect to
be common use cases for this feature.  I've written a blog entry at
http://blog.2ndquadrant.com/en/2010/02/tradeoffs-in-hot-standby-deplo.html
that tries to explain all that background clearly, and relate the
implementation details to how I expect DBAs will perceive them.  That
was a bit much to also cover here, and had a broader audience that might
appreciate it than just this list.

Attached is a tar file with some test case demo scripts that demonstrate
the worst of the problems here IMHO.  A README in there outlines how to
set the problem demo up (presuming you've already gotten a HS pair
going).  What this does is execute the following sequence continuously
on the master:

UPDATE pgbench_tellers SET tbalance = tbalance + <delta> WHERE tid =
<tid>; (several times)
VACUUM pgbench_tellers;

Meanwhile, on the standby, the following long query runs on a few
sessions at once, again looping constantly:

SELECT sum(abalance) FROM pgbench_accounts;

It took a bit of testing to get the database scale and iteration times
here to easily encounter the issue here on my system, I hope this shows
up easily enough for others with the values used.  (I have a similar
work in progress demo that tries to trigger the b-tree deletion problem
too, will follow up once the storm of messages about this topic dies
down, as I think this is a pre-requisite for it anyway)

I'm not sure what you might be expecting from the above combination, but
what actually happens is that many of the SELECT statements on the table
*that isn't even being updated* are canceled.  You see this in the logs:

LOG:  restored log file "0000000100000000000000A5" from archive
ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be
removed.
STATEMENT:  SELECT sum(abalance) FROM pgbench_accounts;

Basically, every time a WAL segment appears that wipes out a tuple that
SELECT expects should still be visible, because the dead row left behind
by the update has been vacuumed away, the query is canceled.  This
happens all the time the way I've set this up, and I don't feel like
this is a contrived demo.  Having a long-running query on the standby
while things get updated and then periodically autovacuumed on the
primary is going to be extremely common in the sorts of production
systems I expect want HS the most.

Now, as explained on the blog entry and in the documentation, there are
all sorts of ways you can work around this issue by tweaking parameters
or doing fun tricks with dblink.  You can prioritize any two of keeping
the standby current, letting long-running queries execute on the
standby, and keeping xid advances on the master moving forward as fast
as possible.  But you can't get all three at once.  The choices
available are really about the best you can do given a system that's
basically the old warm-standby approach, improved with adding just Hot
Standby to the mix.  Sure, you might make the conflict resolution a bit
smarter or make the UI for setting the parameters more friendly, and
there's already been plenty of argument and patching over all of that.
I don't want to belittle that work because it's been important to make
HS a useful standalone feature, but I feel like that's all triage rather
than looking for the most robust fix possible.

If you're running a system that also is using Streaming Replication,
there is a much better approach possible.  This idea has been floating
around for a while and I am not taking credit for inventing it (too busy
tonight to dig into the archives to figure out exactly when this popped
up initially and who deserves credit for it).  I'm just pointing out
that now is the time where it's actually possible to implement.  The HS
TODO already includes the following action item, to resolve a serious
issue you can run into (that itself would be great to eliminate):

"Requires keep-alives with timestamps to be added to sync rep feature"

If those keep-alives flowed in both directions, and included both
timestamps *and* xid visibility information, the master could easily be
configured to hold open xid snapshots needed for long running queries on
the standby when that was necessary. I might be missing an
implementation detail here, but from a high level it seems like you
could make the walreceiver on the master publish the information about
where the standby has advanced to as a bit of ProcArray xmin data.  Then
the master could only advance past where the standby says it cannot need
visibility behind anymore.

This is a much more elegant solution than any of the hacks available so
far.  It would turn Hot Standby + Streaming Replication into a system
that stepped out of the way of the worst of the technical limitations of
HS alone.  The master would easily and automatically avoid advancing
past where the queries running on the standby needed visibility back to,
essentially the same way cleanup is blocked during a long-running query
on the primary--except with the actual main query work offloaded to the
standby, the idea all along.

I don't know how difficult the keepalive feature was expected to be, and
there's certainly plenty of potential landmines in this whole xid export
idea.  How to handle situations where the standby goes away for a while,
such as a network outage, so that it doesn't block the master from ever
cleaning up dead tuples is a concern.  I wouldn't expect that to be too
serious of a blocker, given that if the standby isn't talking it
probably isn't running queries you need to worry about canceling
either.  Not sure where else this can fall down, and unfortunately I
don't know nearly enough about the SR code to help myself with
implementing this feature.  (I think Simon is in a similar
position--it's just not what we've been staring at the last few months).

But I do know that the current Hot Standby implementation is going to be
frustrating to configure correctly for people.  If it's possible to make
most of that go away just by doing some final integration between it and
Streaming Replication that just wasn't practical to accomplish until
now, I think it's worth considering how to make that happen before the
final 9.0 release.

I really hope this discussion can say focused on if and how it's
possible to improve this area, with the goal being to deliver a product
everyone can be proud of with the full feature set that makes this next
release a killer one. The features that have managed to all get into
this release already are fantastic, everyone who contributed should be
proud of that progress, and it's encouraging that the alpha4 date was
nailed.  It would be easy to descend into finger-pointing for why
exactly this particular problem is only getting more visibility now, or
into schedule-oriented commentary suggesting it must be ignored because
it's too late to do anything about it.  I hope everyone appreciates
wandering that way will not help make PostgreSQL 9.0 a better release.
This issue is so easy to encounter, and looks so bad when it happens,
that I feel it could easily lead to an embarrassing situation for the
community if something isn't done about it before release.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Вложения

Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Fri, Feb 26, 2010 at 8:33 AM, Greg Smith <greg@2ndquadrant.com> wrote:
>
> I'm not sure what you might be expecting from the above combination, but
> what actually happens is that many of the SELECT statements on the table
> *that isn't even being updated* are canceled.  You see this in the logs:

Well I proposed that the default should be to wait forever when
applying WAL logs that conflict with a query. Precisely because I
think the expectation is that things will "just work" and queries not
fail unpredictably. Perhaps in your test a larger max_standby_delay
would have prevented the cancellations but then as soon as you try a
query which lasts longer you would have to raise it again. There's no
safe value which will be right for everyone.

> If you're running a system that also is using Streaming Replication, there
> is a much better approach possible.

So I think one of the main advantages of a log shipping system over
the trigger-based systems is precisely that it doesn't require the
master to do anything it wasn't doing already. There's nothing the
slave can do which can interfere with the master's normal operation.

This independence is really a huge feature. It means you can allow
users on the slave that you would never let near the master. The
master can continue running production query traffic while users run
all kinds of crazy queries on the slave and drive it into the ground
and the master will continue on blithely unaware that anything's
changed.

In the model you describe any long-lived queries on the slave cause
tables in the master to bloat with dead records.

I think this model is on the roadmap but it's not appropriate for
everyone and I think one of the benefits of having delayed it is that
it forces us to get the independent model right before throwing in
extra complications. It would be too easy to rely on the slave
feedback as an answer for hard questions about usability if we had it
and just ignore the question of what to do when it's not the right
solution for the user.

--
greg


Re: Hot Standby query cancellation and Streaming Replication integration

От
Heikki Linnakangas
Дата:
Greg Smith wrote:
> Attached is a tar file with some test case demo scripts that demonstrate
> the worst of the problems here IMHO.

Thanks for that! We've been discussing this for ages, so it's nice to
have a concrete example.

> I don't want to belittle that work because it's been important to make
> HS a useful standalone feature, but I feel like that's all triage rather
> than looking for the most robust fix possible.

Ideally the standby would stash away the old pages or tuples somewhere
so that it can still access them even after replaying the WAL records
that remove them from the main storage. I realize that's not going to
happen any time soon because it's hard to do, but that would really be
the most robust fix possible.

> I don't know how difficult the keepalive feature was expected to be, and
> there's certainly plenty of potential landmines in this whole xid export
> idea.

One such landmine is that the keepalives need to flow from client to
server while the WAL records are flowing from server to client. We'll
have to crack that problem for synchronous replication too, but I think
that alone is a big enough problem to make this 9.1 material.

> How to handle situations where the standby goes away for a while,
> such as a network outage, so that it doesn't block the master from ever
> cleaning up dead tuples is a concern.

Yeah, that's another issue that needs to be dealt with. You'd probably
need some kind of a configurable escape valve in the master, to let it
ignore a standby's snapshot once it gets too old.

> But I do know that the current Hot Standby implementation is going to be
> frustrating to configure correctly for people.

Perhaps others who are not as deep into the code as I am will have a
better view on this, but I seriously don't think that's such a big
issue. I think the max_standby_delay setting is quite intuitive and easy
to explain. Sure, it would better if there was no tradeoff between
killing queries and stalling recovery, but I don't think it'll be that
hard to understand the tradeoff.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby query cancellation and Streaming Replication integration

От
Richard Huxton
Дата:
On 26/02/10 08:33, Greg Smith wrote:
>  There are a number of HS
> tunables that interact with one another, and depending your priorities a
> few ways you can try to optimize the configuration for what I expect to
> be common use cases for this feature.
> I've written a blog entry at
> http://blog.2ndquadrant.com/en/2010/02/tradeoffs-in-hot-standby-deplo.html
> that tries to explain all that background clearly,

It did too. Thanks for the nice summary people can be pointed at.

> I'm not sure what you might be expecting from the above combination, but
> what actually happens is that many of the SELECT statements on the table
> *that isn't even being updated* are canceled. You see this in the logs:

Hmm - this I'd already figured out for myself. It's just occurred to me 
that this could well be the case between databases too. Database A gets 
vacuumed, B gets its queries kicked off on the standby. Granted lots of 
people just have the one main DB, but even so...

> LOG: restored log file "0000000100000000000000A5" from archive
> ERROR: canceling statement due to conflict with recovery
> DETAIL: User query might have needed to see row versions that must be
> removed.
> STATEMENT: SELECT sum(abalance) FROM pgbench_accounts;
>
> Basically, every time a WAL segment appears that wipes out a tuple that
> SELECT expects should still be visible, because the dead row left behind
> by the update has been vacuumed away, the query is canceled. This
> happens all the time the way I've set this up, and I don't feel like
> this is a contrived demo. Having a long-running query on the standby
> while things get updated and then periodically autovacuumed on the
> primary is going to be extremely common in the sorts of production
> systems I expect want HS the most.

I can pretty much everyone wanting HS+SR. Thousands of small DBs running 
on VMs for a start. Free mostly-live backup? Got to be a winner.

Dumb non-hacker question: why do we cancel all transactions rather than 
just those with "ACCESS SHARE" on the vacuumed table in question? Is it 
the simple fact that we don't know what table this particular section of 
WAL affects, or is it the complexity of tracking all this info?

> If you're running a system that also is using Streaming Replication,
> there is a much better approach possible.

> "Requires keep-alives with timestamps to be added to sync rep feature"
>
> If those keep-alives flowed in both directions, and included both
> timestamps *and* xid visibility information, the master could easily be
> configured to hold open xid snapshots needed for long running queries on
> the standby when that was necessary.

Presumably meaning we need *another* config setting to prevent excessive 
bloat on a heavily updated table on the master.

--   Richard Huxton  Archonet Ltd


Re: Hot Standby query cancellation and Streaming Replication integration

От
Richard Huxton
Дата:
On 26/02/10 14:10, Heikki Linnakangas wrote:
>
> Ideally the standby would stash away the old pages or tuples somewhere
> so that it can still access them even after replaying the WAL records
> that remove them from the main storage. I realize that's not going to
> happen any time soon because it's hard to do, but that would really be
> the most robust fix possible.

Something like snapshotting a filesystem, so updates continue while 
you're still looking at a static version.

--   Richard Huxton  Archonet Ltd


Re: Hot Standby query cancellation and Streaming Replication integration

От
Heikki Linnakangas
Дата:
Richard Huxton wrote:
> On 26/02/10 08:33, Greg Smith wrote:
>> I'm not sure what you might be expecting from the above combination, but
>> what actually happens is that many of the SELECT statements on the table
>> *that isn't even being updated* are canceled. You see this in the logs:
> 
> Hmm - this I'd already figured out for myself. It's just occurred to me
> that this could well be the case between databases too. Database A gets
> vacuumed, B gets its queries kicked off on the standby.

No, it's per-database already. Only queries in the same database are
canceled.

> Dumb non-hacker question: why do we cancel all transactions rather than
> just those with "ACCESS SHARE" on the vacuumed table in question? Is it
> the simple fact that we don't know what table this particular section of
> WAL affects, or is it the complexity of tracking all this info?

The problem is that even if transaction X doesn't have an (access share)
lock on the vacuumed table at the moment, it might take one in the
future. Simon proposed mechanisms for storing the information about
vacuumed tables in shared memory, so that if X takes the lock later on
it will get canceled at that point, but that's 9.1 material.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby query cancellation and Streaming Replication integration

От
Richard Huxton
Дата:
On 26/02/10 14:45, Heikki Linnakangas wrote:
> Richard Huxton wrote:
>> On 26/02/10 08:33, Greg Smith wrote:
>>> I'm not sure what you might be expecting from the above combination, but
>>> what actually happens is that many of the SELECT statements on the table
>>> *that isn't even being updated* are canceled. You see this in the logs:
>>
>> Hmm - this I'd already figured out for myself. It's just occurred to me
>> that this could well be the case between databases too. Database A gets
>> vacuumed, B gets its queries kicked off on the standby.
>
> No, it's per-database already. Only queries in the same database are
> canceled.

That's a relief.

>> Dumb non-hacker question: why do we cancel all transactions rather than
>> just those with "ACCESS SHARE" on the vacuumed table in question? Is it
>> the simple fact that we don't know what table this particular section of
>> WAL affects, or is it the complexity of tracking all this info?
>
> The problem is that even if transaction X doesn't have an (access share)
> lock on the vacuumed table at the moment, it might take one in the
> future. Simon proposed mechanisms for storing the information about
> vacuumed tables in shared memory, so that if X takes the lock later on
> it will get canceled at that point, but that's 9.1 material.

I see - we'd need to age the list of vacuumed tables too, so when the 
oldest transactions complete the correct flags get cleared.

Can we not wait to cancel the transaction until *any* new lock is 
attempted though? That should protect all the single-statement 
long-running transactions that are already underway. Aggregates etc.

--   Richard Huxton  Archonet Ltd


Re: Hot Standby query cancellation and Streaming Replication integration

От
Heikki Linnakangas
Дата:
Richard Huxton wrote:
> Can we not wait to cancel the transaction until *any* new lock is
> attempted though? That should protect all the single-statement
> long-running transactions that are already underway. Aggregates etc.

Hmm, that's an interesting thought. You'll still need to somehow tell
the victim backend "you have to fail if you try to acquire any more
locks", but a single per-backend flag in the procarray would suffice.

You could also clear the flag whenever you free the last snapshot in the
transaction (ie. between each query in read committed mode).

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby query cancellation and Streaming Replication integration

От
Richard Huxton
Дата:
Replying to my own post - first sign of madness...

Let's see if I've got the concepts clear here, and hopefully my thinking 
it through will help others reading the archives.

There are two queues:
1. Cleanup on the master
2. Replay on the slave

Running write queries on the master adds to both queues.
Running (read-only) queries on the slave prevents you removing from both 
queues.


There are two interesting measurements of "age"/"size":
1. Oldest item in / length of queue (knowable)
2. How long will it take to clear the queue (estimable at best)

You'd like to know #2 to keep up with your workload. Unfortunately, you 
can't for certain unless you have control over new incoming queries (on 
both master and slave).

You might want four separate GUCs for the two measurements on the two 
queues. We currently have two that (sort of) match #1 "Oldest item" 
(vacuum_defer_cleanup_age, max_standby_delay).


Delaying replay on a slave has no effect on the master. If a slave falls 
too far behind it's responsible for catch-up (via normal WAL archives).

There is no point in delaying cleanup on the master unless it's going to 
help one or more slaves. In fact, you don't want to start delaying 
cleanup until you have to, otherwise you're wasting your delay time. 
This seems to be the case with vacuum_defer_cleanup_age. If I have a 
heavily-updated table and I defer vacuuming then when any given query 
starts on the slave it's going to be half used up already.

There's also no point in deferring cleanup on the master if the standby 
is already waiting on a conflict that will cause its queries to be 
cancelled anyway. Not only won't it help, but it might make things worse 
since transactions will be cancelled, the conflict will be replayed and 
(presumably) queries will be re-submitted only to be cancelled again.

This is what Greg Smith's discussion of the keep-alives was about. 
Giving the master enough information to be smarter about cleanup (and 
making the conflicts more fine-grained).

The situation with deferring on one or both ends of process just gets 
more complicated with multiple slaves. There's all sorts of unpleasant 
feedback loops I can envisage there.

For the case of single slave being used to run long reporting queries 
the ideal scenario would be the following. Master starts deferring 
vacuum activity just before the query starts. When that times out, the 
slave will receive the cleanup info, refuse to replay it and start its 
delay. This gives you a total available query time of: natural time between vacuums + vacuum delay + WAL transfer time
+
 
standby delay


I can think of five useful things we should be doing (and might be 
already - don't know).

1. On the master, deduce whether the slave is already waiting on a 
query. If so, don't bother delaying cleanup. Clearly you don't want to 
be signalling hundreds of times a second though. Does the slave pause 
fetching via streaming replication if replay is blocked on a query? 
Could we signal "half-way to max-age" or some such?

2. Perhaps simpler than trying to make the master smarter, just allow 
SET this_transaction_is_probably_a_long_one=true on the slave. That (a) 
clears the queue on the slave and (b) sends the signal to the master 
which then starts deferring vacuum.

3. Do a burst of cleanup activity on the master after blocking. This 
should concentrate conflicts together when they reach the slave. Perhaps 
vacuum_defer_cleanup_age should be vacuum_deferred_queue_size and 
measure the amount of work to do, rather than the max age of the oldest 
cleanup (if I've understood correctly).

4. Do a burst of replay on the slave after blocking. Perhaps every time 
it cancels a transaction it should replay at least half the queued WAL 
before letting new transactions start. Or perhaps it replays any vacuum 
activity it comes across and then stops. That should sync with #2 
assuming the slave doesn't lag the master too much.

5. I've been mixing "defer" and "delay", as do the docs. We should 
probably settle on one or the other. I think defer conveys the meaning 
more precisely, but what about non-native English speakers?

--   Richard Huxton  Archonet Ltd


Re: Hot Standby query cancellation and Streaming Replication integration

От
Robert Haas
Дата:
On Fri, Feb 26, 2010 at 10:21 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Richard Huxton wrote:
>> Can we not wait to cancel the transaction until *any* new lock is
>> attempted though? That should protect all the single-statement
>> long-running transactions that are already underway. Aggregates etc.
>
> Hmm, that's an interesting thought. You'll still need to somehow tell
> the victim backend "you have to fail if you try to acquire any more
> locks", but a single per-backend flag in the procarray would suffice.
>
> You could also clear the flag whenever you free the last snapshot in the
> transaction (ie. between each query in read committed mode).

Wow, that seems like it would help a lot.  Although I'm not 100% sure
I follow all the details of how this works.

...Robert


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Fri, Feb 26, 2010 at 4:43 PM, Richard Huxton <dev@archonet.com> wrote:
> Let's see if I've got the concepts clear here, and hopefully my thinking it
> through will help others reading the archives.
>
> There are two queues:

I don't see two queues. I only see the one queue of operations which
have been executed on the master but not replayed yet on the slave.
Every write operation on the master enqueues an operation to it and
every operation replayed on the slave dequeues from it. Only a subset
of operations create conflicts with concurrent transactions on the
slave, namely vacuums and a few similar operations (HOT pruning and
btree index pruning).

There's no question we need to make sure users have good tools to
monitor this queue and are aware of these tools. You can query each
slave for its currently replayed log position and hopefully you can
find out how long it's been delayed (ie, if it's looking at a log
record and waiting for a conflict to clear how long ago that log
record was generated). You can also find out what the log position is
on the master.




-- 
greg


Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Heikki Linnakangas wrote:
> > How to handle situations where the standby goes away for a while,
> > such as a network outage, so that it doesn't block the master from ever
> > cleaning up dead tuples is a concern.
> 
> Yeah, that's another issue that needs to be dealt with. You'd probably
> need some kind of a configurable escape valve in the master, to let it
> ignore a standby's snapshot once it gets too old.
> 
> > But I do know that the current Hot Standby implementation is going to be
> > frustrating to configure correctly for people.
> 
> Perhaps others who are not as deep into the code as I am will have a
> better view on this, but I seriously don't think that's such a big
> issue. I think the max_standby_delay setting is quite intuitive and easy
> to explain. Sure, it would better if there was no tradeoff between
> killing queries and stalling recovery, but I don't think it'll be that
> hard to understand the tradeoff.

Let's look at the five documented cases of query conflict (from our manual):
1     Access Exclusive Locks from primary node, including both explicit      LOCK commands and various DDL actions 2
Dropping tablespaces on the primary while standby queries are      using those tablespaces for temporary work files
(work_mem     overflow) 3     Dropping databases on the primary while users are connected to      that database on the
standby. 4     The standby waiting longer than max_standby_delay to acquire a      buffer cleanup lock.  5     Early
cleanupof data still visible to the current query's      snapshot
 

We might have a solution to #1 by only cancelling queries that try to
take locks.

#2 and #3 seem like rare occurances.

#4 can be controlled by max_standby_delay, where a large value only
delays playback during crash recovery --- again, a rare occurance.

#5 could be handled by using vacuum_defer_cleanup_age on the master.

Why is vacuum_defer_cleanup_age not listed in postgresql.conf?

In summary, I think passing snapshots to the master is not something
possible for 9.0, and ideally we will never need to add that feature.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.comPG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard
drive,Christ can be your backup. +
 


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> In the model you describe any long-lived queries on the slave cause
> tables in the master to bloat with dead records.

Yup, same as they would do on the master.

> I think this model is on the roadmap but it's not appropriate for
> everyone and I think one of the benefits of having delayed it is that
> it forces us to get the independent model right before throwing in
> extra complications. It would be too easy to rely on the slave
> feedback as an answer for hard questions about usability if we had it
> and just ignore the question of what to do when it's not the right
> solution for the user.

I'm going to make an unvarnished assertion here.  I believe that the
notion of synchronizing the WAL stream against slave queries is
fundamentally wrong and we will never be able to make it work.
The information needed isn't available in the log stream and can't be
made available without very large additions (and consequent performance
penalties).  As we start getting actual beta testing we are going to
uncover all sorts of missed cases that are not going to be fixable
without piling additional ugly kluges on top of the ones Simon has
already crammed into the system.  Performance and reliability will both
suffer.

I think that what we are going to have to do before we can ship 9.0
is rip all of that stuff out and replace it with the sort of closed-loop
synchronization Greg Smith is pushing.  It will probably be several
months before everyone is forced to accept that, which is why 9.0 is
not going to ship this year.
        regards, tom lane


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Tom Lane wrote:
> I'm going to make an unvarnished assertion here.  I believe that the
> notion of synchronizing the WAL stream against slave queries is
> fundamentally wrong and we will never be able to make it work.
> The information needed isn't available in the log stream and can't be
> made available without very large additions (and consequent performance
> penalties).  As we start getting actual beta testing we are going to
> uncover all sorts of missed cases that are not going to be fixable
> without piling additional ugly kluges on top of the ones Simon has
> already crammed into the system.  Performance and reliability will both
> suffer.
> 
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing.  It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going to ship this year.

Wow, can I have some varnish with that.  :-O

You are right that we need to go down the road a bit before we know what
we need for 9.0 or 9.1.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.comPG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard
drive,Christ can be your backup. +
 


Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
On 2/26/10 6:57 AM, Richard Huxton wrote:
> 
> Can we not wait to cancel the transaction until *any* new lock is
> attempted though? That should protect all the single-statement
> long-running transactions that are already underway. Aggregates etc.

I like this approach.  Is it fragile in some non-obvious way?

--Josh Berkus


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
On 2/26/10 10:53 AM, Tom Lane wrote:
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing.  It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going to ship this year.

I don't think that publishing visibility info back to the master ... and
subsequently burdening the master substantially for each additional
slave ... are what most users want.  Certainly for use cases like NTT's,
it is, but not for most of our users.

In fact, I seem to remember specifically discussing the approach of
trying to publish snapshots back to the master, and rejecting it on this
list during the development of SR.

Does anyone know how Oracle solves these issues?  Does their structure
(separate rollback log) make it easier for them?

--Josh Berkus


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> On 2/26/10 10:53 AM, Tom Lane wrote:
>> I think that what we are going to have to do before we can ship 9.0
>> is rip all of that stuff out and replace it with the sort of closed-loop
>> synchronization Greg Smith is pushing.  It will probably be several
>> months before everyone is forced to accept that, which is why 9.0 is
>> not going to ship this year.

> I don't think that publishing visibility info back to the master ... and
> subsequently burdening the master substantially for each additional
> slave ... are what most users want.

I don't see a "substantial additional burden" there.  What I would
imagine is needed is that the slave transmits a single number back
--- its current oldest xmin --- and the walsender process publishes
that number as its transaction xmin in its PGPROC entry on the master.

I don't doubt that this approach will have its own gotchas that we
find as we get into it.  But it looks soluble.  I have no faith in
either the correctness or the usability of the approach currently
being pursued.
        regards, tom lane


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
> I don't see a "substantial additional burden" there.  What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.

If the main purpose of the slave is long-running queries, though, this
could cause a lot of bloat on the master.  That's a special case, but a
reason why we would want to preserve the stop replication functionality.

> I don't doubt that this approach will have its own gotchas that we
> find as we get into it.  But it looks soluble.  I have no faith in
> either the correctness or the usability of the approach currently
> being pursued.

So, why not start working on it now, instead of arguing about it?  It'll
be easy to prove the approach once we have some test code.


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> On 2/26/10 10:53 AM, Tom Lane wrote:
>>> I think that what we are going to have to do before we can ship 9.0
>>> is rip all of that stuff out and replace it with the sort of closed-loop
>>> synchronization Greg Smith is pushing.  It will probably be several
>>> months before everyone is forced to accept that, which is why 9.0 is
>>> not going to ship this year.
> 
>> I don't think that publishing visibility info back to the master ... and
>> subsequently burdening the master substantially for each additional
>> slave ... are what most users want.
> 
> I don't see a "substantial additional burden" there.  What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.

The additional burden comes from the old snapshot effect. It makes it
unusable for offloading reporting queries, for example. In general, it
is a very good architectural property that the master is not affected by
what happens in a standby, and a closed-loop synchronization would break
that.

I don't actually understand how tight synchronization on its own would
solve the problem. What if the connection to the master is lost? Do you
kill all queries in the standby before reconnecting?

One way to think about this is to first consider a simple a stop-and-go
system. Clearly the database must be consistent at any point in the WAL
sequence, if recovery was stopped and the database started up. So it is
always safe to pause recovery and run a read-only query against the
database as it is at that point in time (this assumes that the index
"cleanup" operations are not required for consistent query results BTW).
After the read-only transaction is finished, you can continue recovery.

The next step up is to relax that so that you allow replay of those WAL
records that are known to not cause trouble to the read-only queries.
For example, heap_insert records are very innocent, they only add rows
with a yet-uncommitted xmin.

Things get more complex when you allow the replay of commit records; all
the known-assigned-xids tracking is related to that, so that
transactions that are not committed when a snapshot is taken in the
standby to be considered uncommitted by the snapshot even after the
commit record is later replayed. If that feels too fragile, there might
be other methods to achieve that. One I once pondered is to not track
all in-progress transactions in shared memory like we do now, but only
OldestXmin. When a backend wants to take a snapshot in the slave, it
memcpy()s clog from OldestXmin to the latest committed XID, and includes
it in the snapshot. The visibility checks use the copy instead of the
actual clog, so they see the situation as it was when the snapshot was
taken. To keep track of the OldestXmin in the slave, the master can emit
that as a WAL record every now and then; it's ok if it lags behind.

Then there's the WAL record types that remove data that might still be
required by the read-only transactions. This includes vacuum and index
deletion records.


If you really think the current approach is unworkable, I'd suggest that
we fall back to a stop-and-go system, where you either let the recovery
to progress or allow queries to run, but not both at the same time. But
FWIW I don't think the situation is that grave.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't see a "substantial additional burden" there.  What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.

And when we want to support cascading slaves?

Or when you want to bring up a new slave and it suddenly starts
advertising a new xmin that's older than the current oldestxmin?

But in any case if I were running a reporting database I would want it
to just stop replaying logs for a few hours while my big batch report
runs, not cause the master to be unable to vacuum any dead records for
hours. That defeats much of the purpose of running the queries on the
slave.

--
greg


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I don't actually understand how tight synchronization on its own would
> solve the problem. What if the connection to the master is lost? Do you
> kill all queries in the standby before reconnecting?

Sure.  So what?  They'd have been killed if they individually lost
connections to the master (or the slave), too.

> [ assorted analysis based on WAL contents ]

The problem is all the interactions that are not reflected (historically
anyway) to WAL.  We already know about btree page reclamation interlocks
and relcache init files.  How many others are there, and how messy and
expensive is it going to be to deal with them?

> If you really think the current approach is unworkable, I'd suggest that
> we fall back to a stop-and-go system, where you either let the recovery
> to progress or allow queries to run, but not both at the same time. But
> FWIW I don't think the situation is that grave.

I might be wrong.  I hope for the sake of the project schedule that I am
wrong.  But I'm afraid that we will spend several months beavering away
to try to make the current approach solid and user-friendly, and
eventually conclude that it's a dead end.  It would be prudent to have
a Plan B; and it looks to me like closed-loop synchronization is the
best Plan B.  Putting off all thought about it for the next release
cycle seems like a recipe for a scheduling disaster.
        regards, tom lane


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
"Joshua D. Drake"
Дата:
On Fri, 2010-02-26 at 12:02 -0800, Josh Berkus wrote:
> > I don't see a "substantial additional burden" there.  What I would
> > imagine is needed is that the slave transmits a single number back
> > --- its current oldest xmin --- and the walsender process publishes
> > that number as its transaction xmin in its PGPROC entry on the master.
>
> If the main purpose of the slave is long-running queries, though, this
> could cause a lot of bloat on the master.  That's a special case, but a
> reason why we would want to preserve the stop replication functionality.
>

Do we really think that users, using the slave to run long-running
queries is a special case? One of the number one things I can see this
being used for is reporting....

Sincerely,

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Aidan Van Dyk
Дата:
* Greg Stark <gsstark@mit.edu> [100226 15:10]:
> On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I don't see a "substantial additional burden" there.  What I would
> > imagine is needed is that the slave transmits a single number back
> > --- its current oldest xmin --- and the walsender process publishes
> > that number as its transaction xmin in its PGPROC entry on the master.
>
> And when we want to support cascading slaves?
>
> Or when you want to bring up a new slave and it suddenly starts
> advertising a new xmin that's older than the current oldestxmin?
>
> But in any case if I were running a reporting database I would want it
> to just stop replaying logs for a few hours while my big batch report
> runs, not cause the master to be unable to vacuum any dead records for
> hours. That defeats much of the purpose of running the queries on the
> slave.

*I* would be quite happy having the stop--and-go and the closed-loop be
the only 2 modes of operation, and I'ld even be quite happy if the were
both limited to separate method:

1) Running SR - then you are forced to use a closed-loop
2) Running HS from a backup/archive - forced to use stop-n-go


#1 still needs to deal ith a slave "disappearing" and not advancing xmin
for a period (TCP timeout)?

I'll note that until SR does synchronous streaming rep (which will
likely require some close-loop plan to allow the slave to be hot), I
want situation #2, and hopefully the knob to control how long it allows
a "stop" before going again can be a HUP'able knob so I can change it
occasionally without taking the server down...

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hot Standby query cancellation and Streaming Replication integration

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't see a "substantial additional burden" there. �What I would
>> imagine is needed is that the slave transmits a single number back
>> --- its current oldest xmin --- and the walsender process publishes
>> that number as its transaction xmin in its PGPROC entry on the master.

> And when we want to support cascading slaves?

So?  Fits right in.  The walsender on the first-level slave is
advertising an xmin from the second-level one, which will be included in
what's passed back up to the master.

> Or when you want to bring up a new slave and it suddenly starts
> advertising a new xmin that's older than the current oldestxmin?

How's it going to do that, when it has no queries at the instant
of startup?

> But in any case if I were running a reporting database I would want it
> to just stop replaying logs for a few hours while my big batch report
> runs, not cause the master to be unable to vacuum any dead records for
> hours. That defeats much of the purpose of running the queries on the
> slave.

Well, as Heikki said, a stop-and-go WAL management approach could deal
with that use-case.  What I'm concerned about here is the complexity,
reliability, maintainability of trying to interlock WAL application with
slave queries in any sort of fine-grained fashion.
        regards, tom lane


Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
> Well, as Heikki said, a stop-and-go WAL management approach could deal
> with that use-case.  What I'm concerned about here is the complexity,
> reliability, maintainability of trying to interlock WAL application with
> slave queries in any sort of fine-grained fashion.

This sounds a bit brute-force, but what about simply having some form of
automatic query retry on the slave?

--Josh Berkus



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Fri, Feb 26, 2010 at 8:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> How's it going to do that, when it has no queries at the instant
> of startup?
>

Why shouldn't it have any queries at walreceiver startup? It has any
xlog segments that were copied from the master and any it can find in
the archive, it could easily reach a consistent point long before it
needs to connect to the master. If you really want to protect your
master from any additional overhead you don't currently need to
configure a streaming connection at all, you can just use the file
shipping interface.

-- 
greg


Re: Hot Standby query cancellation and Streaming Replication integration

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> Why shouldn't it have any queries at walreceiver startup? It has any
> xlog segments that were copied from the master and any it can find in
> the archive, it could easily reach a consistent point long before it
> needs to connect to the master. If you really want to protect your
> master from any additional overhead you don't currently need to
> configure a streaming connection at all, you can just use the file
> shipping interface.

There's *definitely* not going to be enough information in the WAL
stream coming from a master that doesn't think it has HS slaves.
We can't afford to record all that extra stuff in installations for
which it's just useless overhead.  BTW, has anyone made any attempt
to measure the performance hit that the patch in its current form is
creating via added WAL entries?
        regards, tom lane


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There's *definitely* not going to be enough information in the WAL
> stream coming from a master that doesn't think it has HS slaves.
> We can't afford to record all that extra stuff in installations for
> which it's just useless overhead.  BTW, has anyone made any attempt
> to measure the performance hit that the patch in its current form is
> creating via added WAL entries?

What extra entries?


--
greg


Re: Hot Standby query cancellation and Streaming Replication integration

От
Dimitri Fontaine
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> Well, as Heikki said, a stop-and-go WAL management approach could deal
> with that use-case.  What I'm concerned about here is the complexity,
> reliability, maintainability of trying to interlock WAL application with
> slave queries in any sort of fine-grained fashion.

Some admin functions for Hot Standby were removed from the path to ease
its integration, there was a pause() and resume() feature.

I think that offering this explicit control to the user would allow them
to choose between HA setup and reporting setup easily enough: just pause
the replay when running the reporting, resume it to get fresh data
again. If you don't pause, any query can get killed, replay is the
priority.

Now as far as the feedback loop is concerned, I guess the pause()
function would cause the slave to stop publishing any xmin in the
master's procarray so that it's free to vacuum and archive whatever it
wants to.

Should the slave accumulate too much lag, it will resume from the
archive rather than live from the SR link.

How much that helps?

Regards,
-- 
dim


Re: Hot Standby query cancellation and Streaming Replication integration

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's *definitely* not going to be enough information in the WAL
>> stream coming from a master that doesn't think it has HS slaves.
>> We can't afford to record all that extra stuff in installations for
>> which it's just useless overhead. �BTW, has anyone made any attempt
>> to measure the performance hit that the patch in its current form is
>> creating via added WAL entries?

> What extra entries?

Locks, just for starters.  I haven't read enough of the code yet to know
what else Simon added.  In the past it's not been necessary to record
any transient information in WAL, but now we'll have to.
        regards, tom lane


Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Dimitri Fontaine wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> > Well, as Heikki said, a stop-and-go WAL management approach could deal
> > with that use-case.  What I'm concerned about here is the complexity,
> > reliability, maintainability of trying to interlock WAL application with
> > slave queries in any sort of fine-grained fashion.
> 
> Some admin functions for Hot Standby were removed from the path to ease
> its integration, there was a pause() and resume() feature.
> 
> I think that offering this explicit control to the user would allow them
> to choose between HA setup and reporting setup easily enough: just pause
> the replay when running the reporting, resume it to get fresh data
> again. If you don't pause, any query can get killed, replay is the
> priority.

Doesn't the system already adjust the delay based on the length of slave
transactions, e.g. max_standby_delay.  It seems there is no need for a
user switch --- just max_standby_delay really high.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.comPG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard
drive,Christ can be your backup. +
 


Re: Hot Standby query cancellation and Streaming Replication integration

От
Dimitri Fontaine
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> Doesn't the system already adjust the delay based on the length of slave
> transactions, e.g. max_standby_delay.  It seems there is no need for a
> user switch --- just max_standby_delay really high.

Well that GUC looks like it allows to set a compromise between HA and
reporting, not to say "do not ever give the priority to the replay while
I'm running my reports". At least that's how I understand it.

The feedback loop might get expensive on master server when running
reporting queries on the slave, unless you can "pause" it explicitly I
think. I don't see how the system will guess that you're running a
reporting server rather than a HA node, and max_standby_delay is just a
way to tell the standby to please be nice in case of abuse.

Regards,
-- 
dim


Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Dimitri Fontaine wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Doesn't the system already adjust the delay based on the length of slave
> > transactions, e.g. max_standby_delay.  It seems there is no need for a
> > user switch --- just max_standby_delay really high.
> 
> Well that GUC looks like it allows to set a compromise between HA and
> reporting, not to say "do not ever give the priority to the replay while
> I'm running my reports". At least that's how I understand it.

Well, if you set it high, it effectively is that.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.comPG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard
drive,Christ can be your backup. +
 


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Tom Lane wrote:
> I don't see a "substantial additional burden" there.  What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.
>   

That is exactly the core idea I was trying to suggest in my rambling 
message.  Just that small additional bit of information transmitted and 
published to the master via that route, and it's possible to optimize 
this problem in a way not available now.  And it's a way that I believe 
will feel more natural to some users who may not be well served by any 
of the existing tuning possibilities.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Bruce Momjian wrote:
> Doesn't the system already adjust the delay based on the length of slave
> transactions, e.g. max_standby_delay.  It seems there is no need for a
> user switch --- just max_standby_delay really high.
>   

The first issue is that you're basically saying "I don't care about high 
availability anymore" when you increase max_standby_delay to a high 
value.  Want to offload an 8 hour long batch report every day to the 
standby?  You can do it with max_standby_delay=8 hours.  But the day 
your master crashes 7 hours into that, you're in for a long wait before 
your standby is available while it replays all the queued up segments.  
Your 'hot standby' has actually turned into the old form of 'cold 
standby' just when you need it to be responsive.

This is also the reason why the whole "pause recovery" idea is a 
fruitless path to wander down.  The whole point of this feature is that 
people have a secondary server available for high-availability, *first 
and foremost*, but they'd like it to do something more interesting that 
leave it idle all the time.  The idea that you can hold off on applying 
standby updates for long enough to run seriously long reports is 
completely at odds with the idea of high-availability.

The second major problem is that the day the report actually takes 8.1 
hours instead, because somebody else ran another report that slowed you 
down a little, you're screwed if that's something you depend on being 
available--it just got canceled only *after* wasting 8 hours of 
reporting resource time.

max_standby_delay is IMHO only useful for allowing non-real-time web-app 
style uses of HS (think "Facebook status updates"), where you say "I'm 
OK giving people slightly out of date info sometimes if it lets me split 
the query load over two systems".  Set max_standby_delay to a few 
seconds or maybe a minute, enough time to service a typical user query, 
make your app tolerate the occasional failed query and only send big 
ones to the master, and you've just scaled up all the small ones.  
Distributed queries with "eventual consistency" on all nodes is where 
many of the web app designs are going, and this feature is a reasonable 
match for that use case.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Bruce Momjian wrote:
>     5     Early cleanup of data still visible to the current query's
>           snapshot
>
> #5 could be handled by using vacuum_defer_cleanup_age on the master.
> Why is vacuum_defer_cleanup_age not listed in postgresql.conf?
>   

I noticed that myself and fired off a corrective patch to Simon 
yesterday, he said it was intentional but not sure why that is yet. 
We'll sort that out.

You are correct that my suggestion is targeting primarily #5 on this 
list. There are two problems with the possible solutions using that 
parameter though:

-vacuum_defer_cleanup_age is set in a unit that people cannot be 
expected to work in--transactions ids. The UI is essentially useless, 
and there's no obvious way how to make a better one. The best you can do 
will still be really fragile.

-If you increase vacuum_defer_cleanup_age, it's active all the time. 
You're basically making every single transaction that could be cleaned 
up pay for the fact that a query *might* be running on the standby it 
needs to avoid.

You can think of the idea of passing an xmin back from the standby as 
being like an auto-tuning vacuum_defer_cleanup_age. It's 0 when no 
standby queries are running, but grows in size to match longer ones. And 
you don't have to have to know anything to set it correctly; just toggle 
on the proposed "feedback xid from the standby" feature and you're safe.

Expecting that anyone will ever set vacuum_defer_cleanup_age correctly 
in the field in its current form is pretty unreasonable I think. Since 
there's no timestamp-based memory of past xid activity, it's difficult 
to convert it to that form instead, and I think something in terms of 
time is what people would like to set this in.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
> That is exactly the core idea I was trying to suggest in my rambling
> message.  Just that small additional bit of information transmitted and
> published to the master via that route, and it's possible to optimize
> this problem in a way not available now.  And it's a way that I believe
> will feel more natural to some users who may not be well served by any
> of the existing tuning possibilities.

Well, if both you and Tom think it would be relatively easy (or at least
easier that continuing to pursue query cancel troubleshooting), then
please start coding it.  It was always a possible approach, we just
collectively thought that query cancel would be easier.

--Josh Berkus



Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Greg Smith wrote:
> Bruce Momjian wrote:
> > Doesn't the system already adjust the delay based on the length of slave
> > transactions, e.g. max_standby_delay.  It seems there is no need for a
> > user switch --- just max_standby_delay really high.
> >   
> 
> The first issue is that you're basically saying "I don't care about high 
> availability anymore" when you increase max_standby_delay to a high 
> value.  Want to offload an 8 hour long batch report every day to the 
> standby?  You can do it with max_standby_delay=8 hours.  But the day 
> your master crashes 7 hours into that, you're in for a long wait before 
> your standby is available while it replays all the queued up segments.  
> Your 'hot standby' has actually turned into the old form of 'cold 
> standby' just when you need it to be responsive.

Well, I think the choice is either you delay vacuum on the master for 8
hours or pile up 8 hours of WAL files on the slave, and delay
application, and make recovery much slower.  It is not clear to me which
option a user would prefer because the bloat on the master might be
permanent.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.comPG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard
drive,Christ can be your backup. +
 


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Fri, Feb 26, 2010 at 11:56 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> This is also the reason why the whole "pause recovery" idea is a fruitless
> path to wander down.  The whole point of this feature is that people have a
> secondary server available for high-availability, *first and foremost*, but
> they'd like it to do something more interesting that leave it idle all the
> time.  The idea that you can hold off on applying standby updates for long
> enough to run seriously long reports is completely at odds with the idea of
> high-availability.

Well you can go sit in the same corner as Simon with your high
availability servers.

I want my ability to run large batch queries without any performance
or reliability impact on the primary server.

You can have one or the other but you can't get both. If you set
max_standby_delay low then you get your high availability server, if
you set it high you get a useful report server.

If you build sync replication which we don't have today and which will
open another huge can of usability worms when we haven't even finish
bottling the two we've already opened then you lose the lack of impact
on the primary. Suddenly the queries you run on the slaves cause your
production database to bloat. Plus you have extra network connections
which take resources on your master and have to be kept up at all
times or you lose your slaves.

I think the design constraint of not allowing any upstream data flow
is actually very valuable. Eventually we'll have it for sync
replication but it's much better that we've built things incrementally
and can be sure that nothing really depends on it for basic
functionality. This is what allows us to know that the slave imposes
no reliability impact on the master. It's what allows us to know that
everything will work identically regardless of whether you have a
walreceiver running or are running off archived log files.

Remember I wanted to entirely abstract away the walreciever and allow
multiple wal communication methods. I think it would make more sense
to use something like Spread to distribute the logs so the master only
has to send them once and as many slaves as you want can pick them up.
The current architecture doesn't scale very well if you want to have
hundreds of slaves for one master.


--
greg


Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Greg Smith wrote:
> You can think of the idea of passing an xmin back from the standby as 
> being like an auto-tuning vacuum_defer_cleanup_age. It's 0 when no 
> standby queries are running, but grows in size to match longer ones. And 
> you don't have to have to know anything to set it correctly; just toggle 
> on the proposed "feedback xid from the standby" feature and you're safe.

Yes, there is no question there is value in passing the xid back to the
master.  My point is that it is like your blog entry:
http://blog.2ndquadrant.com/en/2010/02/tradeoffs-in-hot-standby-deplo.html

you can't have all the options:
1.  agressive vacuum on master2.  fast recovery of slave3.  no query cancel on slave

Again, pick two.  Passing the xid back to the master gives you #2 and
#3, and that might be good for some people, but not others.  Do we have
any idea which options most administrators would want?  If we get xid
passback to the master, do we keep the other configuration options?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.comPG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard
drive,Christ can be your backup. +
 


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Greg Stark wrote:
> Well you can go sit in the same corner as Simon with your high
> availability servers.
>
> I want my ability to run large batch queries without any performance
> or reliability impact on the primary server.
>   

Thank you for combining a small personal attack with a selfish 
commentary about how yours is the only valid viewpoint.  Saves me a lot 
of trouble replying to your messages, can just ignore them instead if 
this is how you're going to act.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Sat, Feb 27, 2010 at 1:53 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Greg Stark wrote:
>>
>> Well you can go sit in the same corner as Simon with your high
>> availability servers.
>>
>> I want my ability to run large batch queries without any performance
>> or reliability impact on the primary server.
>>
>
> Thank you for combining a small personal attack with a selfish commentary
> about how yours is the only valid viewpoint.  Saves me a lot of trouble
> replying to your messages, can just ignore them instead if this is how
> you're going to act.

Eh? That's not what I meant at all. Actually it's kind of the exact
opposite of what I meant.

What I meant was that your description of the "High Availability first
and foremost" is only one possible use case. Simon in the past
expressed the same single-minded focus on that use case. It's a
perfectly valid use case and I would probably agree if we had to
choose just one it would be the most important.

But we don't have to choose just one. There are other valid use cases
such as load balancing and isolating your large batch queries from
your production systems. I don't want us to throw out all these other
use cases because we only consider high availability as the only use
case we're interested in.


--
greg


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Bruce Momjian wrote:
> Well, I think the choice is either you delay vacuum on the master for 8
> hours or pile up 8 hours of WAL files on the slave, and delay
> application, and make recovery much slower.  It is not clear to me which
> option a user would prefer because the bloat on the master might be
> permanent.
>   

But if you're running the 8 hour report on the master right now, aren't 
you already exposed to a similar pile of bloat issues while it's going?  
If I have the choice between "sometimes queries will get canceled" vs. 
"sometimes the master will experience the same long-running transaction 
bloat issues as in earlier versions even if the query runs on the 
standby", I feel like leaning toward the latter at least leads to a 
problem people are used to. 

This falls into the principle of least astonishment category to me.  
Testing the final design for how transactions get canceled here led me 
to some really unexpected situations, and the downside for a mistake is 
"your query is lost".  Had I instead discovered that sometimes 
long-running transactions on the standby can ripple back to cause a 
maintenance slowdown on the master, that's not great.  But it would not 
have been so surprising, and it won't result in lost query results. 

I think people will expect that their queries cancel because of things 
like DDL changes.  And the existing knobs allow inserting some slack for 
things like locks taking a little bit of time to acquire sometimes.  
What I don't think people will see coming is that a routine update on an 
unrelated table is going to kill a query they might have been waiting 
hours for the result of, just because that update crossed an autovacuum 
threshold for the other table and introduced a dead row cleanup.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Heikki Linnakangas wrote:
> One such landmine is that the keepalives need to flow from client to
> server while the WAL records are flowing from server to client. We'll
> have to crack that problem for synchronous replication too, but I think
> that alone is a big enough problem to make this 9.1 material.
>   

This seems to be the real sticking point then, given that the 
xmin/PGPROC side on the master seems logically straightforward.  For 
some reason I thought the sync rep feature had the reverse message flow 
already going, and that some other sort of limitation just made it 
impractical to merge into the main codebase this early.  My hope was 
that just this particular part could get cherry-picked out of there, and 
that it might even have been thought about already in that context given 
the known HS keepalive "serious issue".  If there was a solution or 
partial solution in progress to that floating around, my thought was 
that just piggybacking this extra xid info on top of it would be easy 
enough.

If there's not already a standby to primary communications backchannel 
implementation available that can be harvested from that work, your 
suggestion that this may not be feasible at all for 9.0 seems like a 
more serious concern than I had thought it was going to be.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Greg Smith wrote:
> Heikki Linnakangas wrote:
> > One such landmine is that the keepalives need to flow from client to
> > server while the WAL records are flowing from server to client. We'll
> > have to crack that problem for synchronous replication too, but I think
> > that alone is a big enough problem to make this 9.1 material.
> >   
> 
> This seems to be the real sticking point then, given that the 
> xmin/PGPROC side on the master seems logically straightforward.  For 
> some reason I thought the sync rep feature had the reverse message flow 
> already going, and that some other sort of limitation just made it 
> impractical to merge into the main codebase this early.  My hope was 
> that just this particular part could get cherry-picked out of there, and 
> that it might even have been thought about already in that context given 
> the known HS keepalive "serious issue".  If there was a solution or 
> partial solution in progress to that floating around, my thought was 
> that just piggybacking this extra xid info on top of it would be easy 
> enough.
> 
> If there's not already a standby to primary communications backchannel 
> implementation available that can be harvested from that work, your 
> suggestion that this may not be feasible at all for 9.0 seems like a 
> more serious concern than I had thought it was going to be.

I suspect the master could connect to the slave to pull an xid.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.comPG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard
drive,Christ can be your backup. +
 


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Greg Stark wrote:
> Eh? That's not what I meant at all. Actually it's kind of the exact
> opposite of what I meant.
>   

Sorry about that--I think we just hit one of those language usage drift 
bits of confusion.  "Sit in the corner" has a very negative tone to it 
in US English and I interpreted your message badly as a result.  A 
Google search for images using that phrase will quickly show you what I 
mean.

> What I meant was that your description of the "High Availability first
> and foremost" is only one possible use case. Simon in the past
> expressed the same single-minded focus on that use case. It's a
> perfectly valid use case and I would probably agree if we had to
> choose just one it would be the most important.
>   

Sure, there are certainly others, and as much as possible more 
flexibility here is a good thing.  What I was suggesting is that if the 
only good way to handle long-running queries has no choice but to 
sacrifice high-availability, which is is the situation if 
max_standby_delay is the approach you use, then the most obvious users 
for this feature are not being well served by that situation.  I would 
guess a large portion of the users looking forward to Hot Standby are in 
the "have an underutilized high-availability standby I'd like to use for 
offloading long running reports", and if there is no way to serve them 
well this feature is missing the mark a bit. 

You really can't do any better without better master/standby integration 
though, and as pointed out a couple of times here that was considered 
and just not followed through on yet.  I'm increasingly concerned that 
nothing else will really do though.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and StreamingReplication integration

От
"Joshua D. Drake"
Дата:
On Fri, 2010-02-26 at 12:02 -0800, Josh Berkus wrote:
> > I don't see a "substantial additional burden" there.  What I would
> > imagine is needed is that the slave transmits a single number back
> > --- its current oldest xmin --- and the walsender process publishes
> > that number as its transaction xmin in its PGPROC entry on the master.
> 
> If the main purpose of the slave is long-running queries, though, this
> could cause a lot of bloat on the master.  That's a special case, but a
> reason why we would want to preserve the stop replication functionality.
> 

Do we really think that users, using the slave to run long-running
queries is a special case? One of the number one things I can see this
being used for is reporting....

Sincerely,

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Sat, Feb 27, 2010 at 2:43 AM, Greg Smith <greg@2ndquadrant.com> wrote:
>
> But if you're running the 8 hour report on the master right now, aren't you
> already exposed to a similar pile of bloat issues while it's going?  If I
> have the choice between "sometimes queries will get canceled" vs. "sometimes
> the master will experience the same long-running transaction bloat issues as
> in earlier versions even if the query runs on the standby", I feel like
> leaning toward the latter at least leads to a problem people are used to.

If they move from running these batch queries on the master to running
them on the slave then sure, the situation will be no worse than
before.

But if they move from having a plain old PITR warm standby to having
one they can run queries on they might well assume that the big
advantage of having the standby to play with is precisely that they
can do things there that they have never been able to do on the master
previously without causing damage.

I agree that having queries randomly and unpredictably canceled is
pretty awful. My argument was that max_standby_delay should default to
infinity on the basis that any other value has to be picked based on
actual workloads and SLAs.

My feeling is that there are probably only two types of configurations
that make sense, a HA replica with a low max_standby_delay or a
reporting replica with a high max_standby_delay. Any attempt to
combine the two on the same system will only work if you know your
application well and can make compromises with both.

I would also like to be able to handle load balancing read-only
queries but that will be really tricky. You want up-to-date data and
you want to be able to run moderately complex queries. That kind of
workload may well require synchronous replication to really work
properly.

--
greg


Re: Hot Standby query cancellation and Streaming Replication integration

От
"Joshua D. Drake"
Дата:
On Sat, 27 Feb 2010 00:43:48 +0000, Greg Stark <gsstark@mit.edu> wrote:
> On Fri, Feb 26, 2010 at 11:56 PM, Greg Smith <greg@2ndquadrant.com>
wrote:
>> This is also the reason why the whole "pause recovery" idea is a
>> fruitless
>> path to wander down.  The whole point of this feature is that people
>> have a
>> secondary server available for high-availability, *first and foremost*,
>> but
>> they'd like it to do something more interesting that leave it idle all
>> the
>> time.  The idea that you can hold off on applying standby updates for
>> long
>> enough to run seriously long reports is completely at odds with the
idea
>> of
>> high-availability.

> I want my ability to run large batch queries without any performance
> or reliability impact on the primary server.

+1

I can use any number of other technologies for high availability.

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org  Consulting, Development, Support, Training  503-667-4564 -
http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
 


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Fri, Feb 26, 2010 at 9:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>
>> What extra entries?
>
> Locks, just for starters.  I haven't read enough of the code yet to know
> what else Simon added.  In the past it's not been necessary to record
> any transient information in WAL, but now we'll have to.

Haven't we been writing locks to the WAL since two-phase commit?

--
greg


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Joshua D. Drake wrote:
> On Sat, 27 Feb 2010 00:43:48 +0000, Greg Stark <gsstark@mit.edu> wrote:
>   
>> I want my ability to run large batch queries without any performance
>> or reliability impact on the primary server.
>>     
>
> +1
>
> I can use any number of other technologies for high availability.
>   

Remove "must be an instant-on failover at the same time" from the 
requirements and you don't even need 9.0 to handle that, this has been a 
straightforward to solve problem since 8.2.  It's the combination of HA 
and queries that make things hard to do.

If you just want batch queries on another system without being concerned 
about HA at the same time, the first option is to just fork the base 
backup and WAL segment delivery to another server and run queries there.

Some simple filesystem snapshot techniques will also suffice to handle 
it all on the same standby.  Stop warm standby recovery, snapshot, 
trigger the server, run your batch job; once finished, rollback to the 
snapshot, grab the latest segment files, and resume standby catchup.  
Even the lame Linux LVM snapshot features can handle that job--one of my 
coworkers has the whole thing scripted even this is so common.

And if you have to go live because there's a failover, you're back to 
the same "cold standby" situation a large max_standby_delay puts you at, 
so it's not even very different from what you're going to get in 9.0 if 
this is your priority mix.  The new version is just lowering the 
operational complexity involved.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Greg Stark wrote:
> But if they move from having a plain old PITR warm standby to having
> one they can run queries on they might well assume that the big
> advantage of having the standby to play with is precisely that they
> can do things there that they have never been able to do on the master
> previously without causing damage.
>   

Just not having the actual query running on the master is such a 
reduction in damage that I think it's delivering the essence of what 
people are looking for regardless.  That it might be possible in some 
cases to additionally avoid the overhead that comes along with any 
long-running query is a nice bonus, and it's great the design allows for 
that possibility.  But if that's only possible with risk, heavy 
tweaking, and possibly some hacks, I'm not sure that's making the right 
trade-offs for everyone.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Aidan Van Dyk
Дата:
* Greg Smith <greg@2ndquadrant.com> [100226 23:39]:

> Just not having the actual query running on the master is such a  
> reduction in damage that I think it's delivering the essence of what  
> people are looking for regardless.  That it might be possible in some  
> cases to additionally avoid the overhead that comes along with any  
> long-running query is a nice bonus, and it's great the design allows for  
> that possibility.  But if that's only possible with risk, heavy  
> tweaking, and possibly some hacks, I'm not sure that's making the right  
> trade-offs for everyone.

Would we (ya, the royal we) be willing to say that if you want the
benifit of removing the MVCC overhead of long-running queries you need
to run PITR backup/archive recovery, and if you want SR, you get a
closed-loop master-follows-save-xmin behaviour?

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Aidan Van Dyk wrote:
> Would we (ya, the royal we) be willing to say that if you want the
> benifit of removing the MVCC overhead of long-running queries you need
> to run PITR backup/archive recovery, and if you want SR, you get a
> closed-loop master-follows-save-xmin behaviour?
>   

To turn that question around a little, I think it's reasonable to say 
that closed-loop master-follows-slave-xmin behavior is only practical to 
consider implementing with SR--and even there, it should be optional 
rather than required until there's more field experience on the whole 
thing.  Whether it's the default or not could take a bit of debate to 
sort out too.

If you think of it in those terms, the idea that "you need to run PITR 
backup/archive recovery" to not get that behavior isn't an important 
distinction anymore.  If you run SR with the option enabled you could 
get it, any other setup and you won't.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Heikki Linnakangas
Дата:
Greg Stark wrote:
> On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's *definitely* not going to be enough information in the WAL
>> stream coming from a master that doesn't think it has HS slaves.
>> We can't afford to record all that extra stuff in installations for
>> which it's just useless overhead.  BTW, has anyone made any attempt
>> to measure the performance hit that the patch in its current form is
>> creating via added WAL entries?
> 
> What extra entries?

* An xact-assignment record is written every PGPROC_MAX_CACHED_SUBXIDS
(= 64) subtransaction ids assigned to a single top-level transaction.

* A running-xacts record is written at every online checkpoint

* A btree-reuse-page record is written whenever a dead b-tree page is
recycled

* A vacuum cleanup-info record is written once per VACUUM of a table

* A standby-lock record is written for each AccessExclusiveLock acquired.

Am I missing something?

I doubt any of these are noticeable, though I don't think anyone has
measured it.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Heikki Linnakangas
Дата:
Josh Berkus wrote:
>> That is exactly the core idea I was trying to suggest in my rambling
>> message.  Just that small additional bit of information transmitted and
>> published to the master via that route, and it's possible to optimize
>> this problem in a way not available now.  And it's a way that I believe
>> will feel more natural to some users who may not be well served by any
>> of the existing tuning possibilities.
> 
> Well, if both you and Tom think it would be relatively easy (or at least
> easier that continuing to pursue query cancel troubleshooting), then
> please start coding it.  It was always a possible approach, we just
> collectively thought that query cancel would be easier.

You still need query cancels. A feedback loop just makes it happen less
frequently.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby query cancellation and Streaming Replication integration

От
Heikki Linnakangas
Дата:
Dimitri Fontaine wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Doesn't the system already adjust the delay based on the length of slave
>> transactions, e.g. max_standby_delay.  It seems there is no need for a
>> user switch --- just max_standby_delay really high.
> 
> Well that GUC looks like it allows to set a compromise between HA and
> reporting, not to say "do not ever give the priority to the replay while
> I'm running my reports". At least that's how I understand it.

max_standby_delay=-1 does that. The documentation needs to be updated to
reflect that, it currently says:

> There is no wait-forever setting because of the potential for deadlock which that setting would introduce. This
parametercan only be set in the postgresql.conf  file or on the server command line. 
 

but that is false, -1 means wait forever. Simon removed that option at
one point, but it was later put back and apparently the documentation
was never updated.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby query cancellation and Streaming Replication integration

От
Heikki Linnakangas
Дата:
Heikki Linnakangas wrote:
> Dimitri Fontaine wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> Doesn't the system already adjust the delay based on the length of slave
>>> transactions, e.g. max_standby_delay.  It seems there is no need for a
>>> user switch --- just max_standby_delay really high.
>> Well that GUC looks like it allows to set a compromise between HA and
>> reporting, not to say "do not ever give the priority to the replay while
>> I'm running my reports". At least that's how I understand it.
> 
> max_standby_delay=-1 does that. The documentation needs to be updated to
> reflect that, it currently says:
> 
>> There is no wait-forever setting because of the potential for deadlock which that setting would introduce. This
parametercan only be set in the postgresql.conf  file or on the server command line. 
 
> 
> but that is false, -1 means wait forever. Simon removed that option at
> one point, but it was later put back and apparently the documentation
> was never updated.

I've put back the mention of max_standby_delay=-1 option in the docs.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Greg Smith wrote:
> Joshua D. Drake wrote:
> > On Sat, 27 Feb 2010 00:43:48 +0000, Greg Stark <gsstark@mit.edu> wrote:
> >   
> >> I want my ability to run large batch queries without any performance
> >> or reliability impact on the primary server.
> >>     
> >
> > +1
> >
> > I can use any number of other technologies for high availability.
> >   
> 
> Remove "must be an instant-on failover at the same time" from the 
> requirements and you don't even need 9.0 to handle that, this has been a 
> straightforward to solve problem since 8.2.  It's the combination of HA 
> and queries that make things hard to do.
> 
> If you just want batch queries on another system without being concerned 
> about HA at the same time, the first option is to just fork the base 
> backup and WAL segment delivery to another server and run queries there.

That is a lot of administrative overhead. It is hard to say it is
equivalent to HS.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.comPG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard
drive,Christ can be your backup. +
 


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Tatsuo Ishii
Дата:
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing.  It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going to ship this year.

Oh, 9.0 will not be shipped in 2010? You guys share same opinion as
Tom?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
Greg,

> If you think of it in those terms, the idea that "you need to run PITR
> backup/archive recovery" to not get that behavior isn't an important
> distinction anymore.  If you run SR with the option enabled you could
> get it, any other setup and you won't.

+1.

I always expected that we'd get this kind of behavior with "synch" in
9.1.  I can see that there are two desired modes of behaviour depending
on what the replication config is:

1) Master full-speed, degraded operation on slaves: this is the current
wal_standby_delay approach.  It has the advantage of supporting possibly
hundreds of slaves, and certainly dozens.

2) Master burdened, full operation on slaves:  this is the
publish-xmin-back-to-master approach, which IIRC the core team first
discussed at pgCon 2008 before Simon started work, and which you and Tom
seem to think can be done soon.

I can see people wanting to use either mode depending on their use-case.Or, for that matter, using both modes to
differentslaves.
 

Now that I think about it, the xmin thing really doesn't seem
conceptually difficult.  If the slave just opens a 2nd, special query
connection back to the master and publishes its oldest xmin there, as
far as the master is concerned, it's just another query backend.
Could it be that easy?

Also, I'm serious about what I suggested earlier for "delay" mode.  We
should have an option for cancelled queries to be immediately retried,
if that's feasible.  It would turn something which is now a major
application design issue (lots of cancelled queries) into just degrated
performance.

Overall, though, I'd say that getting 9.0 out the door relatively
on-time is more important than getting it perfect.  "Release early,
release often" isn't just a mantra; it's a very good idea if you want
your project to keep improving and not bog down and fall apart.

--Josh Berkus


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Josh Berkus wrote:

> Now that I think about it, the xmin thing really doesn't seem
> conceptually difficult.  If the slave just opens a 2nd, special query
> connection back to the master and publishes its oldest xmin there, as
> far as the master is concerned, it's just another query backend.
> Could it be that easy?
>   

Something just like that is in fact already suggested as a workaround in 
the Hot Standby manual:

"The first option is to connect to the primary server and keep a query 
active for as long as needed to run queries on the standby. This 
guarantees that a WAL cleanup record is never generated and query 
conflicts do not occur, as described above. This could be done using 
contrib/dblink  and pg_sleep(), or via other mechanisms."

And the idea of doing it mainly in client land has its attractions. 

The main reason I wandered toward asking about it in the context of SR 
is that there's already this open "Standby delay on idle system" issue 
with Hot Standby, and the suggested resolution for that problem involves 
publishing keep-alive data with timestamps over SR.  While all these 
problems and potential solutions have been floating around for a long 
time, as you pointed out, the little flash of insight I had here was 
that it's possible to bundle these two problems together with a combined 
keep-alive timestamp+xmin message that goes in both directions.  That 
removes one serious Hot Standby issue altogether, and adds an additional 
conflict avoidance mechanism for people who want to enable it, all with 
something that needs to get done sooner or later anyway for sync rep.

The part I still don't have good visibility on is how much of the 
necessary SR infrastructure needed to support this communications 
channel is already available in some form.  I had though the walsender 
on the master was already receiving messages sometimes from the 
walreceiver on the standby, but I'm getting the impression from Heikki's 
comments that this not the case at all yet.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
> The part I still don't have good visibility on is how much of the
> necessary SR infrastructure needed to support this communications
> channel is already available in some form.  I had though the walsender
> on the master was already receiving messages sometimes from the
> walreceiver on the standby, but I'm getting the impression from Heikki's
> comments that this not the case at all yet.

I don't think asking for a 2nd connection back from the standby to the
master would be horrible for 9.0.  I think it would be quite reasonable,
actually; even with 2 connections per slave, you could still have quite
a few slaves before the # of connections bogged down the master.

--Josh


Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
> Thank you for combining a small personal attack with a selfish
> commentary about how yours is the only valid viewpoint.  Saves me a lot
> of trouble replying to your messages, can just ignore them instead if
> this is how you're going to act.

Hey, take it easy!  I read Stark's post as tongue-in-cheek, which I
think it was.

Though, Stark, if you're going to be flip, I'd suggest using a smiley
next time.

--Josh



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Josh Berkus wrote:
> Hey, take it easy!  I read Stark's post as tongue-in-cheek, which I
> think it was.
>   

Yeah, I didn't get that.  We've already exchanged mutual off-list 
apologies for the misunderstanding in both directions, I stopped just 
short of sending flowers.

I did kick off this discussion with noting a clear preference this not 
wander into any personal finger-pointing.  And I am far too displeased 
with the technical situation here to have much of a sense of humor left 
about it either.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Robert Haas
Дата:
On Fri, Feb 26, 2010 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>> In the model you describe any long-lived queries on the slave cause
>> tables in the master to bloat with dead records.
>
> Yup, same as they would do on the master.
>
>> I think this model is on the roadmap but it's not appropriate for
>> everyone and I think one of the benefits of having delayed it is that
>> it forces us to get the independent model right before throwing in
>> extra complications. It would be too easy to rely on the slave
>> feedback as an answer for hard questions about usability if we had it
>> and just ignore the question of what to do when it's not the right
>> solution for the user.
>
> I'm going to make an unvarnished assertion here.  I believe that the
> notion of synchronizing the WAL stream against slave queries is
> fundamentally wrong and we will never be able to make it work.
> The information needed isn't available in the log stream and can't be
> made available without very large additions (and consequent performance
> penalties).  As we start getting actual beta testing we are going to
> uncover all sorts of missed cases that are not going to be fixable
> without piling additional ugly kluges on top of the ones Simon has
> already crammed into the system.  Performance and reliability will both
> suffer.
>
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing.  It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going to ship this year.

Somewhat unusually for me, I haven't been able to keep up with my
email over the last few days, so I'm weighing in on this one a bit
late.  It seems to me that if we're forced to pass the xmin from the
slave back to the master, that would be a huge step backward in terms
of both scalability and performance, so I really hope it doesn't come
to that.  I wish I understood better exactly what you mean by "the
notion of synchronizing the WAL stream against slave queries" and why
you don't think it will work.  Can you elaborate?

...Robert


Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Greg Smith wrote:
> Josh Berkus wrote:
> 
> > Now that I think about it, the xmin thing really doesn't seem
> > conceptually difficult.  If the slave just opens a 2nd, special query
> > connection back to the master and publishes its oldest xmin there, as
> > far as the master is concerned, it's just another query backend.
> > Could it be that easy?
> >   
> 
> Something just like that is in fact already suggested as a workaround in 
> the Hot Standby manual:
> 
> "The first option is to connect to the primary server and keep a query 
> active for as long as needed to run queries on the standby. This 
> guarantees that a WAL cleanup record is never generated and query 
> conflicts do not occur, as described above. This could be done using 
> contrib/dblink  and pg_sleep(), or via other mechanisms."

I am unclear how you would easily advance the snapshot as each query
completes on the slave.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Bruce Momjian wrote:
>> "The first option is to connect to the primary server and keep a query 
>> active for as long as needed to run queries on the standby. This 
>> guarantees that a WAL cleanup record is never generated and query 
>> conflicts do not occur, as described above. This could be done using 
>> contrib/dblink  and pg_sleep(), or via other mechanisms."
>>     
>
> I am unclear how you would easily advance the snapshot as each query
> completes on the slave.
>   

The idea of the workaround is that if you have a single long-running 
query to execute, and you want to make sure it doesn't get canceled 
because of a vacuum cleanup, you just have it connect back to the master 
to keep an open snapshot the whole time.  That's basically the same idea 
that vacuum_defer_cleanup_age implements, except you don't have to 
calculate a value--you just hold open the snapshot to do it.

When that query ended, its snapshot would be removed, and then the 
master would advance to whatever the next latest one is.  Nothing 
fancier than that.  The only similarity is that if you made every query 
that happened on the standby do that, it would effectively be the same 
behavior I'm suggesting could be available via the standby->master xmin 
publication.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Sun, Feb 28, 2010 at 5:28 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> The idea of the workaround is that if you have a single long-running query
> to execute, and you want to make sure it doesn't get canceled because of a
> vacuum cleanup, you just have it connect back to the master to keep an open
> snapshot the whole time.  That's basically the same idea that
> vacuum_defer_cleanup_age implements, except you don't have to calculate a
> value--you just hold open the snapshot to do it.

This sounds like it would require a separate connection for each
client on the replica. That would be a pretty big burden for the
master.

Also, I'm not sure this actually works. When your client makes this
additional connection to the master it's connecting at some
transaction in the future from the slave's point of view. The master
could have already vacuumed away some record which the snapshot the
client gets on the slave will have in view. Even if you defer taking
the snapshot on the slave until after connecting to the master it's
still possibly "in the past" compared to the xmin on the master.

I think to make this work you would have to connect to the master,
establish a snapshot, then fetch pg_current_xlog_location(), then poll
the slave and wait until it reaches that same position -- and only
then perform your query taking care to establish a fresh snapshot for
it such as by starting a new transaction on the slave. That's a lot of
effort to go to.

Still it's a handy practical trick even if it isn't 100% guaranteed to
work. But I don't think it provides the basis for something we can
bake in.


--
greg


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Robert Haas wrote:
> It seems to me that if we're forced to pass the xmin from the
> slave back to the master, that would be a huge step backward in terms
> of both scalability and performance, so I really hope it doesn't come
> to that.

Not forced to--have the option of.  There are obviously workloads where 
you wouldn't want this.  At the same time, I think there are some pretty 
common ones people are going to expect HS+SR to work on transparently 
where this would obviously be the preferred trade-off to make, were it 
available as one of the options.  The test case I put together shows an 
intentionally pathological but not completely unrealistic example of 
such a workload.

> I wish I understood better exactly what you mean by "the
> notion of synchronizing the WAL stream against slave queries" and why
> you don't think it will work.  Can you elaborate?
>   

There's this constant WAL stream coming in from the master to the 
slave.  Each time the slave is about to apply a change from that stream, 
it considers "will this disrupt one of the queries I'm already 
executing?".  If so, it has to make a decision about what to do; that's 
where the synchronization problem comes from.

The current two options are "delay applying the change", at which point 
the master and standby will drift out of sync until the query ends and 
it can catch back up, or "cancel the query".  There are tunables for 
each of these, and they all seem to work fine (albeit without too much 
testing in the field yet).  My concern is that the tunable that tries to 
implement the other thing you might want to optimize for--"avoid letting 
the master generate WAL entires that are the most likely ones to 
conflict"--just isn't very usable in its current form.

Tom and I don't see completely eye to eye on this, in that I'm not so 
sure the current behaviors are "fundamentally wrong and we will never be 
able to make [them] work".  If that's really the case, you may not ever 
get the scalability/performance results you're hoping for from this 
release, and really we're all screwed if those are the only approaches 
available.

What I am sure of is that a SR-based xmin passing approach is simpler, 
easier to explain, more robust for some common workloads, and less 
likely to give surprised "wow, I didn't think *that* would cancel my 
standby query" reports from the field than any way you can configure Hot 
Standby alone right now.  And since I never like to bet against Tom's 
gut feel, having it around as a "plan B" in case he's right about an 
overwhelming round of bug reports piling up against the 
max_standby_delay etc. logic doesn't hurt either.

I spent a little time today seeing if there was any interesting code I 
might steal from the early "synchrep" branch at 
http://git.postgresql.org/gitweb?p=users/fujii/postgres.git;a=summary , 
but sadly when I tried to rebase that against the master to separate out 
just the parts unique to it the merge conflicts were overwhelming.  I 
hate getting beaten by merge bitrot even when Git is helping.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Greg Stark wrote:
> On Sun, Feb 28, 2010 at 5:28 AM, Greg Smith <greg@2ndquadrant.com> wrote:
>   
>> The idea of the workaround is that if you have a single long-running query
>> to execute, and you want to make sure it doesn't get canceled because of a
>> vacuum cleanup, you just have it connect back to the master to keep an open
>> snapshot the whole time.
> Also, I'm not sure this actually works. When your client makes this
> additional connection to the master it's connecting at some
> transaction in the future from the slave's point of view. The master
> could have already vacuumed away some record which the snapshot the
> client gets on the slave will have in view.

Right, and there was an additional comment in the docs alluding to some 
sleep time on the master that intends to try and improve thins.  If you 
knew how long archive_timeout was you could try to sleep longer than it 
to try and increase your odds of avoiding an ugly spot.  But there are 
race conditions galore possible here, particularly if your archiver or 
standby catchup is backlogged.

> Still it's a handy practical trick even if it isn't 100% guaranteed to
> work. But I don't think it provides the basis for something we can
> bake in.
>   

Agreed on both counts, which is why it's in the current docs as a 
workaround people can consider, but not what I've been advocating as the 
right way to proceed.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Sun, Feb 28, 2010 at 6:07 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Not forced to--have the option of.  There are obviously workloads where you
> wouldn't want this.  At the same time, I think there are some pretty common
> ones people are going to expect HS+SR to work on transparently where this
> would obviously be the preferred trade-off to make, were it available as one
> of the options.  The test case I put together shows an intentionally
> pathological but not completely unrealistic example of such a workload.

Well if we're forced to eventually have both then it kind of takes the
wind out of Tom's arguments. We had better get both features working
so it becomes only a question of which is worth doing first and which
can be held off. Since there aren't any actual bugs in evidence for
the current setup and we already have it that's a pretty easy
decision.

> What I am sure of is that a SR-based xmin passing approach is simpler,
> easier to explain, more robust for some common workloads, and less likely to
> give surprised "wow, I didn't think *that* would cancel my standby query"
> reports from the field

Really? I think we get lots of suprised wows from the field from the
idea that a long-running read-only query can cause your database to
bloat. I think the only reason that's obvious to us is that we've been
grappling with that problem for so long.


> And since I never like to bet against Tom's gut feel, having it
> around as a "plan B" in case he's right about an overwhelming round of bug
> reports piling up against the max_standby_delay etc. logic doesn't hurt
> either.

Agreed. Though I think it'll be bad in that case even if we have a
plan B. It'll mean no file-based log shipping replicas and no
guarantee that what you run on the standby can't affect the master --
which is a pretty nice guarantee. It'll also mean it'll be much more
fragile against network interruptions.



--
greg


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Joachim Wieland
Дата:
On Sun, Feb 28, 2010 at 2:54 PM, Greg Stark <gsstark@mit.edu> wrote:
> Really? I think we get lots of suprised wows from the field from the
> idea that a long-running read-only query can cause your database to
> bloat. I think the only reason that's obvious to us is that we've been
> grappling with that problem for so long.

It seems to me that the scenario that you are looking at is one where
people run different queries with and without HS, i.e. that they will
run longer read-only queries than now once they have HS. I don't think
that is the case. If it isn't you cannot really speak of a master
"bloat".

Instead, I assume that most people who will grab 9.0 and use HS+SR do
already have a database with a certain query profile. Now with HS+SR
they will try to put the most costly and longest read-only queries to
the standby but in the end will run the same number of queries with
the same overall complexity.

Now let's take a look at both scenarios from the administrators' point of view:

1) With the current implementation they will see better performance on
the master and more aggressive vacuum (!), since they have less
long-running queries now on the master and autovacuum can kick in and
clean up with less delay than before. On the other hand their queries
on the standby might fail and they will start thinking that this HS+SR
feature is not as convincing as they thought it was... Next step for
them is to take the documentation and study it for a few days to learn
all about vacuum, different delays, transaction ids and age parameters
and experiment a few weeks until no more queries fail - for a while...
But they can never be sure... In the end they might also modify the
parameters in the wrong direction or overshoot because of lack of time
to experiment and lose another important property without noticing
(like being as close as possible to the master).

2) On the other hand if we could ship 9.0 with the xmin-propagation
feature, people would still see a better performance and have a hot
standby system but this time without query cancellations. Again: the
read-only queries that will be processed by the HS in the future are
being processed by the master today anyway, so why should it get
worse? The first impression will be that it just works nicely out of
the box, is easy to set up and has no negative effect (query
cancellation) that has not already shown up before (vacuum lag).

I guess that most people will just run fine with this setup and never
get to know about the internals. Of course we should still offer an
expert mode where you can turn all kinds of knobs and where you can
avoid the vacuum dependency but it would be nice if this could be the
expert mode only. Tuning this is highly installation specific and you
need to have a deep understanding of how PostgreSQL and HS work
internally and what you actually want to achieve...


> Agreed. Though I think it'll be bad in that case even if we have a
> plan B. It'll mean no file-based log shipping replicas and no
> guarantee that what you run on the standby can't affect the master --
> which is a pretty nice guarantee. It'll also mean it'll be much more
> fragile against network interruptions.

Regarding the network interruptions... in reality if you have network
interruptions of several minutes between your primary and your
standby, you have worse problems anyway... If the standby does not
renew its xmin for n seconds, log a message and just go on...


Joachim


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
All,

First, from the nature of the arguments, we need to eventually have both
versions of SR: delay-based and xmin-pub.  And it would be fantastic if
Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
ready as well.

I also think, based on the discussion and Greg's test case, that we
could do two things which would make the shortcomings of delay-based SR
a vastly better experience for users:

1) Automated retry of cancelled queries on the slave.  I have no idea
how hard this would be to implement, but it makes the difference between
writing lots of exception-handling code for slave connections
(unacceptable) to just slow response times on the slave (acceptable).

2) A more usable vacuum_defer_cleanup_age.  If it was feasible for a
user to configure the master to not vacuum records less than, say, 5
minutes dead, then that would again offer the choice to the user of
slightly degraded performance on the master (acceptable) vs. lots of
query cancel (unacceptable).  I'm going to test Greg's case with
vacuum_cleanup_age used fairly liberally to see if this approach has merit.

Why do I say that "lots of query cancel" is "unacceptable"?  For the
simple reason that one cannot run the same application code against an
HS+SR cluster with lots of query cancel as one runs against a standalone
database.  And if that's true, then the main advantage of HS+SR over
Slony and Londiste is gone.  MySQL took great pains to make sure that
you could run the same code against replicated MySQL as standalone, and
that was based on having a fairly intimate relationship with their users
(at the time, anyway).

Another thing to keep in mind in these discussions is the
inexpensiveness of servers today.  This means that, if slaves have poor
performance, that's OK; one can always spin up more slaves.  But if each
slave imposes a large burden on the master, then that limits your
scalability.

--Josh Berkus



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Joachim Wieland wrote:
> Instead, I assume that most people who will grab 9.0 and use HS+SR do
> already have a database with a certain query profile. Now with HS+SR
> they will try to put the most costly and longest read-only queries to
> the standby but in the end will run the same number of queries with
> the same overall complexity.
>   

This is a nice summary of the primary use-case I am trying to optimize 
usability for, because I know for a fact there's a stack of pent-up 
requests for exactly this form of improvement from existing warm standby 
users.  And your subsequent discussion of how administrators will react 
in each of the possible configurations here matches my own concerns.  I 
would highly recommend anyone who feels this is not a critical feature 
to fix carefully read Joachim's message from an advocacy perspective, 
that's a better user-oriented prediction than mine of exactly how this 
is going to play out in the field post-release.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Josh Berkus wrote:
> First, from the nature of the arguments, we need to eventually have both
> versions of SR: delay-based and xmin-pub.  And it would be fantastic if
> Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
> ready as well.
>   

As I see it, the main technical obstacle here is that a subset of a 
feature already on the SR roadmap needs to get built earlier than 
expected to pull this off.  I don't know about Tom, but I have no 
expectation it's possible for me to get up to speed on that code fast 
enough to contribute anything there.  I expect the thing I'd be most 
productive at as far as moving the release forward is to continue 
testing this pair of features looking for rough edges, which is what I 
have planned for the next month. 

I'm not even close to finished with generating test cases specifically 
probing for bad behavior suspected after a look the implementation 
details--this is just what I came up with in my first week of that.  
Count me in for more testing, but out for significant development here.  
It's not what I've got my time allocated for because it's not where I 
think I'll be most productive.

> 2) A more usable vacuum_defer_cleanup_age.  If it was feasible for a
> user to configure the master to not vacuum records less than, say, 5
> minutes dead, then that would again offer the choice to the user of
> slightly degraded performance on the master (acceptable) vs. lots of
> query cancel (unacceptable).  I'm going to test Greg's case with
> vacuum_cleanup_age used fairly liberally to see if this approach has merit.
>   

I've been down that road and it leads quickly to the following 
question:  "how can I tell how old in time-based units an xid is?"  If 
there were an easy answer to that question, vacuum_defer_cleanup_age 
would already be set in time units.  It's the obvious UI to want, it's 
just not obvious how to build it internally.  Maybe I missed something, 
but my guess is that vacuum_defer_cleanup_age is already as good as it's 
going to get.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Dimitri Fontaine
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> 2) A more usable vacuum_defer_cleanup_age.  If it was feasible for a
> user to configure the master to not vacuum records less than, say, 5
> minutes dead, then that would again offer the choice to the user of
> slightly degraded performance on the master (acceptable) vs. lots of
> query cancel (unacceptable).  I'm going to test Greg's case with
> vacuum_cleanup_age used fairly liberally to see if this approach has
> merit.

I think that to associate any time based interval notion with the XID
flow, you need a ticker. We already took the txid and txid_snapshot
types and functions from Skytools, which took them from Slony.

Maybe we could consider borrowing pgqd, the C version of the ticker, for
being able to specify in human time how long a dead transaction is
allowed to remain in the heap?
 http://github.com/markokr/skytools-dev/tree/master/sql/ticker/

Regards,
-- 
dim


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Joachim Wieland
Дата:
On Sun, Feb 28, 2010 at 8:47 PM, Josh Berkus <josh@agliodbs.com> wrote:
> 1) Automated retry of cancelled queries on the slave.  I have no idea
> how hard this would be to implement, but it makes the difference between
> writing lots of exception-handling code for slave connections
> (unacceptable) to just slow response times on the slave (acceptable).

We're not only canceling queries, we are effectively canceling
transactions. It seems quite impossible to repeat all queries from a
transaction that has started in the past. One query might be or
include the result of a previous query and as the data we see now has
changed since then, the client might now want to execute a different
query when it gets a different result out of a previous query...

And even if it was possible, how often would you retry? You still have
no guarantee that your query succeeds the second time. I'd claim that
if a query failed once, chances are even higher that it fails again
than that it succeeds the second time. Moreover if you continue to
repeat the query and if queries come in at a certain rate, you need to
process more and more queries on the slave which will not really help
other queries to finish in time nor will it be beneficial for the
throughput of the system as a whole...


I fully agree with what you say about user expectations: We need to
assume that many programs are not prepared for failures of "simple"
read-only queries because in the past they have always worked...


> Another thing to keep in mind in these discussions is the
> inexpensiveness of servers today.  This means that, if slaves have poor
> performance, that's OK; one can always spin up more slaves.  But if each
> slave imposes a large burden on the master, then that limits your
> scalability.

The burden of the xmin-publication feature is not the number of
slaves, it's just the longest running queries on whatever slave they
are. So your argument applies to both cases... To minimize the burden
on the master, get additional slaves so that you can run your most
expensive queries in a shorter time :-)


Joachim


Re: Hot Standby query cancellation and Streaming Replication integration

От
Simon Riggs
Дата:
On Fri, 2010-02-26 at 16:44 -0500, Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> There's *definitely* not going to be enough information in the WAL
> >> stream coming from a master that doesn't think it has HS slaves.
> >> We can't afford to record all that extra stuff in installations for
> >> which it's just useless overhead. BTW, has anyone made any attempt
> >> to measure the performance hit that the patch in its current form is
> >> creating via added WAL entries?
> 
> > What extra entries?
> 
> Locks, just for starters.  I haven't read enough of the code yet to know
> what else Simon added.  In the past it's not been necessary to record
> any transient information in WAL, but now we'll have to.

There is room for technical confusion here, so I'll just add some info.

There was/is potential for performance hit because of the volume of
additional WAL *and* the processing overhead from that additional WAL.
As Heikki points out these turn out to be minimal, though this has been
by careful design.

There is also potential for a performance hit because incoming cleanup
records may conflict with currently executing queries. If we knew for
certain that the master was not sending any cleanup records that would
effect current standby queries we could avoid that overhead altogether.
That's a good reason for integrating a solution.

AccessExclusiveLock lock records are nothing at all to do with that.
They exist simply to prevent obvious correctness issues such as somebody
reading a file while it is being deleted.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby query cancellation and Streaming Replication integration

От
Simon Riggs
Дата:
On Fri, 2010-02-26 at 03:33 -0500, Greg Smith wrote:

> I really hope this discussion can say focused on if and how it's 
> possible to improve this area, with the goal being to deliver a product 
> everyone can be proud of with the full feature set that makes this next 
> release a killer one. The features that have managed to all get into 
> this release already are fantastic, everyone who contributed should be 
> proud of that progress, and it's encouraging that the alpha4 date was 
> nailed.  It would be easy to descend into finger-pointing for why 
> exactly this particular problem is only getting more visibility now, or 
> into schedule-oriented commentary suggesting it must be ignored because 
> it's too late to do anything about it.  I hope everyone appreciates 
> wandering that way will not help make PostgreSQL 9.0 a better release.  
> This issue is so easy to encounter, and looks so bad when it happens, 
> that I feel it could easily lead to an embarrassing situation for the 
> community if something isn't done about it before release.

Thanks Greg. It's a great relief for me to hear someone else say this
and to watch a discussion about this important issue unfold.

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
Greg, Joachim,

> As I see it, the main technical obstacle here is that a subset of a
> feature already on the SR roadmap needs to get built earlier than
> expected to pull this off.  I don't know about Tom, but I have no
> expectation it's possible for me to get up to speed on that code fast
> enough to contribute anything there.  I expect the thing I'd be most
> productive at as far as moving the release forward is to continue
> testing this pair of features looking for rough edges, which is what I
> have planned for the next month.

That's OK with me.  I thought you were saying that xmin-pub was going to
be easier than expected.  Per my other e-mails, I think that we should
be shooting for "good enough, on time" for 9.0., rather than "perfect".We can't ever get to "perfect" if we don't
releasesoftware.
 

Quite frankly, simply telling people that "long-running queries on the
slave tend not to be effective, wait for 9.1" is a possibility.  If you
consider the limitations and silent failures associated with MySQL
replication, let alone the issues with other Postgres solutions or the
replication of some of the nosql databases, "no long-running queries" is
a positively straightforwards restriction.

HS+SR is still a tremendous improvement over the options available
previously.  We never thought it was going to work for everyone
everywhere, and shouldn't let our project's OCD tendencies run away from us.

> I've been down that road and it leads quickly to the following
> question:  "how can I tell how old in time-based units an xid is?"  If
> there were an easy answer to that question, vacuum_defer_cleanup_age
> would already be set in time units.  It's the obvious UI to want, it's
> just not obvious how to build it internally.  Maybe I missed something,
> but my guess is that vacuum_defer_cleanup_age is already as good as it's
> going to get.

Well, we could throw this on the user if we could get them some
information on how to calculate that number.  For example, some way for
them to calculate the number of XIDs per minute via a query, and then
set vacuum_defer_cleanup_age appropriately on the master.  Sure, it's
clunky, but we've already warned people that 9.0 will be clunky and hard
to administer.  And it's no worse than setting FSM_pages used to be.

However, first we need to test that setting vacuum_defer_cleanup_age
actually benefits query cancel issues.

> We're not only canceling queries, we are effectively canceling
> transactions. It seems quite impossible to repeat all queries from a
> transaction that has started in the past. One query might be or
> include the result of a previous query and as the data we see now has
> changed since then, the client might now want to execute a different
> query when it gets a different result out of a previous query...

Sure, except that I don't expect people to be using explicit
transactions as much on the slaves, since they are read-only anyway and
can't even create temp tables. So having the retry not retry if there is
an explicit transaction would be an OK option.

> And even if it was possible, how often would you retry? You still have
> no guarantee that your query succeeds the second time. I'd claim that
> if a query failed once, chances are even higher that it fails again
> than that it succeeds the second time. Moreover if you continue to
> repeat the query and if queries come in at a certain rate, you need to
> process more and more queries on the slave which will not really help
> other queries to finish in time nor will it be beneficial for the
> throughput of the system as a whole...

Well, we'd need to have a limited number of retries, which means a GUC
in recovery.conf:

query_cancel_retry = #

This might default to, say, 2.

However, I'd still like to hear from someone with the requisite
technical knowledge whether capturing and retrying the current query in
a query cancel is even possible.

--Josh Berkus


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Josh Berkus wrote:
> Well, we could throw this on the user if we could get them some
> information on how to calculate that number.  For example, some way for
> them to calculate the number of XIDs per minute via a query, and then
> set vacuum_defer_cleanup_age appropriately on the master.  Sure, it's
> clunky, but we've already warned people that 9.0 will be clunky and hard
> to administer.  And it's no worse than setting FSM_pages used to be.
>
> However, first we need to test that setting vacuum_defer_cleanup_age
> actually benefits query cancel issues.
>   

Proving that setting works as expected is already on my test case grid, 
seems fine in my limited testing so far.  I've started looking into ways 
to monitor XID churn in a way for setting it better.  I'll take care of 
providing all that in my next test case update.  My intent here is to 
take the ideas outlined in my "Hot Standby Tradeoffs" blog post and turn 
that into a new documentation section making it more clear where the 
problem steps are, regardless of what else happens here.  And I need 
some concrete example of XID burn rate measurement to finish that job.

The main problem with setting vacuum_defer_cleanup_age high isn't 
showing it works, it's a pretty simple bit of code.  It's when you 
recognize that it penalizes all cleanup all the time, whether or not the 
standby is actually executing a long-running query or not, that you note 
the second level of pain in increasing it.  Returning to the idea of 
"how is this different from a site already in production?", it may very 
well be the case that a site that sets vacuum_defer_cleanup_age high 
enough to support off-peak batch reporting cannot tolerate how that will 
impact vacuums during their peak time of day.  The XID export 
implementation sidesteps that issue by only making the vacuum delay 
increase when queries that require it are running, turning this back 
into a standard "what's the best time of day to run my big reports?" 
issue that people understand how to cope with already.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Robert Haas
Дата:
On Sun, Feb 28, 2010 at 5:38 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Greg, Joachim,
>
>> As I see it, the main technical obstacle here is that a subset of a
>> feature already on the SR roadmap needs to get built earlier than
>> expected to pull this off.  I don't know about Tom, but I have no
>> expectation it's possible for me to get up to speed on that code fast
>> enough to contribute anything there.  I expect the thing I'd be most
>> productive at as far as moving the release forward is to continue
>> testing this pair of features looking for rough edges, which is what I
>> have planned for the next month.
>
> That's OK with me.  I thought you were saying that xmin-pub was going to
> be easier than expected.  Per my other e-mails, I think that we should
> be shooting for "good enough, on time" for 9.0., rather than "perfect".
>  We can't ever get to "perfect" if we don't release software.

I agree.  It seems to me that the right long term fix for the problem
of query cancellations on the slave is going to be to give the slave
the ability to save multiple versions of relation pages where
necessary so that older snapshots can continue to be used even after
the conflicting WAL has been applied.  However, I'm pretty sure that's
going to be a very difficult project which is unlikely to be coded by
anyone any time soon, let alone merged.  Until it does, we're going to
force people to pick from a fairly unappealing menu of options:
postpone WAL replay for long periods of time, cancel queries (perhaps
even seemingly unrelated to what changed on the master), bloat the
master.  All of those options are seriously unpleasant.

I think, though, that we have to think of this as being like the
Windows port, or maybe even more significant than that, as an
architectural change.  I think it is going to take several releases
for this feature to be well-understood and stable and have all the
options we'd like it to have.  It wouldn't surprise me if we get to
10.0 before we really have truly seamless replication.  I don't expect
Slony or Londiste or any of the other solutions that are out there now
to get kicked to the curb by PG 9.0.  Still, a journey of a thousand
miles begins with the first step.  Simon and many others have put a
great deal of time and energy into getting us to the point where we
are now, and if we let the fact that we haven't reached our ultimate
goal keep us from putting what we have out there in front of our
customers, I think we're going to regret that.

I think the thing to do is to reposition our PR around these features.We should maybe even go so far as to call them
"beta"or 
"experimental".  We shouldn't tell people - this is going to be
totally awesome.  We should tell people - this is a big improvement,
and it's still got some pretty significant limitations, but it's good
stuff and it's going in a good direction.  Overhyping what we have
today is not going to be good for the project, and I'm frankly quite
afraid that nothing we can possibly code between now and the release
is going to measure up to what people are hoping for.  We need to set
our own expectations, and those of our customers, at a level at which
they can be met.

> Quite frankly, simply telling people that "long-running queries on the
> slave tend not to be effective, wait for 9.1" is a possibility.

Yep.

> HS+SR is still a tremendous improvement over the options available
> previously.  We never thought it was going to work for everyone
> everywhere, and shouldn't let our project's OCD tendencies run away from us.

Yep.

> However, I'd still like to hear from someone with the requisite
> technical knowledge whether capturing and retrying the current query in
> a query cancel is even possible.

I'm not sure who you want to hear from here, but I think that's a dead end.

...Robert


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
On 2/28/10 7:00 PM, Greg Smith wrote:
> The main problem with setting vacuum_defer_cleanup_age high isn't
> showing it works, it's a pretty simple bit of code.  It's when you
> recognize that it penalizes all cleanup all the time, whether or not the
> standby is actually executing a long-running query or not, that you note
> the second level of pain in increasing it.  Returning to the idea of
> "how is this different from a site already in production?", it may very
> well be the case that a site that sets vacuum_defer_cleanup_age high
> enough to support off-peak batch reporting cannot tolerate how that will
> impact vacuums during their peak time of day.  The XID export
> implementation sidesteps that issue by only making the vacuum delay
> increase when queries that require it are running, turning this back
> into a standard "what's the best time of day to run my big reports?"
> issue that people understand how to cope with already.

I don't think that defer_cleanup_age is a long-term solution.  But we
need *a* solution which does not involve delaying 9.0.

And I think we can measure bloat in a pgbench test, no?  When I get a
chance, I'll run one for a couple hours and see the difference that
cleanup_age makes.

--Josh Berkus


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Josh Berkus wrote:
> And I think we can measure bloat in a pgbench test, no?  When I get a
> chance, I'll run one for a couple hours and see the difference that
> cleanup_age makes.
>   

The test case I attached at the start of this thread runs just the 
UPDATE to the tellers table.  Running something similar that focuses 
just on UPDATEs to the pgbench_accounts table, without the rest of the 
steps done by the standard test, is the fastest route to bloat.  The 
standard test will do it too, just does a lot of extra stuff too that 
doesn't impact results (SELECT, INSERT) so it wastes some resources 
compared to a targeted bloater script.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
> I don't think that defer_cleanup_age is a long-term solution.  But we
> need *a* solution which does not involve delaying 9.0.

So I think the primary solution currently is to raise max_standby_age.

However there is a concern with max_standby_age. If you set it to,
say, 300s. Then run a 300s query on the slave which causes the slave
to fall 299s behind. Now you start a new query on the slave -- it gets
a snapshot based on the point in time that the slave is currently at.
If it hits a conflict it will only have 1s to finish before the
conflict causes the query to be cancelled.

In short in the current setup I think there is no safe value of
max_standby_age which will prevent query cancellations short of -1. If
the slave has a constant stream of queries and always has at least one
concurrent query running then it's possible that the slave will run
continuously max_standby_age-epsilon behind the master and cancel
queries left and right, regardless of how large max_standby_age is.

To resolve this I think you would have to introduce some chance for
the slave to catch up. Something like refusing to use a snapshot older
than max_standby_age/2  and instead wait until the existing queries
finish and the slave gets a chance to catch up and see a more recent
snapshot. The problem is that this would result in very unpredictable
and variable response times from the slave. A single long-lived query
could cause replay to pause for a big chunk of max_standby_age and
prevent any new query from starting.

Does anyone see any way to guarantee that the slave gets a chance to
replay and new snapshots will become visible without freezing out new
queries for extended periods of time?

--
greg


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
> So I think the primary solution currently is to raise max_standby_age.
> 
> However there is a concern with max_standby_age. If you set it to,
> say, 300s. Then run a 300s query on the slave which causes the slave
> to fall 299s behind. Now you start a new query on the slave -- it gets
> a snapshot based on the point in time that the slave is currently at.
> If it hits a conflict it will only have 1s to finish before the
> conflict causes the query to be cancelled.

Completely aside from that, how many users are going to be happy with a
slave server which is constantly 5 minutes behind?

--Josh Berkus


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Completely aside from that, how many users are going to be happy with a
> slave server which is constantly 5 minutes behind?
>

Uhm, well all the ones who are happy with our current warm standby
setup for one?

And all the ones who are looking for a standby reporting server rather
than a high availability DR site.

For what it's worth Oracle has an option to have your standby
intentionally hold back n minutes behind and I've seen that set to 5
minutes.

-- 
greg


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Stefan Kaltenbrunner
Дата:
Greg Stark wrote:
> On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Completely aside from that, how many users are going to be happy with a
>> slave server which is constantly 5 minutes behind?
>>
> 
> Uhm, well all the ones who are happy with our current warm standby
> setup for one?
> 
> And all the ones who are looking for a standby reporting server rather
> than a high availability DR site.
> 
> For what it's worth Oracle has an option to have your standby
> intentionally hold back n minutes behind and I've seen that set to 5
> minutes.

yeah a lot of people are doing that intentionally...


Stefan


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Tom Lane
Дата:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Greg Stark wrote:
>> For what it's worth Oracle has an option to have your standby
>> intentionally hold back n minutes behind and I've seen that set to 5
>> minutes.

> yeah a lot of people are doing that intentionally...

It's the old DBA screwup safety valve ... drop the main accounts table,
you have five minutes to stop replication before it's dropped on the
standby.  Speaking of which, does the current HS+SR code have a
provision to force the standby to stop tracking WAL and come up live,
even when there's more WAL available?  Because that's what you'd need
in order for such a thing to be helpful in that scenario.
        regards, tom lane


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
On 3/1/10 11:43 AM, Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> Greg Stark wrote:
>>> For what it's worth Oracle has an option to have your standby
>>> intentionally hold back n minutes behind and I've seen that set to 5
>>> minutes.
> 
>> yeah a lot of people are doing that intentionally...
> 
> It's the old DBA screwup safety valve ... drop the main accounts table,
> you have five minutes to stop replication before it's dropped on the
> standby.  Speaking of which, does the current HS+SR code have a
> provision to force the standby to stop tracking WAL and come up live,
> even when there's more WAL available?  Because that's what you'd need
> in order for such a thing to be helpful in that scenario.

the "fast" recovery option should do this.  You'd need some fast
reaction times, though.

However, this leaves aside Greg's point about snapshot age and
successive queries; does anyone dispute his analysis?  Simon?

--Josh Berkus


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
On 2/28/10 7:12 PM, Robert Haas wrote:
>> However, I'd still like to hear from someone with the requisite
>> > technical knowledge whether capturing and retrying the current query in
>> > a query cancel is even possible.
> 
> I'm not sure who you want to hear from here, but I think that's a dead end.

"dead end" as in "too hard to implement"?  Or for some other reason?

It's undeniable that auto-retry would be better from a user's
perspective than a user-visible cancel.  So if it's *reasonable* to
implement, I think we should be working on it.  I'm also very puzzled as
to why nobody else wants to even discuss it; it's like some wierd blackout.

--Josh Berkus


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
"Kevin Grittner"
Дата:
Josh Berkus <josh@agliodbs.com> wrote:
> It's undeniable that auto-retry would be better from a user's
> perspective than a user-visible cancel.  So if it's *reasonable*
> to implement, I think we should be working on it.  I'm also very
> puzzled as to why nobody else wants to even discuss it; it's like
> some wierd blackout.
Well, at least for serializable transactions past the first
statement, you'd need to have the complete *logic* for the
transaction in order to do a retry.  Not that this is a bad idea --
our application framework does this automatically -- but unless you
only support this for a transaction which is wrapped up as a
function, I don't see how the database itself could handle it.  It
might be *possible* to do it outside of a single-function
transaction in a read committed transaction, but you'd have to be
careful about locks.  I remember suggesting automatic query retry
(rather than continuing in a mixed-snapshot mode) for update
conflicts in read committed mode and Tom had objections; you might
want to check the archives for that.
-Kevin


Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
<p>josh, nobody is talking about it because it doesn't make sense. you could only retry if it was the first query in
thetransaction and only if you could prove there were no side-effects outside the database and then you would have no
reasonto think the retry would be any more likely to work. <p>greg<p><blockquote type="cite">On 1 Mar 2010 22:32, "Josh
Berkus"<<a href="mailto:josh@agliodbs.com">josh@agliodbs.com</a>> wrote:<br /><br /><p><font color="#500050">On
2/28/107:12 PM, Robert Haas wrote:<br />>> However, I'd still like to hear from someone with the requ...</font>
"deadend" as in "too hard to implement"?  Or for some other reason?<br /><br /> It's undeniable that auto-retry would
bebetter from a user's<br /> perspective than a user-visible cancel.  So if it's *reasonable* to<br /> implement, I
thinkwe should be working on it.  I'm also very puzzled as<br /> to why nobody else wants to even discuss it; it's like
somewierd blackout.<br /><font color="#888888"><br /> --Josh Berkus<br /></font><p><font color="#500050"><br />-- <br
/>Sentvia pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/>To make changes to your
subs...</font></blockquote>

Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> josh, nobody is talking about it because it doesn't make sense. you could
> only retry if it was the first query in the transaction and only if you
> could prove there were no side-effects outside the database and then you
> would have no reason to think the retry would be any more likely to work.

But it's hot standby, so there are no data-modifying transactions.
Volatile functions could be a problem, though.  A bigger problem is
we might have already shipped partial query results to the client.

I agree it ain't easy, but it might not be completely out of the
question.  Definitely won't be happening for 9.0 though.
        regards, tom lane


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Josh Berkus wrote:
> However, this leaves aside Greg's point about snapshot age and
> successive queries; does anyone dispute his analysis?  Simon?
>   

There's already a note on the Hot Standby TODO about unexpectly bad 
max_standby_delay behavior being possible on an idle system, with no 
suggested resolution for it besides better SR integration.  The issue 
Greg Stark has noted is another variation on that theme.  It's already 
on my list of theorized pathological but as yet undemonstrated concerns 
that Simon and I identified, the one I'm working through creating a test 
cases to prove/disprove.  I'm past "it's possible..." talks at this 
point though as not to spook anyone unnecessarily, and am only raising 
things I can show concrete examples of in action.  White box testing at 
some point does require pausing one's investigation of what's in the box 
and getting on with the actual testing instead.

The only real spot where my opinion diverges here that I have yet to 
find any situation where 'max_standby_delay=-1' makes any sense to me.  
When I try running my test cases with that setting, the whole system 
just reacts far too strangely.  My first patch here is probably going to 
be adding more visibility into the situation when queries are blocking 
replication forever, because I think the times I find myself at "why is 
the system hung right now?" are when that happens and it's not obvious 
as an admin what's going on.

Also, the idea that a long running query on the standby could cause an 
unbounded delay in replication is so foreign to my sensibilities that I 
don't ever include it in the list of useful solutions to the problems 
I'm worried about.  The option is there, not disputing that it makes 
sense for some people because there seems some demand for it, just can't 
see how it fits into any of the use-cases I'm concerned about.

I haven't said anything about query retry mainly because I can't imagine 
any way it's possible to build it in time for this release, so whether 
it's eventually feasible or not doesn't enter into what I'm worried 
about right now.  In any case, I would prioritize that behind work on 
preventing the most common situations that cause cancellations in the 
first place, until those are handled so well that retry is the most 
effective improvement left to consider.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Robert Haas
Дата:
On Mon, Mar 1, 2010 at 5:32 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 2/28/10 7:12 PM, Robert Haas wrote:
>>> However, I'd still like to hear from someone with the requisite
>>> > technical knowledge whether capturing and retrying the current query in
>>> > a query cancel is even possible.
>>
>> I'm not sure who you want to hear from here, but I think that's a dead end.
>
> "dead end" as in "too hard to implement"?  Or for some other reason?

I think it's probably too hard to implement for the extremely limited
set of circumstances in which it can work.  See the other responses
for some of the problems.  There are others, too.  Suppose that the
plan for some particular query is to read a table with a hundred
million records, sort it, and then do whatever with the results.
After reading the first 99 million records, the transaction is
cancelled and we have to start over.  Maybe someone will say, fine, no
problem - but it's certainly going to be user-visible.  Especially if
we retry more than once.

I think we should focus our efforts initially on reducing the
frequency of spurious cancels.  What we're essentially trying to do
here is refute the proposition "the WAL record I just replayed might
change the result of this query".  It's possibly equivalent to the
halting problem (and certainly impossibly hard) to refute this
proposition in every case where it is in fact false, but it sounds
like what we have in place right now doesn't come close to doing as
well as can be done.

I just read through the current documentation and it doesn't really
seem to explain very much about how HS decides which queries to kill.
Can someone try to flesh that out a bit?  It also uses the term
"buffer cleanup lock", which doesn't seem to be used anywhere else in
the documentation (though it does appear in the source tree, including
README.HOT).

...Robert


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Aidan Van Dyk
Дата:
* Tom Lane <tgl@sss.pgh.pa.us> [100301 20:04]:
> Greg Stark <stark@mit.edu> writes:
> > josh, nobody is talking about it because it doesn't make sense. you could
> > only retry if it was the first query in the transaction and only if you
> > could prove there were no side-effects outside the database and then you
> > would have no reason to think the retry would be any more likely to work.
> 
> But it's hot standby, so there are no data-modifying transactions.
> Volatile functions could be a problem, though.  A bigger problem is
> we might have already shipped partial query results to the client.

But, since we know its a slave and that the reason the query was
cancelled was because it's got a backlog of updates to apply, it's very
likely that the data that the earlier parts of the transaction would be
different...

And then you have no idea if just blindly replaying all statements of
the transaction successively is a good idea...


a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Joachim Wieland wrote:
> 1) With the current implementation they will see better performance on
> the master and more aggressive vacuum (!), since they have less
> long-running queries now on the master and autovacuum can kick in and
> clean up with less delay than before. On the other hand their queries
> on the standby might fail and they will start thinking that this HS+SR
> feature is not as convincing as they thought it was... Next step for
> them is to take the documentation and study it for a few days to learn
> all about vacuum, different delays, transaction ids and age parameters
> and experiment a few weeks until no more queries fail - for a while...
> But they can never be sure... In the end they might also modify the
> parameters in the wrong direction or overshoot because of lack of time
> to experiment and lose another important property without noticing
> (like being as close as possible to the master).

I assumed they would set max_standby_delay = -1 and be happy.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Josh Berkus wrote:
> HS+SR is still a tremendous improvement over the options available
> previously.  We never thought it was going to work for everyone
> everywhere, and shouldn't let our project's OCD tendencies run away from us.

OCD (Obsessive-Compulsive Disorder) --- good one.  :-)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Bruce Momjian wrote: <blockquote cite="mid:201003020454.o224s4601113@momjian.us" type="cite"><pre wrap="">Joachim
Wielandwrote: </pre><blockquote type="cite"><pre wrap="">1) With the current implementation they will see better
performanceon
 
the master and more aggressive vacuum (!), since they have less
long-running queries now on the master and autovacuum can kick in and
clean up with less delay than before. On the other hand their queries
on the standby might fail and they will start thinking that this HS+SR
feature is not as convincing as they thought it was...   </pre></blockquote><pre wrap="">
I assumed they would set max_standby_delay = -1 and be happy. </pre></blockquote><br /> The admin in this situation
mightbe happy until the first time the primary fails and a failover is forced, at which point there is an unbounded
amountof recovery data to apply that was stuck waiting behind whatever long-running queries were active.  I don't know
ifyou've ever watched what happens to a pre-8.2 cold standby when you start it up with hundreds or thousands of backed
upWAL files to process before the server can start, but it's not a fast process.  I watched a production 8.1 standby
get>4000 files behind once due to an archive_command bug, and it's not something I'd like to ever chew my nails off
toagain.  If your goal was HA and you're trying to bring up the standby, the server is down the whole time that's going
on.<br/><br /> This is why no admin who prioritizes HA would consider 'max_standby_delay = -1' a reasonable setting,
andthose are the sort of users Joachim's example was discussing.  Only takes one rogue query that runs for a long time
tomake the standby so far behind it's useless for HA purposes.  And you also have to ask yourself "if recovery is
haltedwhile waiting for this query to run, how stale is the data on the standby getting?".  That's true for any large
settingfor this parameter, but using -1 for the unlimited setting also gives the maximum possible potential for such
staleness.<br/><br /> 'max_standby_delay = -1' is really only a reasonable idea if you are absolutely certain all
queriesare going to be short, which we can't dismiss as an unfounded use case so it has value.  I would expect you have
toalso combine it with a matching reasonable statement_timeout to enforce that expectation to make that situation
safer.<br/><br /> In any of the "offload batch queries to the failover standby" situations, it's unlikely an unlimited
valuefor this setting will be practical.  Perhaps you set max_standby_delay to some number of hours, to match your
expectedworst-case query run time and reduce the chance of cancellation.  Not putting a limit on it at all is a
situationno DBA with healthy paranoia is going to be happy with the potential downside of in a HA environment, given
thatboth unbounded staleness and recovery time are then both possible.  The potential of a failed long-running query is
muchless risky than either of those.<br /><br /><pre class="moz-signature" cols="72">-- 
 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
<a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>   <a
class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a>
 
</pre>

Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Robert Haas wrote:
> I just read through the current documentation and it doesn't really
> seem to explain very much about how HS decides which queries to kill.
> Can someone try to flesh that out a bit?

I believe it just launches on a mass killing spree once things like 
max_standby_delay expire.  This I want to confirm via testing (can 
simulate with a mix of long and short running pgbench queries) and then 
intend to update the docs to clarify.

> It also uses the term
> "buffer cleanup lock", which doesn't seem to be used anywhere else in
> the documentation (though it does appear in the source tree, including
> README.HOT).
>   

This loose end was already noted in my last docs update.  I wrote an 
initial description, but Bruce and I decided to leave out until 
something more thorough could be put together.  This is also on my docs 
cleanup list, will get to it somewhere along the beta timeline.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Greg Smith wrote:
> > I assumed they would set max_standby_delay = -1 and be happy.
> >   
> 
> The admin in this situation might be happy until the first time the 
> primary fails and a failover is forced, at which point there is an 
> unbounded amount of recovery data to apply that was stuck waiting behind 
> whatever long-running queries were active.  I don't know if you've ever 
> watched what happens to a pre-8.2 cold standby when you start it up with 
> hundreds or thousands of backed up WAL files to process before the 
> server can start, but it's not a fast process.  I watched a production 
> 8.1 standby get >4000 files behind once due to an archive_command bug, 
> and it's not something I'd like to ever chew my nails off to again.  If 
> your goal was HA and you're trying to bring up the standby, the server 
> is down the whole time that's going on.
> 
> This is why no admin who prioritizes HA would consider 
> 'max_standby_delay = -1' a reasonable setting, and those are the sort of 
> users Joachim's example was discussing.  Only takes one rogue query that 
> runs for a long time to make the standby so far behind it's useless for 
> HA purposes.  And you also have to ask yourself "if recovery is halted 
> while waiting for this query to run, how stale is the data on the 
> standby getting?".  That's true for any large setting for this 
> parameter, but using -1 for the unlimited setting also gives the maximum 
> possible potential for such staleness.
> 
> 'max_standby_delay = -1' is really only a reasonable idea if you are 
> absolutely certain all queries are going to be short, which we can't 
> dismiss as an unfounded use case so it has value.  I would expect you 
> have to also combine it with a matching reasonable statement_timeout to 
> enforce that expectation to make that situation safer.

Well, as you stated in your blog, you are going to have one of these
downsides:
o  master bloato  delayed recoveryo  cancelled queries

Right now you can't choose "master bloat", but you can choose the other
two.  I think that is acceptable for 9.0, assuming the other two don't
have the problems that Tom foresees.

Our documentation should probably just come how and state that clearly.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Bruce Momjian wrote:
> > 'max_standby_delay = -1' is really only a reasonable idea if you are 
> > absolutely certain all queries are going to be short, which we can't 
> > dismiss as an unfounded use case so it has value.  I would expect you 
> > have to also combine it with a matching reasonable statement_timeout to 
> > enforce that expectation to make that situation safer.
> 
> Well, as you stated in your blog, you are going to have one of these
> downsides:
> 
>     o  master bloat
>     o  delayed recovery
>     o  cancelled queries
> 
> Right now you can't choose "master bloat", but you can choose the other
> two.  I think that is acceptable for 9.0, assuming the other two don't
> have the problems that Tom foresees.

I was wrong.  You can choose "master bloat" with
vacuum_defer_cleanup_age, but only crudely because it is measured in
xids and the master defers no matter what queries are running on the
slave, and there is still the possibility for query cancel for long
queries.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
On 3/2/10 10:30 AM, Bruce Momjian wrote:
> Right now you can't choose "master bloat", but you can choose the other
> two.  I think that is acceptable for 9.0, assuming the other two don't
> have the problems that Tom foresees.

Actually, if vacuum_defer_cleanup_age can be used, "master bloat" is an
option.  Hopefully I'll get some time for serious testing this weekend.

--Josh Berkus


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Smith
Дата:
Bruce Momjian wrote:<br /><blockquote cite="mid:201003021834.o22IYX529089@momjian.us" type="cite"><blockquote
type="cite"><prewrap="">Right now you can't choose "master bloat", but you can choose the other
 
two.  I think that is acceptable for 9.0, assuming the other two don't
have the problems that Tom foresees.   </pre></blockquote><pre wrap="">
I was wrong.  You can choose "master bloat" with
vacuum_defer_cleanup_age, but only crudely because it is measured in
xids and the master defers no matter what queries are running on the
slave...</pre></blockquote><br /> OK with you finding the situation acceptable, so long as it's an informed decision. 
Fromhow you're writing about this, I'm comfortable you (and everybody else still involved here) have absorbed the
issuesenough that we're all talking about the same thing now.  Since there are a couple of ugly user-space hacks
possiblefor prioritizing "master bloat", and nobody is stepping up to work on resolving this via my suggestion
involvingbetter SR integration, seems to me heated discussion of code changes has come to a resolution of sorts I (and
Simon,just checked) can live with.  Sounds like we have three action paths here:<br /><br /> -Tom already said he was
planninga tour through the HS/SR code, I wanted that to happen with him aware of this issue.<br /> -Josh will continue
doinghis testing, also better informed about this particular soft spot.<br /> -I'll continue test-case construction for
theproblems here there are still concerns about (pathologic max_standby_delay and b-tree split issues being the top two
onthat list), and keep sharing particularly interesting ones here to help everyone else's testing.  <br /><br /> If it
turnsout any of those paths leads to a must-fix problem that doesn't have an acceptable solution, at least the idea of
thisas a "plan B" is both documented and more widely understood then when I started ringing this particular bell.<br
/><br/> I just updated the Open Items list:  <a class="moz-txt-link-freetext"
href="http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items">http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items</a>
toofficially put myself on the hook for the following HS related documentation items that have come up recently, aiming
toget them all wrapped up in time before or during early beta:<br /><br /> -Update Hot Standby documentation: clearly
explainrelationships between the 3 major setup trade-offs, "buffer cleanup lock", notes on which queries are killed
oncemax_standby_delay is reached, measuring XID churn on master for setting vacuum_defer_cleanup_age<br /> -Clean up
archive_commanddocs related to recent "/bin/true" addition.  Given that's where I expect people who run into the
pg_stop_backupwarning message recently added will end up at, noting its value for escaping from that particular case
mightbe useful too.<br /><br /> To finish airing my personal 9.0 TODO list now that I've gone this far, I'm also still
workingon completing the following patches that initial versions have been submitted of, was close to finishing both
beforegetting side-tracked onto this larger issue:<br /><br /> -pgbench > 4000 scale bug fix:  <a
class="moz-txt-link-freetext"
href="http://archives.postgresql.org/message-id/4B621BA3.7090306@2ndquadrant.com">http://archives.postgresql.org/message-id/4B621BA3.7090306@2ndquadrant.com</a><br
/>-Improving the logging/error reporting/no timestamp issues in pg_standby re-raised recently by Selena:  <a
class="moz-txt-link-freetext"
href="http://archives.postgresql.org/message-id/2b5e566d1001250945oae17be8n6317f827e3bd7492@mail.gmail.com">http://archives.postgresql.org/message-id/2b5e566d1001250945oae17be8n6317f827e3bd7492@mail.gmail.com</a><br
/><br/> If nobody else claims them as something they're working on before, I suspect I'll then move onto building some
ofthe archiver UI improvements discussed most recently as part of the "pg_stop_backup does not complete" thread,
despiteHeikki having crushed my dreams of a simple solution to those by pointing out the shared memory memory
limitationinvolved.<br /><br /><pre class="moz-signature" cols="72">-- 
 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
<a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>   <a
class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a>
 

</pre>

Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Simon Riggs
Дата:
On Mon, 2010-03-01 at 12:04 -0800, Josh Berkus wrote:

> does anyone dispute his analysis?  Simon?

No dispute. I think I've discussed this before.

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Simon Riggs
Дата:
On Mon, 2010-03-01 at 14:43 -0500, Tom Lane wrote:
> Speaking of which, does the current HS+SR code have a
> provision to force the standby to stop tracking WAL and come up live,
> even when there's more WAL available?

Yes, trigger file.

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Simon Riggs
Дата:
On Sun, 2010-02-28 at 16:56 +0100, Joachim Wieland wrote:

> Now let's take a look at both scenarios from the administrators' point
> of view:

Well argued, agree with all of your points.

-- Simon Riggs           www.2ndQuadrant.com



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Greg Stark wrote:
> On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
> > I don't think that defer_cleanup_age is a long-term solution. ?But we
> > need *a* solution which does not involve delaying 9.0.
>
> So I think the primary solution currently is to raise max_standby_age.
>
> However there is a concern with max_standby_age. If you set it to,
> say, 300s. Then run a 300s query on the slave which causes the slave
> to fall 299s behind. Now you start a new query on the slave -- it gets
> a snapshot based on the point in time that the slave is currently at.
> If it hits a conflict it will only have 1s to finish before the
> conflict causes the query to be cancelled.
>
> In short in the current setup I think there is no safe value of
> max_standby_age which will prevent query cancellations short of -1. If
> the slave has a constant stream of queries and always has at least one
> concurrent query running then it's possible that the slave will run
> continuously max_standby_age-epsilon behind the master and cancel
> queries left and right, regardless of how large max_standby_age is.
>
> To resolve this I think you would have to introduce some chance for
> the slave to catch up. Something like refusing to use a snapshot older
> than max_standby_age/2  and instead wait until the existing queries
> finish and the slave gets a chance to catch up and see a more recent
> snapshot. The problem is that this would result in very unpredictable
> and variable response times from the slave. A single long-lived query
> could cause replay to pause for a big chunk of max_standby_age and
> prevent any new query from starting.

That is a good point. I have added the attached documentation patch to
mention that max_standby_delay increases the master/slave inconsistency,
and not to use it for xid-keepalive connections.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.256
diff -c -c -r1.256 config.sgml
*** doc/src/sgml/config.sgml    27 Feb 2010 14:46:05 -0000    1.256
--- doc/src/sgml/config.sgml    2 Mar 2010 21:03:14 -0000
***************
*** 1869,1875 ****
          this parameter makes sense only during replication, so when
          performing an archive recovery to recover from data loss a very high
          parameter setting or -1 which means wait forever is recommended.
!         The default is 30 seconds.
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
--- 1869,1876 ----
          this parameter makes sense only during replication, so when
          performing an archive recovery to recover from data loss a very high
          parameter setting or -1 which means wait forever is recommended.
!         The default is 30 seconds.  Increasing this parameter can delay
!         master server changes from appearing on the standby.
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
Index: doc/src/sgml/high-availability.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v
retrieving revision 1.52
diff -c -c -r1.52 high-availability.sgml
*** doc/src/sgml/high-availability.sgml    27 Feb 2010 09:29:20 -0000    1.52
--- doc/src/sgml/high-availability.sgml    2 Mar 2010 21:03:14 -0000
***************
*** 1410,1416 ****
      that the primary and standby nodes are linked via the WAL, so the cleanup
      situation is no different from the case where the query ran on the primary
      node itself.  And you are still getting the benefit of off-loading the
!     execution onto the standby.
     </para>

     <para>
--- 1410,1418 ----
      that the primary and standby nodes are linked via the WAL, so the cleanup
      situation is no different from the case where the query ran on the primary
      node itself.  And you are still getting the benefit of off-loading the
!     execution onto the standby. <varname>max_standby_delay</> should
!     not be used in this case because delayed WAL files might already
!     contain entries that invalidate the current shapshot.
     </para>

     <para>

Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Greg Stark wrote:
> On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus <josh@agliodbs.com> wrote:
> > I don't think that defer_cleanup_age is a long-term solution. ?But we
> > need *a* solution which does not involve delaying 9.0.
>
> So I think the primary solution currently is to raise max_standby_age.
>
> However there is a concern with max_standby_age. If you set it to,
> say, 300s. Then run a 300s query on the slave which causes the slave
> to fall 299s behind. Now you start a new query on the slave -- it gets
> a snapshot based on the point in time that the slave is currently at.
> If it hits a conflict it will only have 1s to finish before the
> conflict causes the query to be cancelled.
>
> In short in the current setup I think there is no safe value of
> max_standby_age which will prevent query cancellations short of -1. If
> the slave has a constant stream of queries and always has at least one
> concurrent query running then it's possible that the slave will run
> continuously max_standby_age-epsilon behind the master and cancel
> queries left and right, regardless of how large max_standby_age is.

This is sobering.  I have added the attached documentation so at least
this odd behavior is documented.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.257
diff -c -c -r1.257 config.sgml
*** doc/src/sgml/config.sgml    2 Mar 2010 21:18:59 -0000    1.257
--- doc/src/sgml/config.sgml    2 Mar 2010 23:34:38 -0000
***************
*** 1862,1879 ****
        <listitem>
         <para>
          When server acts as a standby, this parameter specifies a wait policy
!         for queries that conflict with data changes being replayed by recovery.
          If a conflict should occur the server will delay up to this number
!         of seconds before it begins trying to resolve things less amicably, as
!         described in <xref linkend="hot-standby-conflict">. Typically,
!         this parameter makes sense only during replication, so when
!         performing an archive recovery to recover from data loss a very high
!         parameter setting or -1 which means wait forever is recommended.
!         The default is 30 seconds.  Increasing this parameter can delay
!         master server changes from appearing on the standby.
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
        </listitem>
       </varlistentry>

--- 1862,1892 ----
        <listitem>
         <para>
          When server acts as a standby, this parameter specifies a wait policy
!         for applying WAL entries that conflict with active queries.
          If a conflict should occur the server will delay up to this number
!         of seconds before it cancels conflicting queries, as
!         described in <xref linkend="hot-standby-conflict">.
!         Typically, this parameter is used only during replication.
!         The default is 30 seconds.
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
+        <para>
+         A high value makes query cancel less likely, and -1
+         causes the standby to wait forever for a conflicting query to
+         complete.  Increasing this parameter might delay master server
+         changes from appearing on the standby.
+       </para>
+       <para>
+        While it is tempting to believe that <varname>max_standby_delay</>
+        is the maximum number of seconds a query can run before
+        cancellation is possible, this is not true.  When a long-running
+        query ends, there is a finite time required to apply backlogged
+        WAL logs.  If a second long-running query appears before the
+        WAL has caught up, the snapshot taken by the second query will
+        allow significantly less than <varname>max_standby_delay</>
+        before query cancellation is possible.
+       </para>
        </listitem>
       </varlistentry>


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Greg Smith wrote:
> Bruce Momjian wrote:
> >> Right now you can't choose "master bloat", but you can choose the other
> >> two.  I think that is acceptable for 9.0, assuming the other two don't
> >> have the problems that Tom foresees.
> >>     
> >
> > I was wrong.  You can choose "master bloat" with
> > vacuum_defer_cleanup_age, but only crudely because it is measured in
> > xids and the master defers no matter what queries are running on the
> > slave...
> 
> OK with you finding the situation acceptable, so long as it's an 
> informed decision.  From how you're writing about this, I'm comfortable 

Well, consider that in November we were not even sure SR or HS would be
in 9.0.  We got them both, so if it is a little kudgy in 9.0, that's OK.
We are much farther along than I ever expected.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
All,

I've been playing with vacuum_defer_cleanup_age in reference to the
query cancel problem.  It really seems to me that this is the way
forward in terms of dealing with query cancel for normal operation
rather than wal_standby_delay, or maybe in combination with it.

As a first test, I set up a deliberately pathological situation with
pgbench and a wal_standby_delay of 1 second.  This allowed me to trigger
query cancel on a relatively simple reporting query; in fact, to make it
impossible to complete.

Then I increased vacuum_defer_cleanup_age to 100000, which represents
about 5 minutes of transactions on the test system.  This eliminated all
query cancels for the reporting query, which takes an average of 10s.

Next is a database bloat test, but I'll need to do that on a system with
more free space than my laptop.

--Josh Berkus



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Greg Stark
Дата:
On Wed, Mar 10, 2010 at 6:29 AM, Josh Berkus <josh@agliodbs.com> wrote:
> Then I increased vacuum_defer_cleanup_age to 100000, which represents
> about 5 minutes of transactions on the test system.  This eliminated all
> query cancels for the reporting query, which takes an average of 10s.
>
> Next is a database bloat test, but I'll need to do that on a system with
> more free space than my laptop.

Note that this will be heavily dependent on the use case. If you have
one of those counter records that keeps being updated and gets cleaned
up by HOT whenever the page fills up then you need to allow HOT to
clean it up before it overflows the page or else it'll bloat the table
and require a real vacuum. I think that means that a
vacuum_defer_cleanup of up to about 100 or so (it depends on the width
of your counter record) might be reasonable as a general suggestion
but anything higher will depend on understanding the specific system.

Another use case that might suprise people who are accustomed to the
current behaviour is massive updates. This is the main really pessimal
use case left in Postgres -- ideally they wouldn't bloat the table at
all but currently they double the size of the table. People may be
accustomed to the idea that they can then run vacuum and that will
limit the bloat to 50%, assuming they have no (other) long-lived
transactions. With vacuum_defer_cleanup that will no longer be true.
It will be as if you always have a query lasting n transactions in
your system at all times.

--
greg


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
On 3/10/10 3:38 AM, Greg Stark wrote:
> I think that means that a
> vacuum_defer_cleanup of up to about 100 or so (it depends on the width
> of your counter record) might be reasonable as a general suggestion
> but anything higher will depend on understanding the specific system.

100 wouldn't be useful at all.  It would increase bloat without doing
anything about query cancel except on a very lightly used system.

> With vacuum_defer_cleanup that will no longer be true.
> It will be as if you always have a query lasting n transactions in
> your system at all times.

Yep, but until we get XID-publish-to-master working in 9.1, I think it's
probably the best we can do.  At least it's no *worse* than having a
long-running query on the master at all times.

--Josh Berkus



Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Fujii Masao
Дата:
On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus <josh@agliodbs.com> wrote:
> I've been playing with vacuum_defer_cleanup_age in reference to the
> query cancel problem.  It really seems to me that this is the way
> forward in terms of dealing with query cancel for normal operation
> rather than wal_standby_delay, or maybe in combination with it.

Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample?
Though I also tried to test the effect of it, I was unable to find it
in the conf file.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Bruce Momjian
Дата:
Fujii Masao wrote:
> On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus <josh@agliodbs.com> wrote:
> > I've been playing with vacuum_defer_cleanup_age in reference to the
> > query cancel problem. ?It really seems to me that this is the way
> > forward in terms of dealing with query cancel for normal operation
> > rather than wal_standby_delay, or maybe in combination with it.
> 
> Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample?
> Though I also tried to test the effect of it, I was unable to find it
> in the conf file.

I asked about that last week and for some reason Simon didn't want it
added and Greg Smith was going to get this corrected.  Greg?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


Re: Re: Hot Standby query cancellation and Streaming Replication integration

От
Josh Berkus
Дата:
> Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample?
> Though I also tried to test the effect of it, I was unable to find it
> in the conf file.

Using it has some bugs we need to clean up, apparently.

--Josh Berkus