Greg, Robert,
> Certainly that one particular case can be solved by making the
> servers be in time sync a prereq for HS working (in the traditional way).
> And by "prereq" I mean a "user beware" documentation warning.
>
Last I checked, you work with *lots* of web developers and web
companies. I'm sure you can see the issue with the above.
> Stephen's idea of a mode where we wait up to max_standby_delay for a
> lock but then kill everything in our path until we've caught up again
> is another possible way of approaching this problem, although it may
> lead to "kill storms".
Personally, I thought that the kill storms were exactly what was wrong
with max_standby_delay. That is, with MSD, no matter *what* your
settings or traffic are, you're going to get query cancel occasionally.
I don't see the issue with Tom's approach from a wait perspective. The
max wait becomes 1.001X max_standby_delay; there's no way I can think of
that replay would wait longer than that. I've yet to see an explanation
why it would be longer.
Simon's assertion that not all operations take a conventional lock is a
much more serious potential flaw.
-- -- Josh Berkus PostgreSQL Experts Inc.
http://www.pgexperts.com