Re: Issues with two-server Synch Rep

Поиск
Список
Период
Сортировка
От Josh Berkus
Тема Re: Issues with two-server Synch Rep
Дата
Msg-id 4CB38436.308@agliodbs.com
обсуждение исходный текст
Ответ на Re: Issues with two-server Synch Rep  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Issues with two-server Synch Rep  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
> Obviously.  I presume it'll be something like "update postgresql.conf
> or recovery.conf and run pg_ctl reload", but I haven't (yet, anyway)
> verified the actual behavior of the patches, but if the above isn't
> feasible then we have a problem.

Right.  That's why I asked the question.  Mind you, a superuser function
on the master would be even better ...

> What is your source for those numbers?  They could be right, but I
> simply don't know.

pg_bench tests with asynch rep and standby_delay = 0.  Not rigorous, but
enough to show that there is a problem there.  Doing pg_bench with a
small database

> It would be far better if we could decouple master cleanup from
> standby cleanup, so that only the machine that actually has the old
> query gets bloated.  However, no one seems excited about writing that
> code.

"not excited" == terrified of the amount of troubleshooting involved,
and likely believing it's impossible.

> A further grump about our current architecture is that it doesn't seem
> at all clear how to make it work for partial replication.  I have to
> wonder whether we are going down the wrong path completely and need to
> hit the reset button.  

The way to do partial replication is Slony, Londiste, Bucardo, etc.

> But neither this nor the pruning problem are
> things that we can reasonably expect the sync rep patch to solve, if
> we want it to get committed this release cycle.

>> It is not, given that I've seen several proposals for synch rep which
>> would make asynch rep even more complicated than it already is.
> 
> I'm not aware of any proposals on the table which would do that.

Standby registration?

> Do you have some ideas on how to simplify it?  How will we know
> whether a particular design for sync rep does this?

That's a good point, I'll have to think about this and do a write-up.

> Sure, that would be nice to have, and it's a good idea.  But I don't
> think that's going to be a common failure mode.  What I expect to
> happen is the standby to hum along with no problem for a long time and
> then either kick a disk or suffer a power outage.

That might be more common, but it's not an argument against monitoring
what we *can* monitor for.  More importantly, if monitoring ACK response
times -- and similar metrics -- is not available via core postgres, it
is impossible to find them out any other way.  We need to give DBAs the
tools to do their jobs, even if the tools are at a very low level.

> No, it isn't at all.  What does your application do NOW if the master
> goes down after you've sent a commit and before you get an
> acknowledgment back?  Does it assume that the transaction is
> committed, or does it assume the transaction was aborted by a crash on
> the master?  Either is possible, right?

This problem certainly exists with async, it's just less likely so
people are ignoring it.  With a high enough transaction rate, and a
standby in "apply" mode, it's *certain* to happen on synch rep.  So we
can't ignore it as a problem anymore.

I don't have any brilliant ideas on a solution for this one.

>> So, your opinion is "it's out of scope to handle this issue" ?
> 
> What handling of it would you propose?  Consider the case where you

I was asking a question.  My original question was "do we need to handle
this?"  I'm taking your viewpoint as "there's no reasonable way to
handle it, so we shouldn't."  That's a fine answer.  What I want is for
-hackers to make a *decision* about a very real problem, and not just
fail to discuss it.

> I agree, but it's not something we can address in the first patch,
> which is hard enough without adding things that make it even harder.
> We need to get something simple committed first and then build on it.

The reason I posted the start of this thread is that I know that both
Fujii and Simon have thought about some of these questions, and even if
they don't have code for them, they have ideas.  I want to read those
ideas explained.  Further, the answers to these questions may tell the
rest of us which parts of each patch are the most valuable.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Issues with two-server Synch Rep
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Issues with two-server Synch Rep