Re: logical changeset generation v5

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: logical changeset generation v5
Дата
Msg-id CA+TgmoaHPnVBfyjcKrbWdgGMMtyftM5y1+zm+Od=w_+NNED4pw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: logical changeset generation v5  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: logical changeset generation v5  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> To my way of thinking, it seems as though we ought to always begin
>> replay at a checkpoint, so the standby ought always to see one of
>> these records immediately.  Obviously that's not good enough, but why
>> not?
>
> We always see one after the checkpoint (well, actually before the
> checkpoint record, but ...), correct. The problem is just that reading a
> single xact_running record doesn't automatically make you consistent. If
> there's a single suboverflowed transaction running on the primary when
> the xl_runing_xacts is logged we won't be able to switch to
> consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun
> and some optimizations.
> Since the only place where we currently have the information to
> potentially become consistent is ProcArrayApplyRecoveryInfo() we will
> have to wait checkpoint_timeout time till we get consistent. Which
> sucks as there are good arguments to set that to 1h.
> That especially sucks as you loose consistency everytime you restart the
> standby...

Right, OK.

>> And why is every 15 seconds good enough?
>
> Waiting 15s to become consistent instead of checkpoint_timeout seems to
> be ok to me and to be a good tradeoff between overhead and waiting. We
> can certainly discuss other values or making it configurable. The latter
> seemed to be unnecessary to me, but I have don't have a problem
> implementing it. I just don't want to document it :P

I don't think it particularly needs to be configurable, but I wonder
if we can't be a bit smarter about when we do it.  For example,
suppose we logged it every 15 s but only until we log a non-overflowed
snapshot.  I realize that the overhead of a WAL record every 15
seconds is fairly small, but the load on some systems is all but
nonexistent.  It would be nice not to wake up the HD unnecessarily.

>> The WAL writer is supposed to call XLogBackgroundFlush() every time
>> WalWriterDelay expires.  Yeah, it can hibernate, but if it's
>> hibernating, then we should respect that decision for this WAL record
>> type also.
>
> Why should we respect it?

Because I don't see any reason to believe that this WAL record is any
more important or urgent than any other WAL record that might get
logged.

>> >> I understand why logical replication needs to connect to a database,
>> >> but I don't understand why any other walsender would need to connect
>> >> to a database.
>> >
>> > Well, logical replication actually streams out data using the walsender,
>> > so that's the reason why I want to add it there. But there have been
>> > cases in the past where we wanted to do stuff in the walsender that need
>> > database access, but we couldn't do so because you cannot connect to
>> > one.
>
>> Could you be more specific?
>
> I only remember 3959.1349384333@sss.pgh.pa.us but I think it has come up
> before.

It seems we need some more design there.  Perhaps entering replication
mode could be triggered by writing either dbname=replication or
replication=yes.  But then, do the replication commands simply become
SQL commands?  I've certainly seen hackers use them that way.  And I
can imagine that being a sensible approach, but this patch seems like
it's only covering a fairly small fraction of what really ought to be
a single commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [9.4] Make full_page_writes only settable on server start?
Следующее
От: Antonin Houska
Дата:
Сообщение: Re: Backup throttling