Re: logical changeset generation v5

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: logical changeset generation v5
Дата
Msg-id 20130903165754.GC7018@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: logical changeset generation v5  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: logical changeset generation v5  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 2013-09-03 12:22:22 -0400, Robert Haas wrote:
> On Tue, Sep 3, 2013 at 12:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> 1. I think more comments are needed here to explain why we need this.
> >> I don't know if the comments should go into the functions modified by
> >> this patch or in some other location, but I don't find what's here now
> >> adequate for understanding.
> >
> > Hm. What information are you actually missing? I guess the
> > XLogSetAsyncXactLSN() needs a bit more context based on your question,
> > what else?
> > Not sure if it makes sense to explain in detail why it helps us to get
> > into a consistent state faster?

> Well, we must have had some idea in mind when the original Hot Standby
> patch went in that doing this once per checkpoint was good enough.
> Now we think we need it every 15 seconds, but not more or less often.
> So, why the change of heart?

I think the primary reason for that was that it's was a pretty
complicated patchset and we needed to start somewhere. By now we do have
reports of standbys taking their time to get consistent.

> To my way of thinking, it seems as though we ought to always begin
> replay at a checkpoint, so the standby ought always to see one of
> these records immediately.  Obviously that's not good enough, but why
> not?

We always see one after the checkpoint (well, actually before the
checkpoint record, but ...), correct. The problem is just that reading a
single xact_running record doesn't automatically make you consistent. If
there's a single suboverflowed transaction running on the primary when
the xl_runing_xacts is logged we won't be able to switch to
consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun
and some optimizations.
Since the only place where we currently have the information to
potentially become consistent is ProcArrayApplyRecoveryInfo() we will
have to wait checkpoint_timeout time till we get consistent. Which
sucks as there are good arguments to set that to 1h.
That especially sucks as you loose consistency everytime you restart the
standby...

> And why is every 15 seconds good enough?

Waiting 15s to become consistent instead of checkpoint_timeout seems to
be ok to me and to be a good tradeoff between overhead and waiting. We
can certainly discuss other values or making it configurable. The latter
seemed to be unnecessary to me, but I have don't have a problem
implementing it. I just don't want to document it :P

> >> 3. Why does LogCurrentRunningXacts() need to call
> >> XLogSetAsyncXactLSN()?  Hopefully any WAL record is going to get
> >> sync'd in a reasonably timely fashion; I can't see off-hand why this
> >> one should need special handling.
> >
> > No, we don't force writing out wal records in a timely fashion if
> > there's no pressure in wal_buffers, basically only on commits and
> > various XLogFlush()es. It doesn't make much of a difference if the
> > entire system is busy, but if it's not the wal writer will sleep. The
> > alternative would be to XLogFlush() the record, but that would actually
> > block, which isn't really what we want/need.

> The WAL writer is supposed to call XLogBackgroundFlush() every time
> WalWriterDelay expires.  Yeah, it can hibernate, but if it's
> hibernating, then we should respect that decision for this WAL record
> type also.

Why should we respect it? There is work to be done and the wal writer
has no way of knowing that without us telling it? Normally we rely on
commit records and XLogFlush()es to trigger the wal writer.
Alternatively we can start a transaction and set synchronous_commit =
off, but that seems like a complication to me.

> >> I understand why logical replication needs to connect to a database,
> >> but I don't understand why any other walsender would need to connect
> >> to a database.
> >
> > Well, logical replication actually streams out data using the walsender,
> > so that's the reason why I want to add it there. But there have been
> > cases in the past where we wanted to do stuff in the walsender that need
> > database access, but we couldn't do so because you cannot connect to
> > one.

> Could you be more specific?

I only remember 3959.1349384333@sss.pgh.pa.us but I think it has come up
before.

> >>  Absent a clear use case for such a thing, I don't
> >> think we should allow it.  Ignorant suggestion: perhaps the database
> >> name could be stored in the logical replication slot.
> >
> > The problem is that you need to InitPostgres() with a database. You
> > cannot do that again, after connecting with an empty database which we
> > do in a plain walsender.
> 
> Are you saying that the logical replication slot can't be read before
> calling InitPostgres()?

The slot can be read just fine, but we won't know that we should do
that. Walsender accepts commands via PostgresMain()'s mainloop which has
done a InitPostgres(dbname) before. Which we need to do because we need
the environment it sets up.

The database is stored in the slots btw (as oid, not as a name though) ;)

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Freezing without write I/O
Следующее
От: Stefan Kaltenbrunner
Дата:
Сообщение: Re: CREATE FUNCTION .. SET vs. pg_dump