Re: logical replication empty transactions

Поиск

Список

Период

Сортировка

От	Craig Ringer
Тема	Re: logical replication empty transactions
Дата	13 марта 2020 г. 09:39:43
Msg-id	CAMsr+YE3o8Dt890Q8wTooY2MpN0JvdHqUAHYL-LNhBryXOPaKg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: logical replication empty transactions (Andres Freund <andres@anarazel.de>)
Ответы	Re: logical replication empty transactions Re: logical replication empty transactions
Список	pgsql-hackers

Дерево обсуждения

On Tue, 10 Mar 2020 at 02:30, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-03-06 13:53:02 +0800, Craig Ringer wrote:
> On Mon, 2 Mar 2020 at 19:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > One thing that is not clear to me is how will we advance restart_lsn
> > if we don't send any empty xact in a system where there are many such
> > xacts?
>
> Same way we already do it for writes that are not replicated over
> logical replication, like vacuum work etc. The upstream sends feedback
> with reply-requested. The downstream replies. The upstream advances
> confirmed_flush_lsn, and that lazily updates restart_lsn.

It'll still delay it a bit.

Right, but we don't generally care because there's no sync rep txn waiting for confirmation. If we lose progress due to a crash it doesn't matter. It does delay removal of old WAL a little, but it hardly matters.

Somewhat independent from the issue at hand: It'd be really good if we
could evolve the syncrep framework to support per-database waiting... It
shouldn't be that hard, and the current situation sucks quite a bit (and
yes, I'm to blame).

Hardly, you just didn't get the chance to fix that on top of the umpteen other things you had to change to make all the logical stuff work. You didn't break it, just didn't implement every single possible enhancement all at once. Shocking, I tell you.

I'm not quite sure what you mean by "poke the walsender"? Kinda sounds
like sending a signal, but decoding happens inside after the walsender,
so there's no need for that. Do you just mean somehow requesting that
walsender sends a feedback message?

Right. I had in mind something like sending a ProcSignal via our funky multiplexed signal mechanism to ask the walsender to immediately generate a keepalive message with a reply-requested flag, then set the walsender's latch so we wake it promptly.

To address the volume we could:

1a) Introduce a pgoutput message type to indicate that the LSN has
advanced, without needing separate BEGIN/COMMIT. Right now BEGIN is
21 bytes, COMMIT is 26. But we really don't need that much here. A
single message should do the trick.

It would. Is it worth caring though? Especially since it seems rather unlikely that the actual network data volume of begin/commit msgs will be much of a concern. It's not like we're PITRing logical streams, and if we did, we could just filter out empty commits on the receiver side.

That message pretty much already exists in the form of a walsender keepalive anyway so we might as well re-use that and not upset the protocol.

1b) Add a LogicalOutputPluginWriterUpdateProgress parameter (and
possibly rename) that indicates that we are intentionally "ignoring"
WAL. For walsender that callback then could check if it could just
forward the position of the client (if it was entirely caught up
before), or if it should send a feedback request (if syncrep is
enabled, or distance is big).

I can see something like that being very useful, because at present only the output plugin knows if a txn is "empty" as far as that particular slot and output plugin is concerned. The reorder buffering mechanism cannot do relation-level filtering before it sends the changes to the output plugin during ReorderBufferCommit, since it only knows about relfilenodes not relation oids. And the output plugin might be doing finer grained filtering using row-filter expressions or who knows what else.

But as described above that will only help for txns done in DBs other than the one the logical slot is for or txns known to have an empty ReorderBuffer when the commit is seen.

If there's a txn in the slot's db with a non-empty reorderbuffer, the output plugin won't know if the txn is empty or not until it finishes processing all callbacks and sees the commit for the txn. So it will generally have emitted the Begin message on the wire by the time it knows it has nothing useful to say. And Pg won't know that this txn is empty as far as this output plugin with this particular slot, set of output plugin params, and current user-catalog state is concerned, so it won't have any way to call the output plugin's "update progress" callback instead of the usual begin/change/commit callbacks.

But I think we can already skip empty txns unless sync-rep is enabled with no core changes, and send empty txns as walsender keepalives instead, by altering only output plugins, like this:

* Stash BEGIN data in plugin's LogicalDecodingContext.output_plugin_private when plugin's begin callback called, don't write anything to the outstream

* Write out BEGIN message lazily when any other callback generates a message that does need to be written out

* If no BEGIN written by the time COMMIT callback called, discard the COMMIT too. Check if sync rep enabled. if it is, call LogicalDecodingContext.update_progress from within the output plugin commit handler, otherwise just ignore the commit totally. Probably by calling OutputPluginUpdateProgress().

We could e.g. have a new LogicalDecodingContext callback that is
called whenever WalSndWaitForWal() would wait. That'd check if there's
a pending "need" to send out a 'empty transaction'/feedback request
message. The "need" flag would get cleared whenever we send out data
bearing an LSN for other reasons.

I can see that being handy, yes. But it won't necessarily help with the sync rep issue, since other sync rep txns may continue to generate WAL while others wait for commit-confirmations that won't come from the logical replica.

While we're speaking of adding output plugin hooks, I keep on trying to think of a sensible way to do a plugin-defined reply handler, so the downstream end can send COPY BOTH messages of some new msgkind back to the walsender, which will pass them to the output plugin if it implements the appropriate handle_reply_message (or whatever) callback. That much is trivial to implement, where I keep getting a bit stuck is with whether there's a sensible snapshot that can be set to call the output plugin reply handler with. We wouldn't want to switch to a current non-historic snapshot because of all the cache flushes that'd cause, but there isn't necessarily a valid and safe historic snapshot to set when we're not within ReorderBufferCommit is there?

I'd love to get rid of the need to "connect back" to a provider over plain libpq connections to communicate with it. The ability to run SQL on the walsender conn helps. But really, so much more would be possible if we could just have the downstream end *reply* on the same connection using COPY BOTH, much like it sends replay progress updates right now. It'd let us manage relation/attribute/type metadata caches better for example.

Thoughts?

Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "imai.yoshikazu@fujitsu.com"
Дата: 13 марта 2020 г., 09:35:48
Сообщение: RE: Planning counters in pg_stat_statements (using pgss_store)

Следующее

От: "imai.yoshikazu@fujitsu.com"
Дата: 13 марта 2020 г., 09:54:28
Сообщение: RE: Planning counters in pg_stat_statements (using pgss_store)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: logical replication empty transactions

Предыдущее

Следующее