RE: CDC/ETL system on top of logical replication with pgoutput, custom client

Поиск
Список
Период
Сортировка
От José Neves
Тема RE: CDC/ETL system on top of logical replication with pgoutput, custom client
Дата
Msg-id PR3P193MB049163E2247D8F87340BE34E8905A@PR3P193MB0491.EURP193.PROD.OUTLOOK.COM
обсуждение исходный текст
Ответ на Re: CDC/ETL system on top of logical replication with pgoutput, custom client  (Andres Freund <andres@anarazel.de>)
Ответы Re: CDC/ETL system on top of logical replication with pgoutput, custom client  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi Andres, thanks for your reply.

Ok, if I understood you correctly, I start to see where my logic is faulty. Just to make sure that I got it right, taking the following example again:

T-1
INSERT LSN1-1000
UPDATE LSN2-2000
UPDATE LSN3-3000
COMMIT  LSN4-4000

T-2
INSERT LSN1-500
UPDATE LSN2-1500
UPDATE LSN3-2500
COMMIT  LSN4-5500

Where data will arrive in this order:

INSERT LSN1-500
INSERT LSN1-1000
UPDATE LSN2-1500
UPDATE LSN2-2000
UPDATE LSN3-2500
UPDATE LSN3-3000
COMMIT  LSN4-4000
COMMIT  LSN4-5500

You are saying that the LSN3-3000 will never be missing, either the entire connection will fail at that point, or all should be received in the expected order (which is different from the "numeric order" of LSNs). If the connection is down, upon restart, I will receive the entire T-1 transaction again (well, all example data again).
In addition to that, if I commit LSN4-4000, even tho that LSN has a "bigger numeric value" than the ones representing INSERT and UPDATE events on T-2, I will be receiving the entire T-2 transaction again, as the LSN4-5500 is still uncommitted. 
This makes sense to me, but just to be extra clear, I will never receive a transaction commit before receiving all other events for that transaction.
Are these statements correct?

>Are you using the 'streaming' mode / option to pgoutput?
No.

>Not sure what you mean with "unordered offsets"?
Ordered: EB53/E0D88188, EB53/E0D88189, EB53/E0D88190
Unordered: EB53/E0D88190, EB53/E0D88188, EB53/E0D88189

Extra question: When I get a begin message, I get a transaction starting at LSN-1000, and a transaction ending at LSN-2000. But as the example above shows, I can have data points from other transactions with LSNs in that interval. I have no way to identify to which transaction they belong, correct?

Thanks again. Regards,
José Neves

De: Andres Freund <andres@anarazel.de>
Enviado: 31 de julho de 2023 21:39
Para: José Neves <rafaneves3@msn.com>
Cc: Amit Kapila <amit.kapila16@gmail.com>; pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom client
 
Hi,

On 2023-07-31 14:16:22 +0000, José Neves wrote:
> Hi Amit, thanks for the reply.
>
> In our worker (custom pg replication client), we care only about INSERT,
> UPDATE, and DELETE operations, which - sure - may be part of the issue.

That seems likely. Postgres streams out changes in commit order, not in order
of the changes having been made (that'd not work due to rollbacks etc). If you
just disregard transactions entirely, you'll get something bogus after
retries.

You don't need to store the details for each commit in the target system, just
up to which LSN you have processed *commit records*. E.g. if you have received
and safely stored up to commit 0/1000, you need to remember that.


Are you using the 'streaming' mode / option to pgoutput?


> 1. We have no way to match LSN operations with the respective commit, as
> they have unordered offsets.

Not sure what you mean with "unordered offsets"?


> Assuming that all of them were received in order, we would commit all data with the commit message LSN4-4000 as other events would match the transaction start and end LSN interval of it.

Logical decoding sends out changes in a deterministic order and you won't see
out of order data when using TCP (the entire connection can obviously fail
though).

Andres

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: pgsql: Fix search_path to a safe value during maintenance operations.
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: pgsql: Fix search_path to a safe value during maintenance operations.