RE: CDC/ETL system on top of logical replication with pgoutput, custom client

Поиск

Список

Период

Сортировка

От	José Neves
Тема	RE: CDC/ETL system on top of logical replication with pgoutput, custom client
Дата	24 октября 2023 г. 18:21:33
Msg-id	PR3P193MB04918BB4D03CEF5F966AD73B89DFA@PR3P193MB0491.EURP193.PROD.OUTLOOK.COM обсуждение исходный текст
Ответ на	Re: CDC/ETL system on top of logical replication with pgoutput, custom client (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: CDC/ETL system on top of logical replication with pgoutput, custom client (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi there, hope to find you well.

I have a follow-up question to this already long thread.

Upon deploying my PostgreSQL logical replication fed application on a stale database, I ended up running out of space, as the replication slot is being held back till the next time that we receive a data-changing event, and we advance to that new LSN offset.
I think that the solution for this is to advance our LSN offset every time a keep-alive message is received ('k' // 107).

My doubt is, can the keep-alive messages be received in between open transaction events? I think not, but I would like to get your input to be extra sure as if this happens, and I commit that offset, I may introduce again faulty logic leading to data loss.

In sum, something like this wouldn't happen:

BEGIN LSN001
INSERT LSN002

KEEP LIVE LSN003
UPDATE LSN004
COMMIT LSN005

Correct? It has to be either:

KEEP LIVE LSN001

BEGIN LSN002
INSERT LSN003
UPDATE LSN004
COMMIT LSN005

Or:
BEGIN LSN001
INSERT LSN002
UPDATE LSN004
COMMIT LSN005

KEEP LIVE LSN006

LSNXXX are mere representations of LSN offsets.

Thank you again.
Regards,
José Neves

De: Amit Kapila <amit.kapila16@gmail.com>
Enviado: 8 de agosto de 2023 14:37
Para: José Neves <rafaneves3@msn.com>
Cc: Andres Freund <andres@anarazel.de>; pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom client

On Mon, Aug 7, 2023 at 1:46 PM José Neves <rafaneves3@msn.com> wrote:
>
> Humm, that's... challenging. I faced some issues after "the fix" because I had a couple of transactions with 25k updates, and I had to split it to be able to push to our event messaging system, as our max message size is 10MB. Relying on commit time would mean that all transaction operations will have the same timestamp. If something goes wrong while my worker is pushing that transaction data chunks, I will duplicate some data in the next run, so... this wouldn't allow me to deal with data duplication.
> Is there any other way that you see to deal with it?
>
> Right now I only see an option, which is to store all processed LSNs on the other side of the ETL. I'm trying to avoid that overhead.
>

Sorry, I don't understand your system enough to give you suggestions
but if you have any questions related to how logical replication work
then I might be able to help.

--
With Regards,
Amit Kapila.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 24 октября 2023 г., 18:16:54
Сообщение: Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

Следующее

От: Christoph Berg
Дата: 24 октября 2023 г., 18:31:43
Сообщение: Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

RE: CDC/ETL system on top of logical replication with pgoutput, custom client

Предыдущее

Следующее