Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

Поиск
Список
Период
Сортировка
От Peter Smith
Тема Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Дата
Msg-id CAHut+Ps2W7KP46zFQ9ybEckJoRKpT6zh6aPj31L2RV9qBcoLRg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-bugs
On Mon, Nov 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> Peter, can you also please once test the attached and see if this
> fixes the problem for you as well?

I have reviewed the v3 patch code, and repeated the tablesync testing
(for a stream-enabled SUBSCRIPTION) using the debugging technique
previously described [1].
https://www.postgresql.org/message-id/CAHut%2BPt4PyKQCwqzQ%3DEFF%3DbpKKJD7XKt_S23F6L20ayQNxg77A%40mail.gmail.com

Now the test was successful. The stream start/stop/commit/abort are
all being handled by the tablesync worker without errors.

===

TESTING

I performed the tablesync stream test by multiple different ways of
inserting the en masse data:

1. Cause streaming (using a PREPARE TRANSACTION)
------------------------------------------------
psql -d test_pub -c "BEGIN;INSERT INTO test_tab SELECT i, md5(i::text)
FROM generate_series(3, 5000) s(i);PREPARE TRANSACTION
'test_prepared_tab';"
psql -d test_pub -c "COMMIT PREPARED 'test_prepared_tab';"

tablesync does 'S' (stream start)
tablesync does 'R'
tablesync does 'I' (insert) x n
tablesync does 'E' (stream end)
tablesync does 'S'
tablesync does 'I' x n
tablesync does 'E'
tablesync does { 'S',  'I' x n, 'E' } repeats 13 more times
tablesync does 'c' (stream commit)
- In handle_stream_commit the tablesync worker processes the spool
file, calling apply_dispatch for all the messages
- 'R'
- { 'I' + should_apply_changes_for_rel = true } x 5000
- then calls process_syncing_tables_for_sync, which sets the state to
SUBREL_STATE_SYNCDONE
- then tablesync worker process exits

Now the "apply" worker catches up.
- it runs all the same messages again but does nothing because
should_apply_changes_for_rel = false


2. Cause streaming (not using a PREPARE TRANSACTION)
----------------------------------------------------
psql -d test_pub -c "INSERT INTO test_tab SELECT i, md5(i::text) FROM
generate_series(3, 5000) s(i);"

gives very similar results to above


3. Causing a stream abort to happen
-----------------------------------
psql -d test_pub -c "BEGIN; INSERT INTO test_tab SELECT i,
md5(i::text) FROM generate_series(3, 5000) s(i); INSERT INTO test_tab
VALUES(4999, 'already exists'); END;"

tablesync does 'S' (stream start)
tablesync does 'E' (stream end)
tablesync does 'A' (stream abort)
tablesync does 'B'
tablesync does 'C'
- calls process_syncing_tables
- tablesync worker process exits
apply worker continues...

===

REVIEW COMMENTS

Despite the tests working OK I thought there should be a couple small
changes made for this patch, as follows.

(1) process_sync_tables should be last

IMO the process_syncing_tables should be always the *last* thing any
apply handler does, because it seems a bad idea for the worker to
change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact
there is still a little bit more processing remaining. This is why I
made the same modification in the other WIP patch v26-0006 [2]
[2] https://www.postgresql.org/message-id/CAHut%2BPuQcLbjrBXM54%2By%2B%3DYsmEDFd3CCtp31K-_jS4Xka2vwbQ%40mail.gmail.com

So, I think the process_syncing_tables() call should be put *after*
the stream_cleanup_files() in function apply_handle_stream_commit().
But to do this means the process_syncing_tables has to be removed from
your new apply_handle_commit_internal function.


(2) misleading comment

/*
 * Start a transaction on stream start, this transaction will be committed
 * on the stream stop. We need the transaction for handling the buffile,
 * used for serializing the streaming data and subxact info.
 */

That above comment (in the apply_handle_stream_start function) may now
be inaccurate/misleading for the case of tablesync worker, because I
think for tablesync you are explicitly avoiding the tx commit within
the apply_handle_stream_stop().

---

Kind Regards,
Peter Smith.
Fujitsu Australia



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Manoj Kumar
Дата:
Сообщение: Re: BUG #16739: Temporary files not deleting from data folder on disk
Следующее
От: David Turoň
Дата:
Сообщение: Re: BUG #16743: psql doesn't show whole expression in stored column