RE: Perform streaming logical transactions by background workers and parallel apply

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Perform streaming logical transactions by background workers and parallel apply
Дата
Msg-id OS0PR01MB57164DF9FC5366024A1952D594659@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Perform streaming logical transactions by background workers and parallel apply  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wednesday, April 26, 2023 5:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> Please look at a new anomaly that can be observed starting from 216a7848.
> 
> The following script:
> echo "CREATE SUBSCRIPTION testsub CONNECTION 'dbname=nodb'
> PUBLICATION testpub WITH (connect = false);
> ALTER SUBSCRIPTION testsub ENABLE;" | psql
> 
> sleep 1
> rm $PGINST/lib/libpqwalreceiver.so
> sleep 15
> pg_ctl -D "$PGDB" stop -m immediate
> grep 'TRAP:' server.log
> 
> Leads to multiple assertion failures:
> CREATE SUBSCRIPTION
> ALTER SUBSCRIPTION
> waiting for server to shut down.... done
> server stopped
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899323
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899416
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899427
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899439
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899538
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899547
> 
> server.log contains:
> 2023-04-26 11:00:58.797 MSK [2899300] LOG:  database system is ready to
> accept connections
> 2023-04-26 11:00:58.821 MSK [2899416] ERROR:  could not access file
> "libpqwalreceiver": No such file or directory
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899416
> postgres: logical replication apply worker for subscription 16385
> (ExceptionalCondition+0x69)[0x558b2ac06d41]
> postgres: logical replication apply worker for subscription 16385
> (VirtualXactLockTableCleanup+0xa4)[0x558b2aa9fd74]
> postgres: logical replication apply worker for subscription 16385
> (LockReleaseAll+0xbb)[0x558b2aa9fe7d]
> postgres: logical replication apply worker for subscription 16385
> (+0x4588c6)[0x558b2aa2a8c6]
> postgres: logical replication apply worker for subscription 16385
> (shmem_exit+0x6c)[0x558b2aa87eb1]
> postgres: logical replication apply worker for subscription 16385
> (+0x4b5faa)[0x558b2aa87faa]
> postgres: logical replication apply worker for subscription 16385
> (proc_exit+0xc)[0x558b2aa88031]
> postgres: logical replication apply worker for subscription 16385
> (StartBackgroundWorker+0x147)[0x558b2aa0b4d9]
> postgres: logical replication apply worker for subscription 16385
> (+0x43fdc1)[0x558b2aa11dc1]
> postgres: logical replication apply worker for subscription 16385
> (+0x43ff3d)[0x558b2aa11f3d]
> postgres: logical replication apply worker for subscription 16385
> (+0x440866)[0x558b2aa12866]
> postgres: logical replication apply worker for subscription 16385
> (+0x440e12)[0x558b2aa12e12]
> postgres: logical replication apply worker for subscription 16385
> (BackgroundWorkerInitializeConnection+0x0)[0x558b2aa14396]
> postgres: logical replication apply worker for subscription 16385
> (main+0x21a)[0x558b2a932e21]
> 
> I understand, that removing libpqwalreceiver.so (or whole pginst/) is not
> what happens in a production environment every day, but nonetheless it's a
> new failure mode and it can produce many coredumps when testing.
> 
> IIUC, that assert will fail in case of any error raised between
> ApplyWorkerMain()->logicalrep_worker_attach()->before_shmem_exit() and
> ApplyWorkerMain()->InitializeApplyWorker()->BackgroundWorkerInitializeC
> onnectionByOid()->InitPostgres().

Thanks for reporting the issue.

I think the problem is that it tried to release locks in
logicalrep_worker_onexit() before the initialization of the process is complete
because this callback function was registered before the init phase. So I think we
can add a conditional statement before releasing locks. Please find an attached
patch.

Best Regards,
Hou zj


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Add two missing tests in 035_standby_logical_decoding.pl
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Find dangling membership roles in pg_dumpall