Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
От | Shlok Kyal |
---|---|
Тема | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Дата | |
Msg-id | CANhcyEW8UyMr_7idB580DT3bjtB=EKiHwecTx5KC3ggiVs9c+A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-bugs |
On Wed, 21 May 2025 at 17:18, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear hackers, > > > I think the problem here is that when we are distributing > > invalidations to a concurrent transaction, in addition to queuing the > > invalidations as a change, we also copy the distributed invalidations > > along with the original transaction's invalidations via repalloc in > > ReorderBufferAddInvalidations. So, when there are many in-progress > > transactions, each would try to copy all its accumulated invalidations > > to the remaining in-progress transactions. This could lead to such an > > increase in allocation request size. However, after queuing the > > change, we don't need to copy it along with the original transaction's > > invalidations. This is because the copy is only required when we don't > > process any changes in cases like ReorderBufferForget(). I have > > analyzed all such cases, and my analysis is as follows: > > Based on the analysis, I created a PoC which avoids the repalloc(). > Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are > skipped to add in the list, just queued - repalloc can be skipped. Also, the function > distributes messages only in the list, so received messages won't be sent again. > > Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and > confirms whether the issue can be solved? > Hi, I was able to reproduce the issue with following test: 1. First begin 9 concurrent txn. (BEGIN; INSERT into t1 values(11);) 2. In 10th concurrent txn : perform 1000 DDL (ALTER PUBLICATION ADD/DROP TABLE) 3. For each concurrent 9 txn. Perform: i. Add 1000 DDL ii. COMMIT; iii. BEGIN; INSERT into t1 values(11); 4. Perform step (2 and 3) in loop This steps reproduced the error: 2025-05-22 19:03:35.111 JST [63150] sub1 ERROR: invalid memory alloc request size 1555752832 2025-05-22 19:03:35.111 JST [63150] sub1 STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', streaming 'parallel', origin 'any', publication_names '"pub1"') I have also attached the test script for the same. Also, I tried to run the test with Kuroda-san's patch and it did not reproduce the issue. Thanks and Regards, Shlok Kyal
Вложения
В списке pgsql-bugs по дате отправления: