Обсуждение: [HACKERS] Logical replication ApplyContext bloat
Hi, currently logical replication worker uses ApplyContext to decode received data and that context is never freed during transaction processing. Hence if publication side is performing something like 10M row inserts in single transaction, then subscription worker will occupy more than 10G of ram (and can be killed by OOM). Attached patch resets ApplyContext after each insert/update/delete/commit. I’ve tried to reset context only on each 100/1000/10000 value of CommandCounter, but didn’t spotted any measurable difference in speed. Problem spotted by Mikhail Shurutov. Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 18/04/17 13:45, Stas Kelvich wrote: > Hi, > > currently logical replication worker uses ApplyContext to decode received data > and that context is never freed during transaction processing. Hence if publication > side is performing something like 10M row inserts in single transaction, then > subscription worker will occupy more than 10G of ram (and can be killed by OOM). > > Attached patch resets ApplyContext after each insert/update/delete/commit. > I’ve tried to reset context only on each 100/1000/10000 value of CommandCounter, > but didn’t spotted any measurable difference in speed. > > Problem spotted by Mikhail Shurutov. > Hmm we also do replication protocol handling inside the ApplyContext (LogicalRepApplyLoop), are you sure this change is safe in that regard? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> On 19 Apr 2017, at 12:37, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > > On 18/04/17 13:45, Stas Kelvich wrote: >> Hi, >> >> currently logical replication worker uses ApplyContext to decode received data >> and that context is never freed during transaction processing. Hence if publication >> side is performing something like 10M row inserts in single transaction, then >> subscription worker will occupy more than 10G of ram (and can be killed by OOM). >> >> Attached patch resets ApplyContext after each insert/update/delete/commit. >> I’ve tried to reset context only on each 100/1000/10000 value of CommandCounter, >> but didn’t spotted any measurable difference in speed. >> >> Problem spotted by Mikhail Shurutov. >> > > Hmm we also do replication protocol handling inside the ApplyContext > (LogicalRepApplyLoop), are you sure this change is safe in that regard? Memory is bloated by logicalrep_read_* when palloc happens inside. I’ve inserted context reset in ensure_transaction() which is only called in beginning of hande_insert/update/delete/commit when data from previous call of that function isn’t needed. So probably it is safe to clear memory there. At least i don’t see any state that can be accessed later in this functions. Do you have any specific concerns? Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 19/04/17 11:55, Stas Kelvich wrote: > >> On 19 Apr 2017, at 12:37, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >> >> On 18/04/17 13:45, Stas Kelvich wrote: >>> Hi, >>> >>> currently logical replication worker uses ApplyContext to decode received data >>> and that context is never freed during transaction processing. Hence if publication >>> side is performing something like 10M row inserts in single transaction, then >>> subscription worker will occupy more than 10G of ram (and can be killed by OOM). >>> >>> Attached patch resets ApplyContext after each insert/update/delete/commit. >>> I’ve tried to reset context only on each 100/1000/10000 value of CommandCounter, >>> but didn’t spotted any measurable difference in speed. >>> >>> Problem spotted by Mikhail Shurutov. >>> >> >> Hmm we also do replication protocol handling inside the ApplyContext >> (LogicalRepApplyLoop), are you sure this change is safe in that regard? > > Memory is bloated by logicalrep_read_* when palloc happens inside. > I’ve inserted context reset in ensure_transaction() which is only called in beginning > of hande_insert/update/delete/commit when data from previous call of that > function isn’t needed. So probably it is safe to clear memory there. At least > i don’t see any state that can be accessed later in this functions. Do you > have any specific concerns? > I wanted to make sure you checked things that are happening outside of the apply_handle_* stuff. I checked myself now, the allocations that happen outside don't use postgres memory contexts (libpqrcv_receive) so that should not be problem. Other allocations that I see in neighboring code switch to permanent contexts anyway. So looks safe on first look indeed. Eventually we'll want to cache some of the executor stuff so this will be problem but not in v10. I'd still like you to add comment to the ApplyContext saying that it's cleaned after handling each message so nothing that needs to survive for example whole transaction can be allocated in it as that's not very well visible IMHO (and I know I will forget about it when writing next patch in that area). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 19 April 2017 at 11:24, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > I'd still like you to add comment to the ApplyContext saying that it's > cleaned after handling each message so nothing that needs to survive for > example whole transaction can be allocated in it as that's not very well > visible IMHO (and I know I will forget about it when writing next patch > in that area). It would be better to invent the contexts we want now, so we get the architecture right for future work. Otherwise we have problems with "can't backpatch this fix because that infrastructure doesn't exist in PG10" in a couple of years. So I suggest we have ApplyMessageContext - cleaned after every message ApplyTransactionContext - cleaned at EOXact ApplyContext - persistent -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 19 Apr 2017, at 13:34, Simon Riggs <simon@2ndquadrant.com> wrote: > > On 19 April 2017 at 11:24, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > >> I'd still like you to add comment to the ApplyContext saying that it's >> cleaned after handling each message so nothing that needs to survive for >> example whole transaction can be allocated in it as that's not very well >> visible IMHO (and I know I will forget about it when writing next patch >> in that area). > > It would be better to invent the contexts we want now, so we get the > architecture right for future work. Otherwise we have problems with > "can't backpatch this fix because that infrastructure doesn't exist in > PG10" in a couple of years. > > So I suggest we have > > ApplyMessageContext - cleaned after every message > ApplyTransactionContext - cleaned at EOXact > ApplyContext - persistent Right now ApplyContext cleaned after each transaction and by this patch I basically suggest to clean it after each command counter increment. So in your definitions that is ApplyMessageContext, most short-lived one. We can rename now ApplyContext -> ApplyMessageContext to make that clear and avoid possible name conflict when somebody will decide to keep something during whole transaction or worker lifespan. P.S. It also seems to me now that resetting context in ensure_transaction() isn’t that good idea, probably better just explicitly call reset at the end of each function involved. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 19/04/17 12:46, Stas Kelvich wrote: > >> On 19 Apr 2017, at 13:34, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> On 19 April 2017 at 11:24, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >> >>> I'd still like you to add comment to the ApplyContext saying that it's >>> cleaned after handling each message so nothing that needs to survive for >>> example whole transaction can be allocated in it as that's not very well >>> visible IMHO (and I know I will forget about it when writing next patch >>> in that area). >> >> It would be better to invent the contexts we want now, so we get the >> architecture right for future work. Otherwise we have problems with >> "can't backpatch this fix because that infrastructure doesn't exist in >> PG10" in a couple of years. >> >> So I suggest we have >> >> ApplyMessageContext - cleaned after every message >> ApplyTransactionContext - cleaned at EOXact >> ApplyContext - persistent > > Right now ApplyContext cleaned after each transaction and by this patch I basically > suggest to clean it after each command counter increment. > > So in your definitions that is ApplyMessageContext, most short-lived one. We can > rename now ApplyContext -> ApplyMessageContext to make that clear and avoid > possible name conflict when somebody will decide to keep something during whole > transaction or worker lifespan. Yeah we can rename, and then rename the ApplyCacheContext to ApplyContext. That would leave the ApplyTransactionContext from Simon's proposal which is not really need it anywhere right now and I am not sure it will be needed given there is still TopTransactionContext present. If we really want ApplyTransactionContext we can probably just assign it to TopTransactionContext in ensure_transaction. > > P.S. It also seems to me now that resetting context in ensure_transaction() isn’t that > good idea, probably better just explicitly call reset at the end of each function involved. > Well it would definitely improve clarity. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> On 19 Apr 2017, at 14:30, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > > On 19/04/17 12:46, Stas Kelvich wrote: >> >> Right now ApplyContext cleaned after each transaction and by this patch I basically >> suggest to clean it after each command counter increment. >> >> So in your definitions that is ApplyMessageContext, most short-lived one. We can >> rename now ApplyContext -> ApplyMessageContext to make that clear and avoid >> possible name conflict when somebody will decide to keep something during whole >> transaction or worker lifespan. > > Yeah we can rename, and then rename the ApplyCacheContext to > ApplyContext. That would leave the ApplyTransactionContext from Simon's > proposal which is not really need it anywhere right now and I am not > sure it will be needed given there is still TopTransactionContext > present. If we really want ApplyTransactionContext we can probably just > assign it to TopTransactionContext in ensure_transaction. > >> >> P.S. It also seems to me now that resetting context in ensure_transaction() isn’t that >> good idea, probably better just explicitly call reset at the end of each function involved. >> > > Well it would definitely improve clarity. > Okay, done. With patch MemoryContextStats() shows following hierarchy during slot operations in apply worker: TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 used ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Stas Kelvich wrote: > With patch MemoryContextStats() shows following hierarchy during slot operations in > apply worker: > > TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used > ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used > ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 used > ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used > ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used Please update src/backend/utils/mmgr/README to list the new contexts. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 19 Apr 2017, at 16:07, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Stas Kelvich wrote: > >> With patch MemoryContextStats() shows following hierarchy during slot operations in >> apply worker: >> >> TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used >> ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used >> ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 used >> ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used >> ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used > > Please update src/backend/utils/mmgr/README to list the new contexts. Thanks for noting. Added short description of ApplyContext and ApplyMessageContext to README. Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Thu, Apr 20, 2017 at 7:04 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote: > Thanks for noting. > > Added short description of ApplyContext and ApplyMessageContext to README. +ApplyContext --- permanent during whole lifetime of apply worker. It is +possible to use TopMemoryContext here as well, but for simplicity of memory usage +analysys we spin up different context. Typo /analysys/analysis -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
> On 20 Apr 2017, at 17:01, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Apr 20, 2017 at 7:04 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote: >> Thanks for noting. >> >> Added short description of ApplyContext and ApplyMessageContext to README. > > Typo > > /analysys/analysis > Fixed, thanks. Added this to open items list. Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Wed, May 03, 2017 at 12:02:41PM +0300, Stas Kelvich wrote: > > On 20 Apr 2017, at 17:01, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Apr 20, 2017 at 7:04 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote: > >> Thanks for noting. > >> > >> Added short description of ApplyContext and ApplyMessageContext to README. > > > > Typo > > > > /analysys/analysis > > > > Fixed, thanks. > > Added this to open items list. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Peter, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
I'm not sure about some of the details. I think it would make more sense to reset ApplyMessageContext in apply_dispatch(), so that it is done consistently after every message (as the name implies), not only some messages. Also, perhaps ApplyMessageContext should be a child of TopTransactionContext. (You have it as a child of ApplyContext, which is under TopMemoryContext.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5/5/17 03:09, Noah Misch wrote: > [Action required within three days. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 10 open item. Peter, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > v10 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within three calendar days of > this message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > well in advance of shipping v10. Consequently, I will appreciate your efforts > toward speedy resolution. Thanks. The discussion is still ongoing. I will focus on it this week and report back on Wednesday. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5/8/17 11:26, Peter Eisentraut wrote: > I think it would make more sense to reset ApplyMessageContext in > apply_dispatch(), so that it is done consistently after every message > (as the name implies), not only some messages. Committed that. > Also, perhaps ApplyMessageContext should be a child of > TopTransactionContext. (You have it as a child of ApplyContext, which > is under TopMemoryContext.) Left that as is. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 9 May 2017, at 21:53, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 5/8/17 11:26, Peter Eisentraut wrote: >> I think it would make more sense to reset ApplyMessageContext in >> apply_dispatch(), so that it is done consistently after every message >> (as the name implies), not only some messages. > > Committed that. > >> Also, perhaps ApplyMessageContext should be a child of >> TopTransactionContext. (You have it as a child of ApplyContext, which >> is under TopMemoryContext.) > > Left that as is. Thanks! Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company