Обсуждение: [HACKERS] Logical replication ApplyContext bloat

Поиск
Список
Период
Сортировка

[HACKERS] Logical replication ApplyContext bloat

От
Stas Kelvich
Дата:
Hi,

currently logical replication worker uses ApplyContext to decode received data
and that context is never freed during transaction processing. Hence if publication
side is performing something like 10M row inserts in single transaction, then
subscription worker will occupy more than 10G of ram (and can be killed by OOM).

Attached patch resets ApplyContext after each insert/update/delete/commit.
I’ve tried to reset context only on each 100/1000/10000 value of CommandCounter,
but didn’t spotted any measurable difference in speed.

Problem spotted by Mikhail Shurutov.



Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Logical replication ApplyContext bloat

От
Petr Jelinek
Дата:
On 18/04/17 13:45, Stas Kelvich wrote:
> Hi,
> 
> currently logical replication worker uses ApplyContext to decode received data
> and that context is never freed during transaction processing. Hence if publication
> side is performing something like 10M row inserts in single transaction, then
> subscription worker will occupy more than 10G of ram (and can be killed by OOM).
> 
> Attached patch resets ApplyContext after each insert/update/delete/commit.
> I’ve tried to reset context only on each 100/1000/10000 value of CommandCounter,
> but didn’t spotted any measurable difference in speed.
> 
> Problem spotted by Mikhail Shurutov.
> 

Hmm we also do replication protocol handling inside the ApplyContext
(LogicalRepApplyLoop), are you sure this change is safe in that regard?

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical replication ApplyContext bloat

От
Stas Kelvich
Дата:
> On 19 Apr 2017, at 12:37, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>
> On 18/04/17 13:45, Stas Kelvich wrote:
>> Hi,
>>
>> currently logical replication worker uses ApplyContext to decode received data
>> and that context is never freed during transaction processing. Hence if publication
>> side is performing something like 10M row inserts in single transaction, then
>> subscription worker will occupy more than 10G of ram (and can be killed by OOM).
>>
>> Attached patch resets ApplyContext after each insert/update/delete/commit.
>> I’ve tried to reset context only on each 100/1000/10000 value of CommandCounter,
>> but didn’t spotted any measurable difference in speed.
>>
>> Problem spotted by Mikhail Shurutov.
>>
>
> Hmm we also do replication protocol handling inside the ApplyContext
> (LogicalRepApplyLoop), are you sure this change is safe in that regard?

Memory is bloated by logicalrep_read_* when palloc happens inside.
I’ve inserted context reset in ensure_transaction() which is only called in beginning
of hande_insert/update/delete/commit when data from previous call of that
function isn’t needed. So probably it is safe to clear memory there. At least
i don’t see any state that can be accessed later in this functions. Do you
have any specific concerns?


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [HACKERS] Logical replication ApplyContext bloat

От
Petr Jelinek
Дата:
On 19/04/17 11:55, Stas Kelvich wrote:
> 
>> On 19 Apr 2017, at 12:37, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>>
>> On 18/04/17 13:45, Stas Kelvich wrote:
>>> Hi,
>>>
>>> currently logical replication worker uses ApplyContext to decode received data
>>> and that context is never freed during transaction processing. Hence if publication
>>> side is performing something like 10M row inserts in single transaction, then
>>> subscription worker will occupy more than 10G of ram (and can be killed by OOM).
>>>
>>> Attached patch resets ApplyContext after each insert/update/delete/commit.
>>> I’ve tried to reset context only on each 100/1000/10000 value of CommandCounter,
>>> but didn’t spotted any measurable difference in speed.
>>>
>>> Problem spotted by Mikhail Shurutov.
>>>
>>
>> Hmm we also do replication protocol handling inside the ApplyContext
>> (LogicalRepApplyLoop), are you sure this change is safe in that regard?
> 
> Memory is bloated by logicalrep_read_* when palloc happens inside.
> I’ve inserted context reset in ensure_transaction() which is only called in beginning
> of hande_insert/update/delete/commit when data from previous call of that
> function isn’t needed. So probably it is safe to clear memory there. At least
> i don’t see any state that can be accessed later in this functions. Do you
> have any specific concerns?
> 

I wanted to make sure you checked things that are happening outside of
the apply_handle_* stuff. I checked myself now, the allocations that
happen outside don't use postgres memory contexts (libpqrcv_receive) so
that should not be problem. Other allocations that I see in neighboring
code switch to permanent contexts anyway. So looks safe on first look
indeed. Eventually we'll want to cache some of the executor stuff so
this will be problem but not in v10.

I'd still like you to add comment to the ApplyContext saying that it's
cleaned after handling each message so nothing that needs to survive for
example whole transaction can be allocated in it as that's not very well
visible IMHO (and I know I will forget about it when writing next patch
in that area).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical replication ApplyContext bloat

От
Simon Riggs
Дата:
On 19 April 2017 at 11:24, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

> I'd still like you to add comment to the ApplyContext saying that it's
> cleaned after handling each message so nothing that needs to survive for
> example whole transaction can be allocated in it as that's not very well
> visible IMHO (and I know I will forget about it when writing next patch
> in that area).

It would be better to invent the contexts we want now, so we get the
architecture right for future work. Otherwise we have problems with
"can't backpatch this fix because that infrastructure doesn't exist in
PG10" in a couple of years.

So I suggest we have

ApplyMessageContext - cleaned after every message
ApplyTransactionContext - cleaned at EOXact
ApplyContext - persistent

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical replication ApplyContext bloat

От
Stas Kelvich
Дата:
> On 19 Apr 2017, at 13:34, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On 19 April 2017 at 11:24, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>
>> I'd still like you to add comment to the ApplyContext saying that it's
>> cleaned after handling each message so nothing that needs to survive for
>> example whole transaction can be allocated in it as that's not very well
>> visible IMHO (and I know I will forget about it when writing next patch
>> in that area).
>
> It would be better to invent the contexts we want now, so we get the
> architecture right for future work. Otherwise we have problems with
> "can't backpatch this fix because that infrastructure doesn't exist in
> PG10" in a couple of years.
>
> So I suggest we have
>
> ApplyMessageContext - cleaned after every message
> ApplyTransactionContext - cleaned at EOXact
> ApplyContext - persistent

Right now ApplyContext cleaned after each transaction and by this patch I basically
suggest to clean it after each command counter increment.

So in your definitions that is ApplyMessageContext, most short-lived one. We can
rename now ApplyContext -> ApplyMessageContext to make that clear and avoid
possible name conflict when somebody will decide to keep something during whole
transaction or worker lifespan.

P.S. It also seems to me now that resetting context in ensure_transaction() isn’t that
good idea, probably better just explicitly call reset at the end of each function involved.

>
> --
> Simon Riggs                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [HACKERS] Logical replication ApplyContext bloat

От
Petr Jelinek
Дата:
On 19/04/17 12:46, Stas Kelvich wrote:
> 
>> On 19 Apr 2017, at 13:34, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>> On 19 April 2017 at 11:24, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>>
>>> I'd still like you to add comment to the ApplyContext saying that it's
>>> cleaned after handling each message so nothing that needs to survive for
>>> example whole transaction can be allocated in it as that's not very well
>>> visible IMHO (and I know I will forget about it when writing next patch
>>> in that area).
>>
>> It would be better to invent the contexts we want now, so we get the
>> architecture right for future work. Otherwise we have problems with
>> "can't backpatch this fix because that infrastructure doesn't exist in
>> PG10" in a couple of years.
>>
>> So I suggest we have
>>
>> ApplyMessageContext - cleaned after every message
>> ApplyTransactionContext - cleaned at EOXact
>> ApplyContext - persistent
> 
> Right now ApplyContext cleaned after each transaction and by this patch I basically 
> suggest to clean it after each command counter increment. 
> 
> So in your definitions that is ApplyMessageContext, most short-lived one. We can
> rename now ApplyContext -> ApplyMessageContext to make that clear and avoid
> possible name conflict when somebody will decide to keep something during whole
> transaction or worker lifespan.

Yeah we can rename, and then rename the ApplyCacheContext to
ApplyContext. That would leave the ApplyTransactionContext from Simon's
proposal which is not really need it anywhere right now and I am not
sure it will be needed given there is still TopTransactionContext
present. If we really want ApplyTransactionContext we can probably just
assign it to TopTransactionContext in ensure_transaction.

> 
> P.S. It also seems to me now that resetting context in ensure_transaction() isn’t that
> good idea, probably better just explicitly call reset at the end of each function involved.
> 

Well it would definitely improve clarity.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical replication ApplyContext bloat

От
Stas Kelvich
Дата:
> On 19 Apr 2017, at 14:30, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>
> On 19/04/17 12:46, Stas Kelvich wrote:
>>
>> Right now ApplyContext cleaned after each transaction and by this patch I basically
>> suggest to clean it after each command counter increment.
>>
>> So in your definitions that is ApplyMessageContext, most short-lived one. We can
>> rename now ApplyContext -> ApplyMessageContext to make that clear and avoid
>> possible name conflict when somebody will decide to keep something during whole
>> transaction or worker lifespan.
>
> Yeah we can rename, and then rename the ApplyCacheContext to
> ApplyContext. That would leave the ApplyTransactionContext from Simon's
> proposal which is not really need it anywhere right now and I am not
> sure it will be needed given there is still TopTransactionContext
> present. If we really want ApplyTransactionContext we can probably just
> assign it to TopTransactionContext in ensure_transaction.
>
>>
>> P.S. It also seems to me now that resetting context in ensure_transaction() isn’t that
>> good idea, probably better just explicitly call reset at the end of each function involved.
>>
>
> Well it would definitely improve clarity.
>

Okay, done.

With patch MemoryContextStats() shows following hierarchy during slot operations in
apply worker:

TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used
  ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used
    ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 used
      ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used



Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Logical replication ApplyContext bloat

От
Alvaro Herrera
Дата:
Stas Kelvich wrote:

> With patch MemoryContextStats() shows following hierarchy during slot operations in 
> apply worker:
> 
> TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used
>   ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used
>     ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 used
>       ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
>         ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used

Please update src/backend/utils/mmgr/README to list the new contexts.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical replication ApplyContext bloat

От
Stas Kelvich
Дата:
> On 19 Apr 2017, at 16:07, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Stas Kelvich wrote:
>
>> With patch MemoryContextStats() shows following hierarchy during slot operations in
>> apply worker:
>>
>> TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used
>>  ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used
>>    ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 used
>>      ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
>>        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
>
> Please update src/backend/utils/mmgr/README to list the new contexts.


Thanks for noting.

Added short description of ApplyContext and ApplyMessageContext to README.





Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Logical replication ApplyContext bloat

От
Dilip Kumar
Дата:
On Thu, Apr 20, 2017 at 7:04 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
> Thanks for noting.
>
> Added short description of ApplyContext and ApplyMessageContext to README.


+ApplyContext --- permanent during whole lifetime of apply worker. It is
+possible to use TopMemoryContext here as well, but for simplicity of
memory usage
+analysys we spin up different context.


Typo

/analysys/analysis


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Logical replication ApplyContext bloat

От
Stas Kelvich
Дата:
> On 20 Apr 2017, at 17:01, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Apr 20, 2017 at 7:04 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
>> Thanks for noting.
>>
>> Added short description of ApplyContext and ApplyMessageContext to README.
>
> Typo
>
> /analysys/analysis
>

Fixed, thanks.

Added this to open items list.



Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Logical replication ApplyContext bloat

От
Noah Misch
Дата:
On Wed, May 03, 2017 at 12:02:41PM +0300, Stas Kelvich wrote:
> > On 20 Apr 2017, at 17:01, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > On Thu, Apr 20, 2017 at 7:04 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
> >> Thanks for noting.
> >> 
> >> Added short description of ApplyContext and ApplyMessageContext to README.
> > 
> > Typo
> > 
> > /analysys/analysis
> > 
> 
> Fixed, thanks.
> 
> Added this to open items list.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] Logical replication ApplyContext bloat

От
Peter Eisentraut
Дата:
I'm not sure about some of the details.

I think it would make more sense to reset ApplyMessageContext in
apply_dispatch(), so that it is done consistently after every message
(as the name implies), not only some messages.

Also, perhaps ApplyMessageContext should be a child of
TopTransactionContext.  (You have it as a child of ApplyContext, which
is under TopMemoryContext.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical replication ApplyContext bloat

От
Peter Eisentraut
Дата:
On 5/5/17 03:09, Noah Misch wrote:
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.

The discussion is still ongoing.  I will focus on it this week and
report back on Wednesday.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical replication ApplyContext bloat

От
Peter Eisentraut
Дата:
On 5/8/17 11:26, Peter Eisentraut wrote:
> I think it would make more sense to reset ApplyMessageContext in
> apply_dispatch(), so that it is done consistently after every message
> (as the name implies), not only some messages.

Committed that.

> Also, perhaps ApplyMessageContext should be a child of
> TopTransactionContext.  (You have it as a child of ApplyContext, which
> is under TopMemoryContext.)

Left that as is.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical replication ApplyContext bloat

От
Stas Kelvich
Дата:
> On 9 May 2017, at 21:53, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>
> On 5/8/17 11:26, Peter Eisentraut wrote:
>> I think it would make more sense to reset ApplyMessageContext in
>> apply_dispatch(), so that it is done consistently after every message
>> (as the name implies), not only some messages.
>
> Committed that.
>
>> Also, perhaps ApplyMessageContext should be a child of
>> TopTransactionContext.  (You have it as a child of ApplyContext, which
>> is under TopMemoryContext.)
>
> Left that as is.

Thanks!

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company