Обсуждение: [BUGS] json(b)_array_elements use causes very large memory usage when alsoreferencing entire json document

Поиск
Список
Период
Сортировка

[BUGS] json(b)_array_elements use causes very large memory usage when alsoreferencing entire json document

От
Lucas Fairchild-Madar
Дата:
This might be related to or a duplicate of: BUG #14843: CREATE TABLE churns through all memory

PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit
Ubuntu 16.04.3 LTS

Given the following json structure in a jsonb column called "data":
{"id":"foo","items":[{"id":1},{"id":2}, ... {"id":100000}]}

SELECT jsonb_array_elements(data->'items') from table;
This query returns very quickly and behaves as expected.
 Planning time: 0.055 ms
 Execution time: 4.746 ms

SELECT data->>'id', jsonb_array_elements(data->'items') from table;
This query either takes over 8gb of memory, causing the OOM killer to terminate the database.

Reducing the number of items in the array to 10,000, we get:
 Planning time: 0.048 ms
 Execution time: 3706.880 ms

Here's the output of pg_stat_statements, if this is helpful.

query               | SELECT jsonb_array_elements(data->?) from kaboom;
rows                | 10000
shared_blks_hit     | 9

query               | SELECT data->>? as id, jsonb_array_elements(data->?) from kaboom;
rows                | 10000
shared_blks_hit     | 80017

This also works with json_ variant functions and also happens in postgresql 9.5.

Please let me know if I can provide any additional information. 

Thanks,
Lucas
Lucas Fairchild-Madar <lucas.madar@gmail.com> writes:
> This might be related to or a duplicate of: BUG #14843: CREATE TABLE churns
> through all memory

Yeah, sounds like it.  Can you check if the just-committed patch
fixes it?

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0c25e9652461c08b5caef259a6af27a38707e07a
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] json(b)_array_elements use causes very large memory usagewhen also referencing entire json document

От
Lucas Fairchild-Madar
Дата:
On Fri, Oct 6, 2017 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lucas Fairchild-Madar <lucas.madar@gmail.com> writes:
> This might be related to or a duplicate of: BUG #14843: CREATE TABLE churns
> through all memory

Yeah, sounds like it.  Can you check if the just-committed patch
fixes it?

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0c25e9652461c08b5caef259a6af27a38707e07a

 
I'm experiencing this on 9.6. Can't figure out where this patch applies there.

In the meantime, I'll try to reproduce on 10 and see what happens.
Lucas Fairchild-Madar <lucas.madar@gmail.com> writes:
> I'm experiencing this on 9.6.

Oh, hmm, it's something else in that case.  Can you put together a
more self-contained test case?  This sort of thing isn't usually
very data-dependent, so you could probably make a test case with
dummy data.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] json(b)_array_elements use causes very large memory usagewhen also referencing entire json document

От
Lucas Fairchild-Madar
Дата:
On Fri, Oct 6, 2017 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lucas Fairchild-Madar <lucas.madar@gmail.com> writes:
> I'm experiencing this on 9.6.

Oh, hmm, it's something else in that case.  Can you put together a
more self-contained test case?  This sort of thing isn't usually
very data-dependent, so you could probably make a test case with
dummy data.

                        regards, tom lane

Sure, here's a bash script. Assumes you have PGDATABASE, etc set.
Verified this also blows up on pg 10. I haven't verified the new patch.

#!/bin/bash

ITEMS=100000
FILE=kaboom.sql

echo -n "{\"id\":\"foo\",\"items\":[" > $FILE
for i in $(seq 1 $ITEMS); do
   echo -n "{\"item\":$i,\"data\":\"The quick brown cow jumps over the lazy sloth\"}," >> $FILE
done
echo "{\"dummy\":true}]}" >> $FILE

psql -e -c "drop table if exists kaboom; create table kaboom(data jsonb);"
psql -e -c "\\copy kaboom from kaboom.sql"
echo "this works"
psql -qAtc "select jsonb_array_elements(data->'items') from kaboom;" | wc
echo "prepare to swap"
psql -qAtc "select data->'id', jsonb_array_elements(data->'items') from kaboom;" | wc
Lucas Fairchild-Madar <lucas.madar@gmail.com> writes:
> Sure, here's a bash script. Assumes you have PGDATABASE, etc set.
> Verified this also blows up on pg 10. I haven't verified the new patch.

Oh, I see the problem.  This is a different animal, because it's actually
an intra-row leak, as it were.  What you've got is

select data->'id', jsonb_array_elements(data->'items') from kaboom;

where the SRF jsonb_array_elements() emits a lot of values.  For each
of those values, data->'id' gets evaluated over again, and we can't
reclaim memory in the per-tuple context until we've finished the whole
cycle for the current row of "kaboom".  So a leak would occur in any
case ... but it's particularly awful in this case, because data->'id'
involves detoasting the rather wide value of "data", which is then
promptly leaked.  So the total memory consumption is more or less
proportional to O(N^2) in the length of "data".

This has been like this since forever, and it's probably impractical
to do anything about it pre-v10, given the unstructured way that
targetlist SRFs are handled.  You could dodge the problem by moving
the SRF to a lateral FROM item:

select data->'id', ja
from kaboom, lateral jsonb_array_elements(data->'items') as ja;

(The LATERAL keyword is optional here, but I like it because it
makes it clearer what's happening.)

As of v10, it might be possible to fix this for the tlist case
as well, by doing something like using a separate short-lived
context for the non-SRF tlist items.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] json(b)_array_elements use causes very large memory usagewhen also referencing entire json document

От
Lucas Fairchild-Madar
Дата:
On Fri, Oct 6, 2017 at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lucas Fairchild-Madar <lucas.madar@gmail.com> writes:
> Sure, here's a bash script. Assumes you have PGDATABASE, etc set.
> Verified this also blows up on pg 10. I haven't verified the new patch.

Oh, I see the problem.  This is a different animal, because it's actually
an intra-row leak, as it were.  What you've got is

select data->'id', jsonb_array_elements(data->'items') from kaboom;

where the SRF jsonb_array_elements() emits a lot of values.  For each
of those values, data->'id' gets evaluated over again, and we can't
reclaim memory in the per-tuple context until we've finished the whole
cycle for the current row of "kaboom".  So a leak would occur in any
case ... but it's particularly awful in this case, because data->'id'
involves detoasting the rather wide value of "data", which is then
promptly leaked.  So the total memory consumption is more or less
proportional to O(N^2) in the length of "data".

This has been like this since forever, and it's probably impractical
to do anything about it pre-v10, given the unstructured way that
targetlist SRFs are handled.  You could dodge the problem by moving
the SRF to a lateral FROM item:

select data->'id', ja
from kaboom, lateral jsonb_array_elements(data->'items') as ja;

(The LATERAL keyword is optional here, but I like it because it
makes it clearer what's happening.)

As of v10, it might be possible to fix this for the tlist case
as well, by doing something like using a separate short-lived
context for the non-SRF tlist items.

 
Is there any sort of setting right now that can defend against this? A way to prevent a query from using 20+GB of memory? I'd prefer the query fail before the database system is kill -9'd.
Lucas Fairchild-Madar <lucas.madar@gmail.com> writes:
> Is there any sort of setting right now that can defend against this? A way
> to prevent a query from using 20+GB of memory? I'd prefer the query fail
> before the database system is kill -9'd.

You could experiment with running the postmaster under a ulimit setting,
but the more traditional recommendation is to disable OOM kill, cf

https://www.postgresql.org/docs/current/static/kernel-resources.html#linux-memory-overcommit
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 2017-10-06 15:37:03 -0400, Tom Lane wrote:
> So a leak would occur in any case ... but it's particularly awful in
> this case, because data->'id' involves detoasting the rather wide
> value of "data", which is then promptly leaked.

While not tackling the problem in a systematic manner, it seems like
it'd be a good idea to free the detoasted column in this and related
functions?

> As of v10, it might be possible to fix this for the tlist case
> as well, by doing something like using a separate short-lived
> context for the non-SRF tlist items.

Yea, that should be quite doable, slightly annoying to need more than
one expr context, but that seems unavoidable.  Should even be doable for
SFRM_ValuePerCall?  SFRM_Materialize isn't a "central" problem, given it
properly manages memory for the tuplestore and filling the tuplestore is
an internal matter.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund <andres@anarazel.de> writes:
> On 2017-10-06 15:37:03 -0400, Tom Lane wrote:
>> So a leak would occur in any case ... but it's particularly awful in
>> this case, because data->'id' involves detoasting the rather wide
>> value of "data", which is then promptly leaked.

> While not tackling the problem in a systematic manner, it seems like
> it'd be a good idea to free the detoasted column in this and related
> functions?

I think that's basically a dead-end solution.  It amounts to saying
that no functions can leak memory, which is not a place we're ever
likely to get to, especially given that there's been no policy
favoring that in the past (with the narrow exception of btree
comparison functions).

>> As of v10, it might be possible to fix this for the tlist case
>> as well, by doing something like using a separate short-lived
>> context for the non-SRF tlist items.

> Yea, that should be quite doable, slightly annoying to need more than
> one expr context, but that seems unavoidable.  Should even be doable for
> SFRM_ValuePerCall?  SFRM_Materialize isn't a "central" problem, given it
> properly manages memory for the tuplestore and filling the tuplestore is
> an internal matter.

I'm not really convinced whether there's an issue for ValuePerCall SRFs.
We've not had complaints about them, and I would think we would if that
feature alone would create an issue.  But I've not thought about it hard,
nor tested.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund <andres@anarazel.de> writes:
> On 2017-10-06 15:37:03 -0400, Tom Lane wrote:
>> As of v10, it might be possible to fix this for the tlist case
>> as well, by doing something like using a separate short-lived
>> context for the non-SRF tlist items.

> Yea, that should be quite doable, slightly annoying to need more than
> one expr context, but that seems unavoidable.

BTW, another idea that occurred to me is to change things so we only
evaluate scalar tlist items once per SRF cycle, and just re-use their
old values for additional SRF output rows.  However, to preserve current
semantics we'd have to distinguish volatile vs. non-volatile tlist items,
and be sure to do the former over again for each SRF output row.  So
I'm not really excited about that, because of the amount of complexity
it would add.  This is messy enough without having two code paths for
the scalar items ... and we'd still have the memory leak issue in the
volatile case.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 2017-10-06 20:28:21 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-10-06 15:37:03 -0400, Tom Lane wrote:
> >> So a leak would occur in any case ... but it's particularly awful in
> >> this case, because data->'id' involves detoasting the rather wide
> >> value of "data", which is then promptly leaked.
> 
> > While not tackling the problem in a systematic manner, it seems like
> > it'd be a good idea to free the detoasted column in this and related
> > functions?
> 
> I think that's basically a dead-end solution.  It amounts to saying
> that no functions can leak memory, which is not a place we're ever
> likely to get to, especially given that there's been no policy
> favoring that in the past (with the narrow exception of btree
> comparison functions).

Hm. We've a bunch of places where we free detoasted data, just out of
efficiency concerns. And since the jsonb functions are quite likely to
detoast a lot, it doesn't seem unreasonable to do so for the most likely
offenders. I mean if you've a bit more complex expression involving a
few fields accessed, freeing in the accesses will reduce maximum memory
usage by quite a bit. I'm not suggesting to work towards leak free, just
towards reducing the lifetime of a few potentially large allocations.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund <andres@anarazel.de> writes:
> Hm. We've a bunch of places where we free detoasted data, just out of
> efficiency concerns. And since the jsonb functions are quite likely to
> detoast a lot, it doesn't seem unreasonable to do so for the most likely
> offenders. I mean if you've a bit more complex expression involving a
> few fields accessed, freeing in the accesses will reduce maximum memory
> usage by quite a bit. I'm not suggesting to work towards leak free, just
> towards reducing the lifetime of a few potentially large allocations.

Dunno, for the common case of not-so-large values, this would just be
a net loss.  pfree'ing a value we can afford to ignore till the next
per-tuple context reset is not a win.

Maybe we need some kind of PG_FREE_IF_LARGE_COPY macro?  Where it
would kick in for values over 8K or thereabouts?

(You might argue that any value that got detoasted at all would be
large enough to be worth worrying about; but I think that falls down
because we folded short-header unshorting into the detoast mechanism.)
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 2017-10-06 20:44:20 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Hm. We've a bunch of places where we free detoasted data, just out of
> > efficiency concerns. And since the jsonb functions are quite likely to
> > detoast a lot, it doesn't seem unreasonable to do so for the most likely
> > offenders. I mean if you've a bit more complex expression involving a
> > few fields accessed, freeing in the accesses will reduce maximum memory
> > usage by quite a bit. I'm not suggesting to work towards leak free, just
> > towards reducing the lifetime of a few potentially large allocations.
> 
> Dunno, for the common case of not-so-large values, this would just be
> a net loss.  pfree'ing a value we can afford to ignore till the next
> per-tuple context reset is not a win.
> 
> Maybe we need some kind of PG_FREE_IF_LARGE_COPY macro?  Where it
> would kick in for values over 8K or thereabouts?

Yea, that might be worthwhile.


> (You might argue that any value that got detoasted at all would be
> large enough to be worth worrying about; but I think that falls down
> because we folded short-header unshorting into the detoast mechanism.)

Hm. Unrelated to the topic at hand, but I wonder how much it'd take to
make some of the hot-path jsonb functionality tolerant of unaligned
datums.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund <andres@anarazel.de> writes:
> Hm. Unrelated to the topic at hand, but I wonder how much it'd take to
> make some of the hot-path jsonb functionality tolerant of unaligned
> datums.

Probably pretty hard given the list-of-pointers data structure inside.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 2017-10-06 20:51:45 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Hm. Unrelated to the topic at hand, but I wonder how much it'd take to
> > make some of the hot-path jsonb functionality tolerant of unaligned
> > datums.
> 
> Probably pretty hard given the list-of-pointers data structure inside.

Hm, it'd be a bunch of memcpy's added, that's true. But even just doing
something like that to a variant of findJsonbValueFromContainer() might
be quite rewarding.

I'm kinda wondering about treating short datums differently on platforms
that don't care much about alignment, like modern x86. It'd not take
many code changes to only do alignment copying of short datums on
platforms that care...

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 2017-10-06 20:28:21 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Yea, that should be quite doable, slightly annoying to need more than
> > one expr context, but that seems unavoidable.  Should even be doable for
> > SFRM_ValuePerCall?  SFRM_Materialize isn't a "central" problem, given it
> > properly manages memory for the tuplestore and filling the tuplestore is
> > an internal matter.
> 
> I'm not really convinced whether there's an issue for ValuePerCall SRFs.
> We've not had complaints about them, and I would think we would if that
> feature alone would create an issue.  But I've not thought about it hard,
> nor tested.

I've just played around with this. ValuePerCall SRFs are fine with
called in a short-lived context (they're required to be able to, as
documented in xfunc.sgml), so is SFRM_Materialize. The only thing to be
careful about is the *arguments* to the function, those need to live
long enough in the ValuePerCall case.

I'm not entirely sure what the best way to code the memory allocation
is. I've just prototyped this by switching to
econtext->ecxt_per_query_memory around ExecEvalFuncArgs, and resetting
econtext->ecxt_per_tuple_memory before every ExecProjectSRF call.  But
obviously we don't want to leak the arguments this way.  Easiest seems
to be to just pass in a separate econtext for the arguments, and reset
that explicitly.  Unless you've a better suggestion, I'll clean up my
hack and propose it.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund <andres@anarazel.de> writes:
> I've just played around with this. ValuePerCall SRFs are fine with
> called in a short-lived context (they're required to be able to, as
> documented in xfunc.sgml), so is SFRM_Materialize. The only thing to be
> careful about is the *arguments* to the function, those need to live
> long enough in the ValuePerCall case.

Isn't there already code to deal with that?  See around line 175
in execSRF.c.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 2017-10-06 22:35:37 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I've just played around with this. ValuePerCall SRFs are fine with
> > called in a short-lived context (they're required to be able to, as
> > documented in xfunc.sgml), so is SFRM_Materialize. The only thing to be
> > careful about is the *arguments* to the function, those need to live
> > long enough in the ValuePerCall case.
> 
> Isn't there already code to deal with that?  See around line 175
> in execSRF.c.

Well, that's for nodeFunctionscan.c, not nodeProjectSet.c. But it seems
quite sensible to model this very similarly.

I'd still like to unify those two functions, but given that
ExecMakeTableFunctionResult materializes ValuePerCall SRFs, that doesn't
seem likely.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 2017-10-06 19:42:05 -0700, Andres Freund wrote:
> On 2017-10-06 22:35:37 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > I've just played around with this. ValuePerCall SRFs are fine with
> > > called in a short-lived context (they're required to be able to, as
> > > documented in xfunc.sgml), so is SFRM_Materialize. The only thing to be
> > > careful about is the *arguments* to the function, those need to live
> > > long enough in the ValuePerCall case.
> >
> > Isn't there already code to deal with that?  See around line 175
> > in execSRF.c.
>
> Well, that's for nodeFunctionscan.c, not nodeProjectSet.c. But it seems
> quite sensible to model this very similarly.

Patch attached. The business with having to switch the memory context
tuplestore_gettupleslot() ain't pretty, but imo is tolerable.

I'm kinda tempted to put this, after a bit more testing, into v10.


After this Lucas' testcase doesn't leak memory anymore - it's still slow
in execution, decompressing the same datum over and over. But that feels
like something that should be fixed separately.

- Andres

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Вложения
On 2017-10-06 21:08:43 -0700, Andres Freund wrote:
> I'm kinda tempted to put this, after a bit more testing, into v10.

Any opinions on that?

- Andres


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund <andres@anarazel.de> writes:
> On 2017-10-06 21:08:43 -0700, Andres Freund wrote:
>> I'm kinda tempted to put this, after a bit more testing, into v10.

> Any opinions on that?

I think it's fine to put it in HEAD.  I would not risk back-patching
into v10.  Now that we fixed the new leak, v10 is on par with the last
umpteen years worth of tSRF behavior, and we've not had very many
complaints about that.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 2017-10-08 16:27:51 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-10-06 21:08:43 -0700, Andres Freund wrote:
> >> I'm kinda tempted to put this, after a bit more testing, into v10.
> 
> > Any opinions on that?
> 
> I think it's fine to put it in HEAD.  I would not risk back-patching
> into v10.  Now that we fixed the new leak, v10 is on par with the last
> umpteen years worth of tSRF behavior, and we've not had very many
> complaints about that.

WFM. Was only thinking about it, because it'd give the OP a chance to
upgrade to a supported release and get rid of the behaviour. But given
he's essentially hitting O(n^2) runtime even aftre that, just rewriting
the query is the saner option anyway...

Will make it so.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs