Обсуждение: Bug? Query plans / EXPLAIN using gigabytes of memory

Поиск
Список
Период
Сортировка

Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
Hi,
I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit Debian system.

I have a database which is moderately large - 20 GByte or so - and contains that data split up over dozens of tables,
whichare themselves partitioned. 
Queries are usually only run against fairly small, partitioned, sets of data.

These queries generally run fairly fast. Performance is not a problem.

However Postgres is chewing up huge amounts of memory just to create the query plan!

For example, even if I just run
EXPLAIN SELECT a_column FROM a_view
WHERE partition_id = 1;

Then the postgres backend process takes several seconds to return, and in the worst example here, is hogging more than
3Gbyteonce it comes back. (It doesn't free that up until you close the connection) 

The query plan that comes back does seem quite convoluted, but then, the view is a query run over about eight other
views,each of which is pulling data from a few other views. The actual underlying data being touched is only *a few
dozen*small rows. 

As I said, the query runs fast enough.. however we only need a handful of these queries to get run in separate
connections,and the database server will be exhausted of memory. Especially since the memory isn't returned until the
endof the connection, yet these connections typically stay up for a while. 

I wondered if there's anything I can do to reduce this memory usage? And, is this a bug?

I've posted the output of the query plan here: https://gist.github.com/2487097


Thanks in advance,
Toby

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Willy-Bas Loos
Дата:
Stacking views is a bad practice. It usually means that you are making the db do a lot of unnecessary work, scanning tables more than once when you don't even need them.
According to your description, you have 3 layers of views on partitioned tables.
I can imagine that that leaves the planner with a lot of possible query plans, a lot of interaction and a lot of statistics to read.

do you have any special settings for the statistics on these tables?
and could you please post the non-default settings in your postgresql.conf file?
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]

Would be helpful to see if you have any statistics or planner stuff altered.

Cheers,

WBL


On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale <toby.corkindale@strategicdata.com.au> wrote:
Hi,
I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit Debian system.

I have a database which is moderately large - 20 GByte or so - and contains that data split up over dozens of tables, which are themselves partitioned.
Queries are usually only run against fairly small, partitioned, sets of data.

These queries generally run fairly fast. Performance is not a problem.

However Postgres is chewing up huge amounts of memory just to create the query plan!

For example, even if I just run
EXPLAIN SELECT a_column FROM a_view
WHERE partition_id = 1;

Then the postgres backend process takes several seconds to return, and in the worst example here, is hogging more than 3Gbyte once it comes back. (It doesn't free that up until you close the connection)

The query plan that comes back does seem quite convoluted, but then, the view is a query run over about eight other views, each of which is pulling data from a few other views. The actual underlying data being touched is only *a few dozen* small rows.

As I said, the query runs fast enough.. however we only need a handful of these queries to get run in separate connections, and the database server will be exhausted of memory. Especially since the memory isn't returned until the end of the connection, yet these connections typically stay up for a while.

I wondered if there's anything I can do to reduce this memory usage? And, is this a bug?

I've posted the output of the query plan here: https://gist.github.com/2487097


Thanks in advance,
Toby

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
Hi Willy-Bas,
Thanks for your reply.

I realise that stacking the views up like this complicates matters, but the actual views are fairly simple queries, and
eachone individually is only looking at a few dozen rows. (Eg. selecting min, max or average value from a small set,
groupedby one column) 
From the point of view of creating reporting queries, it's a nice and logical way to build up a query, and we didn't
thinkit would present any problems.. and even on a well-populated database, the query runs very fast. It's just the
astoundingamount of memory used that presents difficulties. 

Looking at the postgresql.conf for non-default settings, the notable ones are:

max_connections = 200
ssl = false
shared_buffers = 256MB
max_prepared_transactions = 16
# although they aren't used for the group of queries in question
maintenance_work_mem = 128MB
# work_mem is left at default of 1MB
effective_io_concurrency = 2
random_page_cost = 3.0
effective_cache_size = 512MB
geqo = on
geqo_threshold = 12
geqo_effort = 7

Some other things are non-default, like checkpoints, streaming-replication stuff, but those shouldn't have any effect.

The memory settings (shared buffers, effective cache) might seem to be set quite conservatively at the moment, given
thememory available in the machine -- but since we can exhaust that memory with just a few connections, it seems fair. 

Cheers,
Toby

----- Original Message -----
From: "Willy-Bas Loos" <willybas@gmail.com>
To: "Toby Corkindale" <toby.corkindale@strategicdata.com.au>
Cc: "pgsql-general" <pgsql-general@postgresql.org>
Sent: Wednesday, 25 April, 2012 6:05:37 PM
Subject: Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory


Stacking views is a bad practice. It usually means that you are making the db do a lot of unnecessary work, scanning
tablesmore than once when you don't even need them.  
According to your description, you have 3 layers of views on partitioned tables.
I can imagine that that leaves the planner with a lot of possible query plans, a lot of interaction and a lot of
statisticsto read.  

do you have any special settings for the statistics on these tables?
and could you please post the non-default settings in your postgresql.conf file?
$ grep ^[^#] /etc/ postgresql /9.1/main/ postgresql . conf | grep -e ^[^[:space:]]

Would be helpful to see if you have any statistics or planner stuff altered.

Cheers,

WBL



On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale < toby.corkindale@strategicdata.com.au > wrote:


Hi,
I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit Debian system.

I have a database which is moderately large - 20 GByte or so - and contains that data split up over dozens of tables,
whichare themselves partitioned.  
Queries are usually only run against fairly small, partitioned, sets of data.

These queries generally run fairly fast. Performance is not a problem.

However Postgres is chewing up huge amounts of memory just to create the query plan!

For example, even if I just run
EXPLAIN SELECT a_column FROM a_view
WHERE partition_id = 1;

Then the postgres backend process takes several seconds to return, and in the worst example here, is hogging more than
3Gbyteonce it comes back. (It doesn't free that up until you close the connection)  

The query plan that comes back does seem quite convoluted, but then, the view is a query run over about eight other
views,each of which is pulling data from a few other views. The actual underlying data being touched is only *a few
dozen*small rows.  

As I said, the query runs fast enough.. however we only need a handful of these queries to get run in separate
connections,and the database server will be exhausted of memory. Especially since the memory isn't returned until the
endof the connection, yet these connections typically stay up for a while.  

I wondered if there's anything I can do to reduce this memory usage? And, is this a bug?

I've posted the output of the query plan here: https://gist.github.com/2487097


Thanks in advance,
Toby

--
Sent via pgsql-general mailing list ( pgsql-general@postgresql.org )
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth


Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Willy-Bas Loos
Дата:
would it be possible to reproduce the same query without using any views?
you could see the difference in memory usage.

if that doesn't explain, try also without inheritance, by using the ONLY keyword (and UNION ALL).

If it's really only a couple of rows, you might as well post a dump somewhere? Then i could reproduce.

WBL


On Wed, Apr 25, 2012 at 10:51 AM, Toby Corkindale <toby.corkindale@strategicdata.com.au> wrote:
Hi Willy-Bas,
Thanks for your reply.

I realise that stacking the views up like this complicates matters, but the actual views are fairly simple queries, and each one individually is only looking at a few dozen rows. (Eg. selecting min, max or average value from a small set, grouped by one column)
From the point of view of creating reporting queries, it's a nice and logical way to build up a query, and we didn't think it would present any problems.. and even on a well-populated database, the query runs very fast. It's just the astounding amount of memory used that presents difficulties.

Looking at the postgresql.conf for non-default settings, the notable ones are:

max_connections = 200
ssl = false
shared_buffers = 256MB
max_prepared_transactions = 16
# although they aren't used for the group of queries in question
maintenance_work_mem = 128MB
# work_mem is left at default of 1MB
effective_io_concurrency = 2
random_page_cost = 3.0
effective_cache_size = 512MB
geqo = on
geqo_threshold = 12
geqo_effort = 7

Some other things are non-default, like checkpoints, streaming-replication stuff, but those shouldn't have any effect.

The memory settings (shared buffers, effective cache) might seem to be set quite conservatively at the moment, given the memory available in the machine -- but since we can exhaust that memory with just a few connections, it seems fair.

Cheers,
Toby

----- Original Message -----
From: "Willy-Bas Loos" <willybas@gmail.com>
To: "Toby Corkindale" <toby.corkindale@strategicdata.com.au>
Cc: "pgsql-general" <pgsql-general@postgresql.org>
Sent: Wednesday, 25 April, 2012 6:05:37 PM
Subject: Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory


Stacking views is a bad practice. It usually means that you are making the db do a lot of unnecessary work, scanning tables more than once when you don't even need them.
According to your description, you have 3 layers of views on partitioned tables.
I can imagine that that leaves the planner with a lot of possible query plans, a lot of interaction and a lot of statistics to read.

do you have any special settings for the statistics on these tables?
and could you please post the non-default settings in your postgresql.conf file?
$ grep ^[^#] /etc/ postgresql /9.1/main/ postgresql . conf | grep -e ^[^[:space:]]

Would be helpful to see if you have any statistics or planner stuff altered.

Cheers,

WBL



On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale < toby.corkindale@strategicdata.com.au > wrote:


Hi,
I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit Debian system.

I have a database which is moderately large - 20 GByte or so - and contains that data split up over dozens of tables, which are themselves partitioned.
Queries are usually only run against fairly small, partitioned, sets of data.

These queries generally run fairly fast. Performance is not a problem.

However Postgres is chewing up huge amounts of memory just to create the query plan!

For example, even if I just run
EXPLAIN SELECT a_column FROM a_view
WHERE partition_id = 1;

Then the postgres backend process takes several seconds to return, and in the worst example here, is hogging more than 3Gbyte once it comes back. (It doesn't free that up until you close the connection)

The query plan that comes back does seem quite convoluted, but then, the view is a query run over about eight other views, each of which is pulling data from a few other views. The actual underlying data being touched is only *a few dozen* small rows.

As I said, the query runs fast enough.. however we only need a handful of these queries to get run in separate connections, and the database server will be exhausted of memory. Especially since the memory isn't returned until the end of the connection, yet these connections typically stay up for a while.

I wondered if there's anything I can do to reduce this memory usage? And, is this a bug?

I've posted the output of the query plan here: https://gist.github.com/2487097


Thanks in advance,
Toby

--
Sent via pgsql-general mailing list ( pgsql-general@postgresql.org )
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth




--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
----- Original Message -----
> From: "Willy-Bas Loos" <willybas@gmail.com>
> To: "Toby Corkindale" <toby.corkindale@strategicdata.com.au>
> Cc: "pgsql-general" <pgsql-general@postgresql.org>
> Sent: Wednesday, 25 April, 2012 7:16:50 PM
> Subject: Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory
>
> would it be possible to reproduce the same query without using any
> views?
> you could see the difference in memory usage.

It is possible to do it in stages instead, and if done that way, memory usage (and explain plans) are quite normal.
However to change everything would be very time consuming to re-code so I would rather avoid that.

> if that doesn't explain, try also without inheritance, by using the
> ONLY keyword (and UNION ALL).

Have tried something similar to that (accessing partition slices directly). It didn't change anything.


> If it's really only a couple of rows, you might as well post a dump
> somewhere? Then i could reproduce.

Well, the data touched by the query is only a handful of rows, but it's a handful of rows in 20GB of *other* rows.
Although the query plan correctly estimates it's only going to touch on a few.
I'm pretty sure we didn't see this crazy memory usage in earlier testing, before the database was starting to get
populated.

Toby

>
> WBL
>
>
>
> On Wed, Apr 25, 2012 at 10:51 AM, Toby Corkindale <
> toby.corkindale@strategicdata.com.au > wrote:
>
>
> Hi Willy-Bas,
> Thanks for your reply.
>
> I realise that stacking the views up like this complicates matters,
> but the actual views are fairly simple queries, and each one
> individually is only looking at a few dozen rows. (Eg. selecting
> min, max or average value from a small set, grouped by one column)
> From the point of view of creating reporting queries, it's a nice and
> logical way to build up a query, and we didn't think it would
> present any problems.. and even on a well-populated database, the
> query runs very fast. It's just the astounding amount of memory used
> that presents difficulties.
>
> Looking at the postgresql.conf for non-default settings, the notable
> ones are:
>
> max_connections = 200
> ssl = false
> shared_buffers = 256MB
> max_prepared_transactions = 16
> # although they aren't used for the group of queries in question
> maintenance_work_mem = 128MB
> # work_mem is left at default of 1MB
> effective_io_concurrency = 2
> random_page_cost = 3.0
> effective_cache_size = 512MB
> geqo = on
> geqo_threshold = 12
> geqo_effort = 7
>
> Some other things are non-default, like checkpoints,
> streaming-replication stuff, but those shouldn't have any effect.
>
> The memory settings (shared buffers, effective cache) might seem to
> be set quite conservatively at the moment, given the memory
> available in the machine -- but since we can exhaust that memory
> with just a few connections, it seems fair.
>
> Cheers,
> Toby
>
>
> ----- Original Message -----
> From: "Willy-Bas Loos" < willybas@gmail.com >
> To: "Toby Corkindale" < toby.corkindale@strategicdata.com.au >
> Cc: "pgsql-general" < pgsql-general@postgresql.org >
> Sent: Wednesday, 25 April, 2012 6:05:37 PM
> Subject: Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of
> memory
>
>
> Stacking views is a bad practice. It usually means that you are
> making the db do a lot of unnecessary work, scanning tables more
> than once when you don't even need them.
> According to your description, you have 3 layers of views on
> partitioned tables.
> I can imagine that that leaves the planner with a lot of possible
> query plans, a lot of interaction and a lot of statistics to read.
>
> do you have any special settings for the statistics on these tables?
> and could you please post the non-default settings in your
> postgresql.conf file?
> $ grep ^[^#] /etc/ postgresql /9.1/main/ postgresql . conf | grep -e
> ^[^[:space:]]
>
>
>
> Would be helpful to see if you have any statistics or planner stuff
> altered.
>
> Cheers,
>
> WBL
>
>
>
> On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale <
> toby.corkindale@strategicdata.com.au > wrote:
>
>
> Hi,
> I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a
> 64bit Debian system.
>
> I have a database which is moderately large - 20 GByte or so - and
> contains that data split up over dozens of tables, which are
> themselves partitioned.
> Queries are usually only run against fairly small, partitioned, sets
> of data.
>
> These queries generally run fairly fast. Performance is not a
> problem.
>
> However Postgres is chewing up huge amounts of memory just to create
> the query plan!
>
> For example, even if I just run
> EXPLAIN SELECT a_column FROM a_view
> WHERE partition_id = 1;
>
> Then the postgres backend process takes several seconds to return,
> and in the worst example here, is hogging more than 3Gbyte once it
> comes back. (It doesn't free that up until you close the connection)
>
> The query plan that comes back does seem quite convoluted, but then,
> the view is a query run over about eight other views, each of which
> is pulling data from a few other views. The actual underlying data
> being touched is only *a few dozen* small rows.
>
> As I said, the query runs fast enough.. however we only need a
> handful of these queries to get run in separate connections, and the
> database server will be exhausted of memory. Especially since the
> memory isn't returned until the end of the connection, yet these
> connections typically stay up for a while.
>
> I wondered if there's anything I can do to reduce this memory usage?
> And, is this a bug?
>
> I've posted the output of the query plan here:
> https://gist.github.com/2487097
>
>
> Thanks in advance,
> Toby
>
> --
> Sent via pgsql-general mailing list ( pgsql-general@postgresql.org )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> "Quality comes from focus and clarity of purpose" -- Mark
> Shuttleworth
>
>
>
>
> --
> "Quality comes from focus and clarity of purpose" -- Mark
> Shuttleworth
>
>

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
Hi,
Just wondering if anyone else has thoughts on this?

I'm still suspicious that this is a bug.

If I run EXPLAIN (or the query itself) on a database that has all the
schemas and tables created, but just the relevant data touched by the
query loaded.. then everything is fine.

The query plan is still hundreds of lines long, but running it doesn't
use much RAM. So I think that eliminates work_mem-related issues.

It really does seem like it's purely the query plan itself that is
consuming all the memory.

Has anyone else seen this?

Thanks,
Toby


On 25/04/12 16:18, Toby Corkindale wrote:
> Hi,
> I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit Debian system.
>
> I have a database which is moderately large - 20 GByte or so - and contains that data split up over dozens of tables,
whichare themselves partitioned. 
> Queries are usually only run against fairly small, partitioned, sets of data.
>
> These queries generally run fairly fast. Performance is not a problem.
>
> However Postgres is chewing up huge amounts of memory just to create the query plan!
>
> For example, even if I just run
> EXPLAIN SELECT a_column FROM a_view
> WHERE partition_id = 1;
>
> Then the postgres backend process takes several seconds to return, and in the worst example here, is hogging more
than3Gbyte once it comes back. (It doesn't free that up until you close the connection) 
>
> The query plan that comes back does seem quite convoluted, but then, the view is a query run over about eight other
views,each of which is pulling data from a few other views. The actual underlying data being touched is only *a few
dozen*small rows. 
>
> As I said, the query runs fast enough.. however we only need a handful of these queries to get run in separate
connections,and the database server will be exhausted of memory. Especially since the memory isn't returned until the
endof the connection, yet these connections typically stay up for a while. 
>
> I wondered if there's anything I can do to reduce this memory usage? And, is this a bug?
>
> I've posted the output of the query plan here: https://gist.github.com/2487097
>
>
> Thanks in advance,
> Toby
>


--
.signature

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Tom Lane
Дата:
Toby Corkindale <toby.corkindale@strategicdata.com.au> writes:
> Just wondering if anyone else has thoughts on this?

> I'm still suspicious that this is a bug.

Well, if you were to provide a reproducible test case, somebody might be
motivated to look into it.  There could be a memory leak in the planner
somewhere, but without a test case it's not very practical to go look
for it.

            regards, tom lane

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
On 26/04/12 13:11, Tom Lane wrote:
> Toby Corkindale<toby.corkindale@strategicdata.com.au>  writes:
>> Just wondering if anyone else has thoughts on this?
>
>> I'm still suspicious that this is a bug.
>
> Well, if you were to provide a reproducible test case, somebody might be
> motivated to look into it.  There could be a memory leak in the planner
> somewhere, but without a test case it's not very practical to go look
> for it.

Hi Tom,
Thanks for responding.. I'm trying to work on a test case, but it's
quite tricky.
It'll need to be something like a script that generates a tonne of
partitions at the very least. I don't know if the actual amount of data
in the partitions is part of the problem or not.
Would a Perl-based script that built up a database like that be a useful
test case for you?


For what it's worth, I discovered something quite interesting. The
memory usage only blows out when I do an update based on the results of
the query. But not if I just select the results on their own, nor if I
do the update using those values on its own.

ie.

Method #1, uses all the memory and doesn't return it:
   explain update line set status = 'foo'
   where file_id=725 and line.lineno in (
     select line from complex_view
     where file_id=725
   );


Method #2, also uses all the memory:
   explain update line set status = 'foo'
   from complex_view v
   where line.lineno = v.line
   and line.file_id=725
   and v.file_id=725;


Method #3, which uses next to no memory:
   explain select line from complex_view
   where file_id=725;


Method #4, which also uses next to no memory:
   explain create temp table foo as
   select line from complex_view;

   where file_id=725;
   update line set status = 'foo'
   from foo
   where line.lineno=foo.line
     and file_id=725;


-Toby

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Tom Lane
Дата:
Toby Corkindale <toby.corkindale@strategicdata.com.au> writes:
> On 26/04/12 13:11, Tom Lane wrote:
>> Well, if you were to provide a reproducible test case, somebody might be
>> motivated to look into it.  There could be a memory leak in the planner
>> somewhere, but without a test case it's not very practical to go look
>> for it.

> Would a Perl-based script that built up a database like that be a useful
> test case for you?

Yeah, sure, just something that somebody else can run to duplicate the
problem.

> For what it's worth, I discovered something quite interesting. The
> memory usage only blows out when I do an update based on the results of
> the query.

Hm, is the update target an inheritance tree?

            regards, tom lane

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
On 26/04/12 15:30, Tom Lane wrote:
> Toby Corkindale<toby.corkindale@strategicdata.com.au>  writes:
>> On 26/04/12 13:11, Tom Lane wrote:
>>> Well, if you were to provide a reproducible test case, somebody might be
>>> motivated to look into it.  There could be a memory leak in the planner
>>> somewhere, but without a test case it's not very practical to go look
>>> for it.
>
>> Would a Perl-based script that built up a database like that be a useful
>> test case for you?
>
> Yeah, sure, just something that somebody else can run to duplicate the
> problem.
>
>> For what it's worth, I discovered something quite interesting. The
>> memory usage only blows out when I do an update based on the results of
>> the query.
>
> Hm, is the update target an inheritance tree?

The target is the parent table of a bunch of partitions.
The actual rows being updated live in those child tables.

Toby

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Tom Lane
Дата:
Toby Corkindale <toby.corkindale@strategicdata.com.au> writes:
> On 26/04/12 15:30, Tom Lane wrote:
>> Hm, is the update target an inheritance tree?

> The target is the parent table of a bunch of partitions.

How many would "a bunch" be, exactly?  I'm fairly sure that the complex
view would get re-planned for each target table ...

            regards, tom lane

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
On 26/04/12 16:58, Tom Lane wrote:
> Toby Corkindale<toby.corkindale@strategicdata.com.au>  writes:
>> On 26/04/12 15:30, Tom Lane wrote:
>>> Hm, is the update target an inheritance tree?
>
>> The target is the parent table of a bunch of partitions.
>
> How many would "a bunch" be, exactly?  I'm fairly sure that the complex
> view would get re-planned for each target table ...

The table being updated (line) has 57 child tables.

-Toby

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
On 26/04/12 17:16, Toby Corkindale wrote:
> On 26/04/12 16:58, Tom Lane wrote:
>> Toby Corkindale<toby.corkindale@strategicdata.com.au> writes:
>>> On 26/04/12 15:30, Tom Lane wrote:
>>>> Hm, is the update target an inheritance tree?
>>
>>> The target is the parent table of a bunch of partitions.
>>
>> How many would "a bunch" be, exactly? I'm fairly sure that the complex
>> view would get re-planned for each target table ...
>
> The table being updated (line) has 57 child tables.

Although we are specifying a value for the column which they are
partitioned on.

ie.

CREATE TABLE line (file_id, lineno, status,
                    primary key (file_id, lineno));
CREATE TABLE line_1 ( CHECK (file_id=1) ) INHERITS (line);
CREATE TABLE line_2 ( CHECK (file_id=2) ) INHERITS (line);

UPDATE line SET status = something
FROM complex_view cv
WHERE cv.file_id = 2 AND line.file_id=2;

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
On 26/04/12 13:11, Tom Lane wrote:
> Toby Corkindale<toby.corkindale@strategicdata.com.au>  writes:
>> Just wondering if anyone else has thoughts on this?
>
>> I'm still suspicious that this is a bug.
>
> Well, if you were to provide a reproducible test case, somebody might be
> motivated to look into it.  There could be a memory leak in the planner
> somewhere, but without a test case it's not very practical to go look
> for it.

Hi,
I've created a bit of a test case now.
There's a Perl script here:
http://dryft.net/postgres/

Running it will create a test database that's populated with quite a lot
of schemas and partitioned tables, and a few views.

Running EXPLAIN on the query on that database at the end added ~700MB to
the server-side postgres process.

It's not the same as 3.4GB I've seen on our bigger database warehouse,
but maybe it's enough to help?

Let me know if I can help elaborate further,

Toby

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Tom Lane
Дата:
Toby Corkindale <toby.corkindale@strategicdata.com.au> writes:
> I've created a bit of a test case now.
> There's a Perl script here:
> http://dryft.net/postgres/

AFAICT, what is happening is that we're repeating the planning of that
messy nest of views for each child table of foo.  For most of the
children the planner eventually decides that the join degenerates to
nothing because of constraint exclusion, but not until it's expended a
fair amount of time and memory space per child.

I looked at whether we could improve that by having inheritance_planner
use a temporary memory context per child, but that doesn't look very
practical: it would add a good deal of extra data-copying overhead,
and some of the data structures involved are not easily copiable.

The general scheme of replanning per child might be questioned as well,
but IMO it's fairly important given the looseness of inheritance
restrictions --- it's not unlikely that you *need* different plans for
different children.  We might be able to reconsider that approach
whenever we invent an explicit concept of partitioned tables, since
presumably the partitions would all be essentially alike.

In the meantime, the best advice I can come up with is to reconsider
whether you need so many partitions.  That mechanism is really designed
for only a dozen or two partitions at most.

            regards, tom lane

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
On 27/04/12 09:33, Tom Lane wrote:
> Toby Corkindale<toby.corkindale@strategicdata.com.au>  writes:
>> I've created a bit of a test case now.
>> There's a Perl script here:
>> http://dryft.net/postgres/
>
> AFAICT, what is happening is that we're repeating the planning of that
> messy nest of views for each child table of foo.  For most of the
> children the planner eventually decides that the join degenerates to
> nothing because of constraint exclusion, but not until it's expended a
> fair amount of time and memory space per child.
>
> I looked at whether we could improve that by having inheritance_planner
> use a temporary memory context per child, but that doesn't look very
> practical: it would add a good deal of extra data-copying overhead,
> and some of the data structures involved are not easily copiable.
>
> The general scheme of replanning per child might be questioned as well,
> but IMO it's fairly important given the looseness of inheritance
> restrictions --- it's not unlikely that you *need* different plans for
> different children.  We might be able to reconsider that approach
> whenever we invent an explicit concept of partitioned tables, since
> presumably the partitions would all be essentially alike.
>
> In the meantime, the best advice I can come up with is to reconsider
> whether you need so many partitions.  That mechanism is really designed
> for only a dozen or two partitions at most.


Hi Tom,
Thanks for looking into this, I appreciate you spending the time.

The system I've come up with for partitioning this data requires quite a
lot of partitions - say thirty to seventy - but I didn't realise it
would cause trouble down the line, so I'll see if it can be reworked to
reduce the number.

For what it's worth, the actual query that was blowing out to gigabytes
was only hitting a couple of dozen partitions per table it was touching
- but it was hitting three such tables, about sixteen times (!) each.

I'm still curious about why I can do a SELECT * FROM complexview without
using much memory, but an UPDATE foo FROM complexview causes all the
memory to get exhausted?

Thanks,
Toby

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Rob Sargentg
Дата:
On 04/29/2012 07:19 PM, Toby Corkindale wrote:
> On 27/04/12 09:33, Tom Lane wrote:
>> Toby Corkindale<toby.corkindale@strategicdata.com.au>  writes:
>>> I've created a bit of a test case now.
>>> There's a Perl script here:
>>> http://dryft.net/postgres/
>>
>> AFAICT, what is happening is that we're repeating the planning of that
>> messy nest of views for each child table of foo.  For most of the
>> children the planner eventually decides that the join degenerates to
>> nothing because of constraint exclusion, but not until it's expended a
>> fair amount of time and memory space per child.
>>
>> I looked at whether we could improve that by having inheritance_planner
>> use a temporary memory context per child, but that doesn't look very
>> practical: it would add a good deal of extra data-copying overhead,
>> and some of the data structures involved are not easily copiable.
>>
>> The general scheme of replanning per child might be questioned as well,
>> but IMO it's fairly important given the looseness of inheritance
>> restrictions --- it's not unlikely that you *need* different plans for
>> different children.  We might be able to reconsider that approach
>> whenever we invent an explicit concept of partitioned tables, since
>> presumably the partitions would all be essentially alike.
>>
>> In the meantime, the best advice I can come up with is to reconsider
>> whether you need so many partitions.  That mechanism is really designed
>> for only a dozen or two partitions at most.
>
>
> Hi Tom,
> Thanks for looking into this, I appreciate you spending the time.
>
> The system I've come up with for partitioning this data requires quite
> a lot of partitions - say thirty to seventy - but I didn't realise it
> would cause trouble down the line, so I'll see if it can be reworked
> to reduce the number.
>
> For what it's worth, the actual query that was blowing out to
> gigabytes was only hitting a couple of dozen partitions per table it
> was touching - but it was hitting three such tables, about sixteen
> times (!) each.
>
> I'm still curious about why I can do a SELECT * FROM complexview
> without using much memory, but an UPDATE foo FROM complexview causes
> all the memory to get exhausted?
>
> Thanks,
> Toby
>
Does

UPDATE foo set <values>
where foo.id in (select id from complexview...)

also swallow the memory?


Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Tom Lane
Дата:
Toby Corkindale <toby.corkindale@strategicdata.com.au> writes:
> I'm still curious about why I can do a SELECT * FROM complexview without
> using much memory, but an UPDATE foo FROM complexview causes all the
> memory to get exhausted?

The former only requires the complexview to get planned once.

            regards, tom lane

Re: Bug? Query plans / EXPLAIN using gigabytes of memory

От
Toby Corkindale
Дата:
On 30/04/12 11:26, Rob Sargentg wrote:
> On 04/29/2012 07:19 PM, Toby Corkindale wrote:
>> On 27/04/12 09:33, Tom Lane wrote:
>>> Toby Corkindale<toby.corkindale@strategicdata.com.au> writes:
>>>> I've created a bit of a test case now.
>>>> There's a Perl script here:
>>>> http://dryft.net/postgres/
>>>
>>> AFAICT, what is happening is that we're repeating the planning of that
>>> messy nest of views for each child table of foo. For most of the
>>> children the planner eventually decides that the join degenerates to
>>> nothing because of constraint exclusion, but not until it's expended a
>>> fair amount of time and memory space per child.
>>>
>>> I looked at whether we could improve that by having inheritance_planner
>>> use a temporary memory context per child, but that doesn't look very
>>> practical: it would add a good deal of extra data-copying overhead,
>>> and some of the data structures involved are not easily copiable.
>>>
>>> The general scheme of replanning per child might be questioned as well,
>>> but IMO it's fairly important given the looseness of inheritance
>>> restrictions --- it's not unlikely that you *need* different plans for
>>> different children. We might be able to reconsider that approach
>>> whenever we invent an explicit concept of partitioned tables, since
>>> presumably the partitions would all be essentially alike.
>>>
>>> In the meantime, the best advice I can come up with is to reconsider
>>> whether you need so many partitions. That mechanism is really designed
>>> for only a dozen or two partitions at most.
>>
>>
>> Hi Tom,
>> Thanks for looking into this, I appreciate you spending the time.
>>
>> The system I've come up with for partitioning this data requires quite
>> a lot of partitions - say thirty to seventy - but I didn't realise it
>> would cause trouble down the line, so I'll see if it can be reworked
>> to reduce the number.
>>
>> For what it's worth, the actual query that was blowing out to
>> gigabytes was only hitting a couple of dozen partitions per table it
>> was touching - but it was hitting three such tables, about sixteen
>> times (!) each.
>>
>> I'm still curious about why I can do a SELECT * FROM complexview
>> without using much memory, but an UPDATE foo FROM complexview causes
>> all the memory to get exhausted?
>>
>> Thanks,
>> Toby
>>
> Does
>
> UPDATE foo set <values>
> where foo.id in (select id from complexview...)
>
> also swallow the memory?

Yes, definitely. (See an earlier post of mine for several variations on
the query)

However a two-stage process doesn't, ie.
create temp table as select id from complexview;
update foo where id in (select id from complexview);
(or the same thing with FROM)


--
.signature