Обсуждение: BUG #4418: Memory leak in query planner

Поиск
Список
Период
Сортировка

BUG #4418: Memory leak in query planner

От
"michael McMaster"
Дата:
The following bug has been logged online:

Bug reference:      4418
Logged by:          michael McMaster
Email address:      Michael.McMaster@gmail.com
PostgreSQL version: 8.3.3
Operating system:   Ubuntu Hardy 8.04.1
Description:        Memory leak in query planner
Details:

I intend on creating a view over a large number of tables (possibly > 1000).
 Each table in the view has a CHECK constraint that partitions the data by a
time range.

I've created some test data with 2 views, with each view a UNION ALL over
1000 tables, with each table containing 100 rows.  An INT8 type is used as a
primary key in both tables, with one table additionally containing the INT8
key of the other as a foreign key.

eg. Table A is created with
CREATE TABLE A (
key INT8 primary key,
dtstart INT8, -- top 32 bits seconds, lower 32 bits microseconds
CHECK(dtstart >= <some value> AND dtstart < <some other value>)
); -- Create 1000 of these tables
CREATE VIEW A_View as SELECT * from A_1 UNION ALL select * from A_2 ...
UNION ALL select * from A_N;

(Table B is created in a similar manner)

I have set constraint_exclusion on to limit the number of tables scanned in
each query.  I have confirmed that constraint_exclusion has the effect of
limiting a query to a single table in the view based on the dtstart column.

The problem is the query planner seems to use a large amount of memory, and
this memory is not released until the client connection goes away.  I've
tested this with both a perl dbi client and psql - in both cases the
postgres process can consume > 1Gb of ram.

I can trigger the bug by starting up a psql session, and executing an
"explain" command.  The postgres process will then use large amounts of
memory until psql is terminated.

eg.
explain SELECT e.*, a.annotation FROM History.EventView e,
History.AnnotationView a WHERE e.dtstart between 429496729600000 and
444958611865600 AND a.dtstart between 429496729600000 and 444958611865600
AND a.dtstart = e.dtstart AND a.entryId = e.eventId;
                                                              QUERY PLAN

----------------------------------------------------------------------------
----------------------------------------------------------
 Hash Join  (cost=5.00..10.26 rows=1 width=253)
   Hash Cond: ((event_100000.dtstart = event_annotation_100000.dtstart) AND
(event_100000.eventid = event_annotation_100000.entryid))
   ->  Append  (cost=0.00..4.50 rows=100 width=201)
         ->  Seq Scan on event_100000  (cost=0.00..4.50 rows=100 width=201)
               Filter: ((dtstart >= 429496729600000::bigint) AND (dtstart <=
444958611865600::bigint))
   ->  Hash  (cost=3.50..3.50 rows=100 width=68)
         ->  Append  (cost=0.00..3.50 rows=100 width=68)
               ->  Seq Scan on event_annotation_100000  (cost=0.00..3.50
rows=100 width=68)
                     Filter: ((dtstart >= 429496729600000::bigint) AND
(dtstart <= 444958611865600::bigint))
(9 rows)

It's not necessary to actually perform the query in order to waste
resources.  Actually, when I perform the query in a psql session, the memory
usage is still high, but released as soon as the query results are
available.  If I then perform the explain (ie. explain after the select in
the same psql session), the memory is also released correctly (but I still
think the 1Gb of memory to return 100 small rows with a single join is
excessive).


Executing the explain query in the same psql session twice doesn't use any
more memory than executing it once.

Re: BUG #4418: Memory leak in query planner

От
Tom Lane
Дата:
"michael McMaster" <Michael.McMaster@gmail.com> writes:
> I intend on creating a view over a large number of tables (possibly > 1000).

Better rethink that plan ... PG's current partitioning support doesn't
scale that high, and filing bugs claiming it's a memory leak isn't
going to make it do so.

            regards, tom lane

Re: BUG #4418: Memory leak in query planner

От
Jeff Davis
Дата:
On Tue, 2008-09-16 at 00:37 +0000, michael McMaster wrote:
> I intend on creating a view over a large number of tables (possibly > 1000).
>  Each table in the view has a CHECK constraint that partitions the data by a
> time range.

You may try using the inheritance mechanism rather than a UNION ALL
view.

Regards,
    Jeff Davis

Re: BUG #4418: Memory leak in query planner

От
Michael McMaster
Дата:
Tom Lane wrote:
> "michael McMaster" <Michael.McMaster@gmail.com> writes:
>> I intend on creating a view over a large number of tables (possibly > 1000).
>
> Better rethink that plan ... PG's current partitioning support doesn't
> scale that high, and filing bugs claiming it's a memory leak isn't
> going to make it do so.
>
>             regards, tom lane
>

I modified the views to mimic the CHECK constraints on the partitioned
tables.
ie CREATE VIEW a_view AS select * from a_table where dtstart between <x>
and <y>
UNION ALL ...

The query plans generated weren't quite as good, but use an
insignificant amount of memory, and are still reasonably fast.  This
solution is an acceptable trade off at the moment.

I still think the memory usage issue is a bug, as there are alternative
plans that show Postgres -does- scale that high.  I had a quick look
through the code and I suspect that table inheritance wouldn't have this
issue.  I'll test inheritance and report my findings.