Re: BUG #7571: Query high memory usage

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: BUG #7571: Query high memory usage
Дата
Msg-id CAFj8pRB1A=w4OcbcZ_7_HejeTYhZ2vvCjjMgP7Bq1tHejStD5g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #7571: Query high memory usage  (Radovan Jablonovsky <radovan.jablonovsky@replicon.com>)
Список pgsql-bugs
2012/9/27 Radovan Jablonovsky <radovan.jablonovsky@replicon.com>:
> Hi Pavel,
>
> Here are the test data with set enable_hashagg to off. It does not looks
> like improvement. Query was running for 30min without returning result set.

so maybe it is PostgreSQL bug - probably window function doesn't reset
some memory context and then execution is memory expensive

Regards

Pavel

>
> db=> set enable_hashagg=off;
> SET
> db=> explain
> db-> SELECT
> db->   schema_name,
> db->   sum(table_size)
> db-> FROM
> db->   (SELECT
> db(>     pg_catalog.pg_namespace.nspname as schema_name,
> db(>     pg_relation_size(pg_catalog.pg_class.oid) as table_size,
> db(>     sum(pg_relation_size(pg_catalog.pg_class.oid)) over () as
> database_size
> db(>    FROM pg_catalog.pg_class
> db(>    JOIN pg_catalog.pg_namespace
> db(>     ON relnamespace = pg_catalog.pg_namespace.oid
> db(>   ) t
> db-> GROUP BY schema_name, database_size;
>                                                     QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=725540.59..756658.18 rows=40000 width=104)
>    ->  Sort  (cost=725540.59..733219.99 rows=3071759 width=104)
>          Sort Key: pg_namespace.nspname,
> (sum(pg_relation_size((pg_class.oid)::regclass, 'main'::text)) OVER (?))
>          ->  WindowAgg  (cost=120.98..243838.73 rows=3071759 width=68)
>                ->  Hash Join  (cost=120.98..190082.95 rows=3071759 width=68)
>                      Hash Cond: (pg_class.relnamespace = pg_namespace.oid)
>                      ->  Seq Scan on pg_class  (cost=0.00..143885.59
> rows=3071759 width=8)
>                      ->  Hash  (cost=90.99..90.99 rows=2399 width=68)
>                            ->  Seq Scan on pg_namespace  (cost=0.00..90.99
> rows=2399 width=68)
> (9 rows)
>
>
> Data from top after 30 min of query run with hashagg set off:
>   PID  USER     PR  NI  VIRT   RES  SHR S %CPU %MEM    TIME+    COMMAND
>  2235 postgres  25   0   27.5g  23g    4.6g R  95.1    75.2        31:39.81
> postgres: aspuser aspdata 10.0.2.67(52716) SELECT
>
>
> Radovan
>
> On Wed, Sep 26, 2012 at 10:15 PM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>>
>> Hello
>>
>> you should to run this query on real data - and if it works now, then
>> send EXPLAIN ANALYZE result, please
>>
>> Pavel
>>
>> 2012/9/27 Melese Tesfaye <mtesfaye@gmail.com>:
>> > Thanks Pavel,
>> > Setting enable_hashagg to off didn't resolve the issue.
>> > Please find the explain as well as query results after "set
>> > enable_hashagg=off;"
>> >
>> > mtesfaye@[local](test_db)=# EXPLAIN SELECT DISTINCT(A.*)
>> > test_db-# FROM table1_t A LEFT JOIN table2_v B
>> > test_db-# ON A.pnr_id=B.pnr_id
>> > test_db-# WHERE  A.pnr_id IN(1801,2056) AND
>> > B.departure_date_time>=DATE('2012-09-26')
>> > test_db-# ORDER BY pnr_id ASC,nam_id ASC;
>> >
>> > +-----------------------------------------------------------------------------------------------------------+
>> > |                                                QUERY PLAN
>> > |
>> >
>> > +-----------------------------------------------------------------------------------------------------------+
>> > | Unique  (cost=1354.62..1354.66 rows=4 width=13)
>> > |
>> > |   ->  Sort  (cost=1354.62..1354.63 rows=4 width=13)
>> > |
>> > |         Sort Key: a.pnr_id, a.nam_id, a.pty_num
>> > |
>> > |         ->  Merge Join  (cost=1084.06..1354.58 rows=4 width=13)
>> > |
>> > |               Merge Cond: (table2_t.pnr_id = a.pnr_id)
>> > |
>> > |               ->  Unique  (cost=1084.06..1198.67 rows=11461 width=16)
>> > |
>> > |                     ->  Sort  (cost=1084.06..1112.72 rows=11461
>> > width=16)
>> > |
>> > |                           Sort Key: table2_t.pnr_id, table2_t.itn_id,
>> > table2_t.departure_date_time        |
>> > |                           ->  Seq Scan on table2_t  (cost=0.00..311.34
>> > rows=11461 width=16)               |
>> > |                                 Filter: (departure_date_time >=
>> > '2012-09-26'::date)                       |
>> > |               ->  Index Scan using table1_t_pnr_id_idx1 on table1_t a
>> > (cost=0.00..12.60 rows=4 width=13) |
>> > |                     Index Cond: (pnr_id = ANY
>> > ('{1801,2056}'::integer[]))
>> > |
>> >
>> > +-----------------------------------------------------------------------------------------------------------+
>> > (12 rows)
>> >
>> > Time: 5.889 ms
>> >
>> > mtesfaye@[local](test_db)=# show enable_hashagg;
>> > +----------------+
>> > | enable_hashagg |
>> > +----------------+
>> > | on             |
>> > +----------------+
>> > (1 row)
>> >
>> > Time: 0.136 ms
>> >
>> > mtesfaye@[local](test_db)=# set enable_hashagg=off;
>> > SET
>> > Time: 0.203 ms
>> > mtesfaye@[local](test_db)=# show enable_hashagg;
>> > +----------------+
>> > | enable_hashagg |
>> > +----------------+
>> > | off            |
>> > +----------------+
>> > (1 row)
>> >
>> > Time: 0.131 ms
>> >
>> >
>> > mtesfaye@[local](test_db)=# SELECT DISTINCT(A.*)
>> > test_db-# FROM table1_t A LEFT JOIN table2_v B
>> > test_db-# ON A.pnr_id=B.pnr_id
>> > test_db-# WHERE  A.pnr_id IN(1801,2056) AND
>> > B.departure_date_time>=DATE('2012-09-26')
>> > test_db-# ORDER BY pnr_id ASC,nam_id ASC;
>> > +--------+--------+---------+
>> > | pnr_id | nam_id | pty_num |
>> > +--------+--------+---------+
>> > |   1801 |   3359 |       1 |
>> > |   1801 |   3360 |       1 |
>> > |   1801 |   3361 |       1 |
>> > |   1801 |   3362 |       1 |
>> > +--------+--------+---------+
>> > (4 rows)
>> >
>> > Time: 8.452 ms
>> >
>> >
>> > On Thu, Sep 27, 2012 at 3:54 AM, Pavel Stehule <pavel.stehule@gmail.com>
>> > wrote:
>> >>
>> >> Hello
>> >>
>> >> this situation is possible, when optimizer use HashAgg where should not
>> >> use it.
>> >>
>> >> Please, try to disable HashAgg - set enable_hashagg to off;
>> >>
>> >> please, send EXPLAIN result
>> >>
>> >> Regards
>> >>
>> >> Pavel Stehule
>> >>
>> >> 2012/9/26  <radovan.jablonovsky@replicon.com>:
>> >> > The following bug has been logged on the website:
>> >> >
>> >> > Bug reference:      7571
>> >> > Logged by:          Radovan Jablonovsky
>> >> > Email address:      radovan.jablonovsky@replicon.com
>> >> > PostgreSQL version: 9.1.5
>> >> > Operating system:   CentOs 5.8 Linux 2.6.18-308.el5 x86_64
>> >> > Description:
>> >> >
>> >> > During checking our company database size we used query, which was
>> >> > not
>> >> > the
>> >> > best to find out the tables/db size but should do the job. The query
>> >> > was
>> >> > tested on server with 32GB of RAM, 2 CPU with 4 cores and it was
>> >> > running
>> >> > alone without other activity. It consumed almost all RAM forced
>> >> > server
>> >> > to
>> >> > use swap and after 1hour it was still running. The simplified version
>> >> > of
>> >> > query used 20% of memory and finished after 1hour 8min.
>> >> >
>> >> > The size of pg_class is 3mil rows/objects and pg_namespace has 3000
>> >> > rows/schemata.
>> >> >
>> >> > query:
>> >> > SELECT
>> >> >   schema_name,
>> >> >   sum(table_size)
>> >> > FROM
>> >> >   (SELECT
>> >> >     pg_catalog.pg_namespace.nspname as schema_name,
>> >> >     pg_relation_size(pg_catalog.pg_class.oid) as table_size,
>> >> >     sum(pg_relation_size(pg_catalog.pg_class.oid)) over () as
>> >> > database_size
>> >> >    FROM pg_catalog.pg_class
>> >> >    JOIN pg_catalog.pg_namespace
>> >> >     ON relnamespace = pg_catalog.pg_namespace.oid
>> >> >   ) t
>> >> > GROUP BY schema_name, database_size;
>> >> >
>> >> >
>> >> > top - 10:50:44 up 20 days, 19:00,  1 user,  load average: 1.15, 1.10,
>> >> > 0.84
>> >> > Tasks: 239 total,   3 running, 236 sleeping,   0 stopped,   0 zombie
>> >> > Cpu(s): 15.1%us,  1.5%sy,  0.0%ni, 83.0%id,  0.5%wa,  0.0%hi,
>> >> > 0.0%si,
>> >> > 0.0%st
>> >> > Mem:  32946260k total, 32599908k used,   346352k free,   141924k
>> >> > buffers
>> >> > Swap: 55043952k total,    85216k used, 54958736k free, 14036516k
>> >> > cached
>> >> >
>> >> > Info from top:
>> >> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> >> >  2016 postgres  25   0 22.8g  17g 3.2g R 96.1 56.0  19:17.01
>> >> > postgres:
>> >> > postgres db 10.0.1.10(49928) SELECT
>> >> >
>> >> > Simplified version of query uses pg_tables. It has 0.5mil
>> >> > rows/tables.
>> >> > Simplified version of query:
>> >> > SELECT
>> >> >   schemaname,
>> >> >   sum(pg_relation_size(schemaname || '.' || tablename))::bigint
>> >> > FROM pg_tables
>> >> > GROUP BY schemaname;
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
>> >> > To make changes to your subscription:
>> >> > http://www.postgresql.org/mailpref/pgsql-bugs
>> >>
>> >>
>> >> --
>> >> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
>> >> To make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-bugs
>> >
>> >

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Radovan Jablonovsky
Дата:
Сообщение: Re: BUG #7571: Query high memory usage
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #7570: WHERE .. IN bug from 9.2.0 not fixed in 9.2.1