Обсуждение: Out Of Memory 8.1

Поиск
Список
Период
Сортировка

Out Of Memory 8.1

От
"French, Martin"
Дата:
Hi All,

I am having problems with a query on 8.1 running on

RHEL 5.4
16GB RAM
Linux pgsql3 2.6.18-164.el5PAE #1 SMP Tue Aug 18 15:59:11 EDT 2009 i686
i686 i386 GNU/Linux
2 x Xeon X5650 (2.67GHz 6 Cores)
Disks are on PERC 6 controller in RAID 10

Postgresql.conf:
# - Memory -

shared_buffers = 327680
work_mem = 98394
maintenance_work_mem = 983040

The query:

2011-04-06 23:23:03 BST <app.charisma><26083>LOG:  statement: CREATE
TABLE medusa.cmi_group_free AS
SELECT
    stkl_stockno,
    (SUM(stkphys) - (SUM(stkalloc) + SUM(stkbacko))) as group_free
FROM
    charisma.sk_stklfl
WHERE
    stkl_stockno NOT LIKE 'ZZ%' AND
    stkl_stockno ~
'^([A-z&]{2,3})([0-9][0-9][0-9])([0-9][0-9][0-9][0-9][A-z0-9]{1,3})$'
GROUP BY
    stkl_stockno;

The explain (cannot explain analyze, or Postgres runs out of memory
again)
'HashAggregate  (cost=2731947.55..2731947.57 rows=1 width=38)'
'  ->  Seq Scan on stkl_rec  (cost=0.00..2731947.54 rows=1 width=38)'
'        Filter: (((stkl_comp)::text = 'A'::text) AND
((stkl_stockno)::text ~
'^(TK[A-Za-z0-9][0-9]{3}(?:[0-9]{5}(?:[0-9]{3,4})?)?|NSP[0-9]{3}([0-9]{4
})?|Z[MZ][0-9]{8,9}|LSP[0-9]{7}[A-Za-z0-9]|[A-Z][A-Z&][A-Z][0-9]{7}[A-Z0
-9])$'::text) AND ((stkl_stockno)::text !~~ 'ZZ%'::text) AND
((stkl_stockno)::text ~
'^([A-z&]{2,3})([0-9][0-9][0-9])([0-9][0-9][0-9][0-9][A-z0-9]{1,3})$'::t
ext))'

The table has approximately
6.9 million rows


It's not the OOM that's doing this as vm.overcommit_memory = 2 and
vm.overcommit_ratio = 90 nor is postgres swapping to disk.

Can anyone help me here? The same query on exactly the same data on an
older machine running the same setup except that it's RHEL 5.0 runs
fine!

Thanks.

Martin

___________________________________________________

This email is intended for the named recipient. The information contained
in it is confidential.  You should not copy it for any purposes, nor
disclose its contents to any other party.  If you received this email
in error, please notify the sender immediately via email, and delete it from
your computer.

Any views or opinions presented are solely those of the author and do not
necessarily represent those of the company.

PCI Compliancy: Please note, we do not send or wish to receive banking, credit
or debit card information by email or any other form of communication.

Cromwell Tools Limited, PO Box 14, 65 Chartwell Drive
Wigston, Leicester LE18 1AT. Tel 0116 2888000
Registered in England and Wales, Reg No 00986161
VAT GB 115 5713 87 900
__________________________________________________


Re: Out Of Memory 8.1

От
Tom Lane
Дата:
"French, Martin" <frenchm@cromwell.co.uk> writes:
> I am having problems with a query on 8.1 running on
> RHEL 5.4
> work_mem = 98394

> The explain (cannot explain analyze, or Postgres runs out of memory
> again)
> 'HashAggregate  (cost=2731947.55..2731947.57 rows=1 width=38)'
> '  ->  Seq Scan on stkl_rec  (cost=0.00..2731947.54 rows=1 width=38)'
> '        Filter: (((stkl_comp)::text = 'A'::text) AND
> ((stkl_stockno)::text ~
> '^(TK[A-Za-z0-9][0-9]{3}(?:[0-9]{5}(?:[0-9]{3,4})?)?|NSP[0-9]{3}([0-9]{4
> })?|Z[MZ][0-9]{8,9}|LSP[0-9]{7}[A-Za-z0-9]|[A-Z][A-Z&][A-Z][0-9]{7}[A-Z0
> -9])$'::text) AND ((stkl_stockno)::text !~~ 'ZZ%'::text) AND
> ((stkl_stockno)::text ~
> '^([A-z&]{2,3})([0-9][0-9][0-9])([0-9][0-9][0-9][0-9][A-z0-9]{1,3})$'::t
> ext))'

Apparently the number of groups is way more than the planner expects,
and so the hash table grows to exceed available memory.

Kluge fixes: try reducing work_mem to discourage it from using
HashAggregate.  Or you could temporarily turn off enable_hashagg.

A non-kluge fix would involve getting the planner to realize there are a
lot of groups needed.  Have you analyzed the table lately?  Maybe you
need to increase the statistics target for it.

            regards, tom lane

Re: Out Of Memory 8.1

От
"French, Martin"
Дата:
Thanks for the info Tom.

The table has been analyzed (somewhat repeatedly...), with the stats
target set at various limits.

At the moment default_statistics_target = 50.

I've had work_mem as low as 1MB and as high as 128MB, with little to no
avail.

Setting enable_hashagg = off, does allow the query to complete, but
mashes the next query in the batch with it having to be killed at 8
hours. Setting it off for this query alone causes the next query to do
exactly the same.

The thing that perplexes me is; that on RHEL 5.0 with all the same
settings and data, the query works fine... so it's confusing as to what
can have changed so much to cause this. I know that we had to move to
RHEL 5.4 because of PERC Raid drivers, but I wouldn't have thought that
would've made too much of a difference.

It might also be of interest that "VACCUM ANALYZE VERBOSE" immediately
falls over with out of memory on a request of 960MB (what it's set to
use...) but that "VACUUM FULL ANALYZE VEBOSE" doesn't...

I have a sneaky suspicion that this is to do with the width of the table
(271 columns of mainly text), and the fact that the designer (not me...)
of it, didn't take into consideration how difficult it would be to
process this amount opf data on a row by row data.

cheers

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: 07 April 2011 15:26
To: French, Martin
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Out Of Memory 8.1

"French, Martin" <frenchm@cromwell.co.uk> writes:
> I am having problems with a query on 8.1 running on
> RHEL 5.4
> work_mem = 98394

> The explain (cannot explain analyze, or Postgres runs out of memory
> again)
> 'HashAggregate  (cost=2731947.55..2731947.57 rows=1 width=38)'
> '  ->  Seq Scan on stkl_rec  (cost=0.00..2731947.54 rows=1 width=38)'
> '        Filter: (((stkl_comp)::text = 'A'::text) AND
> ((stkl_stockno)::text ~
>
'^(TK[A-Za-z0-9][0-9]{3}(?:[0-9]{5}(?:[0-9]{3,4})?)?|NSP[0-9]{3}([0-9]{4
>
})?|Z[MZ][0-9]{8,9}|LSP[0-9]{7}[A-Za-z0-9]|[A-Z][A-Z&][A-Z][0-9]{7}[A-Z0
> -9])$'::text) AND ((stkl_stockno)::text !~~ 'ZZ%'::text) AND
> ((stkl_stockno)::text ~
>
'^([A-z&]{2,3})([0-9][0-9][0-9])([0-9][0-9][0-9][0-9][A-z0-9]{1,3})$'::t
> ext))'

Apparently the number of groups is way more than the planner expects,
and so the hash table grows to exceed available memory.

Kluge fixes: try reducing work_mem to discourage it from using
HashAggregate.  Or you could temporarily turn off enable_hashagg.

A non-kluge fix would involve getting the planner to realize there are a
lot of groups needed.  Have you analyzed the table lately?  Maybe you
need to increase the statistics target for it.

            regards, tom lane

___________________________________________________

This email is intended for the named recipient. The information contained
in it is confidential.  You should not copy it for any purposes, nor
disclose its contents to any other party.  If you received this email
in error, please notify the sender immediately via email, and delete it from
your computer.

Any views or opinions presented are solely those of the author and do not
necessarily represent those of the company.

PCI Compliancy: Please note, we do not send or wish to receive banking, credit
or debit card information by email or any other form of communication.

Cromwell Tools Limited, PO Box 14, 65 Chartwell Drive
Wigston, Leicester LE18 1AT. Tel 0116 2888000
Registered in England and Wales, Reg No 00986161
VAT GB 115 5713 87 900
__________________________________________________


Re: Out Of Memory 8.1

От
Scott Marlowe
Дата:
On Thu, Apr 7, 2011 at 8:56 AM, French, Martin <frenchm@cromwell.co.uk> wrote:
> Thanks for the info Tom.
>
> The table has been analyzed (somewhat repeatedly...), with the stats
> target set at various limits.
>
> At the moment default_statistics_target = 50.
>
> I've had work_mem as low as 1MB and as high as 128MB, with little to no
> avail.
>
> Setting enable_hashagg = off, does allow the query to complete, but
> mashes the next query in the batch with it having to be killed at 8
> hours. Setting it off for this query alone causes the next query to do
> exactly the same.
>
> The thing that perplexes me is; that on RHEL 5.0 with all the same
> settings and data, the query works fine... so it's confusing as to what
> can have changed so much to cause this. I know that we had to move to
> RHEL 5.4 because of PERC Raid drivers, but I wouldn't have thought that
> would've made too much of a difference.

So what's the difference between the query plans on the two machines?
Any difference in non-default settings in postgresql.conf between the
two machines?

Re: Out Of Memory 8.1

От
"French, Martin"
Дата:
OK - MORE development here.

If I bring down shared buffers to the equivalent of 1.7GB, it'll vacuum
without a problem.

Does anyone know why this might be?

Thanks in advance.

Cheers

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: 08 April 2011 08:47
To: French, Martin
Cc: Tom Lane; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Out Of Memory 8.1

On Thu, Apr 7, 2011 at 8:56 AM, French, Martin <frenchm@cromwell.co.uk>
wrote:
> Thanks for the info Tom.
>
> The table has been analyzed (somewhat repeatedly...), with the stats
> target set at various limits.
>
> At the moment default_statistics_target = 50.
>
> I've had work_mem as low as 1MB and as high as 128MB, with little to
no
> avail.
>
> Setting enable_hashagg = off, does allow the query to complete, but
> mashes the next query in the batch with it having to be killed at 8
> hours. Setting it off for this query alone causes the next query to do
> exactly the same.
>
> The thing that perplexes me is; that on RHEL 5.0 with all the same
> settings and data, the query works fine... so it's confusing as to
what
> can have changed so much to cause this. I know that we had to move to
> RHEL 5.4 because of PERC Raid drivers, but I wouldn't have thought
that
> would've made too much of a difference.

So what's the difference between the query plans on the two machines?
Any difference in non-default settings in postgresql.conf between the
two machines?

___________________________________________________

This email is intended for the named recipient. The information contained
in it is confidential.  You should not copy it for any purposes, nor
disclose its contents to any other party.  If you received this email
in error, please notify the sender immediately via email, and delete it from
your computer.

Any views or opinions presented are solely those of the author and do not
necessarily represent those of the company.

PCI Compliancy: Please note, we do not send or wish to receive banking, credit
or debit card information by email or any other form of communication.

Cromwell Tools Limited, PO Box 14, 65 Chartwell Drive
Wigston, Leicester LE18 1AT. Tel 0116 2888000
Registered in England and Wales, Reg No 00986161
VAT GB 115 5713 87 900
__________________________________________________


Re: Out Of Memory 8.1

От
"French, Martin"
Дата:
Ok, so this has moved on a little from the query now...

As it stands now, without ANY data in the DB at all, taking a schema
pg_dump from the old machine and porting to the new, then run vacuum:

[root@pgsql3 build]# psql -U postgres -d hermes
Welcome to psql 8.1.9, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

hermes=# vacuum;
ERROR:  out of memory
DETAIL:  Failed on request of size 1006632960.
hermes=#

2011-04-08 11:33:36 BST <postgres><10937>STATEMENT:  VACUUM VERBOSE
TopMemoryContext: 55064 total in 5 blocks; 6560 free (10 chunks); 48504
used
Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks); 3256
used
TopTransactionContext: 8192 total in 1 blocks; 7712 free (9 chunks); 480
used
MessageContext: 8192 total in 1 blocks; 7368 free (1 chunks); 824 used
smgr relation table: 8192 total in 1 blocks; 3904 free (0 chunks); 4288
used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 968 free (3 chunks); 56 used
Vacuum: 57344 total in 3 blocks; 28184 free (0 chunks); 29160 used
Relcache by OID: 8192 total in 1 blocks; 3896 free (0 chunks); 4296 used
CacheMemoryContext: 253952 total in 5 blocks; 13520 free (2 chunks);
240432 used
pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_opclass_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_database_datname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_cast_source_target_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 328 free (0 chunks); 696
used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
MdSmgr: 8192 total in 1 blocks; 8080 free (0 chunks); 112 used
LockTable (locallock hash): 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
2011-04-08 11:39:48 BST <postgres><11119>ERROR:  out of memory
2011-04-08 11:39:48 BST <postgres><11119>DETAIL:  Failed on request of
size 1006632960.
2011-04-08 11:39:48 BST <postgres><11119>STATEMENT:  vacuum;

I'm totally lost with this now, as I assumed it was a data corruption
issue within the catalogs (hence rebuilding the DB) but obviously,
that's had no effect at all.

Even an 8.1.23 build on this machine does exactly the same thing...

Cheers

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: 08 April 2011 08:47
To: French, Martin
Cc: Tom Lane; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Out Of Memory 8.1

On Thu, Apr 7, 2011 at 8:56 AM, French, Martin <frenchm@cromwell.co.uk>
wrote:
> Thanks for the info Tom.
>
> The table has been analyzed (somewhat repeatedly...), with the stats
> target set at various limits.
>
> At the moment default_statistics_target = 50.
>
> I've had work_mem as low as 1MB and as high as 128MB, with little to
no
> avail.
>
> Setting enable_hashagg = off, does allow the query to complete, but
> mashes the next query in the batch with it having to be killed at 8
> hours. Setting it off for this query alone causes the next query to do
> exactly the same.
>
> The thing that perplexes me is; that on RHEL 5.0 with all the same
> settings and data, the query works fine... so it's confusing as to
what
> can have changed so much to cause this. I know that we had to move to
> RHEL 5.4 because of PERC Raid drivers, but I wouldn't have thought
that
> would've made too much of a difference.

So what's the difference between the query plans on the two machines?
Any difference in non-default settings in postgresql.conf between the
two machines?

___________________________________________________

This email is intended for the named recipient. The information contained
in it is confidential.  You should not copy it for any purposes, nor
disclose its contents to any other party.  If you received this email
in error, please notify the sender immediately via email, and delete it from
your computer.

Any views or opinions presented are solely those of the author and do not
necessarily represent those of the company.

PCI Compliancy: Please note, we do not send or wish to receive banking, credit
or debit card information by email or any other form of communication.

Cromwell Tools Limited, PO Box 14, 65 Chartwell Drive
Wigston, Leicester LE18 1AT. Tel 0116 2888000
Registered in England and Wales, Reg No 00986161
VAT GB 115 5713 87 900
__________________________________________________