Обсуждение: ERROR: invalid memory alloc in Pg 9.6.6
Started seeing "invalid memory alloc" errors in a non-production DB (9.6.6) all of a sudden this afternoon. It's running on CentOS 7.4 on VMWare.
I ran a pg_dump on that DB and got this error almost immediately:
pg_dump: [archiver (db)] query failed: ERROR: invalid memory alloc request size 8830452760576
pg_dump: [archiver (db)] query was: SELECT at.attname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault('c',c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('c',c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS attacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('c',c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault('c',c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rattacl, NULL AS initattacl, NULL AS initrattacl FROM pg_catalog.pg_attribute at JOIN pg_catalog.pg_class c ON (at.attrelid = c.oid) LEFT JOIN pg_catalog.pg_init_privs pip ON (at.attrelid = pip.objoid AND pip.classoid = 'pg_catalog.pg_class'::pg_catalog.regclass AND at.attnum = pip.objsubid) WHERE at.attrelid = '16445700'::pg_catalog.oid AND NOT at.attisdropped AND ((SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault('c',c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('c',c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) IS NOT NULL OR (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('c',c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault('c',c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) IS NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL)ORDER BY at.attnum
I ran pg_catcheck and it came back just fine:
progress: done (0 inconsistencies, 0 warnings, 0 errors)
The web hits I found on this seem to expect the error to hit on the COPY command that dumps the data, but we seem to be hitting this error way ahead of that. I ran a pg_dump on the other DBs in the cluster and it finished without errors.
While this isn't production, I'd like to salvage things if possible (as well as try to determine why this happened). I can take a statement that hit the error from the log and run it myself in psql and it will return fine. I can only seem to hit this error when I run the pg_dump, but it's happening very frequently in this pre-prod DB.
Any hope here?
Don Seiler
www.seiler.us
www.seiler.us
On Wed, Apr 11, 2018 at 2:44 PM, Don Seiler <don@seiler.us> wrote:
I can take a statement that hit the error from the log and run it myself in psql and it will return fine.
This includes the statement that hits the error on pg_dump. It errors every time on pg_dump but not when I run it manually from psql as the postgres user:
# SELECT at.attname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault('c',c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('c',c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS attacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('c',c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault('c',c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rattacl, NULL AS initattacl, NULL AS initrattacl FROM pg_catalog.pg_attribute at JOIN pg_catalog.pg_class c ON (at.attrelid = c.oid) LEFT JOIN pg_catalog.pg_init_privs pip ON (at.attrelid = pip.objoid AND pip.classoid = 'pg_catalog.pg_class'::pg_catalog.regclass AND at.attnum = pip.objsubid) WHERE at.attrelid = '16445700'::pg_catalog.oid AND NOT at.attisdropped AND ((SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault('c',c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('c',c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) IS NOT NULL OR (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('c',c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault('c',c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) IS NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL)ORDER BY at.attnum;
attname | attacl | rattacl | initattacl | initrattacl
---------+--------+---------+------------+-------------
(0 rows)
Time: 3.099 ms
Don Seiler
www.seiler.us
www.seiler.us
I think you are getting the error while pg_dump is collecting acl's of table '16445700'::pg_catalog.oid https://github.com/postgres/postgres/blob/REL9_6_6/src/bin/pg_dump/pg_dump.c#L15182-L15222 but I am not sure why you are running out of memory
2018-04-11 21:47 GMT+02:00 Don Seiler <don@seiler.us>:
On Wed, Apr 11, 2018 at 2:44 PM, Don Seiler <don@seiler.us> wrote:I can take a statement that hit the error from the log and run it myself in psql and it will return fine.This includes the statement that hits the error on pg_dump. It errors every time on pg_dump but not when I run it manually from psql as the postgres user:# SELECT at.attname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at.attacl,pg_catalog.acldefault(' c',c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce( pip.initprivs,pg_catalog. acldefault('c',c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS attacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce( pip.initprivs,pg_catalog. acldefault('c',c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at. attacl,pg_catalog.acldefault(' c',c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rattacl, NULL AS initattacl, NULL AS initrattacl FROM pg_catalog.pg_attribute at JOIN pg_catalog.pg_class c ON (at.attrelid = c.oid) LEFT JOIN pg_catalog.pg_init_privs pip ON (at.attrelid = pip.objoid AND pip.classoid = 'pg_catalog.pg_class'::pg_ catalog.regclass AND at.attnum = pip.objsubid) WHERE at.attrelid = '16445700'::pg_catalog.oid AND NOT at.attisdropped AND ((SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(at. attacl,pg_catalog.acldefault(' c',c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce( pip.initprivs,pg_catalog. acldefault('c',c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) IS NOT NULL OR (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce( pip.initprivs,pg_catalog. acldefault('c',c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(at. attacl,pg_catalog.acldefault(' c',c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) IS NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL)ORDER BY at.attnum; attname | attacl | rattacl | initattacl | initrattacl---------+--------+---------+------------+------------- (0 rows)Time: 3.099 ms--Don Seiler
www.seiler.us
Greetings, * Don Seiler (don@seiler.us) wrote: > Started seeing "invalid memory alloc" errors in a non-production DB (9.6.6) > all of a sudden this afternoon. It's running on CentOS 7.4 on VMWare. > > I ran a pg_dump on that DB and got this error almost immediately: > > pg_dump: [archiver (db)] query failed: ERROR: invalid memory alloc request > size 8830452760576 That's interesting. An invalid memory alloc request size complaint such as that tends to be an indicator of corruption or a bug in the backend possibly. Certainly curious that you were able to run the query and not hit that.. Are you sure you were running it exactly the way pg_dump does? There's a bunch of setup that pg_dump does early on, maybe log all queries when you run the pg_dump and then in a psql session try running them..? If you can reliably reproduce this with either pg_dump or psql, then the next step would perhaps be jumping in with gdb to see where that error is being hit.. Thanks! Stephen
Вложения
On Fri, Apr 13, 2018 at 11:09 AM, Stephen Frost <sfrost@snowman.net> wrote:
If you can reliably reproduce this with either pg_dump or psql, then the
next step would perhaps be jumping in with gdb to see where that error
is being hit..
I forgot to follow-up yesterday. It seems like my issue was caused by a bug with the pg_qualstats extension that I was using in conjuction with the POWA extension. There was already a similar issue reported on their tracker at https://github.com/powa-team/powa/issues/104
It seems that high TPS stress testing on Wednesday afternoon triggered the bug. When the stress testing tailed off, the error messages stopped happening and I was able to run pg_dump without errors. I removed those extensions shortly after. The authors have put out a bug fix but I can't use this system to test their fix and my sandbox isn't available for the same HA server stress test that we were doing on Wednesday.
Don.
Don Seiler
www.seiler.us
www.seiler.us
On 04/13/2018 06:13 PM, Don Seiler wrote: > On Fri, Apr 13, 2018 at 11:09 AM, Stephen Frost <sfrost@snowman.net > <mailto:sfrost@snowman.net>> wrote: > > If you can reliably reproduce this with either pg_dump or psql, then the > next step would perhaps be jumping in with gdb to see where that error > is being hit.. > > > I forgot to follow-up yesterday. It seems like my issue was caused by a bug with > the pg_qualstats extension that I was using in conjuction with the POWA > extension. There was already a similar issue reported on their tracker > at https://github.com/powa-team/powa/issues/104 > > It seems that high TPS stress testing on Wednesday afternoon triggered the bug. > When the stress testing tailed off, the error messages stopped happening and I > was able to run pg_dump without errors. I removed those extensions shortly > after. The authors have put out a bug fix but I can't use this system to test > their fix and my sandbox isn't available for the same HA server stress test that > we were doing on Wednesday. > > Don. > Hello, FIY, a new version including bug fix, has been released two days ago : https://github.com/powa-team/pg_qualstats/releases/tag/1.0.4 Regards,