Обсуждение: 9.5.4: Segfault (signal 11) while running ALTER TABLE

Поиск
Список
Период
Сортировка

9.5.4: Segfault (signal 11) while running ALTER TABLE

От
Devrim Gündüz
Дата:
Hi,

I received an email offlist about a crash that a non-customer experienced
recently. I asked them to send a few details about the crash, so here it is.
Please note that I don't have access to their systems, but they will be fast
enough to provide data if needed. (They asked me to mask some of the table/func
names)

They have a domain called primaryuuid:

CREATE DOMAIN primaryuuid AS uuid NOT NULL DEFAULT uuid_generate_v4();

and a table called foo1, which used to have this domain as a data type:

CREATE TABLE foo1 (
    id public.primaryuuid NOT NULL,
 <some more columns>
    tempuuid uuid
);

They wanted to change id column from uuid to int, so created this func first:

CREATE FUNCTION foofunc_id_uuidtoint(chartoconvert uuid) RETURNS integer
    LANGUAGE sql IMMUTABLE STRICT
    AS $_$
SELECT newid FROM foo1 WHERE tempuuid = $1 LIMIT 1;
$_$;


and ran this:

ALTER TABLE foo1 ALTER COLUMN id TYPE INTEGER USING
foofunc_id_uuidtoint(tempuuid);

This command crashed postmaster. They connected to related process once again
with gdb, and got this:

===========
Program received signal SIGSEGV, Segmentation fault.
slot_deform_tuple (slot=0x1b80440, natts=13) at heaptuple.c:1157
1157            values[attnum] = fetchatt(thisatt, tp + off);
Continuing.
===========

Below are some lines in postgresql.log:

==============================================================
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "foo1"
process 10884 acquired AccessExclusiveLock on relation 16961 of database 16401
after 1000.411 ms
STATEMENT:  ALTER TABLE foo1 ALTER COLUMN id TYPE INTEGER USING
foofunc_id_uuidtoint(tempuuid);
LOG:  server process (PID 10884) was terminated by signal 11: Segmentation
fault
==============================================================

Please note that this crash is not specific to this table. Various attempts to change this data type on various tables
alsocrashed postmaster. 

This happened a few times, and after disabling autovacuum and lowering
maintenance_work_mem from 4GB to 2GB, they could complete all ALTER TABLE
commands.

This is PostgreSQL 9.5.4 on CentOS 6.8, fully up2date, and they are using the
community RPMs. 

Any ideas what is causing this crash?

Please let me know if you need more data.

Regards,
--
Devrim GÜNDÜZ
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR

Re: 9.5.4: Segfault (signal 11) while running ALTER TABLE

От
Tom Lane
Дата:
Devrim Gündüz <devrim@gunduz.org> writes:
> They wanted to change id column from uuid to int, so created this func first:

> CREATE FUNCTION foofunc_id_uuidtoint(chartoconvert uuid) RETURNS integer
>     LANGUAGE sql IMMUTABLE STRICT
>     AS $_$
> SELECT newid FROM foo1 WHERE tempuuid = $1 LIMIT 1;
> $_$;

> and ran this:

> ALTER TABLE foo1 ALTER COLUMN id TYPE INTEGER USING
> foofunc_id_uuidtoint(tempuuid);

> This command crashed postmaster.

The above isn't ever likely to work for any large value of "work",
because the function would be confused about what the table rowtype
is.  I thought we had adequate defenses in there to throw an error
if you try to access a table that's in the middle of being altered,
but apparently this case isn't covered.

Why didn't they just doALTER TABLE foo1 ALTER COLUMN id TYPE INTEGER USING newid;
?  The intermediate function sure seems like the hard way.
        regards, tom lane



Re: 9.5.4: Segfault (signal 11) while running ALTER TABLE

От
Devrim Gündüz
Дата:
Hi Tom,

On Tue, 2016-08-30 at 08:18 -0400, Tom Lane wrote:
> The above isn't ever likely to work for any large value of "work",
> because the function would be confused about what the table rowtype
> is.  I thought we had adequate defenses in there to throw an error
> if you try to access a table that's in the middle of being altered,
> but apparently this case isn't covered.
>
> Why didn't they just do
>         ALTER TABLE foo1 ALTER COLUMN id TYPE INTEGER USING newid;
> ?  The intermediate function sure seems like the hard way.

Just got a reply from them. They had some historical reasons for this, but
apparently it was not needed, so they'll change their procedures based on your
suggestion.

However, they also reported that they did not have this problem in their
staging environment. I assume that staging has less resources than the prod
machine. What could cause this problem that has more maintenance_work_mem and
shared_buffers than staging one?

Thanks!

Regards,
-- 
Devrim GÜNDÜZ
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR