I was setting up an implicit type cast for an application that was inserting a boolean into a numeric field, but I used the wrong return type from the function... and this caused the Postmaster to core dump:
[postgres@efm1 ~]$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
[postgres@efm1 ~]$ pg_ctl -c start
[postgres@efm1 ~]$ < 2017-04-21 11:10:35.287 BST > LOG: redirecting log output to logging collector process
< 2017-04-21 11:10:35.287 BST > HINT: Future log output will appear in directory "pg_log".
postgres=# select version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)
postgres=# CREATE FUNCTION bool_to_num (boolean) RETURNS integer
postgres-# AS 'SELECT CASE WHEN $1 = true THEN 1 ELSE 0 END;'
postgres-# RETURNS NULL ON NULL INPUT;
postgres=# select oid from pg_proc where proname = 'bool_to_num';
postgres=# SELECT oid, typname FROM pg_type WHERE typname IN ('bool', 'numeric');
postgres=# INSERT INTO pg_cast (castsource, casttarget, castfunc, castcontext, castmethod) VALUES (16,1700,16384,'a','f');
postgres=# create table bool_test(test_column numeric(22,0));
postgres=# insert into bool_test(test_column) values(true);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
[postgres@efm1 ~]$ ls -al $PGDATA/core.27553
-rw-------. 1 postgres postgres 152203264 Apr 21 11:14 /var/lib/pgsql/9.6/data/core.27553
[postgres@efm1 ~]$ file $PGDATA/core.27553
/var/lib/pgsql/9.6/data/core.27553: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'postgres: postgres postgres', real uid: 1002, effective uid: 1002, real gid: 1002, effective gid: 1002, execfn: '/usr/pgsql-9.6/bin/postgres', platform: 'x86_64'
I realise that my types were wrong (function returned an "integer", and should have returned a "numeric"), and I've fixed that and now it's working fine... but how can a type mismatch cause the entire cluster to crash?