Обсуждение: BUG #5718: Cannot start postgres (FATAL: invalid cache id: 19)
The following bug has been logged online: Bug reference: 5718 Logged by: Aleksandr Dushein Email address: adushein@rutube.ru PostgreSQL version: 9.0.1 Operating system: FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 Description: Cannot start postgres (FATAL: invalid cache id: 19) Details: Hello. I was upgraded my database from 8.4.4 Ñо 9.0.1 one week ago by pg_dump/pg_restore. Yesterday database stop responding with message ERROR: database is not accepting commands to avoid wraparound data loss I trying to restart in single mode and fail with the same error. %postgres --single -D /data/pgdata -d 5 DEBUG: TZ "Africa/Algiers" scores 0: at 1263427200 2010-01-14 01:00:00 std versus 2010-01-14 03:00:00 std ...skipped... DEBUG: TZ "UCT" scores 0: at 1263427200 2010-01-14 00:00:00 std versus 2010-01-14 03:00:00 std DEBUG: TZ "UTC" scores 0: at 1263427200 2010-01-14 00:00:00 std versus 2010-01-14 03:00:00 std DEBUG: TZ "Universal" scores 0: at 1263427200 2010-01-14 00:00:00 std versus 2010-01-14 03:00:00 std DEBUG: TZ "W-SU" gets max score 5200 DEBUG: TZ "Zulu" scores 0: at 1263427200 2010-01-14 00:00:00 std versus 2010-01-14 03:00:00 std DEBUG: Reject TZ "Mideast/Riyadh87": uses leap seconds DEBUG: Reject TZ "Mideast/Riyadh88": uses leap seconds DEBUG: Reject TZ "Mideast/Riyadh89": uses leap seconds DEBUG: TZ "posixrules" scores 0: at 1263427200 2010-01-13 19:00:00 std versus 2010-01-14 03:00:00 std DEBUG: invoking IpcMemoryCreate(size=1103020032) DEBUG: removing file "pg_notify/0000" DEBUG: InitPostgres DEBUG: my backend id is 1 LOG: database system was shut down at 2010-10-19 11:25:38 MSD DEBUG: checkpoint record is at 433/C474D3C8 DEBUG: redo record is at 433/C474D3C8; shutdown TRUE DEBUG: next transaction ID: 0/2146484301; next OID: 106178 DEBUG: next MultiXactId: 1; next MultiXactOffset: 0 DEBUG: oldest unfrozen transaction ID: 654, in database 16385 DEBUG: transaction ID wrap limit is 2147484301, limited by database with OID 16385 FATAL: invalid cache id: 19 DEBUG: shmem_exit(1): 8 callbacks to make DEBUG: proc_exit(1): 3 callbacks to make DEBUG: exit(1) DEBUG: shmem_exit(-1): 0 callbacks to make DEBUG: proc_exit(-1): 0 callbacks to make ---------config (only uncommented lines)------------- %cat /data/pgdata/postgresql.conf | grep -v '^[ ]*#' | grep -v '^$' listen_addresses='*' max_connections = 100 # (change requires restart) shared_buffers = 1024MB # min 128kB temp_buffers = 256MB work_mem = 128MB maintenance_work_mem = 512MB fsync = off wal_buffers = 512kB log_destination = 'syslog' silent_mode = on update_process_title = off datestyle = 'iso, mdy' lc_messages = 'C' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english' ---------sysctl (only uncommented lines)------------- analyzer-db-1# cat /etc/sysctl.conf | grep -v '^[ ]*#' | grep -v '^$' kern.ipc.shmall=1048576 kern.ipc.shmmax=4294967296 kern.ipc.semmap=256 kern.ipc.shm_use_phys=1 kern.ipc.somaxconn=4096 kern.ipc.maxsockbuf=16777216 kern.maxfiles=65536 net.inet.tcp.blackhole=0 net.inet.udp.blackhole=0 net.inet.tcp.delayed_ack=0 net.inet.tcp.sendspace=32768 net.inet.tcp.recvspace=32768 net.local.stream.recvspace=65535 net.local.stream.sendspace=65535 net.inet.tcp.keepidle=300000 net.inet.tcp.keepintvl=30000 net.inet.tcp.msl=10000 kern.ipc.nmbclusters=65536 analyzer-db-1# cat /boot/loader.conf kern.ipc.semmni=256 kern.ipc.semmns=512 kern.ipc.semmnu=256 ---------disk utilization ------------- analyzer-db-1# df -h Filesystem Size Used Avail Capacity Mounted on /dev/mfid0s1a 989M 268M 642M 29% / devfs 1.0K 1.0K 0B 100% /dev /dev/mfid0s1d 1.9G 14K 1.8G 0% /tmp /dev/mfid0s1f 8.7G 1.2G 6.8G 15% /usr /dev/mfid0s1e 15G 101M 14G 1% /var /dev/mfid0s2d 483G 54G 391G 12% /data /dev/mfid1s1d 541G 228G 269G 46% /data2 /dev/mfid2s1d 1.1T 639G 356G 64% /data3 10.1.13.31:/vol/vol5 1.0T 721G 303G 70% /data/backup analyzer-db-1# du -sch /data*/pgdata/ 54G /data/pgdata/ 201G /data2/pgdata/ 523G /data3/pgdata/ 778G total
Excerpts from Aleksandr Dushein's message of mié oct 20 06:44:09 -0300 2010: > FATAL: invalid cache id: 19 Hmm, try removing pg_internal.init from directories under "base/" and restart single mode. (You then need to run vacuum on the specified database to get out of the other problem). -- Ãlvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
"Aleksandr Dushein" <adushein@rutube.ru> writes: > Description: Cannot start postgres (FATAL: invalid cache id: 19) > Hello. I was upgraded my database from 8.4.4 Ñо 9.0.1 one week ago by > pg_dump/pg_restore. Yesterday database stop responding with message ERROR: > database is not accepting commands to avoid wraparound data loss > I trying to restart in single mode and fail with the same error. Ugh. The "invalid cache id" failure is my fault. You can fix that by applying this patch to 9.0: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=c98cd9bdb618794643b670c8240c0aad213340c7 after which you'll probably need to do a manual vacuum with a standalone backend before the DB will restart. However, the bigger question is how you got into this state in the first place. Even if you'd had autovacuum turned off, it should have protected you against a wraparound. I suspect that pg_upgrade is failing to preserve datfrozenxid and related values properly. Do you by any chance still have the 8.4 database available for examination? Do you have an idea how many transactions you executed since converting? Bruce: exactly what is pg_upgrade's scheme for getting the right frozenxid values into the new catalogs, anyway? regards, tom lane
Excerpts from Tom Lane's message of mié oct 20 13:57:16 -0300 2010: > "Aleksandr Dushein" <adushein@rutube.ru> writes: > > Description: Cannot start postgres (FATAL: invalid cache id: 19) > > > Hello. I was upgraded my database from 8.4.4 Ñо 9.0.1 one week ago by > > pg_dump/pg_restore. Yesterday database stop responding with message ERROR: > > database is not accepting commands to avoid wraparound data loss > > I trying to restart in single mode and fail with the same error. > > Ugh. The "invalid cache id" failure is my fault. You can fix that by > applying this patch to 9.0: > http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=c98cd9bdb618794643b670c8240c0aad213340c7 > after which you'll probably need to do a manual vacuum with a standalone > backend before the DB will restart. > > However, the bigger question is how you got into this state in the first > place. Even if you'd had autovacuum turned off, it should have > protected you against a wraparound. I suspect that pg_upgrade is > failing to preserve datfrozenxid and related values properly. This exact symptom was just reported twice in pgsql-es-ayuda. Upgrading to 9.0.4 fixed the problem. One of them used pg_upgrade from a previous version, the other one did not. I'm not sure that it's worthwhile to expore the underlying cause, but it seems clear that pg_upgrade is not the only possibly guilty party. -- Ãlvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > This exact symptom was just reported twice in pgsql-es-ayuda. Upgrading > to 9.0.4 fixed the problem. One of them used pg_upgrade from a previous > version, the other one did not. I'm not sure that it's worthwhile to > expore the underlying cause, but it seems clear that pg_upgrade is not > the only possibly guilty party. Yeah, it seems like we are getting into XID-wraparound-approaching states a lot more easily, or a lot quicker, than I would expect. The original theory about that, which turned out to be true, was that pg_upgrade was bollixing relfrozenxid values. But that makes at least two reports from people who (claimed they) didn't use pg_upgrade. So maybe there is something else going on. I see no way to investigate on the strength of the available evidence though ... regards, tom lane