DSM robustness failure (was Re: Peripatus/failures)
| От | Tom Lane | 
|---|---|
| Тема | DSM robustness failure (was Re: Peripatus/failures) | 
| Дата | |
| Msg-id | 6153.1539806400@sss.pgh.pa.us обсуждение исходный текст | 
| Ответ на | Re: Peripatus/failures (Larry Rosenman <ler@lerctr.org>) | 
| Ответы | Re: DSM robustness failure (was Re: Peripatus/failures) | 
| Список | pgsql-hackers | 
Larry Rosenman <ler@lerctr.org> writes: > That got it further, but still fails at PLCheck-C (at least on 9.3). > It's still running the other branches. Hmm. I'm not sure why plpython is crashing for you, but this is exposing a robustness problem in the DSM logic: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2018-10-17%2018%3A22%3A50 The postmaster is suffering an Assert failure while trying to clean up after the crash: 2018-10-17 13:43:23.203 CDT [51974:8] pg_regress LOG: statement: SELECT import_succeed(); 2018-10-17 13:43:24.228 CDT [46467:2] LOG: server process (PID 51974) was terminated by signal 11: Segmentation fault 2018-10-17 13:43:24.228 CDT [46467:3] DETAIL: Failed process was running: SELECT import_succeed(); 2018-10-17 13:43:24.229 CDT [46467:4] LOG: terminating any other active server processes 2018-10-17 13:43:24.229 CDT [46778:2] WARNING: terminating connection because of crash of another server process 2018-10-17 13:43:24.229 CDT [46778:3] DETAIL: The postmaster has commanded this server process to roll back the currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2018-10-17 13:43:24.229 CDT [46778:4] HINT: In a moment you should be able to reconnect to the database and repeat yourcommand. 2018-10-17 13:43:24.235 CDT [46467:5] LOG: all server processes terminated; reinitializing 2018-10-17 13:43:24.235 CDT [46467:6] LOG: dynamic shared memory control segment is corrupt TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 181) It looks to me like what's happening is (1) crashing process corrupts the DSM control segment somehow. (2) dsm_postmaster_shutdown notices that, bleats to the log, and figures its job is done. (3) dsm_postmaster_startup crashes on Assert because dsm_control_mapped_size isn't 0, because the old seg is still mapped. I would argue that both dsm_postmaster_shutdown and dsm_postmaster_startup are broken here; the former because it makes no attempt to unmap the old control segment (which it oughta be able to do no matter how badly broken the contents are), and the latter because it should not let garbage old state prevent it from establishing a valid new segment. BTW, the header comment on dsm_postmaster_startup is a lie, which is probably not unrelated to its failure to consider this situation. regards, tom lane
В списке pgsql-hackers по дате отправления: