Re: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c
От | Michael Paquier |
---|---|
Тема | Re: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c |
Дата | |
Msg-id | aLlAym4DHW4PM8Gg@paquier.xyz обсуждение исходный текст |
Ответ на | Re: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c (Mikhail Kot <mikhail.kot@databricks.com>) |
Список | pgsql-hackers |
On Wed, Sep 03, 2025 at 10:39:04PM +0100, Mikhail Kot wrote: > The issue is not only in pgstat_init_entry(). Currently it errors on OOM but > this doesn't prevent us from calling pgstat_lock_entry() through > pgstat_get_entry_ref() which accesses a non-initialized lock. Spent more time on that, finally. So your argument is that this leads to an inconsistent state in the hash table because the dshash API is designed to force a new entry creation if it cannot be found. - shheader = pgstat_init_entry(kind, shhashent); + PG_TRY(); + { + shheader = pgstat_init_entry(kind, shhashent); + } + PG_CATCH(); + { + dshash_delete_entry(pgStatLocal.shared_hash, shhashent); + PG_RE_THROW(); + } + PG_END_TRY(); Hmm. There are a couple of extra errors that can be reached, through get_segment_by_index() or dsa_get_address() for example, but these point to scenarios that should never happen or programming errors when using DSAs. > Here's the second version of the patch. Now we remove inserted hash entry > on OOM which would prevent accessing the entry There are only two callers of pgstat_init_entry(), so I am wondering if a better long-term thing would be to track this behavior in pgstat_init_entry(), and let the callers deal with the cleanup consequences, rather than have the callers of pgstat_init_entry() guess that they may need to do something with a TRY/CATCH block. I doubt that the number of callers of pgstat_init_entry() will raise, but people like doing fancy things. In pgstat_read_statsfile(), an interesting side effect is the possibility to issue a soft error. I have never seen anybody complain about an OOM from the startup process when reading the stats file, TBH, still prioritizing availability is an interesting take when reading the stats file when facing a DSA allocation failure. In order to reproduce one failure pattern, you can use the attached 0002, then use this sequence to emulate the OOM path and the dshash table inconsistency (install src/test/modules/injection_points first): create extension injection_points; select injection_points_attach('pgstat-init-entry-oom', 'notice'); -- SQL query able to create fresh pgstats entry -- ERROR with patch 0001, crash on HEAD. Note that none of that seems worth a backpatch, we have an history of treating unlikely-going-to-happen errors like OOMs as HEAD-only improvements. This one is of the same class. -- Michael
Вложения
В списке pgsql-hackers по дате отправления: