Re: [HACKERS] Orphaned files in base/[oid]

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] Orphaned files in base/[oid]
Дата
Msg-id 20170814185632.zodm5qykgss7ud32@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] Orphaned files in base/[oid]  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] Orphaned files in base/[oid]  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] Orphaned files in base/[oid]  (Robert Haas <robertmhaas@gmail.com>)
Re: Orphaned files in base/[oid]  (Alexey Gordeev <goa@arenadata.io>)
Список pgsql-hackers
On 2017-08-14 14:40:46 -0400, Tom Lane wrote:
> The core problem with zapping non-temp table files is that you can't
> do that unless you're sure you have consistent, up-to-date pg_class
> data that nobody else is busy adding to.  It's hard to see an external
> application being able to do that safely.  You certainly can't do it
> at the point in the postmaster startup cycle where we currently do
> the other things --- for those, we rely only on filesystem naming
> conventions to identify what to zap.

I think there are some possibilities to close the gap here. We could
e.g. have <relfilenode>.delete_on_crash marker files that get installed
when creating a new persistent relfilenode. If we set up things so they
get deleted post commit, but inside the critical section, we could rely
on them being present in case of crash, but consistently removed during
WAL replay. At the end of recovery, iterate over the whole datadir and
nuke all relations with marker files present.

I first thought that'd cost an additional fsync per relation
created. But I think we actually can delay that to a pre-commit phase,
if we have XLOG_SMGR_CREATE create those markers via a flag, and fsync
them just before checkpoint (via the usual delayed fsync mechanism).
That'd still require an XLogFlush(), but that seems hard to avoid unless
we just don't create relations on FS level until buffers are
evicted and/or BufferSync().


Alternatively we could do something without marker files, with some
added complexity: Keep track of all "uncommitted new files" in memory,
and log them every checkpoint. Commit/abort records clear elements of
that list. Since we always start replay at the beginning of a
checkpoint, we'd always reach a moment with such an up2date list of
pending-action files before reaching end-of-recovery. At end-of-recovery
we can delete all unconfirmed files.  To avoid out-of-memory due to too
many tracked relations, we'd possibly still have to have marker files...

Regards,

Andres



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Orphaned files in base/[oid]
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] shared memory based stat collector (was: Sharing recordtypmods between backends)