Re: including PID or backend ID in relpath of temp rels

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: including PID or backend ID in relpath of temp rels
Дата
Msg-id AANLkTilwKmdpj3Ou1t6V5-cvRS3Ww1FKvn0kSahq3xjD@mail.gmail.com
обсуждение исходный текст
Ответ на Re: including PID or backend ID in relpath of temp rels  (Jim Nasby <decibel@decibel.org>)
Список pgsql-hackers
On Mon, May 17, 2010 at 2:10 PM, Jim Nasby <decibel@decibel.org> wrote:
>> It seems prett clear that it isn't desirable to simply add backend ID
>> to RelFileNode, because there are too many places using RelFileNode
>> already for purposes where the backend ID can be inferred from
>> context, such as buffer headers and most of xlog.  Instead, I
>> introduced BackendRelFileNode, which consists of an ordinary
>> RelFileNode augmented with a backend ID, and use that only where
>> needed.  In particular, the smgr layer must use BackendRelFileNode
>> throughout, since it operates on both permanent and temporary
>> relations. smgr invalidations must also include the backend ID.  xlog
>> generally happens only for non-temporary relations and can thus
>> continue to use an ordinary RelFileNode; however, commit/abort records
>> must use BackendRelFileNode as there may be physical storage
>> associated with temporary relations that must be unlinked.
>> Communication with the bgwriter must use BackendRelFileNode for
>> similar reasons. The relcache now stores rd_backend rather than
>> rd_islocaltemp so that it remains straightforward to call smgropen()
>> based on a relcache entry. Some smgr functions no longer require an
>> isTemp argument, because they can infer the necessary information from
>> their BackendRelFileNode.  smgrwrite() and smgrextend() now take a
>> skipFsync argument rather than an isTemp argument.
>>
>> I'm not totally sure whether it makes sense to do what we were talking
>> about above, viz, transfer unlink responsibility for temp rels from
>> the bgwriter to the owning backend.  I haven't done that here.  Nor
>> have I implemented any kind of improved temporary file cleanup
>> strategy, though I hope such a thing is possible.
>
> Any particular reason not to use directories to help organize things? IE:
>
> base/database_oid/pg_temp_rels/backend_pid/relfilenode
>
> Perhaps relfilenode should be something else.
>
> This seems to have several advantages:
>
> 1: It's more organized. If you want to see all the files for a single backend you have just one place to look.
Findingeverything is still easy via filesystem find. 
> 2: Cleanup becomes easier. When a backend exits, it's entire directory goes away. On server start, everything under
pg_temp_relsgoes away. Unfortunately we still have a race condition with cleaning up if a backend dies and can't run
it'sown cleanup, though I think that anytime that happens we're going to restart everything anyway. 
> 3: It separates all the temporary stuff away from real files.
>
> The only downside I see is some extra code to create the backend_pid directory.

I like the idea of using directories to organize things better and I
completely agree with points #1 and #3.  Point #2 is a little more
complicated, I think, and something I've been struggling with.  We
need to make sure that we clean up not only the temporary files but
also the catalog entries that point to them, if any.  The current code
only blows away temporary tables that are "old" in terms of
transaction IDs, is driven off the catalog entries, and essentially
does DROP TABLE <whatever>.  So it can't clean up orphaned temporary
files that don't have any catalog entries associated with them (which
is what we want to fix) but on the other hand whatever it does clean
up is cleaned up completely.

We might be able to do something like this:

1. Scan pg_temp_rels.  For each subdirectory found (whose name looks
like a backend ID), if the corresponding backend is not currently
running, add the backend ID to a list of backend IDs needing cleanup.
2. For each backend ID derived in step 1:
2A. Scan the subdirectory and add all the files you find (whose names
are in the right format) to a list of files to be deleted.
2B. Check again whether the backend in question is running.  If it is,
don't do anything further for this backend and go on to the next
backend ID (i.e. continue with step 2).
2C. For each file found in step 2A, look for a pg_class entry in the
temp tablespace for that backend ID with a matching relfilenode
number.  If one is found, drop the rel.  If not, unlink the file if it
still exists.
2D. Attempt to remove the directory, ignoring failures.

I think step 2B is sufficient to prevent a race condition where we end
up mistaking a newly created file for an orphaned one.  Am I right?

One possible problem with this is that it would need to be repeated
for every database/tablespace combination, but maybe that wouldn't be
too expensive.  autovacuum already needs to process every database,
but I don't know how you'd decide how often to check for stray temp
files.  Certainly you'd want to check after a restart... after that I
get fuzzy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: New buildfarm client release
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Performance problem in textanycat/anytextcat