Re: RelationCreateStorage can orphan files

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: RelationCreateStorage can orphan files
Дата
Msg-id AANLkTimHVpYtugbu=1UhxyiEHnqRQ5Dg8fGgEwGRu17f@mail.gmail.com
обсуждение исходный текст
Ответ на Re: RelationCreateStorage can orphan files  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: RelationCreateStorage can orphan files  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Sep 15, 2010 at 9:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I notice that RelationCreateStorage() creates the main fork on disk
>> before writing (let alone flushing) WAL.  So if PG gets killed at that
>> point, we end up with an orphaned file on disk.  I think that we could
>> even extend the relation a few times before WAL gets written, so I
>> don't even think it's necessarily a zero-size file.  We could perhaps
>> avoid this by writing and flushing a WAL record that includes the
>> creating XID before touching the disk; when we replay the record, we
>> create the file but then delete it if the XID fails to commit before
>> recovery ends.  But I guess maybe our feeling is that it's just not
>> worth taking a performance hit for this?
>
> That design is intentional.  If the file create fails, and you've
> already written a WAL record that says you created it, you are flat
> out screwed.  You can't even PANIC --- if you do, then the replay of
> the WAL record will likely fail and PANIC again, leaving the database
> dead in the water.

Not that this is perhaps more than of academic interest, but could you
get around this problem by making the replay of the XLOG record defer
the creation of the file until such time as it's actually written to
or the creating XID commits?  And also, if the XID does not commit,
going back and trying to remove the file (on a best effort basis)?

> Orphaned files, in contrast, are completely non-dangerous --- the worst
> they can do is waste a little bit of disk space.  That's a cheap price
> to pay for not having an unrecoverable database after a create failure.
>
> This is essentially the same reason why CREATE DATABASE and related
> commands xlog directory copy operations only after completing them.
> That potentially wastes much more than a few blocks; but it's still
> non-dangerous, and far safer than the alternative.

Thanks for the explanation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: patch: SQL/MED(FDW) DDL
Следующее
От: Itagaki Takahiro
Дата:
Сообщение: Re: patch: SQL/MED(FDW) DDL