Re: POC: Cleaning up orphaned files using undo logs

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: POC: Cleaning up orphaned files using undo logs
Дата
Msg-id CAFiTN-veZUKmd=-8Lbt6zWkPbbZaAh3K-AYZGN76qau2wyfjeQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Jun 19, 2019 at 2:40 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Jun 18, 2019 at 2:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Tue, Jun 18, 2019 at 7:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > [ new patches ]
> >
> > I tried writing some code that throws an error from an undo log
> > handler and the results were not good.
>
> I discovered another bothersome thing here: if you have a single
> transaction that generates a bunch of undo records, the first one has
> uur_dbid set correctly and the remaining records have uur_dbid set to
> 0.  That means if you try to write a sanity check like if
> (record->uur_dbid != MyDatabaseId) elog(ERROR, "the undo system messed
> up") it fails.
>
> The original idea of UnpackedUndoRecord was this: you would put a
> bunch of data into an UnpackedUndoRecord that you wanted written to
> undo, and the undo system would find ways to compress stuff out of the
> on-disk representation by e.g. omitting the fork number if it's
> MAIN_FORKNUM. Then, when you read an undo record, it would decompress
> so that you ended up with the same UnpackedUndoRecord that you had at
> the beginning. However, the inclusion of transaction headers has made
> this a bit confusing: that stuff isn't being added by the user but by
> the undo system itself. It's not very clear from the comments what the
> contract is around these things: do you need to set uur_dbid to
> MyDatabaseId when preparing to insert an undo record? Or can you just
> leave it unset and then it'll possibly be set at decoding time?  The
> comments for the UnpackedUndoRecord structure don't explain this.
>
> I'd really like to see this draw a cleaner distinction between the
> stuff that the user is expected to set and the other stuff we deal
> with internally to the undo subsystem.  For example, suppose that
> UnpackedUndoRecord didn't include any of the fields that are only
> present in the transaction header.  Maybe there's another structure,
> like UndoTransactionHeader, that includes those fields.  The client of
> the undo subsystem creates a bunch of UnpackedUndoRecords and inserts
> them.  At undo time, the callback gets back an identical set of
> UnpackedUndoRecords.  And maybe it also gets a pointer to the
> UndoTransactionHeader which contains all of the system-generated
> stuff. Under this scheme, uur_xid, uur_xidepoch (which still need to
> be combined into uur_fxid), uur_progress, uur_dbid, uur_next,
> uur_prevlogstart, and uur_prevurp would all move out of the
> UnpackedUndoRecord and into the UndoTransactionHeader. The user would
> supply none of those things when inserting undo records, but the
> rm_undo callback could examine those values if it wished.
>
> A weaker approach would be to at least clean up the structure
> definition so that the transaction-header fields set by the system are
> clearly segregated from the per-record fields set by the
> undo-inserter, with comments explaining that those fields don't need
> to be set but will (or may?) be set at undo time. That would be better
> than what we have right now - because it would hopefully make it much
> more clear which fields need to be set on insert and which fields can
> be expected to be set when decoding - but I think it's probably not
> going far enough.

I think it's a fair point.   We can keep pointer to
UndoRecordTransaction(urec_progress, dbid, uur_next) and
UndoRecordLogSwitch(urec_prevurp, urec_prevlogstart) in
UnpackedUndoRecord and include them whenever undo record contain these
headers.  Transaction header in the first record of the transaction
and log-switch header in the first record after undo-log switch during
a transaction.  IMHO uur_fxid, we can keep as part of the main
UnpackedUndoRecord, because as part of the other work  "Compression
for undo records to consider rmgrid, xid,cid,reloid for each record",
the FullTransactionId, will be present in every UnpackedUndoRecord
(although it will not be stored in every undo record).


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: Typo in tableamapi.c
Следующее
От: Binguo Bao
Дата:
Сообщение: [proposal] de-TOAST'ing using a iterator