Re: Idea: recycle WAL segments, don't delete/recreate 'em

Поиск
Список
Период
Сортировка
От Patrick Macdonald
Тема Re: Idea: recycle WAL segments, don't delete/recreate 'em
Дата
Msg-id 3B54821F.E2B9331C@redhat.com
обсуждение исходный текст
Ответ на Idea: recycle WAL segments, don't delete/recreate 'em  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom,

What you are describing is a pseudo circular log.  Other database
systems (such as DB2) support the concept of both circular and
recoverable logs.  Recoverable is named this way because 
recoverable logs can be used in point-in-time recovery.  Both 
methods support crash recovery.

In general, a user defines the number of log extents to be used in
the log cycle.  He/she also defines the number of secondary logs to
use if by chance the circular log becomes full.  If a secondary log
extent is created, it is added to the cycle list.  At a consistent
shutdown, the secondary log extents are deleted.  Since logs
are deleted, any hope of point-in-time recovery is deleted with them.

I understand your solution is for the existing architecture which does
not support point-in-time recovery.  If this item is picked up, your
solution will become a stumbling block due the above mentioned log
extent deletions.  The other issues you list are of concern but are
manageable with some coding. 

So, my question is, should PostgreSQL support both types of logging?
There will be databases where you require the ability to perform 
point-in-time recovery.  Conversely, there will be databases where
an overwritten log extent (as you describe) is acceptable.  I think
it would be useful to be able to define which logging method you
require for a database.  This way, you incur the I/O hit only when
forward recovery is a requirement.

Thoughts/comments?

Cheer,
Patrick 
    

Tom Lane wrote:
> 
> I have noticed that a large fraction of the I/O done by 7.1 is
> associated with initializing new segments of the WAL log for use.
> (We have to physically fill each segment with zeroes to ensure that
> the system has actually allocated a whole 16MB to it; otherwise we
> fall victim to the "hole-saving" allocation technique of most Unix
> filesystems.)  I just had an idea about how to avoid this cost:
> why not recycle old log segments?  At the point where the code
> currently deletes a no-longer-needed segment, just rename it to
> become the next created-in-advance segment.
> 
> With this approach, shortly after installation the system would converge
> to a steady state with a constant number of WAL segments (basically
> CHECKPOINT_SEGMENTS + WAL_FILES + 1, maybe one or two more if load is
> really high).  So, in addition to eliminating initialization writes,
> we would also reduce the metadata traffic (inode and indirect blocks)
> to a very low level.  That has to be good both for performance and for
> improving the odds that the WAL files will survive a system crash.
> 
> The sole disadvantage I can see to this approach is that a recycled
> segment would not contain zeroes, but valid WAL records.  We'd need
> to take care that in a recovery situation, we not mistake old records
> beyond the last one we actually wrote for new records we should redo.
> While checking the xl_prev back-pointers in each record should be
> sufficient to detect this, I'd feel more comfortable if we extended
> the XLogPageHeader record to contain the file/segment number that it
> belongs to.  This'd cost an extra 8 bytes per 8K XLOG page, which seems
> worth it to me.
> 
> Another issue is whether the recycling logic should be "always recycle"
> (hence number of extant WAL segments will never decrease), or should
> it be more like "recycle if there are fewer than WAL_FILES advance
> segments, else delete".  If we were supporting WAL-based UNDO then I
> think it'd have to be the latter, so that we could reduce the WAL usage
> from a peak created by a long-running transaction.  But with the present
> logic that the WAL log is truncated after each checkpoint, I think it'd
> be better just to never delete.  Otherwise, the behavior is likely to
> be that the system varies between N and N+1 extant segments due to
> roundoff effects (ie, depending on just where you are in the current
> segment when a checkpoint happens).  That's exactly what we do not want.
> 
> A possible answer is "recycle if there are fewer than WAL_FILES + SLOP
> advance files, else delete", where SLOP is (say) about three or four
> segments.  That would avoid unwanted oscillations in the number of
> extant files, while still allowing decrease from a peak for UNDO.
> 
> Comments, better ideas?
> 
>                         regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "eCommerce Software Solutions Inc."
Дата:
Сообщение: Fw: Leaking Handles in Postgres 7.1.2 on Cygwin dll 1.3.2 on Win 2000
Следующее
От: "Hsin Lee"
Дата:
Сообщение: Question about porting the PostgreSQL