Обсуждение: Idea: recycle WAL segments, don't delete/recreate 'em

Поиск
Список
Период
Сортировка

Idea: recycle WAL segments, don't delete/recreate 'em

От
Tom Lane
Дата:
I have noticed that a large fraction of the I/O done by 7.1 is
associated with initializing new segments of the WAL log for use.
(We have to physically fill each segment with zeroes to ensure that
the system has actually allocated a whole 16MB to it; otherwise we
fall victim to the "hole-saving" allocation technique of most Unix
filesystems.)  I just had an idea about how to avoid this cost:
why not recycle old log segments?  At the point where the code
currently deletes a no-longer-needed segment, just rename it to
become the next created-in-advance segment.

With this approach, shortly after installation the system would converge
to a steady state with a constant number of WAL segments (basically
CHECKPOINT_SEGMENTS + WAL_FILES + 1, maybe one or two more if load is
really high).  So, in addition to eliminating initialization writes,
we would also reduce the metadata traffic (inode and indirect blocks)
to a very low level.  That has to be good both for performance and for
improving the odds that the WAL files will survive a system crash.

The sole disadvantage I can see to this approach is that a recycled
segment would not contain zeroes, but valid WAL records.  We'd need
to take care that in a recovery situation, we not mistake old records
beyond the last one we actually wrote for new records we should redo.
While checking the xl_prev back-pointers in each record should be
sufficient to detect this, I'd feel more comfortable if we extended
the XLogPageHeader record to contain the file/segment number that it
belongs to.  This'd cost an extra 8 bytes per 8K XLOG page, which seems
worth it to me.

Another issue is whether the recycling logic should be "always recycle"
(hence number of extant WAL segments will never decrease), or should
it be more like "recycle if there are fewer than WAL_FILES advance
segments, else delete".  If we were supporting WAL-based UNDO then I
think it'd have to be the latter, so that we could reduce the WAL usage
from a peak created by a long-running transaction.  But with the present
logic that the WAL log is truncated after each checkpoint, I think it'd
be better just to never delete.  Otherwise, the behavior is likely to
be that the system varies between N and N+1 extant segments due to
roundoff effects (ie, depending on just where you are in the current
segment when a checkpoint happens).  That's exactly what we do not want.

A possible answer is "recycle if there are fewer than WAL_FILES + SLOP
advance files, else delete", where SLOP is (say) about three or four
segments.  That would avoid unwanted oscillations in the number of
extant files, while still allowing decrease from a peak for UNDO.

Comments, better ideas?
        regards, tom lane


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Bruce Momjian
Дата:
> I have noticed that a large fraction of the I/O done by 7.1 is
> associated with initializing new segments of the WAL log for use.
> (We have to physically fill each segment with zeroes to ensure that
> the system has actually allocated a whole 16MB to it; otherwise we
> fall victim to the "hole-saving" allocation technique of most Unix
> filesystems.)  I just had an idea about how to avoid this cost:
> why not recycle old log segments?  At the point where the code
> currently deletes a no-longer-needed segment, just rename it to
> become the next created-in-advance segment.

This sounds good and with UNDO far off, would be a big win.  The
segement number seems like a good idea.  I can't see any disadvantages.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
fche@redhat.com (Frank Ch. Eigler)
Дата:
tgl wrote:

: [...]  (We have to physically fill each segment with zeroes to
: ensure that the system has actually allocated a whole 16MB to it;
: otherwise we fall victim to the "hole-saving" allocation technique
: of most Unix filesystems.)  [...]

Could you explain how postgresql can "fall victim" the filesystem hole
mechanism?  Just hoping to force actual storage allocation, or hoping
to discourage fragmentation?

- FChE


Re: Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Bruce Momjian
Дата:
> Could you explain how postgresql can "fall victim" the filesystem hole
> mechanism?  Just hoping to force actual storage allocation, or hoping
> to discourage fragmentation?

Most Unix filesystems will not allocate disk blocks until you write in
them.  If you just seek out past end-of-file, the file pointer is moved
but the blocks are unallocated.  This is how 'ls' can show a 1gb file
that only uses 4k of disk space.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Tom Lane
Дата:
Patrick Macdonald <patrickm@redhat.com> writes:
> I understand your solution is for the existing architecture which does
> not support point-in-time recovery.  If this item is picked up, your
> solution will become a stumbling block due the above mentioned log
> extent deletions.

Hmm, I don't see why it's a stumbling block.  There is a notion in the
present code that log segments might be moved someplace else for
archiving (rather than just be deleted), and I wasn't planning on
eliminating that option.  I think however that a realistic archival
mechanism would not simply keep the log segments verbatim.  It could
drop the page images, for a huge space savings, and perhaps also
eliminate records from aborted transactions.  So in reality one could
still expect to recycle the log segments, just with a somewhat longer
cycle time --- ie, after the archiver is done copying a segment, then
you rename it into place as a forward file.

In any case, a two-or-three-line change is hardly likely to create much
of an obstacle to PIT recovery, compared to some of the more fundamental
aspects of the existing WAL design (like its need to start from a
complete physical copy of the database files).  So I'm not sure why
you're objecting on these grounds.
        regards, tom lane


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Tom Lane
Дата:
Patrick Macdonald <patrickm@redhat.com> writes:
> Well, notion and actual practice can be mutually exclusive.  Your
> initial message stated that you would like to rename the log segment.
> This insinuated that the log segment was not moved.  Therefore, a
> straight rename would cause problems with the future point-in-time
> recovery item (ie. the only existing version of log segment N has
> been renamed to N+5).  A backup of the database could not roll forward
> through this name change as stated.  That was my objection. 

I think you are missing the point completely.  The rename will occur
only at the time when we would otherwise DELETE the old log segment.
If, for PIT or any other purpose, we do not wish to delete a log
segment, then it's not going to get recycled either.  My proposal is
then when, and only when, we are prepared to discard an old log segment
forever, we instead rename it to be a created-in-advance future log
segment.

What you may really be saying is that the existing scheme for management
of log segments is inappropriate for PIT usage; if so feel free to
propose a better one.  But I don't see how recycling of no-longer-wanted
segments can break anything.
        regards, tom lane


Re: Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Tom Lane
Дата:
fche@redhat.com (Frank Ch. Eigler) writes:
> Could you explain how postgresql can "fall victim" the filesystem hole
> mechanism?  Just hoping to force actual storage allocation, or hoping
> to discourage fragmentation?

The former.  We'd prefer not to get an unexpected "disk full" failure
while writing to a log file we thought was good.

To the extent that prewriting the WAL segment discourages fragmentation,
that's good too, but it's just a side benefit.
        regards, tom lane


Re: Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Gunnar Rønning
Дата:
* Bruce Momjian <pgman@candle.pha.pa.us> wrote:

| Most Unix filesystems will not allocate disk blocks until you write in
| them.  If you just seek out past end-of-file, the file pointer is moved
| but the blocks are unallocated.  This is how 'ls' can show a 1gb file
| that only uses 4k of disk space.

Does this imply that we could get a performance gain by preallocating space
for indexes and data itself as well ? I've seen that other database products
have a setup step where you have to specify the size of the database. 

Or does PostgreSQL do any other tricks to prevent fragmentation of data ?


-- 
Gunnar Rønning - gunnar@polygnosis.com
Senior Consultant, Polygnosis AS, http://www.polygnosis.com/


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Patrick Macdonald
Дата:
Hmmm... my prior appends to this newsgroup are stalled.  Hopefully,
they'll be available soon.

Tom Lane wrote:
> 
> What you may really be saying is that the existing scheme for management
> of log segments is inappropriate for PIT usage; if so feel free to
> propose a better one.  But I don't see how recycling of no-longer-wanted
> segments can break anything.

Yes, but in a very roundabout way (or so it seems).  The main point
that I was trying to illustrate was that if a database supports 
point-in-time recovery, recycling of the only available log segments 
is a bad thing.  And, yes, in practice if you have point-in-time
recovery enabled you better archive your logs with your backup to
ensure that you can roll forward as expected.

A possible solution (as I mentioned before)) is to have 2 methods
of logging available: circular and forward-recoverable.  When a
database is created, the creator selects which type of logging to
perform.  The log segments are exactly the same, only the recycling
method is different.

Hmmm... the more I look at this, the more interested I become.

Cheers,
Patrick


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Tom Lane
Дата:
Patrick Macdonald <patrickm@redhat.com> writes:
> Yes, but in a very roundabout way (or so it seems).  The main point
> that I was trying to illustrate was that if a database supports 
> point-in-time recovery, recycling of the only available log segments 
> is a bad thing.

Certainly, but deleting them is just as bad ;-).

What would need to be changed to use the WAL log for archival purposes
is the control logic that decides when an old log segment is no longer
needed.  Rather than zapping them as soon as they're not needed for
crash recovery (our current approach), they'd have to stick around until
archived offline, or perhaps for some DBA-specified length of time
representing how far back you want to allow for PIT recovery.

Nonetheless, at some point an old WAL segment will become deletable
(unless you have infinite space on your WAL disk).  ISTM that at that
point, it makes sense to consider recycling the file rather than
deleting it.
        regards, tom lane


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Bruce Momjian
Дата:
> Hmmm... my prior appends to this newsgroup are stalled.  Hopefully,
> they'll be available soon.
> 
> Tom Lane wrote:
> > 
> > What you may really be saying is that the existing scheme for management
> > of log segments is inappropriate for PIT usage; if so feel free to
> > propose a better one.  But I don't see how recycling of no-longer-wanted
> > segments can break anything.
> 
> Yes, but in a very roundabout way (or so it seems).  The main point
> that I was trying to illustrate was that if a database supports 
> point-in-time recovery, recycling of the only available log segments 
> is a bad thing.  And, yes, in practice if you have point-in-time
> recovery enabled you better archive your logs with your backup to
> ensure that you can roll forward as expected.

I assume you are not going to do point-in-time recovery by keeping all
the WAL segments around on the same disk.  You have to copy them off
somewhere, right, and once you have copied them, why not reuse them?

> A possible solution (as I mentioned before)) is to have 2 methods
> of logging available: circular and forward-recoverable.  When a
> database is created, the creator selects which type of logging to
> perform.  The log segments are exactly the same, only the recycling
> method is different.

Will not fly.  We need a solution that is flexible.

> Hmmm... the more I look at this, the more interested I become.

My assumption is that once a log is full the point-in-time recovery
daemon will copy that off somewhere, either to a different disk, tape,
or over the network to another machine.  Once it is done making a copy,
the WAL log can be recycled, right?  Am I missing something here?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Patrick Macdonald
Дата:
Bruce Momjian wrote:
> 
> > Hmmm... my prior appends to this newsgroup are stalled.  Hopefully,
> > they'll be available soon.
> >
> > Tom Lane wrote:
> > >
> > > What you may really be saying is that the existing scheme for management
> > > of log segments is inappropriate for PIT usage; if so feel free to
> > > propose a better one.  But I don't see how recycling of no-longer-wanted
> > > segments can break anything.
> >
> > Yes, but in a very roundabout way (or so it seems).  The main point
> > that I was trying to illustrate was that if a database supports
> > point-in-time recovery, recycling of the only available log segments
> > is a bad thing.  And, yes, in practice if you have point-in-time
> > recovery enabled you better archive your logs with your backup to
> > ensure that you can roll forward as expected.
> 
> I assume you are not going to do point-in-time recovery by keeping all
> the WAL segments around on the same disk.

Of course not.  As mentioned, you'd probably archive them with your
backup(s).

> You have to copy them off
> somewhere, right, and once you have copied them, why not reuse them?

I'm not arguing that point.  I stated "recycling of the only available
log segments".  Once the log segment is archived (copied) elsewhere
you have two available images of the same segment.  You can rename
the local copy. 
> > A possible solution (as I mentioned before)) is to have 2 methods
> > of logging available: circular and forward-recoverable.  When a
> > database is created, the creator selects which type of logging to
> > perform.  The log segments are exactly the same, only the recycling
> > method is different.
> 
> Will not fly.  We need a solution that is flexible.

Could you expand on that a little (ie. flexible in which way).
Offering the user a choice of two is more flexible than offering no 
choice.
> > Hmmm... the more I look at this, the more interested I become.
> 
> My assumption is that once a log is full the point-in-time recovery
> daemon will copy that off somewhere, either to a different disk, tape,
> or over the network to another machine.  Once it is done making a copy,
> the WAL log can be recycled, right?  Am I missing something here?

Ok... I wasn't thinking of having a point-in-time daemon.  Some other
databases provide, for lack of a better term, user exits to allow
user defined scripts or programs to be called to perform log segment
archiving.  This archiving is somewhat orthogonal to point-in-time
recovery proper.

Yep, once the archiving is complete, you can do whatever you want
with the local log segment.

Cheers,
Patrick


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Patrick Macdonald
Дата:
Bruce Momjian wrote:
> 
> > > > Yes, but in a very roundabout way (or so it seems).  The main point
> > > > that I was trying to illustrate was that if a database supports
> > > > point-in-time recovery, recycling of the only available log segments
> > > > is a bad thing.  And, yes, in practice if you have point-in-time
> > > > recovery enabled you better archive your logs with your backup to
> > > > ensure that you can roll forward as expected.
> > >
> > > I assume you are not going to do point-in-time recovery by keeping all
> > > the WAL segments around on the same disk.
> >
> > Of course not.  As mentioned, you'd probably archive them with your
> > backup(s).
> 
> You mean the nigthly backup?  Why not do a pg_dump and be done with it.

But the purpose of point-in-time recovery is to restore your backup 
and then use the WAL to bring the backed up image up to a more current
version.   

> > > > A possible solution (as I mentioned before)) is to have 2 methods
> > > > of logging available: circular and forward-recoverable.  When a
> > > > database is created, the creator selects which type of logging to
> > > > perform.  The log segments are exactly the same, only the recycling
> > > > method is different.
> > >
> > > Will not fly.  We need a solution that is flexible.
> >
> > Could you expand on that a little (ie. flexible in which way).
> > Offering the user a choice of two is more flexible than offering no
> > choice.
> 
> We normally don't give users choices unless we can't come up with a
> win-win solution to the problem.  In this case, we could just query to
> see if the WAL PIT archiver is running and handle tune reuse of log
> segments on the fly.  In fact, my guess is that the PIT archiver will
> have to tell the system when it is done with WAL logs anyway.

But this could be a win-win situation.  If a user doesn't not care 
about point-in-time recovery, circular logs can be used.  When a
database is created, a configurable number of log segments are
allocated.  The database uses those logs in a cyclic manner.  No
new log segments need to be created under normal use.  Automatic
reuse.

A database requiring point-in-time functionality will log very
similar to the method in place today.  New log segments will be
created when needed.  

> > > > Hmmm... the more I look at this, the more interested I become.
> > >
> > > My assumption is that once a log is full the point-in-time recovery
> > > daemon will copy that off somewhere, either to a different disk, tape,
> > > or over the network to another machine.  Once it is done making a copy,
> > > the WAL log can be recycled, right?  Am I missing something here?
> >
> > Ok... I wasn't thinking of having a point-in-time daemon.  Some other
> > databases provide, for lack of a better term, user exits to allow
> > user defined scripts or programs to be called to perform log segment
> > archiving.  This archiving is somewhat orthogonal to point-in-time
> > recovery proper.
> >
> > Yep, once the archiving is complete, you can do whatever you want
> > with the local log segment.
> 
> We will clearly need something to transfer these WAL logs somewhere
> else, and it would be nice if it could be easily configured.  I think a
> PIT logger daemon is the only solution, especially since tape/network
> transfer could take a long time.  It would be forked by the postmaster
> so would cover all users and databases.

Actually, it would be better if the entire logger was split out into
it's own process like the large commercial databases.  Archiving the
log segments would just be one of the many functions of the logger
process.  Just a thought.

Cheers,
Patrick


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Bruce Momjian
Дата:
> Nonetheless, at some point an old WAL segment will become deletable
> (unless you have infinite space on your WAL disk).  ISTM that at that
> point, it makes sense to consider recycling the file rather than
> deleting it.

Of course, if you plan to keep your WAL files on the same drive, you
don't really need point-in-time recovery anyway because you have the
physical data files.  The only case I can keeping WAL files around for
point-in-time is if your WAL files are on a separate drive from the data
files, but even then, the page images should be stripped out and the WAL
archived somewhere else, hopefully in a configurable way to another
disk, tape, or networked computer.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Bruce Momjian
Дата:
> * Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> 
> | Most Unix filesystems will not allocate disk blocks until you write in
> | them.  If you just seek out past end-of-file, the file pointer is moved
> | but the blocks are unallocated.  This is how 'ls' can show a 1gb file
> | that only uses 4k of disk space.
> 
> Does this imply that we could get a performance gain by preallocating space
> for indexes and data itself as well ? I've seen that other database products
> have a setup step where you have to specify the size of the database. 
> 
> Or does PostgreSQL do any other tricks to prevent fragmentation of data ?

If we stored all our tables in one file that would be needed. Since we
use the OS to do the defragmenting, I don't think it is an issue.  We do
allocate in 8k chunks to allow the OS to allocate full filesystem blocks
already.  Not sure if preallocating even more would help.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Bruce Momjian
Дата:
> > > Yes, but in a very roundabout way (or so it seems).  The main point
> > > that I was trying to illustrate was that if a database supports
> > > point-in-time recovery, recycling of the only available log segments
> > > is a bad thing.  And, yes, in practice if you have point-in-time
> > > recovery enabled you better archive your logs with your backup to
> > > ensure that you can roll forward as expected.
> > 
> > I assume you are not going to do point-in-time recovery by keeping all
> > the WAL segments around on the same disk.
> 
> Of course not.  As mentioned, you'd probably archive them with your
> backup(s).

You mean the nigthly backup?  Why not do a pg_dump and be done with it.

> > You have to copy them off
> > somewhere, right, and once you have copied them, why not reuse them?
> 
> I'm not arguing that point.  I stated "recycling of the only available
> log segments".  Once the log segment is archived (copied) elsewhere
> you have two available images of the same segment.  You can rename
> the local copy. 

Yes, OK, I see now.  As Tom mentioned, there would have to be some delay
where we allow the WAL log to be archived before reusing it.

> > > A possible solution (as I mentioned before)) is to have 2 methods
> > > of logging available: circular and forward-recoverable.  When a
> > > database is created, the creator selects which type of logging to
> > > perform.  The log segments are exactly the same, only the recycling
> > > method is different.
> > 
> > Will not fly.  We need a solution that is flexible.
> 
> Could you expand on that a little (ie. flexible in which way).
> Offering the user a choice of two is more flexible than offering no 
> choice.

We normally don't give users choices unless we can't come up with a
win-win solution to the problem.  In this case, we could just query to
see if the WAL PIT archiver is running and handle tune reuse of log
segments on the fly.  In fact, my guess is that the PIT archiver will
have to tell the system when it is done with WAL logs anyway.

> > > Hmmm... the more I look at this, the more interested I become.
> > 
> > My assumption is that once a log is full the point-in-time recovery
> > daemon will copy that off somewhere, either to a different disk, tape,
> > or over the network to another machine.  Once it is done making a copy,
> > the WAL log can be recycled, right?  Am I missing something here?
> 
> Ok... I wasn't thinking of having a point-in-time daemon.  Some other
> databases provide, for lack of a better term, user exits to allow
> user defined scripts or programs to be called to perform log segment
> archiving.  This archiving is somewhat orthogonal to point-in-time
> recovery proper.
> 
> Yep, once the archiving is complete, you can do whatever you want
> with the local log segment.

We will clearly need something to transfer these WAL logs somewhere
else, and it would be nice if it could be easily configured.  I think a
PIT logger daemon is the only solution, especially since tape/network
transfer could take a long time.  It would be forked by the postmaster
so would cover all users and databases.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Bruce Momjian
Дата:
> > > Of course not.  As mentioned, you'd probably archive them with your
> > > backup(s).
> > 
> > You mean the nigthly backup?  Why not do a pg_dump and be done with it.
> 
> But the purpose of point-in-time recovery is to restore your backup 
> and then use the WAL to bring the backed up image up to a more current
> version.   

My point was that the WAL logs are going to be archived after the backup
occurs, right?  From the text below, I see you are addressing that.

> > > > > A possible solution (as I mentioned before)) is to have 2 methods
> > > > > of logging available: circular and forward-recoverable.  When a
> > > > > database is created, the creator selects which type of logging to
> > > > > perform.  The log segments are exactly the same, only the recycling
> > > > > method is different.
> > > >
> > > > Will not fly.  We need a solution that is flexible.
> > >
> > > Could you expand on that a little (ie. flexible in which way).
> > > Offering the user a choice of two is more flexible than offering no
> > > choice.
> > 
> > We normally don't give users choices unless we can't come up with a
> > win-win solution to the problem.  In this case, we could just query to
> > see if the WAL PIT archiver is running and handle tune reuse of log
> > segments on the fly.  In fact, my guess is that the PIT archiver will
> > have to tell the system when it is done with WAL logs anyway.
> 
> But this could be a win-win situation.  If a user doesn't not care 
> about point-in-time recovery, circular logs can be used.  When a
> database is created, a configurable number of log segments are
> allocated.  The database uses those logs in a cyclic manner.  No
> new log segments need to be created under normal use.  Automatic
> reuse.
> 
> A database requiring point-in-time functionality will log very
> similar to the method in place today.  New log segments will be
> created when needed.  

Basically, when the user asks for point-in-time, we can then control how
we recycle the logs, right? 

> > > > > Hmmm... the more I look at this, the more interested I become.
> > > >
> > > > My assumption is that once a log is full the point-in-time recovery
> > > > daemon will copy that off somewhere, either to a different disk, tape,
> > > > or over the network to another machine.  Once it is done making a copy,
> > > > the WAL log can be recycled, right?  Am I missing something here?
> > >
> > > Ok... I wasn't thinking of having a point-in-time daemon.  Some other
> > > databases provide, for lack of a better term, user exits to allow
> > > user defined scripts or programs to be called to perform log segment
> > > archiving.  This archiving is somewhat orthogonal to point-in-time
> > > recovery proper.
> > >
> > > Yep, once the archiving is complete, you can do whatever you want
> > > with the local log segment.
> > 
> > We will clearly need something to transfer these WAL logs somewhere
> > else, and it would be nice if it could be easily configured.  I think a
> > PIT logger daemon is the only solution, especially since tape/network
> > transfer could take a long time.  It would be forked by the postmaster
> > so would cover all users and databases.
> 
> Actually, it would be better if the entire logger was split out into
> it's own process like the large commercial databases.  Archiving the
> log segments would just be one of the many functions of the logger
> process.  Just a thought.

I think we already have a daemon that does checkpoints.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Larry Rosenman
Дата:
Err....  PG_DUMP nightly on a 38,000,000+row table that takes forever to 
dump/unload, and gets updated every 5 minutes with 256KChar worth of 
updates? 

Give me a FAST pg_dump, and I'll think about it, until then, no....

LER
(PS: this is also a reason for making a pg_upgrade work IN PLACE on a 
table). 

LER
>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

On 7/18/01, 11:35:04 AM, Bruce Momjian <pgman@candle.pha.pa.us> wrote 
regarding Re: [HACKERS] Idea: recycle WAL segments, don't delete/recreate 
'em:


> > > > Yes, but in a very roundabout way (or so it seems).  The main point
> > > > that I was trying to illustrate was that if a database supports
> > > > point-in-time recovery, recycling of the only available log segments
> > > > is a bad thing.  And, yes, in practice if you have point-in-time
> > > > recovery enabled you better archive your logs with your backup to
> > > > ensure that you can roll forward as expected.
> > >
> > > I assume you are not going to do point-in-time recovery by keeping all
> > > the WAL segments around on the same disk.
> >
> > Of course not.  As mentioned, you'd probably archive them with your
> > backup(s).

> You mean the nigthly backup?  Why not do a pg_dump and be done with it.

> > > You have to copy them off
> > > somewhere, right, and once you have copied them, why not reuse them?
> >
> > I'm not arguing that point.  I stated "recycling of the only available
> > log segments".  Once the log segment is archived (copied) elsewhere
> > you have two available images of the same segment.  You can rename
> > the local copy.

> Yes, OK, I see now.  As Tom mentioned, there would have to be some delay
> where we allow the WAL log to be archived before reusing it.

> > > > A possible solution (as I mentioned before)) is to have 2 methods
> > > > of logging available: circular and forward-recoverable.  When a
> > > > database is created, the creator selects which type of logging to
> > > > perform.  The log segments are exactly the same, only the recycling
> > > > method is different.
> > >
> > > Will not fly.  We need a solution that is flexible.
> >
> > Could you expand on that a little (ie. flexible in which way).
> > Offering the user a choice of two is more flexible than offering no
> > choice.

> We normally don't give users choices unless we can't come up with a
> win-win solution to the problem.  In this case, we could just query to
> see if the WAL PIT archiver is running and handle tune reuse of log
> segments on the fly.  In fact, my guess is that the PIT archiver will
> have to tell the system when it is done with WAL logs anyway.

> > > > Hmmm... the more I look at this, the more interested I become.
> > >
> > > My assumption is that once a log is full the point-in-time recovery
> > > daemon will copy that off somewhere, either to a different disk, tape,
> > > or over the network to another machine.  Once it is done making a copy,
> > > the WAL log can be recycled, right?  Am I missing something here?
> >
> > Ok... I wasn't thinking of having a point-in-time daemon.  Some other
> > databases provide, for lack of a better term, user exits to allow
> > user defined scripts or programs to be called to perform log segment
> > archiving.  This archiving is somewhat orthogonal to point-in-time
> > recovery proper.
> >
> > Yep, once the archiving is complete, you can do whatever you want
> > with the local log segment.

> We will clearly need something to transfer these WAL logs somewhere
> else, and it would be nice if it could be easily configured.  I think a
> PIT logger daemon is the only solution, especially since tape/network
> transfer could take a long time.  It would be forked by the postmaster
> so would cover all users and databases.

> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?

> http://www.postgresql.org/search.mpl


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Patrick Macdonald
Дата:
Tom Lane wrote:
> 
> Patrick Macdonald <patrickm@redhat.com> writes:
> > I understand your solution is for the existing architecture which does
> > not support point-in-time recovery.  If this item is picked up, your
> > solution will become a stumbling block due the above mentioned log
> > extent deletions.
> 
> Hmm, I don't see why it's a stumbling block.  There is a notion in the
> present code that log segments might be moved someplace else for
> archiving (rather than just be deleted), and I wasn't planning on
> eliminating that option.  I think however that a realistic archival
> mechanism would not simply keep the log segments verbatim.  It could
> drop the page images, for a huge space savings, and perhaps also
> eliminate records from aborted transactions.  So in reality one could
> still expect to recycle the log segments, just with a somewhat longer
> cycle time --- ie, after the archiver is done copying a segment, then
> you rename it into place as a forward file.

Well, notion and actual practice can be mutually exclusive.  Your
initial message stated that you would like to rename the log segment.
This insinuated that the log segment was not moved.  Therefore, a
straight rename would cause problems with the future point-in-time
recovery item (ie. the only existing version of log segment N has
been renamed to N+5).  A backup of the database could not roll forward
through this name change as stated.  That was my objection. 

> In any case, a two-or-three-line change is hardly likely to create much
> of an obstacle to PIT recovery, compared to some of the more fundamental
> aspects of the existing WAL design (like its need to start from a
> complete physical copy of the database files).  So I'm not sure why
> you're objecting on these grounds.

Hmmm, stating that it is less of a problem than others doesn't make
it the right thing to do. If the two or three lines you mention renames
a segment I want to roll forward through, that's a problem.  Yeah, I
know it's not a problem now but it'll have to be changed when PIT comes
into play. 

You didn't comment on the idea of two logging methods... circular and
recoverable.  Any thoughts?

Cheers,
Patrick


Re: Idea: recycle WAL segments, don't delete/recreate 'em

От
Patrick Macdonald
Дата:
Tom,

What you are describing is a pseudo circular log.  Other database
systems (such as DB2) support the concept of both circular and
recoverable logs.  Recoverable is named this way because 
recoverable logs can be used in point-in-time recovery.  Both 
methods support crash recovery.

In general, a user defines the number of log extents to be used in
the log cycle.  He/she also defines the number of secondary logs to
use if by chance the circular log becomes full.  If a secondary log
extent is created, it is added to the cycle list.  At a consistent
shutdown, the secondary log extents are deleted.  Since logs
are deleted, any hope of point-in-time recovery is deleted with them.

I understand your solution is for the existing architecture which does
not support point-in-time recovery.  If this item is picked up, your
solution will become a stumbling block due the above mentioned log
extent deletions.  The other issues you list are of concern but are
manageable with some coding. 

So, my question is, should PostgreSQL support both types of logging?
There will be databases where you require the ability to perform 
point-in-time recovery.  Conversely, there will be databases where
an overwritten log extent (as you describe) is acceptable.  I think
it would be useful to be able to define which logging method you
require for a database.  This way, you incur the I/O hit only when
forward recovery is a requirement.

Thoughts/comments?

Cheer,
Patrick 
    

Tom Lane wrote:
> 
> I have noticed that a large fraction of the I/O done by 7.1 is
> associated with initializing new segments of the WAL log for use.
> (We have to physically fill each segment with zeroes to ensure that
> the system has actually allocated a whole 16MB to it; otherwise we
> fall victim to the "hole-saving" allocation technique of most Unix
> filesystems.)  I just had an idea about how to avoid this cost:
> why not recycle old log segments?  At the point where the code
> currently deletes a no-longer-needed segment, just rename it to
> become the next created-in-advance segment.
> 
> With this approach, shortly after installation the system would converge
> to a steady state with a constant number of WAL segments (basically
> CHECKPOINT_SEGMENTS + WAL_FILES + 1, maybe one or two more if load is
> really high).  So, in addition to eliminating initialization writes,
> we would also reduce the metadata traffic (inode and indirect blocks)
> to a very low level.  That has to be good both for performance and for
> improving the odds that the WAL files will survive a system crash.
> 
> The sole disadvantage I can see to this approach is that a recycled
> segment would not contain zeroes, but valid WAL records.  We'd need
> to take care that in a recovery situation, we not mistake old records
> beyond the last one we actually wrote for new records we should redo.
> While checking the xl_prev back-pointers in each record should be
> sufficient to detect this, I'd feel more comfortable if we extended
> the XLogPageHeader record to contain the file/segment number that it
> belongs to.  This'd cost an extra 8 bytes per 8K XLOG page, which seems
> worth it to me.
> 
> Another issue is whether the recycling logic should be "always recycle"
> (hence number of extant WAL segments will never decrease), or should
> it be more like "recycle if there are fewer than WAL_FILES advance
> segments, else delete".  If we were supporting WAL-based UNDO then I
> think it'd have to be the latter, so that we could reduce the WAL usage
> from a peak created by a long-running transaction.  But with the present
> logic that the WAL log is truncated after each checkpoint, I think it'd
> be better just to never delete.  Otherwise, the behavior is likely to
> be that the system varies between N and N+1 extant segments due to
> roundoff effects (ie, depending on just where you are in the current
> segment when a checkpoint happens).  That's exactly what we do not want.
> 
> A possible answer is "recycle if there are fewer than WAL_FILES + SLOP
> advance files, else delete", where SLOP is (say) about three or four
> segments.  That would avoid unwanted oscillations in the number of
> extant files, while still allowing decrease from a peak for UNDO.
> 
> Comments, better ideas?
> 
>                         regards, tom lane