Re: WAL file utilization question

Поиск
Список
Период
Сортировка
От Keaton Adams
Тема Re: WAL file utilization question
Дата
Msg-id 1179334144.22514.14.camel@MXLRMT-208.corp.mxlogic.com
обсуждение исходный текст
Ответ на Re: WAL file utilization question  ("Jim C. Nasby" <decibel@decibel.org>)
Ответы Re: WAL file utilization question  ("Jim C. Nasby" <decibel@decibel.org>)
Re: WAL file utilization question  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-admin
Well, no, I'm not that paranoid where I expect checkpoints to be taking place so often.  I do find it interesting that controlling checkpoint frequency is a factor of the number of WAL files available.  In order to get up to 60 second checkpoints I had to set checkpoint_segments to 25, which resulted in the creation of 52 log files in data/pg_xlog.  So for 120 second checkpoints I will need to have up to 104 log files in data/pg_xlog available?  That's nearly 2 GB of space for WAL logs, and it only nets on average a 2 minute checkpoint.  That would appear to be the pattern (2 * checkpoint_segments at the busiest time):

# - Checkpoints -

#3 - 8 seconds - 3 log files
#6 - 16 seconds - 13 log files
#9 - 29 seconds  - 19 log files
#20 - 57 seconds - 41 log files
#25 - > 60 sec - 52 log files

checkpoint_segments = 25        # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300        # range 30-3600, in seconds
#checkpoint_warning = 60        # in seconds, 0 is off

So, if I do set the checkpoint_segments parameter to 50, 75, or even 100 I will have a considerable number of WAL files in data/pg_xlog, but they will be used more efficiently, so I will be archiving less?  That's what I understand from the comments received so far.

On PITR it makes sense, given the nature of the WAL file, that only full pages can be used.  So I won't be able to set full_page_writes to false in this case.

-Keaton



On Wed, 2007-05-16 at 11:35 -0500, Jim C. Nasby wrote:
On Wed, May 16, 2007 at 10:28:29AM -0600, Keaton Adams wrote:
> On most database systems I am used to, there is a physical log and
> separate logical log. The physical log holds any ???before images??? of data
> pages that have been modified since the last checkpoint. The logical log
> holds the individual rows of data that have been inserted, updated,
> deleted as well as checkpoint records and the like. If the server
> crashes, fast recovery is initiated where (1) any pages in the physical
> log are put back in the data files (tablespaces/dbspaces) on disk to get
> back to a state of physical consistency and (2) individual transactions
> in the logical log since the last checkpoint are rolled forward / rolled
> back to get to a point of logical consistency.
 
PostgreSQL combines the two, though there has been recent discussion
about changing that. There may be some improvements in this regard in
8.3 (I don't remember if the patches were accepted or not).

> Even with full_page_writes set to false and checkpoints taking place
> every 60 seconds or so, the amount of WAL log data generated per minute
> seems to be significant.
 
Wow, do you really want to be checkpointing every 60 seconds? That's
going to greatly increase your WAL volume, as well as the size of WAL.

> So my question is this: If I kick off a transaction that loads records
> with a size of 100 bytes, does the insert for that record take 100 bytes
> in the WAL file, or is the data archived in the WAL log in page size
> (8k) portions? So with piggyback commits if I can only stack up 2K worth
> of data before the next LogFlush, will 2K be written to the WAL file, or
> will 8K be written each time regardless of the amount of actual
> transaction data that is available to flush?
 
(Generally) WAL only records differences. The exception is that the
first time a page is modified after a checkpoint, the entire page is
written out to WAL.

> Since there is no separate physical log to keep track of dirty/modified
> pages since the last checkpoint I would assume that the WAL log is
> serving a dual purpose of being able to get back to the point of
> physical and logical database consistency, but I need to know for
> certain that there is not a way to reduce the amount of WAL data being
> written for the amount of transaction information we are actually
> writing to the database at any given point in time.

The only way to do it right now is to reduce the frequency of your
checkpoints. IIRC you can't actually disable full page writes if you're
using PITR.

В списке pgsql-admin по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: WAL file utilization question
Следующее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: WAL file utilization question