WAL file utilization question

Поиск
Список
Период
Сортировка
От Keaton Adams
Тема WAL file utilization question
Дата
Msg-id 1179332909.22514.3.camel@MXLRMT-208.corp.mxlogic.com
обсуждение исходный текст
Ответы Re: WAL file utilization question  ("Jim C. Nasby" <decibel@decibel.org>)
Re: WAL file utilization question  (Alvaro Herrera <alvherre@commandprompt.com>)
Список pgsql-admin
I have a question related to the WAL log. Our organization is looking at using log-shipping to build a high-availability standby server. In our tests the amount of WAL log data generated per minute is significant. On our busiest database during a WAL archive test, eight 16 MB logs were being used/archived per minute, which translated into having to move roughly 7.6 GB of archived WAL log data across a WAN link from one data center to another.

I read this statement from the PostgreSQL 8.1 manual and have a question related to it:

"If full_page_writes is set (as is the default), there is another factor to consider. To ensure data page consistency, the first modification of a data page after each checkpoint results in logging the entire page content. In that case, a smaller checkpoint interval increases the volume of output to the WAL log, partially negating the goal of using a smaller interval, and in any case causing more disk I/O."

On most database systems I am used to, there is a physical log and separate logical log. The physical log holds any “before images” of data pages that have been modified since the last checkpoint. The logical log holds the individual rows of data that have been inserted, updated, deleted as well as checkpoint records and the like. If the server crashes, fast recovery is initiated where (1) any pages in the physical log are put back in the data files (tablespaces/dbspaces) on disk to get back to a state of physical consistency and (2) individual transactions in the logical log since the last checkpoint are rolled forward / rolled back to get to a point of logical consistency.

Even with full_page_writes set to false and checkpoints taking place every 60 seconds or so, the amount of WAL log data generated per minute seems to be significant.

So my question is this: If I kick off a transaction that loads records with a size of 100 bytes, does the insert for that record take 100 bytes in the WAL file, or is the data archived in the WAL log in page size (8k) portions? So with piggyback commits if I can only stack up 2K worth of data before the next LogFlush, will 2K be written to the WAL file, or will 8K be written each time regardless of the amount of actual transaction data that is available to flush?

Since there is no separate physical log to keep track of dirty/modified pages since the last checkpoint I would assume that the WAL log is serving a dual purpose of being able to get back to the point of physical and logical database consistency, but I need to know for certain that there is not a way to reduce the amount of WAL data being written for the amount of transaction information we are actually writing to the database at any given point in time.

Thank you,

Keaton

В списке pgsql-admin по дате отправления:

Предыдущее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: pg_dump not enough space to create a backup FreeBSD 6.1?
Следующее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: WAL file utilization question