Обсуждение: WAL files containing old and unmodified data?

Поиск
Список
Период
Сортировка

WAL files containing old and unmodified data?

От
Brian Fehrle
Дата:
Hi all,

I have a very active database that generates between 200 and 400 WAL
files that are archived for backups. Yesterday, the average number of
WAL files being generated skyrocketed.

We took a look at some WAL files using the strings command, and notice
quite a bit of data from a specific table with timestamps of last year.
This table does some basic tracking, where a row is inserted, then it's
updated, and after that it's never modified again. So we're having data
appear in our WAL files from rows that haven't (or should not have been)
modified at all in over a year.

We figured that the rows were in fact being updated, so we turned
log_min_duration_statement to 0 for a period of time and monitored all
queries going through, however we never found a single query that could
possibly be updating these pieces of data.

Is there any other reason why a WAL file could contain data that (we
believe) hasn't changed? Or is the fact that it's in the WAL file a
guarantee that the data is either changed or new? Can server processes
such as autovacuum processes cause this behavior? Anywhere else I should
try to look?

Thanks in advance,
- Brian F

Re: WAL files containing old and unmodified data?

От
"Kevin Grittner"
Дата:
Brian Fehrle <brianf@consistentstate.com> wrote:

> I have a very active database that generates between 200 and 400
> WAL files that are archived for backups. Yesterday, the average
> number of WAL files being generated skyrocketed.
>
> We took a look at some WAL files using the strings command, and
> notice quite a bit of data from a specific table with timestamps
> of last year.  This table does some basic tracking, where a row is
> inserted, then it's updated, and after that it's never modified
> again. So we're having data appear in our WAL files from rows that
> haven't (or should not have been) modified at all in over a year.
>
> We figured that the rows were in fact being updated, so we turned
> log_min_duration_statement to 0 for a period of time and monitored
> all queries going through, however we never found a single query
> that could possibly be updating these pieces of data.

> Can server processes such as autovacuum processes cause this
> behavior?

Yes, to prevent transaction ID wrap-around it must FREEZE each tuple
at some point, which is a WAL-logged operation.  If many rows were
loaded at about the same time, and they were not subsequently
updated or deleted, you can see quite a burst of activity when
things hit this point.  If you're using rsync for your base backups,
expect a lot more data to need to be written, too, because tuples
which have been stable for the last year have now changed.

-Kevin

Re: WAL files containing old and unmodified data?

От
Brian Fehrle
Дата:
Thanks Kevin, this makes sense and could very well be the cause of the
issue. The oddity is that this activity seemed to occur right when there
was some code upgraded on the application, however since we can't
confirm it's the code causing this, it could just be a coincidence.

- Brian F

On 02/22/2012 01:53 PM, Kevin Grittner wrote:
> Brian Fehrle<brianf@consistentstate.com>  wrote:
>
>> I have a very active database that generates between 200 and 400
>> WAL files that are archived for backups. Yesterday, the average
>> number of WAL files being generated skyrocketed.
>>
>> We took a look at some WAL files using the strings command, and
>> notice quite a bit of data from a specific table with timestamps
>> of last year.  This table does some basic tracking, where a row is
>> inserted, then it's updated, and after that it's never modified
>> again. So we're having data appear in our WAL files from rows that
>> haven't (or should not have been) modified at all in over a year.
>>
>> We figured that the rows were in fact being updated, so we turned
>> log_min_duration_statement to 0 for a period of time and monitored
>> all queries going through, however we never found a single query
>> that could possibly be updating these pieces of data.
>
>> Can server processes such as autovacuum processes cause this
>> behavior?
>
> Yes, to prevent transaction ID wrap-around it must FREEZE each tuple
> at some point, which is a WAL-logged operation.  If many rows were
> loaded at about the same time, and they were not subsequently
> updated or deleted, you can see quite a burst of activity when
> things hit this point.  If you're using rsync for your base backups,
> expect a lot more data to need to be written, too, because tuples
> which have been stable for the last year have now changed.
>
> -Kevin