AW: Re: Backup and Recovery

Поиск
Список
Период
Сортировка
От Zeugswetter Andreas SB
Тема AW: Re: Backup and Recovery
Дата
Msg-id 11C1E6749A55D411A9670001FA687963368369@sdexcsrv1.f000.d0188.sd.spardat.at
обсуждение исходный текст
Список pgsql-hackers
 
> > > Also, isn't the WAL format rather bulky to archive hours and hours of?
> > 
> > If it were actually too bulky, then it needs to be made less so, since
> > that directly affects overall performance :-) 
> 
> ISTM that WAL record size trades off against lots of things, including 
> (at least) complexity of recovery code, complexity of WAL generation 
> code, usefulness in fixing corrupt table images, and processing time
> it would take to produce smaller log entries.  
> 
> Complexity is always expensive, and CPU time spent "pre-sync" is a lot
> more expensive than time spent in background.  That is, time spent
> generating the raw log entries affects latency and peak capacity, 
> where time in background mainly affects average system load.
> 
> For a WAL, the balance seems to be far to the side of simple-and-bulky.
> For other uses, the balance is sure to be different.

I do not agree with the conclusions you make above.
The limiting factor on the WAL is almost always the IO bottleneck.
How long startup rollforward takes after a crash is mainly influenced 
by the checkpoint interval and IO. Thus you can spend enough additional
CPU to reduce WAL size if that leads to a substantial reduction.
Keep in mind though, that because of Toast long column values that do not 
change, already do not need to be written to the WAL. Thus the potential is 
not as large as it might seem.

> > > > I would expect high-level transaction redo records to be much more
> > > > compact; mixed into the WAL, such records shouldn't make the WAL
> > > > grow much faster.
> > 
> > All redo records have to be at the tuple level, so what higher-level
> > are you talking about ? (statement level redo records would not be
> > able to reproduce the same resulting table data (keyword: transaction
> > isolation level)) 
> 
> Statement-level redo records would be nice, but as you note they are 
> rarely practical if done by the database.

The point is, that the database cannot do it, unless it only allows 
serializable access and allows no user defined functions with external 
or runtime dependencies.

> 
> Redo records that contain that contain whole blocks may be much bulkier
> than records of whole tuples.

What is written in whole pages is the physical log, and yes those pages can 
be stripped before the log is copied to the backup location. 

> Redo records of whole tuples may be much bulkier than those that just 
> identify changed fields.

Yes, that might help in some cases, but as I said above, if it actually
makes a substantial difference it would be best already done before the WAL 
is written.

> Bulky logs mean more-frequent snapshot backups, and bulky log formats 
> are less suitable for network transmission, and therefore less useful 
> for replication.

Any reasonably flexible replication that is based on the WAL will need to 
preprocess the WAL files (or buffers) before transmission anyway.

Andreas


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Víctor Romero
Дата:
Сообщение: Re: Pg on SMP half-powered
Следующее
От: Colin Strickland
Дата:
Сообщение: Re: Pg on SMP half-powered