Re: double-buffering page writes

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: double-buffering page writes
Дата
Msg-id 4900A948.1000606@enterprisedb.com
обсуждение исходный текст
Ответ на Re: double-buffering page writes  (Alvaro Herrera <alvherre@commandprompt.com>)
Ответы Re: double-buffering page writes  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Alvaro Herrera wrote:
> ITAGAKI Takahiro wrote:
> 
>> I have some comments about the double-buffering:
> 
> Since posting this patch I have realized that this implementation is
> bogus.  I'm now playing with WAL-logging hint bits though.  

Yeah, the torn page + hint bit updates problem is the tough question.

>> - Is it ok to allocale dblbuf[BLCKSZ] as local variable?
>>   It might be unaligned. AFAICS we avoid such usages in other places.
> 
> I thought about that too.  I admit I am not sure if this really works
> portably; however I don't want to add a palloc() to that routine.

It should work, AFAIK, but unaligned memcpy()s and write()s can be a 
significantly slower. There can be only one write() happening at any 
time, so you could just palloc() a single 8k buffer in TopMemoryContext 
in backend startup, and always use that.

>> - Are there any other modules that can share in the benefits of
>>   double-buffering? For example, we could avoid avoid waiting for
>>   LockBufferForCleanup(). It is cool if the double-buffering can
>>   be used for multiple purposes.
> 
> Not sure on this.

You'd need to keep both versions of the buffer simultaneously in the 
buffer cache for that. When we talked about the various designs for HOT, 
that was one of the ideas I had to enable more aggressive pruning: if 
you can't immediately get a vacuum lock, allocate a new buffer in the 
buffer cache for the same block, copy the page to the new buffer, and do 
the pruning, including moving tuples around, there. Any new ReadBuffer 
calls would return the new page version, but old readers would keep 
referencing the old one. The intrusive part of that approach, in 
addition to the obvious changes required in the buffer manager to keep 
around multiple copies of the same block, is that all modifications must 
be done on the new version, so anyone who needs to lock the page for 
modification would need to switch to the new page version at the 
LockBuffer call.

As discussed in the other thread with Simon, we also use vacuum locks in 
b-tree for waiting out index scans, so avoiding the waiting there would 
be just wrong.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Sullivan
Дата:
Сообщение: Re: Unicode escapes in literals
Следующее
От: Tom Lane
Дата:
Сообщение: Re: SSL cleanups/hostname verification