RE: [Patch] Optimize dropping of relation buffers using dlist

Поиск
Список
Период
Сортировка
От k.jamison@fujitsu.com
Тема RE: [Patch] Optimize dropping of relation buffers using dlist
Дата
Msg-id OSBPR01MB234113FD8DDAEBCC2317DADBEFDF0@OSBPR01MB2341.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [Patch] Optimize dropping of relation buffers using dlist  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Monday, December 21, 2020 10:25 PM, Amit Kapila wrote:
> I have started doing minor edits to the patch especially planning to write a
> theory why is this optimization safe and here is what I can come up with: 
> "To
> remove all the pages of the specified relation forks from the buffer pool, we
> need to scan the entire buffer pool but we can optimize it by finding the
> buffers from BufMapping table provided we know the exact size of each fork
> of the relation. The exact size is required to ensure that we don't leave any
> buffer for the relation being dropped as otherwise the background writer or
> checkpointer can lead to a PANIC error while flushing buffers corresponding
> to files that don't exist.
> 
> To know the exact size, we rely on the size cached for each fork by us during
> recovery which limits the optimization to recovery and on standbys but we
> can easily extend it once we have shared cache for relation size.
> 
> In recovery, we cache the value returned by the first lseek(SEEK_END) and
> the future writes keeps the cached value up-to-date. See smgrextend. It is
> possible that the value of the first lseek is smaller than the actual number of
> existing blocks in the file due to buggy Linux kernels that might not have
> accounted for the recent write. But that should be fine because there must
> not be any buffers after that file size.
> 
> XXX We would make the extra lseek call for the unoptimized paths but that is
> okay because we do it just for the first fork and we anyway have to scan the
> entire buffer pool the cost of which is so high that the extra lseek call won't
> make any visible difference. However, we can use InRecovery flag to avoid the
> additional cost but that doesn't seem worth it."
> 
> Thoughts?

+1 
Thank you very much for expanding the comments to carefully explain the
reason on why the optimization is safe. I was also struggling to explain it completely
but your description also covers the possibility of extending the optimization in the
future once we have shared cache for rel size. So I like this addition.

(Also, it seems that we have concluded to retain the locking mechanism of the 
existing patch based from the recent email exchanges. Both the traditional path and
the optimized path do the rechecking. So there seems to be no problem, I'm definitely
fine with it.)

Regards,
Kirk Jamison

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: doc review for v14
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Parallel Inserts in CREATE TABLE AS