Re: Heap truncation without AccessExclusiveLock (9.4)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Heap truncation without AccessExclusiveLock (9.4)
Дата
Msg-id 519A7CC9.30409@vmware.com
обсуждение исходный текст
Ответ на Re: Heap truncation without AccessExclusiveLock (9.4)  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On 17.05.2013 12:35, Andres Freund wrote:
> On 2013-05-17 10:45:26 +0300, Heikki Linnakangas wrote:
>> On 16.05.2013 04:15, Andres Freund wrote:
>>> Couldn't we "just" take the extension lock and then walk backwards from
>>> the rechecked end of relation ConditionalLockBufferForCleanup() the
>>> buffers?
>>> For every such locked page we check whether its still empty. If we find
>>> a page that we couldn't lock, isn't empty or we already locked a
>>> sufficient number of pages we truncate.
>>
>> You need an AccessExclusiveLock on the relation to make sure that after you
>> have checked that pages 10-15 are empty, and truncated them away, a backend
>> doesn't come along a few seconds later and try to read page 10 again. There
>> might be an old sequential scan in progress, for example, that thinks that
>> the pages are still there.
>
> But that seems easily enough handled: We know the current page in its
> scan cannot be removed since its pinned. So make
> heapgettup()/heapgetpage() pass something like RBM_IFEXISTS to
> ReadBuffer and if the read fails recheck the length of the relation
> before throwing an error.

Hmm. For the above to work, you'd need to atomically check that the 
pages you're truncating away are not pinned, and truncate them. If those 
steps are not atomic, a backend might pin a page after you've checked 
that it's not pinned, but before you've truncated the underlying file. I 
guess that be doable; needs some new infrastructure in the buffer 
manager, however.

> There isn't much besides seqscans that can have that behaviour afaics:
> - (bitmap)indexscans et al. won't point to completely empty pages
> - there cannot be a concurrent vacuum since we have the appropriate
>    locks
> - if a trigger or something else has a tid referencing a page there need
>    to be unremovable tuples on it.
>
> The only thing that I immediately see are tidscans which should be
> handleable in a similar manner to seqscans.
>
> Sure, there are some callsites that need to be adapted but it still
> seems noticeably easier than what you proposed upthread.

Yeah. I'll think some more how the required buffer manager changes could 
be done.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Fast promotion failure
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: fast promotion and log_checkpoints