Re: drop/truncate table sucks for large values of shared buffers

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: drop/truncate table sucks for large values of shared buffers
Дата	1 июля 2015 г. 17:39:30
Msg-id	CAA4eK1LGmXe7ORg5K7E52=pLbSVJup-B5=ya_DWy2rRG3TdY5Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: drop/truncate table sucks for large values of shared buffers (Simon Riggs <simon@2ndQuadrant.com>)
Ответы	Re: drop/truncate table sucks for large values of shared buffers (Simon Riggs <simon@2ndQuadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Jun 30, 2015 at 12:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On 30 June 2015 at 07:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Tue, Jun 30, 2015 at 11:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >
>> > On 30 June 2015 at 05:02, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>
>> >> On Mon, Jun 29, 2015 at 7:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >> >
>> >> > On 28 June 2015 at 17:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> >>
>> >> > If lseek fails badly then SeqScans would give *silent* data loss, which in my view is worse. Just added pages aren't the only thing we might miss if lseek is badly wrong.
>> >> >
>> >>
>> >> So for the purpose of this patch, do we need to assume that
>> >> lseek can give us wrong size of file and we should add preventive
>> >> checks and other handling for the same?
>> >> I am okay to change that way, if we are going to have that as assumption
>> >> in out code wherever we are using it or will use it in-future, otherwise
>> >> we will end with some preventive checks which are actually not required.
>> >
>> >
>> > They're preventative checks. You always hope it is wasted effort.
>> >
>>
>> I am not sure if Preventative checks (without the real need) are okay if they
>> are not-cheap which could happen in this case. I think Validating buffer-tag
>> would require rel or sys cache lookup.
>
>
> True, so don't do that.
>
> Keep a list of dropped relations and have the checkpoint process scan the buffer pool every 64 tables, kinda like AbsorbFsync
>

Okay. I think we can maintain the list in similar way as we do for

UNLINK_RELATION_REQUEST in RememberFsyncRequest(), but

why to wait till 64 tables? We already scan whole buffer list in each

checkpoint cycle, so during that scan we can refer this dropped relation

list and avoid syncing such buffer contents. Also for ENOENT error

handling for FileWrite, we can use this list to refer relations for which we

need to ignore the error. I think we are already doing something similar in

mdsync to avoid the problem of Dropped tables, so it seems okay to

have it in mdwrite as well.

The crucial thing in this idea to think about is avoiding reassignment of

relfilenode (due to wrapped OID's) before we have ensured that none of

the buffers contains tag for that relfilenode. Currently we avoid this for

Fsync case by retaining the first segment of relation (which will avoid

reassignment of relfilenode) till checkpoint ends, I think if we just postpone

it till we have validated it in shared_buffers, then we can avoid this problem

in new scheme and it should be delay of maximum one checkpoint cycle

for unlinking such file assuming we refer dropped relation list in each checkpoint

cycle during buffer scan.

Does that make sense?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Eisentraut
Дата: 01 июля 2015 г., 17:35:35
Сообщение: Re: pg_basebackup and replication slots

Следующее

От: Peter Eisentraut
Дата: 01 июля 2015 г., 17:45:44
Сообщение: Re: Support for N synchronous standby servers - take 2

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: drop/truncate table sucks for large values of shared buffers

Предыдущее

Следующее