Freeze avoidance of very large table.

Поиск
Список
Период
Сортировка
От Sawada Masahiko
Тема Freeze avoidance of very large table.
Дата
Msg-id CAD21AoDLNoN+QCF8G8rEN8uR=gtyxofwkccPr0z+hsv7CqtT8w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Freeze avoidance of very large table.  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Ответы Re: Freeze avoidance of very large table.  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers


On Tue, Apr 21, 2015 at 7:00 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 4/20/15 2:45 AM, Sawada Masahiko wrote:
>>
>> Current patch adds new source file src/backend/access/heap/frozenmap.c
>> which is quite similar to visibilitymap.c. They have similar code but
>> are separated for now. I do refactoring these source code like adding
>> bitmap.c, if needed.
>

Thank you for having a look this patch.

>
> My feeling is we'd definitely want this refactored; it looks to be a whole
> lot of duplicated code. But before working on that we should get consensus
> that a FrozenMap is a good idea.

Yes, we need to get consensus about FrozenMap before starting work on.
In addition to comment you pointed out, I noticed that one problems I should address, that a bit of FrozenMap need to be cleared on deletion and (i.g. xmax is set).
The page as frozen could have the dead tuple for now, but I think to change to that the frozen page guarantees that page is all frozen *and* all visible.

> Are there any meaningful differences between the two, besides the obvious
> name changes?

No, there aren't.

> I think there's also a bunch of XLOG stuff that could be refactored too...

I agree with you.

>> Also, when skipping vacuum by visibility map, we can skip at least
>> SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
>> frozen map.
>
>
> That's probably something else that can be factored out, since it's
> basically the same logic. I suspect we just need to && some of the checks so
> we're looking at both FM and VM at the same time.

FrozenMap is used to skip scan only when anti-wrapping vacuum or freezing all tuples (i.g scan_all is true).
The normal vacuum uses only VM, doesn't use FM for now.

> Other comments...
>
> It would be nice if we didn't need another page bit for FM; do you see any
> reasonable way that could happen?

We may be able to remove page bit for FM from page header, but I'm not sure we could do that.

> +        * If we didn't pin the visibility(and frozen) map page and the page
> has
> +        * become all visible(and frozen) while we were busy locking the
> buffer,
> +        * or during some subsequent window during which we had it unlocked,
> +        * we'll have to unlock and re-lock, to avoid holding the buffer
> lock
> +        * across an I/O.  That's a bit unfortunate, especially since we'll
> now
> +        * have to recheck whether the tuple has been locked or updated
> under us,
> +        * but hopefully it won't happen very often.
>          */
>
> s/(and frozen)/ or frozen/
>
>
> + * Reply XLOG_HEAP3_FROZENMAP record.
> s/Reply/Replay/

Understood.

>
> +               /*
> +                * XLogReplayBufferExtended locked the buffer. But
> frozenmap_set
> +                * will handle locking itself.
> +                */
> +               LockBuffer(fmbuffer, BUFFER_LOCK_UNLOCK);
>
> Doesn't this create a race condition?
>
>
> Are you sure the bit in finish_heap_swap() is safe? If so, we should add the
> same the same for the visibility map too (it certainly better be all visible
> if it's frozen...)

We can not ensure page is all visible even if we execute VACUUM FULL, because of dead tuple could be remained. e.g. the case when other process does insert and update to same tuple in same transaction before VACUUM FULL.
I was thinking that the FrozenMap is free of the influence of delete operation. But as I said at top of this mail, a bit of FrozenMap needs to be cleared on deletion.
So I will remove these related code as you mentioned.

>
>
>
> +                       /*
> +                        * Current block is all-visible.
> +                        * If frozen map represents that it's all frozen and
> this
> +                        * function is called for freezing tuples, we can
> skip to
> +                        * vacuum block.
> +                        */
>
> I would state this as "Even if scan_all is true, we can skip blocks that are
> marked as frozen."
>
> +                       if (frozenmap_test(onerel, blkno, &fmbuffer) &&
> scan_all)
>
> I suspect it's faster to reverse those tests (scan_all &&
> frozenmap_test())... but why do we even need to look at scan_all? AFAICT if
> a block as frozen we can skip it unconditionally.

The tuple which is frozen and dead, could be remained in page is marked all frozen, in currently patch.
i.g., There is possible to exist the page is not all visible but marked frozen.
But I'm thinking to change that.

>
>
> +                       /*
> +                        * If the un-frozen tuple is remaining in current
> page and
> +                        * current page is marked as ALL_FROZEN, we should
> clear it.
> +                        */
>
> That needs to NEVER happen. If it does then we're going to consider tuples
> as visible/frozen that shouldn't be. We should probably throw an error here,
> because it means the heap is now corrupted. At the minimum it needs to be an
> assert().

I understood. I'll fix it.

> Note that I haven't reviewed all the logic in detail at this point. If this
> ends up being refactored it'll be a lot easier to spot logic problems, so
> I'll hold off on that for now.

Understood, we need to get consen at first.

Regards,

-------
Sawada Masahiko

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Freeze avoidance of very large table.