Re: Freeze avoidance of very large table.

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Freeze avoidance of very large table.
Дата
Msg-id 553576F1.1010809@BlueTreble.com
обсуждение исходный текст
Ответ на Re: Freeze avoidance of very large table.  (Sawada Masahiko <sawada.mshk@gmail.com>)
Ответы Freeze avoidance of very large table.  (Sawada Masahiko <sawada.mshk@gmail.com>)
Список pgsql-hackers
On 4/20/15 2:45 AM, Sawada Masahiko wrote:
> Current patch adds new source file src/backend/access/heap/frozenmap.c
> which is quite similar to visibilitymap.c. They have similar code but
> are separated for now. I do refactoring these source code like adding
> bitmap.c, if needed.

My feeling is we'd definitely want this refactored; it looks to be a 
whole lot of duplicated code. But before working on that we should get 
consensus that a FrozenMap is a good idea.

Are there any meaningful differences between the two, besides the 
obvious name changes?

I think there's also a bunch of XLOG stuff that could be refactored too...

> Also, when skipping vacuum by visibility map, we can skip at least
> SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
> frozen map.

That's probably something else that can be factored out, since it's 
basically the same logic. I suspect we just need to && some of the 
checks so we're looking at both FM and VM at the same time.

Other comments...

It would be nice if we didn't need another page bit for FM; do you see 
any reasonable way that could happen?

+     * If we didn't pin the visibility(and frozen) map page and the page has
+     * become all visible(and frozen) while we were busy locking the buffer,
+     * or during some subsequent window during which we had it unlocked,
+     * we'll have to unlock and re-lock, to avoid holding the buffer lock
+     * across an I/O.  That's a bit unfortunate, especially since we'll now
+     * have to recheck whether the tuple has been locked or updated under us,
+     * but hopefully it won't happen very often.      */

s/(and frozen)/ or frozen/


+ * Reply XLOG_HEAP3_FROZENMAP record.
s/Reply/Replay/


+        /*
+         * XLogReplayBufferExtended locked the buffer. But frozenmap_set
+         * will handle locking itself.
+         */
+        LockBuffer(fmbuffer, BUFFER_LOCK_UNLOCK);

Doesn't this create a race condition?


Are you sure the bit in finish_heap_swap() is safe? If so, we should add 
the same the same for the visibility map too (it certainly better be all 
visible if it's frozen...)



+            /*
+             * Current block is all-visible.
+             * If frozen map represents that it's all frozen and this
+             * function is called for freezing tuples, we can skip to
+             * vacuum block.
+             */

I would state this as "Even if scan_all is true, we can skip blocks that 
are marked as frozen."

+            if (frozenmap_test(onerel, blkno, &fmbuffer) && scan_all)

I suspect it's faster to reverse those tests (scan_all && 
frozenmap_test())... but why do we even need to look at scan_all? AFAICT 
if a block as frozen we can skip it unconditionally.


+            /*
+             * If the un-frozen tuple is remaining in current page and
+             * current page is marked as ALL_FROZEN, we should clear it.
+             */

That needs to NEVER happen. If it does then we're going to consider 
tuples as visible/frozen that shouldn't be. We should probably throw an 
error here, because it means the heap is now corrupted. At the minimum 
it needs to be an assert().



Note that I haven't reviewed all the logic in detail at this point. If 
this ends up being refactored it'll be a lot easier to spot logic 
problems, so I'll hold off on that for now.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Turning off HOT/Cleanup sometimes
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Turning off HOT/Cleanup sometimes