Обсуждение: Write visibility map during CLUSTER/VACUUM FULL
Hi! I found it weird that CLUSTER/VACUUM FULL don't write visibility map. Attached patch implements writing visibility map in heapam_relation_copy_for_cluster(). I've studied previous attempt to implement this [1]. The main problem of that attempt was usage of existing heap_page_is_all_visible() and visibilitymap_set() functions. These functions works through buffer manager, while heap rewriting is made bypass buffer manager. In my patch visibility map pages are handled in the same way as heap pages are. RewriteStateData holds contents of single VM page. Once heap page is finished, corresponding bits are set to VM page etc. VM pages are flushed one-by-one to smgr. This patch have to implement its own check if tuple is allvisible. But it appears to be possible to simplify this check assuming that all tuples already past HeapTupleSatisfiesVacuum(), which sets hint bits. Links 1. https://www.postgresql.org/message-id/flat/20131203162556.GC27105%40momjian.us#90e4a6e77d92076650dcf1d96b32ba38 ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On Sun, Sep 1, 2019 at 11:07 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > This patch have to implement its own check if tuple is allvisible. > But it appears to be possible to simplify this check assuming that all > tuples already past HeapTupleSatisfiesVacuum(), which sets hint bits. Forgot to check tuple xmin against oldest xmin. Fixed. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On Sun, Sep 1, 2019 at 1:37 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > Hi! > > I found it weird that CLUSTER/VACUUM FULL don't write visibility map. > Attached patch implements writing visibility map in > heapam_relation_copy_for_cluster(). > > I've studied previous attempt to implement this [1]. The main problem > of that attempt was usage of existing heap_page_is_all_visible() and > visibilitymap_set() functions. These functions works through buffer > manager, while heap rewriting is made bypass buffer manager. > > In my patch visibility map pages are handled in the same way as heap > pages are. > I haven't studied this patch in detail, but while glancing I observed that this doesn't try to sync the vm pages as we do for heap pages in the end (during end_heap_rewrite). Am I missing something? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 11, 2019 at 3:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sun, Sep 1, 2019 at 1:37 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > I found it weird that CLUSTER/VACUUM FULL don't write visibility map. > > Attached patch implements writing visibility map in > > heapam_relation_copy_for_cluster(). > > > > I've studied previous attempt to implement this [1]. The main problem > > of that attempt was usage of existing heap_page_is_all_visible() and > > visibilitymap_set() functions. These functions works through buffer > > manager, while heap rewriting is made bypass buffer manager. > > > > In my patch visibility map pages are handled in the same way as heap > > pages are. > > > > I haven't studied this patch in detail, but while glancing I observed > that this doesn't try to sync the vm pages as we do for heap pages in > the end (during end_heap_rewrite). Am I missing something? You're not missed anything. Yes, VM need sync. Will fix this. And I just noticed I need a closer look to what is going on with TOAST. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, Sep 12, 2019 at 4:55 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Wed, Sep 11, 2019 at 3:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Sep 1, 2019 at 1:37 PM Alexander Korotkov > > <a.korotkov@postgrespro.ru> wrote: > > > I found it weird that CLUSTER/VACUUM FULL don't write visibility map. > > > Attached patch implements writing visibility map in > > > heapam_relation_copy_for_cluster(). > > > > > > I've studied previous attempt to implement this [1]. The main problem > > > of that attempt was usage of existing heap_page_is_all_visible() and > > > visibilitymap_set() functions. These functions works through buffer > > > manager, while heap rewriting is made bypass buffer manager. > > > > > > In my patch visibility map pages are handled in the same way as heap > > > pages are. > > > > > > > I haven't studied this patch in detail, but while glancing I observed > > that this doesn't try to sync the vm pages as we do for heap pages in > > the end (during end_heap_rewrite). Am I missing something? > > You're not missed anything. Yes, VM need sync. Will fix this. And I > just noticed I need a closer look to what is going on with TOAST. Attached patch syncs VM during end_heap_rewrite(). However, VM for TOAST still isn't read. It appear to be much more difficult to write VM for TOAST, because it's written by insertion tuples one-by-one. Despite it seems to fill TOAST heap pages sequentially (assuming no FSM exists yet), it's quite hard to handle page-switching event with reasonable level of abstraction. Nevertheless, I find this patch useful in current shape. Even if we don't write VM for TOAST, it's still useful to do for regular heap. Additionally, one of key advantages of having VM is index-only scan, which don't work for TOAST anyway. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
Hi, On 2019-09-13 22:22:50 +0300, Alexander Korotkov wrote: > On Thu, Sep 12, 2019 at 4:55 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > On Wed, Sep 11, 2019 at 3:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Sun, Sep 1, 2019 at 1:37 PM Alexander Korotkov > > > <a.korotkov@postgrespro.ru> wrote: > > > > I found it weird that CLUSTER/VACUUM FULL don't write visibility map. > > > > Attached patch implements writing visibility map in > > > > heapam_relation_copy_for_cluster(). > > > > > > > > I've studied previous attempt to implement this [1]. The main problem > > > > of that attempt was usage of existing heap_page_is_all_visible() and > > > > visibilitymap_set() functions. These functions works through buffer > > > > manager, while heap rewriting is made bypass buffer manager. > > > > > > > > In my patch visibility map pages are handled in the same way as heap > > > > pages are. Why do you want to do that? To me that doesn't immediately seems like a good idea, and I've not seen justification for it in this thread. Did I miss something? > > > I haven't studied this patch in detail, but while glancing I observed > > > that this doesn't try to sync the vm pages as we do for heap pages in > > > the end (during end_heap_rewrite). Am I missing something? > > > > You're not missed anything. Yes, VM need sync. Will fix this. And I > > just noticed I need a closer look to what is going on with TOAST. > > Attached patch syncs VM during end_heap_rewrite(). > > However, VM for TOAST still isn't read. It appear to be much more > difficult to write VM for TOAST, because it's written by insertion > tuples one-by-one. Despite it seems to fill TOAST heap pages > sequentially (assuming no FSM exists yet), it's quite hard to handle > page-switching event with reasonable level of abstraction. > Nevertheless, I find this patch useful in current shape. Even if we > don't write VM for TOAST, it's still useful to do for regular heap. > Additionally, one of key advantages of having VM is index-only scan, > which don't work for TOAST anyway. Have you looked at the discussion around https://www.postgresql.org/message-id/20190408010427.4l63qr7h2fjcyp77%40alap3.anarazel.de ? That's not quite the same thing as you're tackling here, but I wonder if some of the logic could be the same. Especially for toast. > +/* Write contents of VM page */ > +static void > +rewrite_flush_vm_page(RewriteState state) > +{ > + Assert(state->rs_vm_buffer_valid); > + > + if (state->rs_use_wal) > + log_newpage(&state->rs_new_rel->rd_node, > + VISIBILITYMAP_FORKNUM, > + state->rs_vm_blockno, > + state->rs_vm_buffer, > + true); > + RelationOpenSmgr(state->rs_new_rel); > + > + PageSetChecksumInplace(state->rs_vm_buffer, state->rs_vm_blockno); > + > + smgrextend(state->rs_new_rel->rd_smgr, VISIBILITYMAP_FORKNUM, > + state->rs_vm_blockno, (char *) state->rs_vm_buffer, true); > + > + state->rs_vm_buffer_valid = false; > +} > + > +/* Set VM flags to the VM page */ > +static void > +rewrite_set_vm_flags(RewriteState state) > +{ > + BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(state->rs_blockno); > + uint32 mapByte = HEAPBLK_TO_MAPBYTE(state->rs_blockno); > + uint8 mapOffset = HEAPBLK_TO_OFFSET(state->rs_blockno); > + char *map; > + uint8 flags; > + > + if (mapBlock != state->rs_vm_blockno && state->rs_vm_buffer_valid) > + rewrite_flush_vm_page(state); > + > + if (!state->rs_vm_buffer_valid) > + { > + PageInit(state->rs_vm_buffer, BLCKSZ, 0); > + state->rs_vm_blockno = mapBlock; > + state->rs_vm_buffer_valid = true; > + } > + > + flags = (state->rs_all_visible ? VISIBILITYMAP_ALL_VISIBLE : 0) | > + (state->rs_all_frozen ? VISIBILITYMAP_ALL_FROZEN : 0); > + > + map = PageGetContents(state->rs_vm_buffer); > + map[mapByte] |= (flags << mapOffset); > +} I think it's a bad idea to copy this many VM implementation details into rewriteheap.c. > +/* > + * Update rs_all_visible and rs_all_frozen flags according to the tuple. We > + * use simplified check assuming that HeapTupleSatisfiesVacuum() should already > + * set tuple hint bits. > + */ > +static void > +rewrite_update_vm_flags(RewriteState state, HeapTuple tuple) > +{ > + TransactionId xmin; > + > + if (!state->rs_all_visible) > + return; > + > + if (!HeapTupleHeaderXminCommitted(tuple->t_data)) > + { > + state->rs_all_visible = false; > + state->rs_all_frozen = false; > + return; > + } > + > + xmin = HeapTupleHeaderGetXmin(tuple->t_data); > + if (!TransactionIdPrecedes(xmin, state->rs_oldest_xmin)) > + { > + state->rs_all_visible = false; > + state->rs_all_frozen = false; > + return; > + } > + > + if (!(tuple->t_data->t_infomask & HEAP_XMAX_INVALID) && > + !HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_data->t_infomask)) > + { > + state->rs_all_visible = false; > + state->rs_all_frozen = false; > + return; > + } > + > + if (!state->rs_all_frozen) > + return; > + > + if (heap_tuple_needs_eventual_freeze(tuple->t_data)) > + state->rs_all_frozen = false; > +} I don't think we should have yet another copy of logic determining visibility. It has repeatedly proven hard to get right in the past, and it'll not get better by having yet another copy. Especially not because we've basically already done a HTSV (via heapam_relation_copy_for_cluster), and we're now redoing a poor man's version of it. Greetings, Andres Freund
On Fri, Sep 20, 2019 at 01:05:06PM -0700, Andres Freund wrote: > I don't think we should have yet another copy of logic determining > visibility. It has repeatedly proven hard to get right in the past, and > it'll not get better by having yet another copy. Especially not because > we've basically already done a HTSV (via > heapam_relation_copy_for_cluster), and we're now redoing a poor man's > version of it. These comments are unanswered for more than 2 months, so I am marking this entry as returned with feedback. Please feel free to change if you think that's not adapted. -- Michael
Вложения
On 2019-11-29 05:32, Michael Paquier wrote: > These comments are unanswered for more than 2 months, so I am marking > this entry as returned with feedback. I'd like to revive this patch. To make further work easier, attaching a rebased version of the latest patch by Alexander. As for having yet another copy of logic determining visibility: the visibility check was mainly taken from heap_page_is_all_visible(), so I refactored the code to make sure that logic is not duplicated. The updated patch is attached too. -- Anna Akenteva Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On Mon, Jun 28, 2021 at 1:52 PM Anna Akenteva <a.akenteva@postgrespro.ru> wrote: > > On 2019-11-29 05:32, Michael Paquier wrote: > > These comments are unanswered for more than 2 months, so I am marking > > this entry as returned with feedback. > > I'd like to revive this patch. To make further work easier, attaching a > rebased version of the latest patch by Alexander. > > As for having yet another copy of logic determining visibility: the > visibility check was mainly taken from heap_page_is_all_visible(), so I > refactored the code to make sure that logic is not duplicated. The > updated patch is attached too. The patch does not apply on Head anymore, could you rebase and post a patch. I'm changing the status to "Waiting for Author". Regards, Vignesh
On Mon, Jun 28, 2021 at 11:22:01AM +0300, Anna Akenteva wrote: > On 2019-11-29 05:32, Michael Paquier wrote: > > These comments are unanswered for more than 2 months, so I am marking > > this entry as returned with feedback. > > I'd like to revive this patch. To make further work easier, attaching a > rebased version of the latest patch by Alexander. > > As for having yet another copy of logic determining visibility: the > visibility check was mainly taken from heap_page_is_all_visible(), so I > refactored the code to make sure that logic is not duplicated. The updated > patch is attached too. I agree that it's strange that VACUUM(FREEZE) freezes tuples but not VM bits, nor page-level PD_ALL_VISIBLE (which is implied by all frozen). After FULL runs (taking an exclusive lock on the table), it's necessary to then vacuum the table again to get what's intended. Rebased on f10f0ae420ee62400876ab34dca2c09c20dcd030. And rephrased Anna's two independent/alternate patches as a 2nd patch on top of the 1st, as that helps me to read it and reduces its total size. I noticed in testing the patch that autovacuum is still hitting the relation shortly after vacuum full. This is due to n_ins_since_autovacuum, which is new in pg13. I don't know how to address that (or even if it should be addressed at all). Also, pg_class.relallvisible is not set unless vacuum/analyze is run again (which is mitigated by the n_ins behavior above). It seems like this might be important: an plan which uses index-only scan might be missed in favour of something else until autovacuum runs (it will use cost-based delay, and might not be scheduled immediately, could be interrupted, or even diasbled). I'm testing like this: CREATE EXTENSION IF NOT EXISTS pg_visibility ; DROP TABLE t; CREATE TABLE t AS SELECT generate_series(1,99999); VACUUM FULLt; ANALYZE t; SELECT n_ins_since_vacuum, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_all_tables WHERE relname='t';SELECT relpages, relallvisible FROM pg_class WHERE oid='t'::regclass; SELECT COUNT(1), COUNT(1)FILTER(WHERE all_visible='t')allvis, COUNT(1)FILTER(WHERE all_frozen='t') allfrz, COUNT(1)FILTER(WHERE pd_all_visible='t') allvispd FROMpg_visibility('t'); -- Justin
Вложения
> On 30 Aug 2021, at 02:26, Justin Pryzby <pryzby@telsasoft.com> wrote: > Rebased on f10f0ae420ee62400876ab34dca2c09c20dcd030. This patch no longer applies, please submit a rebased version. -- Daniel Gustafsson https://vmware.com/
On Sun, Aug 29, 2021 at 07:26:42PM -0500, Justin Pryzby wrote: > On Mon, Jun 28, 2021 at 11:22:01AM +0300, Anna Akenteva wrote: > > On 2019-11-29 05:32, Michael Paquier wrote: > > > These comments are unanswered for more than 2 months, so I am marking > > > this entry as returned with feedback. > > > > I'd like to revive this patch. To make further work easier, attaching a > > rebased version of the latest patch by Alexander. > > > > As for having yet another copy of logic determining visibility: the > > visibility check was mainly taken from heap_page_is_all_visible(), so I > > refactored the code to make sure that logic is not duplicated. The updated > > patch is attached too. > > I agree that it's strange that VACUUM(FREEZE) freezes tuples but not VM bits, > nor page-level PD_ALL_VISIBLE (which is implied by all frozen). After FULL > runs (taking an exclusive lock on the table), it's necessary to then vacuum the > table again to get what's intended. > > Rebased on f10f0ae420ee62400876ab34dca2c09c20dcd030. Rebased this patch again Alexander, are you planning on working on this patch ? Otherwise, Anna, would you want to "own" the patch ? > And rephrased Anna's two independent/alternate patches as a 2nd patch on top of > the 1st, as that helps me to read it and reduces its total size. > > I noticed in testing the patch that autovacuum is still hitting the relation > shortly after vacuum full. This is due to n_ins_since_autovacuum, which is new > in pg13. I don't know how to address that (or even if it should be addressed > at all). > > Also, pg_class.relallvisible is not set unless vacuum/analyze is run again > (which is mitigated by the n_ins behavior above). It seems like this might be > important: an plan which uses index-only scan might be missed in favour of > something else until autovacuum runs (it will use cost-based delay, and might > not be scheduled immediately, could be interrupted, or even diasbled). > > I'm testing like this: > CREATE EXTENSION IF NOT EXISTS pg_visibility ; DROP TABLE t; CREATE TABLE t AS SELECT generate_series(1,99999); VACUUMFULL t; ANALYZE t; SELECT n_ins_since_vacuum, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_all_tables WHERE relname='t';SELECT relpages, relallvisible FROM pg_class WHERE oid='t'::regclass; SELECT COUNT(1), COUNT(1)FILTER(WHERE all_visible='t')allvis, COUNT(1)FILTER(WHERE all_frozen='t') allfrz, COUNT(1)FILTER(WHERE pd_all_visible='t') allvispd FROMpg_visibility('t');
Вложения
On Mon, Nov 15, 2021 at 04:38:56PM -0600, Justin Pryzby wrote: > On Sun, Aug 29, 2021 at 07:26:42PM -0500, Justin Pryzby wrote: > > On Mon, Jun 28, 2021 at 11:22:01AM +0300, Anna Akenteva wrote: > > > On 2019-11-29 05:32, Michael Paquier wrote: > > > > These comments are unanswered for more than 2 months, so I am marking > > > > this entry as returned with feedback. > > > > > > I'd like to revive this patch. To make further work easier, attaching a > > > rebased version of the latest patch by Alexander. > > > > > > As for having yet another copy of logic determining visibility: the > > > visibility check was mainly taken from heap_page_is_all_visible(), so I > > > refactored the code to make sure that logic is not duplicated. The updated > > > patch is attached too. > > > > I agree that it's strange that VACUUM(FREEZE) freezes tuples but not VM bits, > > nor page-level PD_ALL_VISIBLE (which is implied by all frozen). After FULL > > runs (taking an exclusive lock on the table), it's necessary to then vacuum the > > table again to get what's intended. > > > > Rebased on f10f0ae420ee62400876ab34dca2c09c20dcd030. > > Rebased this patch again > > Alexander, are you planning on working on this patch ? > > Otherwise, Anna, would you want to "own" the patch ? Rebased on 8e1fae193864527c931a704bd7908e4fbc983f5c. Would someone step up to "own" this patch ? If not, its CF entry may need to be closed (there's no status for "needs author").
Вложения
On Sun, Dec 26, 2021 at 08:59:31PM -0600, Justin Pryzby wrote: > Rebased on 8e1fae193864527c931a704bd7908e4fbc983f5c. > > Would someone step up to "own" this patch ? > > If not, its CF entry may need to be closed (there's no status for "needs > author"). I'm planning to close this patch until someone can shepherd it. -- Justin
Justin Pryzby <pryzby@telsasoft.com> wrote: > I'm planning to close this patch until someone can shepherd it. I've marked it RwF for now. --Jacob