Re: parallel vacuum comments
От | Masahiko Sawada |
---|---|
Тема | Re: parallel vacuum comments |
Дата | |
Msg-id | CAD21AoBxGEMMPDHXbFB2oit2eo_VRhUXXtrZYhUzqozr2aWv8A@mail.gmail.com обсуждение исходный текст |
Ответ на | parallel vacuum comments (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: parallel vacuum comments
(Masahiko Sawada <sawada.mshk@gmail.com>)
Re: parallel vacuum comments (Amit Kapila <amit.kapila16@gmail.com>) Re: parallel vacuum comments (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum > related code. And I found a few things that I think could stand improvement: > > - There's pretty much no tests. This is way way too complicated feature for > that. If there had been tests for the obvious edge case of some indexes > being too small to be handled in parallel, but others needing parallelism, > the mistake leading to #17245 would have been caught during development. Yes. We should have tests at least for such cases. > > > - There should be error check verifying that all indexes have actually been > vacuumed. It's way too easy to have bugs leading to index vacuuming being > skipped. Agreed. > > > - The state machine is complicated. It's very unobvious that an index needs to > be processed serially by the leader if shared_indstats == NULL. I think we can consolidate the logic that decides who (a worker or the leader) processes the index in one function. > > > - I'm very confused by the existance of LVShared->bitmap. Why is it worth > saving 7 bits per index for something like this (compared to a simple > array of bools)? Nor does the naming explain what it's for. > > The presence of the bitmap requires stuff like SizeOfLVShared(), which > accounts for some of the bitmap size, but not all? Yes, it's better to account for the size of all bitmaps. > > But even though we have this space optimized bitmap thing, we actually need > more memory allocated for each index, making this whole thing pointless. Right. But is better to change to use booleans? > - Imo it's pretty confusing to have functions like > lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index > vacuum or index cleanup with parallel workers.", based on > lps->lvshared->for_cleanup. Okay. We need to set lps->lvshared->for_cleanup to tell worker do either index vacuum or index cleanup. So it might be better to pass for_cleanup flag down to the functions in addition to setting lps->lvshared->for_cleanup. > > > - I don't like some of the new names introduced in 14. E.g. > "do_parallel_processing" is way too generic. I listed the function names that probably needs to be renamed from that perspecti: * do_parallel_processing * do_serial_processing_for_unsafe_indexes * parallel_process_one_index Is there any other function that should be renamed? > - On a higher level, a lot of this actually doesn't seem to belong into > vacuumlazy.c, but should be somewhere more generic. Pretty much none of this > code is heap specific. And vacuumlazy.c is large enough without the parallel > code. I don't come with an idea to make them more generic. Could you elaborate on that? I've started to write a patch for these comments. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/e
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Kyotaro HoriguchiДата:
Сообщение: Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes
Следующее
От: Masahiko SawadaДата:
Сообщение: Re: Skipping logical replication transactions on subscriber side