Re: [HACKERS] Block level parallel vacuum

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [HACKERS] Block level parallel vacuum
Дата
Msg-id CAA4eK1LSUwbrKT8L23akOStUw2Nzgh_Pz-CqXGQW3AH=3fSoaQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Block level parallel vacuum  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Sun, Jan 19, 2020 at 2:15 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Jan 17, 2020 at 1:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Thanks for doing this test again.  In the attached patch, I have
> > addressed all the comments and modified a few comments.
>
> I am in favor of the general idea of parallel VACUUM that parallelizes
> the processing of each index (I haven't looked at the patch, though).
> I observed something during a recent benchmark of the deduplication
> patch that seems like it might be relevant to parallel VACUUM. This
> happened during a recreation of the original WARM benchmark, which is
> described here:
>
> https://www.postgresql.org/message-id/CABOikdMNy6yowA%2BwTGK9RVd8iw%2BCzqHeQSGpW7Yka_4RSZ_LOQ%40mail.gmail.com
>
> (There is an extra pgbench_accounts index on abalance, plus 4 indexes
> on large text columns with filler MD5 hashes, all of which are
> random.)
>
> On the master branch, I can clearly observe that the "filler" MD5
> indexes are bloated to a degree that is affected by the order of their
> original creation/pg_class OID order. These are all indexes that
> become bloated purely due to "version churn" -- or what I like to call
> "unnecessary" page splits. The keys used in each pgbench_accounts
> logical row never change, except in the case of the extra abalance
> index (the idea is to prevent all HOT updates without ever updating
> most indexed columns). I noticed that pgb_a_filler1 is a bit less
> bloated than pgb_a_filler2, which is a little less bloated than
> pgb_a_filler3, which is a little less bloated than pgb_a_filler4. Even
> after 4 hours, and even though the "shape" of each index is identical.
> This demonstrates an important general principle about vacuuming
> indexes: timeliness can matter a lot.
>
> In general, a big benefit of the deduplication patch is that it "buys
> time" for VACUUM to run before "unnecessary" page splits can occur --
> that is why the deduplication patch prevents *all* page splits in
> these "filler" indexes, whereas on the master branch the filler
> indexes are about 2x larger (the exact amount varies based on VACUUM
> processing order, at least earlier on).
>
> For tables with several indexes, giving each index its own VACUUM
> worker process will prevent "unnecessary" page splits caused by
> version churn, simply because VACUUM will start to clean each index
> sooner than it would compared to serial processing (except for the
> "lucky" first index). There is no "lucky" first index that gets
> preferential treatment -- presumably VACUUM will start processing each
> index at the same time with this patch, making each index equally
> "lucky".
>
> I think that there may even be a *complementary* effect with parallel
> VACUUM, though I haven't tested that theory. Deduplication "buys time"
> for VACUUM to run, while at the same time VACUUM takes less time to
> show up and prevent "unnecessary" page splits. My guess is that these
> two seemingly unrelated patches may actually address this "unnecessary
> page split" problem from two completely different angles, with an
> overall effect that is greater than the sum of its parts.
>

Good analysis and I agree that the parallel vacuum patch can help in
such cases.  However, as of now, it only works via Vacuum command, so
some user intervention is required to realize the benefit.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kohei KaiGai
Дата:
Сообщение: Re: TRUNCATE on foreign tables
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] Block level parallel vacuum