Re: New vacuum option to do only freezing

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: New vacuum option to do only freezing
Дата
Msg-id 20190305.200121.198128874.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: New vacuum option to do only freezing  ("Bossart, Nathan" <bossartn@amazon.com>)
Ответы Re: New vacuum option to do only freezing  ("Bossart, Nathan" <bossartn@amazon.com>)
Re: New vacuum option to do only freezing  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
Hello, I have some other comments.

At Mon, 4 Mar 2019 23:27:10 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
<48410154-E6C5-4C07-8122-8D04E3BCD1F8@amazon.com>
> On 2/28/19, 12:13 AM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:
> > Attached the updated version patch. I've incorporated all review
> > comments I got and have changed the number of tuples being reported as
> > 'removed tuples'. With this option, tuples completely being removed is
> > only tuples marked as unused during HOT-pruning, other dead tuples are
> > left. So we count those tuples during HOT-pruning and reports it as
> > removed tuples.
> 
> Thanks for the new patches.  Beyond the reloptions discussion, most of
> my feedback is wording suggestions.
> 
> +      <command>VACUUM</command> removes dead tuples and prunes HOT-updated
> +      tuples chain for live tuples on table. If the table has any dead tuple
> +      it removes them from both table and indexes for re-use. With this
> +      option <command>VACUUM</command> doesn't completely remove dead tuples
> +      and disables removing dead tuples from indexes.  This is suitable for
> +      avoiding transaction ID wraparound (see
> +      <xref linkend="vacuum-for-wraparound"/>) but not sufficient for avoiding
> +      index bloat. This option is ignored if the table doesn't have index.
> +      This cannot be used in conjunction with <literal>FULL</literal>
> +      option.
> 
> There are a couple of small changes I would make.  How does something
> like this sound?
> 
>     VACUUM removes dead tuples and prunes HOT-updated tuple chains for
>     live tuples on the table.  If the table has any dead tuples, it
>     removes them from both the table and its indexes and marks the
>     corresponding line pointers as available for re-use.  With this
>     option, VACUUM still removes dead tuples from the table, but it
>     does not process any indexes, and the line pointers are marked as
>     dead instead of available for re-use.  This is suitable for
>     avoiding transaction ID wraparound (see Section 24.1.5) but not
>     sufficient for avoiding index bloat.  This option is ignored if
>     the table does not have any indexes.  This cannot be used in
>     conjunction with the FULL option.
> 
> - * Returns the number of tuples deleted from the page and sets
> - * latestRemovedXid.
> + * Returns the number of tuples deleted from the page and set latestRemoveXid
> + * and increment nunused.
> 
> I would say something like: "Returns the number of tuples deleted from
> the page, sets latestRemovedXid, and updates nunused."
> 
> +    /*
> +     * hasindex = true means two-pass strategy; false means one-pass. But we
> +     * always use the one-pass strategy when index vacuum is disabled.
> +     */
> 
> I think the added sentence should make it more clear that hasindex
> will still be true when DISABLE_INDEX_CLEANUP is used.  Maybe
> something like:
> 
>     /*
>      * hasindex = true means two-pass strategy; false means one-pass
>      *
>      * If DISABLE_INDEX_CLEANUP is used, hasindex may still be true,
>      * but we'll always use the one-pass strategy.
>      */
> 
>                 tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
> -                                                                                &vacrelstats->latestRemovedXid);
> +                                                                                &vacrelstats->latestRemovedXid,
> +                                                                                &tups_pruned);
> 
> Why do we need a separate tups_pruned argument in heap_page_prune()?
> Could we add the result of heap_page_prune() to tups_pruned instead,
> then report the total number of removed tuples as tups_vacuumed +
> tups_pruned elsewhere?
> 
> +         * If there are no indexes or we skip index vacuum then we can vacuum
> +         * the page right now instead of doing a second scan.
> 
> How about:
> 
>     If there are no indexes or index cleanup is disabled, we can
>     vacuum the page right now instead of doing a second scan.
> 
> +                /*
> +                 * Here, we have indexes but index vacuum is disabled. We don't
> +                 * vacuum dead tuples on heap but forget them as we skip index
> +                 * vacuum. The vacrelstats->dead_tuples could have tuples which
> +                 * became dead after checked at HOT-pruning time but aren't marked
> +                 * as dead yet. We don't process them because it's a very rare
> +                 * condition and the next vacuum will process them.
> +                 */
> 
> I would suggest a few small changes:
> 
>     /*
>      * Here, we have indexes but index vacuum is disabled.  Instead of
>      * vacuuming the dead tuples on the heap, we just forget them.
>      *
>      * Note that vacrelstats->dead_tuples could include tuples which
>      * became dead after HOT-pruning but are not marked dead yet.  We
>      * do not process them because this is a very rare condition, and
>      * the next vacuum will process them anyway.
>      */
> 
> -       /* If no indexes, make log report that lazy_vacuum_heap would've made */
> +       /*
> +        * If no index or disables index vacuum, make log report that lazy_vacuum_heap
> +        * would've made. If index vacuum is disabled, we didn't remove all dead
> +        * tuples but did for tuples removed by HOT-pruning.
> +        */
>         if (vacuumed_pages)
>                 ereport(elevel,
>                                 (errmsg("\"%s\": removed %.0f row versions in %u pages",
>                                                 RelationGetRelationName(onerel),
> -                                               tups_vacuumed, vacuumed_pages)));
> +                                               skip_index_vacuum ? tups_pruned : tups_vacuumed,
> +                                               vacuumed_pages)));
> 
> How about:
> 
>     /*
>      * If no index or index vacuum is disabled, make log report that
>      * lazy_vacuum_heap would've made.  If index vacuum is disabled,
>      * we don't include the tuples that we marked dead, but we do
>      * include tuples removed by HOT-pruning.
>      */
> 
> Another interesting thing I noticed is that this "removed X row
> versions" message is only emitted if vacuumed_pages is greater than 0.
> However, if we only did HOT pruning, tups_vacuumed will be greater
> than 0 while vacuumed_pages will still be 0, so some information will
> be left out.  I think this is already the case, though, so this could
> probably be handled in a separate thread.


+                nleft;            /* item pointers we left */

The name seems to be something other, and the comment doesn't
makes sense at least.. for me.. Looking below,

+                                    "%.0f tuples are left as dead.\n",
+                                    nleft),
+                     nleft);

How about "nleft_dead; /* iterm pointers left as dead */"?



In this block:

-        if (nindexes == 0 &&
+        if ((nindexes == 0 || skip_index_vacuum) &&
             vacrelstats->num_dead_tuples > 0)
         {

Is it right that vacuumed_pages is incremented and FSM is updated
while the page is not actually vacuumed?


         tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
-                                         &vacrelstats->latestRemovedXid);
+                                         &vacrelstats->latestRemovedXid,
+                                         &tups_pruned);

tups_pruned looks as "HOT pruned tuples". It is named "unused" in
the function's parameters. (But I think it is useless. Please see
the details below.)


I tested it with a simple data set.

(autovacuum = off)
drop table if exists t;
create table t with (fillfactor=50) as select a, a % 3 as b from generate_series(1, 9) a;
create index on t(a);
update t set a = -a where b = 0;
update t set b = b + 1 where b = 1;

We now have 9 tuples, 15 versions and 3 out of 6 "old" tuples are
to be "left dead" by DISABLE_INDEX_CLEANUP vacuum. It means,
three tuples ends with "left dead", three tuples are removed and
12 tuples will survive the vacuum below.

vacuum (verbose, freeze ,disable_index_cleanup, disable_page_skipping) t;

> INFO:  "t": removed 0 row versions in 1 pages
> INFO:  "t": found 0 removable, 9 nonremovable row versions in 1 out of 1 pages
> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 925
> There were 0 unused item pointers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 0 pages are entirely empty.
> 3 tuples are left as dead.

Three tuple versions have vanished. Actually they were removed
but not shown in the message.

heap_prune_chain() doesn't count a live root entry of a chain as
"unused (line pointer)" since it is marked as "redierected". As
the result the vanished tuples are counted in tups_vacuumed, not
in tups_pruned. Maybe the name tups_vacuumed is confusing.  After
removing tups_pruned code it works correctly.

> INFO:  "t": removed 6 row versions in 1 pages
> INFO:  "t": found 6 removable, 9 nonremovable row versions in 1 out of 1 pages
> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 932
> There were 0 unused item pointers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 0 pages are entirely empty.
> 3 tuples are left as dead.

I see two choices of the second line above.

1> "t": found 6 removable, 9 nonremovable row versions in 1 out of 1 pages

  "removable" includes "left dead" tuples.

2> "t": found 3 removable, 12 nonremovable row versions in 1 out of 1 pages

  "removable" excludes "left dead" tuples.

If you prefer the latter, removable and nonremoveable need to be
corrected using nleft.

> INFO:  "t": found 3 removable, 12 nonremovable row versions in 1 out of 1 pages
> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 942
> There were 0 unused item pointers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 0 pages are entirely empty.
> 3 tuples are left as dead.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Raúl Marín Rodríguez
Дата:
Сообщение: Re: Re: [PATCH] pgbench tap tests fail if the path contains a perlspecial character
Следующее
От: Nikita Glukhov
Дата:
Сообщение: Re: Fix memleaks and error handling in jsonb_plpython