Re: New vacuum option to do only freezing

Поиск
Список
Период
Сортировка
От Bossart, Nathan
Тема Re: New vacuum option to do only freezing
Дата
Msg-id 9326B6A8-96FE-43F8-87D8-F63998DB3698@amazon.com
обсуждение исходный текст
Ответ на Re: New vacuum option to do only freezing  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
On 3/5/19, 1:22 AM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:
> On Tue, Mar 5, 2019 at 8:27 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>> +      <command>VACUUM</command> removes dead tuples and prunes HOT-updated
>> +      tuples chain for live tuples on table. If the table has any dead tuple
>> +      it removes them from both table and indexes for re-use. With this
>> +      option <command>VACUUM</command> doesn't completely remove dead tuples
>> +      and disables removing dead tuples from indexes.  This is suitable for
>> +      avoiding transaction ID wraparound (see
>> +      <xref linkend="vacuum-for-wraparound"/>) but not sufficient for avoiding
>> +      index bloat. This option is ignored if the table doesn't have index.
>> +      This cannot be used in conjunction with <literal>FULL</literal>
>> +      option.
>>
>> There are a couple of small changes I would make.  How does something
>> like this sound?
>>
>>     VACUUM removes dead tuples and prunes HOT-updated tuple chains for
>>     live tuples on the table.  If the table has any dead tuples, it
>>     removes them from both the table and its indexes and marks the
>>     corresponding line pointers as available for re-use.  With this
>>     option, VACUUM still removes dead tuples from the table, but it
>>     does not process any indexes, and the line pointers are marked as
>>     dead instead of available for re-use.  This is suitable for
>>     avoiding transaction ID wraparound (see Section 24.1.5) but not
>>     sufficient for avoiding index bloat.  This option is ignored if
>>     the table does not have any indexes.  This cannot be used in
>>     conjunction with the FULL option.
>
> Hmm, that's good idea but I'm not sure that user knows the word 'line
> pointer' because it is used only at pageinspect document. I wonder if
> the word 'item identifier' would rather be appropriate here because
> this is used at at describing page layout(in storage.sgml). Thought?

That seems reasonable to me.  It seems like ItemIdData is referred to
as "item pointer," "line pointer," or "item identifier" depending on
where you're looking, but ItemPointerData is also referred to as "item
pointer."  I think using "item identifier" is appropriate here for
clarity and consistency with storage.sgml.

>>                 tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
>> -                                                                                &vacrelstats->latestRemovedXid);
>> +                                                                                &vacrelstats->latestRemovedXid,
>> +                                                                                &tups_pruned);
>>
>> Why do we need a separate tups_pruned argument in heap_page_prune()?
>> Could we add the result of heap_page_prune() to tups_pruned instead,
>> then report the total number of removed tuples as tups_vacuumed +
>> tups_pruned elsewhere?
>
> Hmm, I thought that we should report only the number of tuples
> completely removed but we already count the tulples marked as
> redirected as tups_vacuumed. Let me summarize the fate of dead tuples.
> I think we can roughly classify dead tuples as follows.
>
> 1. root tuple of HOT chain that became dead
> 2. root tuple of HOT chain that became redirected
> 3. other tupels of HOT chain that became unused
> 4. tuples that became dead after HOT pruning
>
> The tuples of #1 through #3 either have only ItemIDs or have been
> completely removed but tuples of #4 has its tuple storage because they
> are not processed when HOT-pruning.
>
> Currently tups_vacuumed counts all of them, nleft (=
> vacrelstats->num_dead_tuples) counts #1 + #4. I think that the number
> of removed tuples being reported would be #1 + #2 + #3. Or should we
> use  #2 + #3 instead?

I think I'm actually more in favor of what was in v6.  IIRC that
version of the patch didn't modify how we tracked the "removed" tuples
at all, but it just added the "X item identifiers left marked dead"
metric.  Since even the tuples we are leaving marked dead lose
storage, that seems accurate enough to me.

>> Another interesting thing I noticed is that this "removed X row
>> versions" message is only emitted if vacuumed_pages is greater than 0.
>> However, if we only did HOT pruning, tups_vacuumed will be greater
>> than 0 while vacuumed_pages will still be 0, so some information will
>> be left out.  I think this is already the case, though, so this could
>> probably be handled in a separate thread.
>>
>
> Hmm, since this log message is corresponding to the one that
> lazy_vacuum_heap makes and total number of removed tuples are always
> reported, it seems consistent to me. Do you have another point?

Here's an example:

        postgres=# CREATE TABLE test (a INT, b INT);
        CREATE TABLE
        postgres=# CREATE INDEX ON test (a);
        CREATE INDEX
        postgres=# INSERT INTO test VALUES (1, 2);
        INSERT 0 1

After only HOT updates, the "removed X row versions in Y pages"
message is not emitted:

        postgres=# UPDATE test SET b = 3;
        UPDATE 1
        postgres=# UPDATE test SET b = 4;
        UPDATE 1
        postgres=# VACUUM (FREEZE, VERBOSE) test;
        INFO:  aggressively vacuuming "public.test"
        INFO:  index "test_a_idx" now contains 1 row versions in 2 pages
        DETAIL:  0 index row versions were removed.
        0 index pages have been deleted, 0 are currently reusable.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
        INFO:  "test": found 2 removable, 1 nonremovable row versions in 1 out of 1 pages
        DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 494
        There were 1 unused item pointers.
        Skipped 0 pages due to buffer pins, 0 frozen pages.
        0 pages are entirely empty.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
        VACUUM

After non-HOT updates, the "removed" message is emitted:

        postgres=# UPDATE test SET a = 5;
        UPDATE 1
        postgres=# VACUUM (FREEZE, VERBOSE) test;
        INFO:  aggressively vacuuming "public.test"
        INFO:  scanned index "test_a_idx" to remove 1 row versions
        DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
        INFO:  "test": removed 1 row versions in 1 pages
        DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
        INFO:  index "test_a_idx" now contains 1 row versions in 2 pages
        DETAIL:  1 index row versions were removed.
        0 index pages have been deleted, 0 are currently reusable.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
        INFO:  "test": found 1 removable, 1 nonremovable row versions in 1 out of 1 pages
        DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 495
        There were 1 unused item pointers.
        Skipped 0 pages due to buffer pins, 0 frozen pages.
        0 pages are entirely empty.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
        VACUUM

Nathan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Windows 32 bit vs circle test
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Rare SSL failures on eelpout