Re: Freeze avoidance of very large table.

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Freeze avoidance of very large table.
Дата
Msg-id CAD21AoCTJ4n9qm09S5yucvoDDHHcOJporTFXWbj4ax=NxJ5EPQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Freeze avoidance of very large table.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: Freeze avoidance of very large table.
Список pgsql-hackers
On Thu, Nov 5, 2015 at 6:03 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello, I had a look on v21 patch.
>
> Though I haven't looked the whole of the patch, I'd like to show
> you some comments only for visibilitymap.c and a part of the
> documentation.
>
>
> 1. Patch application
>
>    patch command complains about offsets for heapam.c at current
>    master.
>
> 2. visitibilymap_test()
>
> -  if (visibilitymap_test(rel, blkno, &vmbuffer))
> +  if (visibilitymap_test(rel, blkno, &vmbuffer, VISIBILITYMAP_ALL_VISIBLE)
>
>  The old VM was a simple bitmap so the name _test and the
>  function are proper but now the bitmap is quad state so it'd be
>  better chainging the function. Alghough it is not so expensive
>  to call it twice successively, it is a bit uneasy for me doing
>  so. One possible shape would be like the following.
>
>  lazy_vacuum_page()
>  > int vmstate = visibilitymap_get_status(rel, blkno, &vmbuffer);
>  > if (!(vmstate  & VISIBILITYMAP_ALL_VISIBLE))
>  >   ...
>  > if (all_frozen && !(vmstate  & VISIBILITYMAP_ALL_FROZEN))
>  >   ...
>  > if (flags != vmstate)
>  >   visibilitymap_set(...., flags);
>
>  and defining two macros for indivisual tests,
>
>  > #define VM_ALL_VISIBLE(r, b, v) ((vm_get_status((r), (b), (v)) & .._VISIBLE) != 0)
>  > if (VM_ALL_VISIBLE(rel, blkno, &vmbuffer))
>  and
>  > if (VM_ALL_FROZEN(rel, blkno, &vmbuffer))
>
>  How about this?
>
>
> 3. visibilitymap.c
>
>  - HEAPBLK_TO_MAPBIT
>
>   In visibilitymap_clear and other functions, mapBit means
>   mapDualBit in the patch, and mapBit always appears in the form
>   "mapBit * BITS_PER_HEAPBLOCK". So, it'd be better to change the
>   definition of HEAPBLK_TO_MAPBIT so that it calculates really
>   the bit position in a byte.
>
>   - #define HEAPBLK_TO_MAPBIT(x) ((x) % HEAPBLOCKS_PER_BYTE)
>   + #define HEAPBLK_TO_MAPBIT(x) (((x) % HEAPBLOCKS_PER_BYTE) * BITS_PER_HEAPBLOCK)
>
>
>  - visibilitymap_count()
>
>   The third argument all_frozen is not necessary in some
>   usage. So this interface would be preferable to be as
>   following,
>
>   BlockNumber
>   visibilitymap_count(Relation rel, BlockNumber *all_frozen)
>   {
>      BlockNumber all_visible = 0;
>   ...
>      if (all_frozen)
>            *all_frozen = 0;
>   ... something like ...
>
>   - visibilitymap_set()
>
>    The check for ALL_VISIBLE is duplicate in the following
>    assertion.
>
>    > Assert((flags & VISIBILITYMAP_ALL_VISIBLE) ||
>    >       (flags & (VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN)));
>
>
>
> 4. documentation
>
>   - 18.11.1 Statement Hehavior
>
>     A typo.
>
>     > VACUUM performs *a* aggressive freezing
>
>     However I am not a fluent English speaker, and such
>     wordsmithing would be done by someone else, I feel that
>     "eager/greedy" is more suitable for this meaning..,
>     nevertheless, the term "whole-table freezing" that you wrote
>     elsewhere in this patch would be usable.
>
>     "VACUUM performs a whole-table freezing"
>
>     All "a table scan/sweep"s and something has the similar
>     meaning would be better be changed to "a whole-table
>     freezing"
>
>     In similar manner, "tuples/rows that are marked as frozen"
>     could be replaced with "unfrozen tuples/rows".
>
>   - 23.1.5 Preventing Transaction ID Wraparound Failures
>
>     "The whole table is scanned only when all pages happen to
>      require vacuuming to remove dead row versions."
>
>     This description looks a bit out-of-point. "the whole table
>     scan" in the original description is what is triggered by
>     relfrozenxid so the correspondent in the revised description
>     is "the whole-table freezing", maybe.
>
>     "The whole-table feezing takes place when
>      <structfield>relfrozenxid</> is more than
>      <varname>vacuum_freeze_table_age</> transactions old or when
>      <command>VACUUM</>'s <literal>FREEZE</> option is used. The
>      whole-table freezing scans all unfreezed pages."
>
>     The last sentence might be unnecessary.
>
>
>  - 63.4 Visibility Map
>
>    "pages contain only tuples that are marked as frozen" would be
>     enough to be "pages contain only frozen tuples"
>
>     and according to the discussion upthread, we might be good to
>     have some desciption that the name is historically omitting
>     the aspect of freezemap.
>
>
> At Sat, 31 Oct 2015 18:07:32 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1+aTdaSwG3u+y8fXxn67Kkj0T1KzRsFDLEi=tQvTYgFrQ@mail.gmail.com>
> amit.kapila16> On Fri, Oct 30, 2015 at 6:03 AM, Masahiko Sawada <sawada.mshk@gmail.com>
>> Couple of more review comments:
>> ------------------------------------------------------
>>
>> 1.
>> @@ -615,6 +617,8 @@ typedef struct PgStat_StatTabEntry
>>   PgStat_Counter n_dead_tuples;
>>   PgStat_Counter
>> changes_since_analyze;
>>
>> + int32 n_frozen_pages;
>> +
>>   PgStat_Counter blocks_fetched;
>>   PgStat_Counter
>> blocks_hit;
>>
>> As you are changing above structure, you need to update
>> PGSTAT_FILE_FORMAT_ID, refer below code:
>> #define PGSTAT_FILE_FORMAT_ID 0x01A5BC9D
>>
>> 2. It seems that n_frozen_page is not initialized/updated properly
>> for toast tables:
>>
>> Try with below steps:
>>
>> postgres=# create table t4(c1 int, c2 text);
>> CREATE TABLE
>> postgres=# select oid, relname from pg_class where relname like '%t4%';
>>   oid  | relname
>> -------+---------
>>  16390 | t4
>> (1 row)
>>
>>
>> postgres=# select oid, relname from pg_class where relname like '%16390%';
>>   oid  |       relname
>> -------+----------------------
>>  16393 | pg_toast_16390
>>  16395 | pg_toast_16390_index
>> (2 rows)
>>
>> postgres=# select relname,seq_scan,n_tup_ins,last_vacuum,n_frozen_page from
>> pg_s
>> tat_all_tables where relname like '%16390%';
>>     relname     | seq_scan | n_tup_ins | last_vacuum | n_frozen_page
>> ----------------+----------+-----------+-------------+---------------
>>  pg_toast_16390 |        1 |         0 |             |    -842150451
>> (1 row)
>>
>> Note that I have tested above scenario on my Windows 7 m/c.
>>
>> 3.
>>  * visibilitymap.c
>>  *  bitmap for tracking visibility of heap tuples
>>
>> I think above needs to be changed to:
>> bitmap for tracking visibility and frozen state of heap tuples
>>
>>
>> 4.
>> a.
>>   /*
>> - * If we froze any tuples, mark the buffer dirty, and write a WAL
>> - * record recording the changes.  We must log the changes to be
>> - * crash-safe against future truncation of CLOG.
>> + * If we froze any tuples then we mark the buffer dirty, and write a WAL
>>
>> b.
>> - * Release any remaining pin on visibility map page.
>> + * Release any remaining pin on visibility map.
>>
>> c.
>>   * We do update relallvisible even in the corner case, since if the table
>> - * is all-visible
>> we'd definitely like to know that.  But clamp the value
>> - * to be not more than what we're setting
>> relpages to.
>> + * is all-visible we'd definitely like to know that.
>> + * But clamp the value to be not more
>> than what we're setting relpages to.
>>
>> I don't think you need to change above comments.
>>
>> 5.
>> + * Even if scan_all is set so far, we could skip to scan some pages
>> + * according by all-frozen
>> bit of visibility amp.
>>
>> /according by/according to
>> /amp/map
>>
>> I suggested to modify comment as below:
>> During full scan, we could skip some pages according to all-frozen
>> bit of visibility map.
>>
>> Also no need to start this in new line, start from where the
>> previous line of comment ends.
>>
>> 6.
>> /*
>>  * lazy_scan_heap() -- scan an open heap relation
>>  *
>>  * This routine prunes each page in the
>> heap, which will among other
>>  * things truncate dead tuples to dead line pointers, defragment the
>>  *
>> page, and set commit status bits (see heap_page_prune).  It also builds
>>  * lists of dead
>> tuples and pages with free space, calculates statistics
>>  * on the number of live tuples in the
>> heap, and marks pages as
>>  * all-visible if appropriate.
>>
>> Modify above function header as:
>>
>> all-visible, all-frozen
>>
>> 7.
>> lazy_scan_heap()
>> {
>> ..
>>
>> if (PageIsEmpty(page))
>> {
>> empty_pages++;
>> freespace =
>> PageGetHeapFreeSpace(page);
>>
>> /* empty pages are always all-visible */
>> if (!PageIsAllVisible(page))
>> ..
>> }
>>
>> Don't we need to ensure that empty pages should get marked as
>> all-frozen?
>>
>> 8.
>> lazy_scan_heap()
>> {
>> ..
>> /*
>> * As of PostgreSQL 9.2, the visibility map bit should never be set if
>> * the page-
>> level bit is clear.  However, it's possible that the bit
>> * got cleared after we checked it
>> and before we took the buffer
>> * content lock, so we must recheck before jumping to the conclusion
>> * that something bad has happened.
>> */
>> else if (all_visible_according_to_vm
>> && !PageIsAllVisible(page)
>> && visibilitymap_test(onerel, blkno, &vmbuffer,
>> VISIBILITYMAP_ALL_VISIBLE))
>> {
>> elog(WARNING, "page is not marked all-visible
>> but visibility map bit is set in relation \"%s\" page %u",
>> relname, blkno);
>> visibilitymap_clear(onerel, blkno, vmbuffer);
>> }
>>
>> /*
>> *
>> It's possible for the value returned by GetOldestXmin() to move
>> * backwards, so it's not wrong for
>> us to see tuples that appear to
>> * not be visible to everyone yet, while PD_ALL_VISIBLE is already
>> * set. The real safe xmin value never moves backwards, but
>> * GetOldestXmin() is
>> conservative and sometimes returns a value
>> * that's unnecessarily small, so if we see that
>> contradiction it just
>> * means that the tuples that we think are not visible to everyone yet
>>  * actually are, and the PD_ALL_VISIBLE flag is correct.
>> *
>> * There should never
>> be dead tuples on a page with PD_ALL_VISIBLE
>> * set, however.
>> */
>> else
>> if (PageIsAllVisible(page) && has_dead_tuples)
>> {
>> elog(WARNING, "page
>> containing dead tuples is marked as all-visible in relation \"%s\" page %u",
>>
>> relname, blkno);
>> PageClearAllVisible(page);
>> MarkBufferDirty(buf);
>> visibilitymap_clear(onerel, blkno, vmbuffer);
>> }
>>
>> ..
>> }
>>
>> I think both the above cases could happen for frozen state
>> as well, unless you think otherwise, we need similar handling
>> for frozen bit.
>

Thank you for reviewing the patch.

I changed the patch so that the visibility map become the page info
map, in source code and documentation.
And fixed review comments I received.
Attached v22 patch.

> I think both the above cases could happen for frozen state
> as well, unless you think otherwise, we need similar handling
> for frozen bit.

It's not happen the situation where is all-frozen and not all-visible,
and the bits of visibility map are cleared at the same time, page
flags are as well.
So I think it's enough to handle only all-visible situation. Am I
missing something?

> 2. visitibilymap_test()
>
> -  if (visibilitymap_test(rel, blkno, &vmbuffer))
> +  if (visibilitymap_test(rel, blkno, &vmbuffer, VISIBILITYMAP_ALL_VISIBLE)
>
>  The old VM was a simple bitmap so the name _test and the
>  function are proper but now the bitmap is quad state so it'd be
>  better chainging the function. Alghough it is not so expensive
>  to call it twice successively, it is a bit uneasy for me doing
>  so. One possible shape would be like the following.
>
>  lazy_vacuum_page()
>  > int vmstate = visibilitymap_get_status(rel, blkno, &vmbuffer);
>  > if (!(vmstate  & VISIBILITYMAP_ALL_VISIBLE))
>  >   ...
>  > if (all_frozen && !(vmstate  & VISIBILITYMAP_ALL_FROZEN))
>  >   ...
>  > if (flags != vmstate)
>  >   visibilitymap_set(...., flags);
>
>  and defining two macros for indivisual tests,
>
>  > #define VM_ALL_VISIBLE(r, b, v) ((vm_get_status((r), (b), (v)) & .._VISIBLE) != 0)
>  > if (VM_ALL_VISIBLE(rel, blkno, &vmbuffer))
>  and
>  > if (VM_ALL_FROZEN(rel, blkno, &vmbuffer))
>
>  How about this?

I agree.
I've changed so.

Regards,

--
Masahiko Sawada

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: psql: add \pset true/false
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Support for N synchronous standby servers - take 2