Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Дата
Msg-id CABOikdOzzLE8A9CivB+xYVZSVUUHRfkA=FOC48TNHtyWPBBhmA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers


On Wed, Feb 27, 2019 at 7:05 AM Jeff Janes <jeff.janes@gmail.com> wrote:


After doing a truncation and '\copy ... with (freeze)' of a table with long data, I find that the associated toast table has a handful of unfrozen blocks.  I don't know if that is an actual problem, but it does seem a bit odd, and thus suspicious.


Hi Jeff, thanks for looking at it and the test. I can reproduce the problem and quite curiously block number 1 and then every 32672th block is getting skipped.

postgres=# select * from pg_visibility('pg_toast.pg_toast_16384') where all_visible = 'f';
 blkno  | all_visible | all_frozen | pd_all_visible 
--------+-------------+------------+----------------
      1 | f           | f          | f
  32673 | f           | f          | f
  65345 | f           | f          | f
  98017 | f           | f          | f
 130689 | f           | f          | f
 163361 | f           | f          | f
 <snip>

Having investigated this a bit, I see that a relcache invalidation arrives after 1st and then after every 32672th block is filled. That clears the rel->rd_smgr field and we lose the information about the saved target block. The code then moves to extend the relation again and thus skips the previously less-than-half filled block, losing the free space in that block.

postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_16384', 0));
    lsn     | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
------------+----------+-------+-------+-------+---------+----------+---------+-----------
 1/15B37748 |        0 |     4 |    40 |    64 |    8192 |     8192 |       4 |         0
(1 row)

postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_16384', 1));
    lsn     | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
------------+----------+-------+-------+-------+---------+----------+---------+-----------
 1/15B39A28 |        0 |     4 |    28 |  7640 |    8192 |     8192 |       4 |         0
(1 row)

postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_16384', 2));
    lsn     | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
------------+----------+-------+-------+-------+---------+----------+---------+-----------
 1/15B3BE08 |        0 |     4 |    40 |    64 |    8192 |     8192 |       4 |         0
(1 row)

So the block 1 has a large amount of free space (upper - lower), which never gets filled.

I am not yet sure what causes the relcache invalidation at regular intervals. But if I have to guess, it could be because of a new VM (or FSM?) page getting allocated. I am bit puzzled because this issue seems to only occur with toast tables since I tested the patch while writing it on a regular table and did not see any block remaining unfrozen. I tested only upto 450 blocks, but that shouldn't matter because with your test, we see the problem with block 1 as well. So something to look into in more detail.

While we could potentially fix this by what you'd done in the original patch and what Kuntal also suggested, i.e. by setting the PD_ALL_VISIBLE bit during page initialisation itself, I am a bit circumspect about that approach for two reasons:

1. It requires us to then add extra logic to avoid clearing the bit during insertions
2. It requires us to also update the VM bit during page init or risk having divergent views on the page-level bit and the VM bit.

And even if we do that, this newly discovered problem of less-than-half filled intermediate blocks remain. I wonder if we should instead track the last used block in BulkInsertState and if the relcache invalidation flushes smgr, start inserting again from the last saved block. In fact, we already track the last used buffer in BulkInsertState and that's enough to know the last used block.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Early WIP/PoC for inlining CTEs
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: Early WIP/PoC for inlining CTEs