Omitting relpages for toast table access not expected

Поиск
Список
Период
Сортировка
От Stefan Litsche
Тема Omitting relpages for toast table access not expected
Дата
Msg-id CA+bTu2RaUzKUqEdd6pPchW3tKvNcfbS6d7ADp8NUuuj871sK_A@mail.gmail.com
обсуждение исходный текст
Ответы Re: Omitting relpages for toast table access not expected
Список pgsql-bugs
Hello PostgreSQL team,

first and foremost I'd like to thank you for the great work on this wonderful
database.

I want to share three observations which show not expected behavior.

I'm running  PostgreSQL 15.7 (Debian 15.7-1.pgdg120+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit in a docker container on
Ubuntu 22.04.4 LTS.

# First observation

Trying to learn the effect of toast for jsonb I want to have a table with a
jsonb column.  Since I want not to be distracted by the effect of compression I
set the storage to external.  I populate the test rows with values which a
slightly smaller than the threshold for compression.

CREATE TABLE toasttest (jb jsonb);
ALTER TABLE toasttest ALTER COLUMN jb SET STORAGE EXTERNAL;
INSERT INTO toasttest
SELECT
 jsonb_build_object(
 'id', i,
 'foo', (select jsonb_agg(0) from generate_series(1, 1960/12)) -- [0,0,0, ...]
 ) jb
FROM
 generate_series(1, 10000) i;

Checking the statistics to verify if expectations are met:

SELECT relname, relpages, reltuples
  FROM pg_class
 WHERE oid IN (
        SELECT UNNEST(ARRAY[oid, reltoastrelid])
          FROM pg_class
         WHERE oid = 'toasttest'::regclass);

    relname     | relpages | reltuples
----------------+----------+-----------
 toasttest      |     2500 |     10000
 pg_toast_56808 |        0 |        -1


EXPLAIN (ANALYZE, BUFFERS) SELECT jb->'id' FROM toasttest;


                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on toasttest  (cost=0.00..2625.00 rows=10000 width=32) (actual time=0.015..4.736 rows=10000 loops=1)
   Buffers: shared hit=2500
 Planning:
   Buffers: shared hit=5
 Planning Time: 0.145 ms
 Execution Time: 5.357 ms
(6 rows)

So far so good.  Now I want to observe the effect of updating the jsonb
value. Since the storage has be set to external, I expect the jsonb value to be
moved to the toast table.

UPDATE toasttest SET jb = jb || '{"bar": "baz"}';
VACUUM FULL toasttest;
ANALYZE toasttest;

The output for the statistics and explain plan after updating looks as follows:

    relname     | relpages | reltuples
----------------+----------+-----------
 toasttest      |       64 |     10000
 pg_toast_56808 |     3334 |     20000


                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on toasttest  (cost=0.00..189.00 rows=10000 width=32) (actual time=0.147..28.911 rows=10000 loops=1)
   Buffers: shared hit=30048 read=32
 Planning:
   Buffers: shared hit=5
 Planning Time: 0.115 ms
 Execution Time: 29.339 ms


As you can observe the cost has decreased by an order of magnitude while the
buffers accessed during execution has increased by an order of magnitude.

I would have expected that the cost also should have increased.  This would be
expected because also the number of buffers have increased if we consider the
toast table.

Only the number of pages for the main table have decreased and would explain the
drop in the costs if only the main table would be considered.

Shouldn't the cost of the plan also reflect the cost of accessing pages in the
toast table?

This was my first observation.  Now lets move to the second.

# Second observation

If I now want to reproduce the behavior from above, I start with dropping the
table.

DROP TABLE toasttest;

If I now investigate the relpages before and after the UPDATE, I observe the
following output:

CREATE TABLE
ALTER TABLE
INSERT 0 10000
ANALYZE

    relname     | relpages | reltuples
----------------+----------+-----------
 toasttest      |     2500 |     10000
 pg_toast_66818 |        0 |        -1
(2 rows)

UPDATE 10000
VACUUM
ANALYZE

    relname     | relpages | reltuples
----------------+----------+-----------
 toasttest      |       64 |     10000
 pg_toast_66818 |        0 |        -1
(2 rows)

So after dropping the table, recreating and analyzing it, I do not get updated
values for relpages for the corresponding toast table.  I would expect that the
analyze command also updates the statistics for the toast table.

However, in some situations the values for the toast table appear after some
time, but I was not able yet to find out, when.

# Third observation

I tried to reproduce the behavior on the latest version: PostgreSQL 16.3 (Debian
16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14)
12.2.0, 64-bit.  Again running it as a docker image.

The observation regarding query planner cost could be reproduced with one minor
difference.  Even after fresh installation the first analyze after the update
did NOT show the relpages for the toast table.  The cost decreased after the
update as described above.  As described in the second observation, the values
of relpages for the toast table appeared some time later.
The decrease in cost for the execution plan stays even after the relpages for the
toast table appeared after some time.

If anything is unclear in my description I would be happy to help.

Kind regards
Stefan Litsche

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: BUG #17889: Invalid cursor direction for a foreign scan that reached the fetch_size (MOVE BACKWARD ALL IN cX)
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Potential data loss due to race condition during logical replication slot creation