Обсуждение: Omitting relpages for toast table access not expected

Поиск
Список
Период
Сортировка

Omitting relpages for toast table access not expected

От
Stefan Litsche
Дата:
Hello PostgreSQL team,

first and foremost I'd like to thank you for the great work on this wonderful
database.

I want to share three observations which show not expected behavior.

I'm running  PostgreSQL 15.7 (Debian 15.7-1.pgdg120+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit in a docker container on
Ubuntu 22.04.4 LTS.

# First observation

Trying to learn the effect of toast for jsonb I want to have a table with a
jsonb column.  Since I want not to be distracted by the effect of compression I
set the storage to external.  I populate the test rows with values which a
slightly smaller than the threshold for compression.

CREATE TABLE toasttest (jb jsonb);
ALTER TABLE toasttest ALTER COLUMN jb SET STORAGE EXTERNAL;
INSERT INTO toasttest
SELECT
 jsonb_build_object(
 'id', i,
 'foo', (select jsonb_agg(0) from generate_series(1, 1960/12)) -- [0,0,0, ...]
 ) jb
FROM
 generate_series(1, 10000) i;

Checking the statistics to verify if expectations are met:

SELECT relname, relpages, reltuples
  FROM pg_class
 WHERE oid IN (
        SELECT UNNEST(ARRAY[oid, reltoastrelid])
          FROM pg_class
         WHERE oid = 'toasttest'::regclass);

    relname     | relpages | reltuples
----------------+----------+-----------
 toasttest      |     2500 |     10000
 pg_toast_56808 |        0 |        -1


EXPLAIN (ANALYZE, BUFFERS) SELECT jb->'id' FROM toasttest;


                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on toasttest  (cost=0.00..2625.00 rows=10000 width=32) (actual time=0.015..4.736 rows=10000 loops=1)
   Buffers: shared hit=2500
 Planning:
   Buffers: shared hit=5
 Planning Time: 0.145 ms
 Execution Time: 5.357 ms
(6 rows)

So far so good.  Now I want to observe the effect of updating the jsonb
value. Since the storage has be set to external, I expect the jsonb value to be
moved to the toast table.

UPDATE toasttest SET jb = jb || '{"bar": "baz"}';
VACUUM FULL toasttest;
ANALYZE toasttest;

The output for the statistics and explain plan after updating looks as follows:

    relname     | relpages | reltuples
----------------+----------+-----------
 toasttest      |       64 |     10000
 pg_toast_56808 |     3334 |     20000


                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on toasttest  (cost=0.00..189.00 rows=10000 width=32) (actual time=0.147..28.911 rows=10000 loops=1)
   Buffers: shared hit=30048 read=32
 Planning:
   Buffers: shared hit=5
 Planning Time: 0.115 ms
 Execution Time: 29.339 ms


As you can observe the cost has decreased by an order of magnitude while the
buffers accessed during execution has increased by an order of magnitude.

I would have expected that the cost also should have increased.  This would be
expected because also the number of buffers have increased if we consider the
toast table.

Only the number of pages for the main table have decreased and would explain the
drop in the costs if only the main table would be considered.

Shouldn't the cost of the plan also reflect the cost of accessing pages in the
toast table?

This was my first observation.  Now lets move to the second.

# Second observation

If I now want to reproduce the behavior from above, I start with dropping the
table.

DROP TABLE toasttest;

If I now investigate the relpages before and after the UPDATE, I observe the
following output:

CREATE TABLE
ALTER TABLE
INSERT 0 10000
ANALYZE

    relname     | relpages | reltuples
----------------+----------+-----------
 toasttest      |     2500 |     10000
 pg_toast_66818 |        0 |        -1
(2 rows)

UPDATE 10000
VACUUM
ANALYZE

    relname     | relpages | reltuples
----------------+----------+-----------
 toasttest      |       64 |     10000
 pg_toast_66818 |        0 |        -1
(2 rows)

So after dropping the table, recreating and analyzing it, I do not get updated
values for relpages for the corresponding toast table.  I would expect that the
analyze command also updates the statistics for the toast table.

However, in some situations the values for the toast table appear after some
time, but I was not able yet to find out, when.

# Third observation

I tried to reproduce the behavior on the latest version: PostgreSQL 16.3 (Debian
16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14)
12.2.0, 64-bit.  Again running it as a docker image.

The observation regarding query planner cost could be reproduced with one minor
difference.  Even after fresh installation the first analyze after the update
did NOT show the relpages for the toast table.  The cost decreased after the
update as described above.  As described in the second observation, the values
of relpages for the toast table appeared some time later.
The decrease in cost for the execution plan stays even after the relpages for the
toast table appeared after some time.

If anything is unclear in my description I would be happy to help.

Kind regards
Stefan Litsche

Re: Omitting relpages for toast table access not expected

От
Tom Lane
Дата:
Stefan Litsche <slitsche@biosoft.de> writes:
> I want to share three observations which show not expected behavior.

> [ after forcing data to be toasted ]

> Shouldn't the cost of the plan also reflect the cost of accessing
> pages in the toast table?

This is expected: the planner does not currently try to account for
the cost of fetching toasted values.  We have thought about that,
certainly.  But it would be quite difficult to do so without
introducing a lot of error into the numbers, since in most non-toy
situations it's hard to predict what fraction of the values fetched
by a particular query will be toasted or how big they will be.
Another reason for not expending sweat here is that in most scenarios
those costs would be the same for any possible query plan, so that the
effort involved in making a better estimate wouldn't end up improving
the plan.

> So after dropping the table, recreating and analyzing it, I do not
> get updated values for relpages for the corresponding toast table.
> I would expect that the analyze command also updates the statistics
> for the toast table.

This doesn't surprise me enormously.  We don't worry too much about
updating statistics for toast tables, because the planner wouldn't use
them anyway -- in particular I'm pretty sure ANALYZE doesn't examine
toast tables as such.  relpages might get updated as a by-product of
VACUUM, but only if VACUUM judges that it's worth scanning the toast
table, which it won't unless some toasted values have been deleted
since last time.

            regards, tom lane