Обсуждение: pg_stats.avg_width

Поиск
Список
Период
Сортировка

pg_stats.avg_width

От
Maciek Sakrejda
Дата:
Hello,

The pg_stats.avg_width field is documented [1] as "Average width in
bytes of column's entries" but it's not defined exactly what "entries"
means here with respect to STORAGE (the underlying pg_statistic
documentation doesn't clarify this either). I thought initially this
was the "logical" size of the values, but I ran an experiment that
suggests this interpretation is not right:

maciek=# create table foo(a text);
CREATE TABLE
maciek=# insert into foo(a) select string_agg(floor((random() *
10)::numeric)::text, '') from generate_series(1,1000000) g;
INSERT 0 1
maciek=# analyze foo;
ANALYZE
maciek=# select avg_width from pg_stats where tablename = 'foo' and
attname = 'a';
 avg_width
-----------
        18
(1 row)
maciek=# select length(a) from foo;
 length
---------
 1000000
(1 row)
maciek=# select reltoastrelid::regclass from pg_class where relname = 'foo';
       reltoastrelid
---------------------------
 pg_toast.pg_toast_6454708
(1 row)
maciek=# select sum(length(chunk_data)) from pg_toast.pg_toast_6454708;
  sum
--------
 724257
(1 row)

So the avg_width here appears to correspond to neither the logical
size nor the compressed toasted size. Am I missing something? Postgres
14.7 in case that matters.

Thanks,
Maciek

[1]: https://www.postgresql.org/docs/current/view-pg-stats.html



Re: pg_stats.avg_width

От
Tom Lane
Дата:
Maciek Sakrejda <m.sakrejda@gmail.com> writes:
> The pg_stats.avg_width field is documented [1] as "Average width in
> bytes of column's entries" but it's not defined exactly what "entries"
> means here with respect to STORAGE (the underlying pg_statistic
> documentation doesn't clarify this either). I thought initially this
> was the "logical" size of the values, but I ran an experiment that
> suggests this interpretation is not right:

Nope.  What that's meant to be is the size of value that would be passed
around inside the executor.  In this case:

> maciek=# select avg_width from pg_stats where tablename = 'foo' and
> attname = 'a';
>  avg_width
> -----------
>         18

what you are getting is the size of the TOAST pointer.  That's correct
for the planner's purposes, because it'd be the TOAST pointer not the
detoasted value that would be passed through joins, sorts, hashes, etc.
And we really only care about how much space would be needed for
things like sort temp files.

            regards, tom lane



Re: pg_stats.avg_width

От
Maciek Sakrejda
Дата:
Thanks, that makes sense. It was going to be my third guess, but it
seemed pretty wide for a TOAST pointer. Reviewing what goes in there,
though, it's reasonable.

I assume that this means for unTOASTed but compressed data, this
counts the compressed size.

Would a doc patch clarifying this (and possibly linking to the
relevant TOAST docs [1]) be welcome? The current wording is pretty
vague. Something like

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 7c09ab3000..2814ac8007 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7466,7 +7466,9 @@ SCRAM-SHA-256$<replaceable><iteration
count></replaceable>:<replaceable>&l
        <structfield>stawidth</structfield> <type>int4</type>
       </para>
       <para>
-       The average stored width, in bytes, of nonnull entries
+       The average stored width, in bytes, of nonnull entries. For compressed
+       entries, counts the compressed size; for TOASTed data, the size of the
+       TOAST pointer (see <link linkend="storage-toast">TOAST</link>).
       </para></entry>
      </row>

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index bb1a418450..62184fe32b 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3680,7 +3680,9 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        <structfield>avg_width</structfield> <type>int4</type>
       </para>
       <para>
-       Average width in bytes of column's entries
+       Average width in bytes of column's entries. For compressed entries,
+       counts the compressed size; for TOASTed data, the size of the TOAST
+       pointer (see <link linkend="storage-toast">TOAST</link>).
       </para></entry>
      </row>

(not sure if this should be <link /> or <xref />).

Thanks,
Maciek

[1]: https://www.postgresql.org/docs/current/storage-toast.html