Re: PoC/WIP: Extended statistics on expressions

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: PoC/WIP: Extended statistics on expressions
Дата
Msg-id CAEZATCVnQccV_Z7AuvbvquaOYxUj+cVFk48pPGASYHFjXWgW5g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PoC/WIP: Extended statistics on expressions  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: PoC/WIP: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
Looking through extended_stats.c, I found a corner case that can lead
to a seg-fault:

CREATE TABLE foo();
CREATE STATISTICS s ON (1) FROM foo;
ANALYSE foo;

This crashes in lookup_var_attr_stats(), because it isn't expecting
nvacatts to be 0. I can't think of any case where building stats on a
table with no analysable columns is useful, so it should probably just
exit early in that case.


In BuildRelationExtStatistics(), it looks like min_attrs should be
declared assert-only.


In evaluate_expressions():

+   /* set the pointers */
+   result = (ExprInfo *) ptr;
+   ptr += sizeof(ExprInfo);

I think that should probably have a MAXALIGN().


A slightly bigger issue that I don't like is the way it assigns
attribute numbers for expressions starting from
MaxHeapAttributeNumber+1, so the first expression has an attnum of
1601. That leads to pretty inefficient use of Bitmapsets, since most
tables only contain a handful of columns, and there's a large block of
unused space in the middle the Bitmapset.

An alternative approach might be to use regular attnums for columns
and use negative indexes -1, -2, -3, ... for expressions in the stored
stats. Then when storing and retrieving attnums from Bitmapsets, it
could just offset by STATS_MAX_DIMENSIONS (8) to avoid negative values
in the Bitmapsets, since there can't be more than that many
expressions (just like other code stores system attributes using
FirstLowInvalidHeapAttributeNumber).

That would be a somewhat bigger change, but hopefully fairly
mechanical, and then some code like add_expressions_to_attributes()
would go away.


Looking at the new view pg_stats_ext_exprs, I noticed that it fails to
show expressions until the statistics have been built. For example:

CREATE TABLE foo(a int, b int);
CREATE STATISTICS s ON (a+b), (a*b) FROM foo;
SELECT statistics_name, tablename, expr, n_distinct FROM pg_stats_ext_exprs;

 statistics_name | tablename | expr | n_distinct
-----------------+-----------+------+------------
 s               | foo       |      |
(1 row)

but after populating and analysing the table, this becomes:

 statistics_name | tablename |  expr   | n_distinct
-----------------+-----------+---------+------------
 s               | foo       | (a + b) |         11
 s               | foo       | (a * b) |         11
(2 rows)

I think it should show the expressions even before the stats have been built.

Another issue is that it returns rows for non-expression stats as
well. For example:

CREATE TABLE foo(a int, b int);
CREATE STATISTICS s ON a, b FROM foo;
SELECT statistics_name, tablename, expr, n_distinct FROM pg_stats_ext_exprs;

 statistics_name | tablename | expr | n_distinct
-----------------+-----------+------+------------
 s               | foo       |      |
(1 row)

and those values will never be populated, since they're not
expressions, so I would expect them to not be shown in the view.

So basically, instead of

+         LEFT JOIN LATERAL (
+             SELECT
+                 *
+             FROM (
+                 SELECT
+
unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr,
+                     unnest(sd.stxdexpr)::pg_statistic AS a
+             ) x
+         ) stat ON sd.stxdexpr IS NOT NULL;

perhaps just

+         JOIN LATERAL (
+             SELECT
+                 *
+             FROM (
+                 SELECT
+
unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr,
+                     unnest(sd.stxdexpr)::pg_statistic AS a
+             ) x
+         ) stat ON true;

Regards,
Dean



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Key management with tests
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Key management with tests