Re: PoC/WIP: Extended statistics on expressions
От | Tomas Vondra |
---|---|
Тема | Re: PoC/WIP: Extended statistics on expressions |
Дата | |
Msg-id | b2995773-c9a1-5d3b-fc90-4f3ab189be11@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: PoC/WIP: Extended statistics on expressions (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Ответы |
Re: PoC/WIP: Extended statistics on expressions
(Dean Rasheed <dean.a.rasheed@gmail.com>)
Re: PoC/WIP: Extended statistics on expressions (Justin Pryzby <pryzby@telsasoft.com>) |
Список | pgsql-hackers |
On 12/7/20 10:56 AM, Dean Rasheed wrote: > On Thu, 3 Dec 2020 at 15:23, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> >> Attached is a patch series rebased on top of 25a9e54d2d. > > After reading this thread and [1], I think I prefer the name > "standard" rather than "expressions", because it is meant to describe > the kind of statistics being built rather than what they apply to, but > maybe that name doesn't actually need to be exposed to the end user: > > Looking at the current behaviour, there are a couple of things that > seem a little odd, even though they are understandable. For example, > the fact that > > CREATE STATISTICS s (expressions) ON (expr), col FROM tbl; > > fails, but > > CREATE STATISTICS s (expressions, mcv) ON (expr), col FROM tbl; > > succeeds and creates both "expressions" and "mcv" statistics. Also, the syntax > > CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl; > > tends to suggest that it's going to create statistics on the pair of > expressions, describing their correlation, when actually it builds 2 > independent statistics. Also, this error text isn't entirely accurate: > > CREATE STATISTICS s ON col FROM tbl; > ERROR: extended statistics require at least 2 columns > > because extended statistics don't always require 2 columns, they can > also just have an expression, or multiple expressions and 0 or 1 > columns. > > I think a lot of this stems from treating "expressions" in the same > way as the other (multi-column) stats kinds, and it might actually be > neater to have separate documented syntaxes for single- and > multi-column statistics: > > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name > ON (expression) > FROM table_name > > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name > [ ( statistics_kind [, ... ] ) ] > ON { column_name | (expression) } , { column_name | (expression) } [, ...] > FROM table_name > > The first syntax would create single-column stats, and wouldn't accept > a statistics_kind argument, because there is only one kind of > single-column statistic. Maybe that might change in the future, but if > so, it's likely that the kinds of single-column stats will be > different from the kinds of multi-column stats. > > In the second syntax, the only accepted kinds would be the current > multi-column stats kinds (ndistinct, dependencies, and mcv), and it > would always build stats describing the correlations between the > columns listed. It would continue to build standard/expression stats > on any expressions in the list, but that's more of an implementation > detail. > > It would no longer be possible to do "CREATE STATISTICS s > (expressions) ON (expr1), (expr2) FROM tbl". Instead, you'd have to > issue 2 separate "CREATE STATISTICS" commands, but that seems more > logical, because they're independent stats. > > The parsing code might not change much, but some of the errors would > be different. For example, the errors "building only extended > expression statistics on simple columns not allowed" and "extended > expression statistics require at least one expression" would go away, > and the error "extended statistics require at least 2 columns" might > become more specific, depending on the stats kind. > I think it makes sense in general. I see two issues with this approach, though: * By adding expression/standard stats for individual statistics, it makes the list of statistics longer - I wonder if this might have measurable impact on lookups in this list. * I'm not sure it's a good idea that the second syntax would always build the per-expression stats. Firstly, it seems a bit strange that it behaves differently than the other kinds. Secondly, I wonder if there are cases where it'd be desirable to explicitly disable building these per-expression stats. For example, what if we have multiple extended statistics objects, overlapping on a couple expressions. It seems pointless to build the stats for all of them. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Amit LangoteДата:
Сообщение: Re: Huge memory consumption on partitioned table with FKs
Следующее
От: Alvaro HerreraДата:
Сообщение: Re: Huge memory consumption on partitioned table with FKs