Additional improvements to extended statistics
От | Tomas Vondra |
---|---|
Тема | Additional improvements to extended statistics |
Дата | |
Msg-id | 20200113230008.g67iyk4cs3xbnjju@development обсуждение исходный текст |
Ответы |
Re: Additional improvements to extended statistics
Re: Additional improvements to extended statistics |
Список | pgsql-hackers |
Hi, Now that I've committed [1] which allows us to use multiple extended statistics per table, I'd like to start a thread discussing a couple of additional improvements for extended statistics. I've considered starting a separate patch for each, but that would be messy as those changes will touch roughly the same places. So I've organized it into a single patch series, with the simpler parts at the beginning. There are three main improvements: 1) improve estimates of OR clauses Until now, OR clauses pretty much ignored extended statistics, based on the experience that they're less vulnerable to misestimates. But it's a bit weird that AND clauses are handled while OR clauses are not, so this extends the logic to OR clauses. Status: I think this is fairly OK. 2) support estimating clauses (Var op Var) Currently, we only support clauses with a single Var, i.e. clauses like - Var op Const - Var IS [NOT] NULL - [NOT] Var - ... and AND/OR clauses built from those simple ones. This patch adds support for clauses of the form (Var op Var), of course assuming both Vars come from the same relation. Status: This works, but it feels a bit hackish. Needs more work. 3) support extended statistics on expressions Currently we only allow simple references to columns in extended stats, so we can do CREATE STATISTICS s ON a, b, c FROM t; but not CREATE STATISTICS s ON (a+b), (c + 1) FROM t; This patch aims to allow this. At the moment it's a WIP - it does most of the catalog changes and stats building, but with some hacks/bugs. And it does not even try to use those statistics during estimation. The first question is how to extend the current pg_statistic_ext catalog to support expressions. I've been planning to do it the way we support expressions for indexes, i.e. have two catalog fields - one for keys, one for expressions. One difference is that for statistics we don't care about order of the keys, so that we don't need to bother with storing 0 keys in place for expressions - we can simply assume keys are first, then expressions. And this is what the patch does now. I'm however wondering whether to keep this split - why not to just treat everything as expressions, and be done with it? A key just represents a Var expression, after all. And it would massively simplify a lot of code that now has to care about both keys and expressions. Of course, expressions are a bit more expensive, but I wonder how noticeable that would be. Opinions? ragards [1] https://commitfest.postgresql.org/26/2320/ -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: