Re: [PATCH] Equivalence Class Filters

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [PATCH] Equivalence Class Filters
Дата
Msg-id CAKJS1f9fPdLKM6=SUZAGwucH3otbsPk6k0YT8-A1HgjFapL-zQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Equivalence Class Filters  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [PATCH] Equivalence Class Filters  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 6 December 2015 at 06:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
> As of today these Equivalence Classes only incorporate expressions which
> use the equality operator, but what if the above query had been written as:

> select * from t1 inner join t2 on t1.id = t2.id where t1.id <= 10;

> Should we not be able to assume that t2.id is also <= 10?
 
This sort of thing has been suggested multiple times before, and I've
rejected it every time on the grounds that it would likely add far more
planner cycles than it'd be worth, eg, time would be spent on searches for
matching subexpressions whether or not anything was learned (and often
nothing would be learned).

I guess if it keeps coming up then people must be hitting this limitation. Perhaps continually rejecting it based on your original opinion is not the best way to move forward with improving the query planner.
 
  While I've only read your description of the
patch not the patch itself, the search methodology you propose seems
pretty brute-force and unlikely to solve that issue.  It's particularly
important to avoid O(N^2) behaviors when there are N expressions ...



Likely it would be possible to do something to improve on that.
Perhaps if the list of filter clauses grows beyond a certain number then a hash table can be built on the ec_members list with the em_expr as the key and the EquivalenceClass as the data. Although likely we don't currently have anything available for generating hash values for Expr nodes. On checking it seems that 32 is the estimated tipping point for hashing in find_join_rel(), perhaps something similar would suit this.
 
Another issue that would need consideration is how to avoid skewing
planner selectivity estimates with derived clauses that are fully
redundant with other clauses.  The existing EC machinery is mostly
able to dodge that problem by generating just a minimal set of equality
clauses from an EC, but I don't see how we'd make that work here.


Could you give me an example of where this is a problem? I've tried fooling the planner into giving me bad estimates, but I've not managed to.

# explain analyze select * from t1 where t1.id < 10 and t1.id < 100 and t1.id < 1000;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Index Scan using t1_pkey on t1  (cost=0.29..8.49 rows=9 width=8) (actual time=0.155..0.160 rows=9 loops=1)
   Index Cond: ((id < 10) AND (id < 100) AND (id < 1000))

I'd say the t1.id < 100 AND t1.id < 1000 are fully redundant here. The estimate given by the planner is bang-on.

Perhaps you're talking about another column making the pushed down qual redundant?  if so, is the current eclass code not prone to the exact same problem?

I'm also wondering why you want to limit it to comparisons to constants;
that seems rather arbitrary.


Do you have other useful cases in mind? 
The other one I considered was "expr op expr" clauses, where both those exprs were in eclasses. For example:

select * from t1 inner join t2 on t1.col1=t2.col1 and t1.col2=t2.col2 where t1.col1 < t1.col2;

I'd imagine that would have a narrower use case. I simply stopped with "Expr op Const" because I though that would be more common and less planner overhead to detect. Giving that below you also raise concerns that "expr op const" is likely not that useful in enough cases, then I'm not holding up too much hope of adding more complexity to the detection mechanism for less common wins.
 
Lastly, in most cases knowing that t2.id <= 10 is just not worth all
that much; it's certainly far less useful than an equality condition.
It's not difficult to imagine that this would often be a net drag on
runtime performance (never mind planner performance) by doing nothing
except creating additional filter conditions the executor has to check.
Certainly it would be valuable to know this if it let us exclude some
partition of a table, but that's only going to happen in a small minority
of queries.


I'd have thought that my link to a thread of a reported 1100 to 2200 times performance improvement might have made you think twice about claiming the uselessness of this idea. 

I'm also quite surprised at you mentioning partitions here. Personally I'd be thinking of indexes long before thinking of partitions. I don't think I need to remind you that btree indexes handle >= and <= just fine. It's not all that hard to imagine that if they're using that column for a join condition and are serious about performance that they just might also have an index defined on that column too. I'd imagine the only case we might have to worry about is when the selectivity of the pushed qual is getting close to 1. Perhaps the RestrictInfos could be marked as "optional" somehow, and we could simply remove them when their selectivity estimates are too high.

I'm not necessarily opposed to doing anything in this area, but so far
I've not seen how to do it in a way that is attractive when planner
complexity, performance, and end results are all considered.

Glad to hear it! Otherwise it would be a real shame to miss out on such great wins during execution time.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chapman Flack
Дата:
Сообщение: a word-choice question
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: proposal: multiple psql option -c