Re: PATCH: use foreign keys to improve join estimates v1

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: PATCH: use foreign keys to improve join estimates v1
Дата
Msg-id CAKJS1f-Ad5Fs9Fsay_=LZT_Ep81r+qU3pVLcecz+a1+BJMVp0g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: use foreign keys to improve join estimates v1  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: PATCH: use foreign keys to improve join estimates v1  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers



On 26 September 2015 at 01:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

On 09/25/2015 03:39 AM, David Rowley wrote:
I looked at this again, and I'm not all that sure it's possible to
assume that 1.0 / <tuples> is valid when there's more than one
relation at either side of the join.
>
My reasoning for this is that the whole basis for the patch is that a
if we find a foreign key match, then we can be sure enough, as far as
row estimations go, that exactly 1 tuple will exist matching that
condition. This assumption of the 1 tuple no longer holds when 2
relations have already been joined, as this can either cause tuple
duplication, or elimination.

I don't see why that would be the case. Of course, you need to take the right <tuples>, i.e. the "target" of the foreign key (the table with UNIQUE constraint) so that the selectivity matches the fact that exactly 1 tuple (on the PK side) matches.

hmm, ok. You're right. It appears I was a bit confused, but thanks for explaining it again. I get it now.

I've been working on this again. I've put back the code that you wrote for the looping over each combination of relations from either side of the join.

I've also added some code to get around the problem with eclass joins and the RestrictInfo having some alternative Vars that don't belong to the foreign key. Basically I'm just checking if the RestrictInfo has a parent_ec, and if it does just loop over the members to try and find the Vars that belong to the foreign key. I've tested it with the following, and it seems to work:

create table a as select i as a_id1, i as a_id2, i as dummy1 from generate_series(0,999) s(i);
alter table a add unique (a_id1, a_id2);
create table b as select i as b_id1, i as b_id2 from generate_series(0,332) s(i);

analyze a;
analyze b;

alter table b add foreign key (b_id1, b_id2) references a (a_id1, a_id2);

explain analyze select * from a inner join b on a.dummy1 = b.b_id1 and a.a_id2 = b.b_id2 where a.a_id1 = a.dummy1;

                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Hash Join  (cost=18.57..26.41 rows=2 width=20) (actual time=0.775..1.046 rows=333 loops=1)
   Hash Cond: ((b.b_id1 = a.dummy1) AND (b.b_id2 = a.a_id2))
   ->  Seq Scan on b  (cost=0.00..5.33 rows=333 width=8) (actual time=0.013..0.046 rows=333 loops=1)
   ->  Hash  (cost=18.50..18.50 rows=5 width=12) (actual time=0.737..0.737 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 51kB
         ->  Seq Scan on a  (cost=0.00..18.50 rows=5 width=12) (actual time=0.014..0.389 rows=1000 loops=1)
               Filter: (dummy1 = a_id1)

The non-patched version estimates 1 row. The patched estimates 2 rows, but that's due to the bad estimate on dummy1 = a_id1.

The 2 comes from ceil(5 * 0.333).

Perhaps you have a better test case to for this?

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Filip Rembiałkowski
Дата:
Сообщение: pg_dump LOCK TABLE ONLY question
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_dump LOCK TABLE ONLY question