Re: BUG #15160: planner overestimates number of rows in join whenthere are more than 200 rows coming from CTE

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: BUG #15160: planner overestimates number of rows in join whenthere are more than 200 rows coming from CTE
Дата
Msg-id CAAKRu_Y+d3SKkofNK7znhwgLauPUCVyZU=H0mvJH1LG7+tMyXg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Thanks for the quick responses. I've put some inline follow-up questions.

On a separate note, I had one additional code clarity feedback. I felt that
eqjoinsel could be reorganized a bit for readability/clarity for the reader.
For example, eqjoinsel_inner uses only the AttStatsSlots up until here and then
suddenly uses the original stats object and the ndvs which we passed in:

    else
    {
        ...
        double        nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;
        double        nullfrac2 = stats2 ? stats2->stanullfrac : 0.0;
 
        selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
        if (nd1 > nd2)
            selec /= nd1;
       else
            selec /= nd2;
    }
 
It would make the process of calculating selectivity for an equijoin more clear
to the reader if the nullfraction calculation was pulled out into the main
eqjoinsel function.
 
Having a clear set of steps in eqjoinsel would be helpful. Basically, my
understanding of an overview of the steps is the following:
 
    1) get NDVs
    2) get nullfrac
    3) get MCVs
    4) calculate selectivity
 
Based on this assumption, I've attached a patch with a rough idea for an
alternative structure that I think would be more clear to the reader.
 
> I could not devise an example in which the previous method of calculating
> selectivity would have produced a better estimate. However, one question I have
> after thinking through the optimization is the following:
> ...
> To summarize:
> Selectivity Type                  |  if nd1 <= nd2 |   if nd1 > nd2 |
> ----------------------------------|----------------|-----------------
> inner-join selectivity * ntuples2 | ntuples2 / nd2 | ntuples2 / nd1 |
> semi-join selectivity             |              1 |      nd2 / nd1 |

Um, mumble.  Those functions could be using different values of nd2
thanks to the clamping logic near the head of eqjoinsel_semi, so I'm
not sure that the comparison you're making really holds.

That's a good point. Taking another look at that clamping logic, I realized
that I don't really understand why that clamping would be done for a semi-join
and not for an inner join. It seems like for an inner join it is also true that
the the nd1 cannot be greater than outer rel estimated tuples and nd2 could not
be greater than inner rel estimated tuples.

Also, I don't understand when vardata2->rel->rows and inner_rel->rows would be
different. I thought the point of doing this clamping was that, if you have a
restriction, like the predicate in this subquery select * from foo where a in
(select b from bar where b > 10); your row estimate for bar and your row
estimate for the rows out for that subquery would be different. However, I
looked at the RelOptInfos for vardata2->rel and inner_rel for this query and it
seems like they are referencing the same relation and have the same rows
estimate, so I'm confused when the rows would be different.

> If there is a reason to keep the existing formula, then I have an additional
> question about the proposed selectivity calculation:
>     selec = Min(selec, nd2 * selec_inner);
> When would it be incorrect to instead multiply by inner side NDVs?

I'm confused ... isn't that exactly what this is doing?

Sorry, typo, I was asking why
selec = Min(selec, nd2 * selec_inner);
could not be used instead of what is in the patch
selec = Min(selec, inner_rel->rows * selec_inner);

Thanks,
Melanie
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Constraint documentation
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Problem while updating a foreign table pointing to a partitioned table on foreign server