Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
От | Alena Rybakina |
---|---|
Тема | Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs |
Дата | |
Msg-id | 866a4673-cf0d-c087-96a6-332e4da24ef8@yandex.ru обсуждение исходный текст |
Ответ на | Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
|
Список | pgsql-hackers |
> Well, one option would be to modify all selectivity functions to do > something like the patch does for nulltestsel(). That seems a bit > cumbersome because why should those places care about maybe running on > the outer side of a join, or what? For code in extensions this would be > particularly problematic, I think. Agree. I would say that we can try it if nothing else works out. > So what I was thinking about doing this in a way that'd make this > automatic, without having to modify the selectivity functions. > > Option (3) is very simple - examine_variable would simply adjust the > statistics by tweaking the null_frac field, when looking at variables on > the outer side of the join. But it has issues when estimating multiple > conditions. > > Imagine t1 has 1M rows, and we want to estimate > > SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id) > WHERE ((t2.a=1) AND (t2.b=1)) > > but only 50% of the t1 rows has a match in t2. Assume each of the t2 > conditions matches 100% rows in the table. With the correction, this > means 50% selectivity for each condition. And if we combine them the > usual way, it's 0.5 * 0.5 = 0.25. > > But we know all the rows in the "matching" part match the condition, so > the correct selectivity should be 0.5. > > In a way, this is just another case of estimation issues due to the > assumption of independence. > FWIW, I used "AND" in the example for simplicity, but that'd probably be > pushed to the baserel level. There'd need to be OR to keep it at the > join level, but the overall issue is the same, I think. > > Also, this entirely ignores extended statistics - I have no idea how we > might tweak those in (3). I understood the idea - it is very similar to what is implemented in the current patch. But I don't understand how to do it in the examine_variable function, to be honest. > But (4) was suggesting we could improve this essentially by treating the > join as two distinct sets of rows > > - the inner join result > > - rows without match on the outer side > > For the inner part, we would do estimates as now (using the regular > per-column statistics). If we knew the conditions match 100% rows, we'd > still get 100% when the conditions are combined. > > For the second part of the join we know the outer side is just NULLs in > all columns, and that'd make the estimation much simpler for most > clauses. We'd just need to have "fake" statistics with null_frac=1.0 and > that's it. > > And then we'd just combine these two selectivities. If we know the inner > side is 50% and all rows match the conditions, and no rows in the other > 50% match, the selectivity is 50%. > > inner_part * inner_sel + outer_part * outer_sel = 0.5 * 1.0 + 0.0 = 0.5 > > Now, we still have issues with independence assumption in each of these > parts separately. But that's OK, I think. > > I think (4) could be implemented by doing the current estimation for the > inner part, and by tweaking examine_variable in the "outer" part in a > way similar to (3). Except that it just sets null_frac=1.0 everywhere. > > For (4) we don't need to tweak those at all, > because for inner part we can just apply them as is, and for outer part > it's irrelevant because everything is NULL. I like this idea the most) I'll try to start with this and implement the patch. > I hope this makes more sense. If not, let me know and I'll try to > explain it better. Thank you for your explanation) I will unsubscribe soon based on the results or if I have any questions. -- Regards, Alena Rybakina Postgres Professional
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Kim Johan AnderssonДата:
Сообщение: Re: [PATCH] Add support function for containment operators
Следующее
От: Tomas VondraДата:
Сообщение: Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs