Re: Performance improvement for joins where outer side is unique

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Performance improvement for joins where outer side is unique
Дата
Msg-id CAApHDvoh-EKF51QQyNoJUe0eHYMZw6OzJjjgYP63Cmw7QfebjA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performance improvement for joins where outer side is unique  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Performance improvement for joins where outer side is unique  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
On 14 March 2015 at 14:51, David Rowley <dgrowleyml@gmail.com> wrote:
On 13 March 2015 at 20:34, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Unfortunately I can't decide this to be 'ready for commiter' for
now. I think we should have this on smaller footprint, in a
method without separate exhauxtive searching. I also had very
similar problem in the past but I haven't find such a way for my
problem..


I don't think it's ready yet either. I've just been going over a few things and looking at Tom's recent commit b557226 in pathnode.c I've got a feeling that this patch would need to re-factor some code that's been modified around the usage of relation_has_unique_index_for() as when this code is called, the semi joins have already been analysed to see if they're unique, so it could be just a case of ripping all of that out of create_unique_path() and just putting a check to say rel->is_unique_join in there. But if I do that then I'm not quite sure if SpecialJoinInfo->semi_rhs_exprs and SpecialJoinInfo->semi_operators would still be needed at all. These were only added in b557226. Changing this would help reduce the extra planning time when the query contains semi-joins. To be quite honest, this type of analysis belongs in analyzejoin.c anyway. I'm tempted to hack at this part some more, but I'd rather Tom had a quick glance at what I'm trying to do here first.
 

I decided to hack away any change the code Tom added in b557226. I've changed it so that create_unique_path() now simply just uses if (rel->is_unique_join), instead off all the calls to relation_has_unique_index_for() and query_is_distinct_for().  This vastly simplifies that code. One small change is that Tom's checks for uniqueness on semi joins included checks for volatile functions, this check didn't exist in the original join removal code, so I've left it out. We'll never match a expression with a volatile function to a unique index as indexes don't allow volatile function expressions anyway. So as I understand it this only serves as a fast path out if the join condition has a volatile function... But I'd assume that check is not all that cheap.

I ended up making query_supports_distinctness() and query_is_distinct_for() static in analyzejoins.c as they're not used in any other files. relation_has_unique_index_for() is also now only used in analyzejoins.c, but I've not moved it into that file yet as I don't want to bloat the patch. I just added a comment to say it needs moved. 

I've also added a small amount of code to query_is_distinct_for() which allows subqueries such as (select 1 a offset 0) to be marked as unique. I thought it was a little silly that these were not being detected as unique, so I fixed it. This has the side effect of allowing left join removals for queries such as: select t1.* from t1 left join (select 1 a offset 0) a on t1.id=a.a;

Updated patch attached.

Regards

David Rowley
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Performance improvement for joins where outer side is unique
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: pg_rewind in contrib