Re: Removing unneeded self joins

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Removing unneeded self joins
Дата
Msg-id CANP8+jLFEQBKRz9q1c2yWK9so4K44Z24sQacoZrCew9NZhcf1A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Removing unneeded self joins  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Removing unneeded self joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 16 May 2018 at 11:26, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 16, 2018 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
>>> There is a join optimization we don't do -- removing inner join of a
>>> table with itself on a unique column. Such joins are generated by
>>> various ORMs, so from time to time our customers ask us to look into
>>> this. Most recently, it was discussed on the list in relation to an
>>> article comparing the optimizations that some DBMS make [1].
>>
>> This is the sort of thing that I always wonder why the customers don't
>> ask the ORM to stop generating such damfool queries.  Its *expensive*
>> for us to clean up after their stupidity; almost certainly, it would
>> take far fewer cycles, net, for them to be a bit smarter in the first
>> place.
>
> The trouble, of course, is that the customer didn't write the ORM,
> likely has no idea how it works, and doesn't want to run a modified
> version of it even if they do.  If the queries run faster on other
> systems than they do on PostgreSQL, we get dinged -- not unjustly.
>
> Also, I'm not sure that I believe that it's always easy to avoid
> generating such queries.  I mean, this case is trivial so it's easy to
> say, well, just rewrite the query.  But suppose that I have a fact
> table over which I've created two views, each of which performs
> various joins between the fact table and various lookup tables.  My
> queries are such that I normally need the joins in just one of these
> two views and not the other to fetch the information I care about.
> But every once in a while I need to run a report that involves pulling
> every column possible.  The obvious solution is to join the views on
> the underlying table's primary key, but then you get this problem.  Of
> course there's a workaround: define a third view that does both sets
> of joins-to-lookup-tables.  But that starts to feel like you're
> handholding the database; surely it's the database's job to optimize
> queries, not the user's.
>
> It's been about 10 years since I worked as a web developer, but I do
> remember hitting this kind of problem from time to time and I'd really
> like to see us do something about it.  I wish we could optimize away
> inner joins, too, for similar reasons.

I agree with everything you say.

What I would add is that I've seen cases where the extra joins do NOT
hurt performance, so the extra CPU used to remove the join hurts more
than the benefit of removing it. Yes, we tried it.

More advanced optimizations should only be applied when we've assessed
that the likely run time is high enough to make it worth investing in
further optimization.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Memory unit GUC range checks
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Removing unneeded self joins