Subselect an order of magnitude faster than nested joins

Поиск
Список
Период
Сортировка
От Eli Naeher
Тема Subselect an order of magnitude faster than nested joins
Дата
Msg-id CAJVWyAz5KJL1Zj00LAyrcC0B5Od0fD6Ar1cVVKEw3Z+s9g6-9A@mail.gmail.com
обсуждение исходный текст
Список pgsql-performance
Hello,

I have two versions of essentially the same query; one using nested joins, the other using subselects. The version using the subselect is roughly an order of magnitude faster (~70ms on my box and data vs ~900ms for the nested joins). Of course the obvious answer here is just to use the faster version, but I'd like to understand why the other version is so slow. These queries are automatically generated by our code and I'd like to feel more informed when deciding what style of query it should be generating (and to know whether there is a way to write the nested-join queries that will more closely approach the performance of the subselect).

(The table aliasing is an artifact of the code that is generating this query--I assume there is no big performance impact there, but perhaps that assumption is mistaken.)

The join version:

(SELECT DISTINCT resource_type_1.* 
 FROM   resource_type AS resource_type_1 
        LEFT JOIN group_authorization AS group_authorization_2 
                  INNER JOIN group_member AS group_member_4 
                          ON ( ( group_authorization_2.person_oid = 
                                 group_member_4.person_oid ) 
                               AND ( group_authorization_2.group_oid =
                             group_member_4.group_oid ) ) 
                  INNER JOIN wco_group AS group_5 
                          ON ( group_authorization_2.group_oid = 
                             group_5.obj_oid ) 
               ON ( resource_type_1.obj_oid = group_authorization_2.rtype_oid )
 WHERE  ( ( ( ( ( group_5.end_date IS NULL ) 
                 OR ( group_5.end_date >= '2014-03-03T18:08:23.543001Z' ) ) 
              AND ( ( group_member_4.expire IS NULL ) 
                     OR ( group_member_4.expire >= '2014-03-03T18:08:23.543001Z' 
                        ) ) 
              AND ( ( group_authorization_2.expire IS NULL ) 
                     OR ( group_authorization_2.expire >= 
                          '2014-03-03T18:08:23.543001Z' 
                        ) 
                  ) 
            ) 
            AND ( group_authorization_2.person_oid = 1 ) ) 
           OR ( resource_type_1.authorized = false ) ))

(explain (analyze, buffers) output is at http://explain.depesz.com/s/wPZL)

The subselect version:

(SELECT DISTINCT resource_type_1.* 
 FROM   resource_type AS resource_type_1 
 WHERE  ( ( resource_type_1.authorized = false ) 
           OR ( resource_type_1.obj_oid IN (SELECT rtype_oid 
                                            FROM   group_authorization
                                                   INNER JOIN group_member
                                                           ON ( ( 
                                                   group_member.group_oid
                                                   = 
                group_authorization.group_oid ) 
                AND ( group_member.person_oid = 
                    group_authorization.person_oid ) ) 
                INNER JOIN wco_group 
                ON ( group_member.group_oid = wco_group.obj_oid ) 
                WHERE  ( ( group_member.person_oid = 1 ) 
                AND ( ( group_authorization.expire > 
                        '2014-03-03T18:11:20.553844Z' ) 
                OR ( group_authorization.expire IS NULL ) ) 
                AND ( ( group_member.expire > '2014-03-03T18:11:20.553844Z' ) 
                OR ( group_member.expire IS NULL ) ) 
                AND ( ( wco_group.end_date > '2014-03-03T18:11:20.553844Z' ) 
                OR ( wco_group.end_date IS NULL ) ) )) ) )) 

(explain (analyze, buffers) output is at http://explain.depesz.com/s/70dd)

This is using Postgres 9.3.3. The table wco_group has ~5000 rows, group_member has ~15000 rows, and group_authorization is the big one with ~385000 rows. Relevant DDL information is here: http://paste.lisp.org/display/141466.

I noticed that the nested join version was doing a lot of seq scans and not using the indexes. I tried setting enable_seqscan to off to force index use, and it was a bit slower that way, so the query optimizer is definitely doing the right thing.

Any thoughts would be much appreciated.

Thank you,
-Eli

В списке pgsql-performance по дате отправления:

Предыдущее
От: Damon Snyder
Дата:
Сообщение: Re: Help with optimizing a query over hierarchical data
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: Help with optimizing a query over hierarchical data