Обсуждение: Help needed on optimizing query

Поиск
Список
Период
Сортировка

Help needed on optimizing query

От
Mark Steben
Дата:
Good morning,
We run postgres 9.4.21.  Last week we moved our database from an expedient datacenter to an azure vm.  
I submit this query for your consideration:

 SELECT DISTINCT ON (permissions.id)
        permissions.section,
        permissions.subject,
        permissions.action
       FROM
         permissions
         LEFT JOIN permissions_roles ON permissions.id = permissions_roles.permission_id
         LEFT JOIN roles_users ON permissions_roles.role_id = roles_users.role_id AND roles_users.user_id = 13949
         LEFT JOIN accounts_users_permissions ON permissions.id = accounts_users_permissions.permission_id
         LEFT JOIN accounts_users ON accounts_users_permissions.accounts_user_id = accounts_users.id
             AND accounts_users.user_id = 13949
             AND accounts_users.account_id = 0
         LEFT JOIN access_keys_permissions ON permissions.id = access_keys_permissions.permission_id
         LEFT JOIN access_keys_users ON access_keys_permissions.access_key_id = access_keys_users.access_key_id AND access_keys_users.user_id = 13949
       WHERE
        accounts_users.deleted_at IS NULL AND
         13949 IS NOT NULL AND
         (
           roles_users.id IS NOT NULL OR
           access_keys_users.id IS NOT NULL OR
           accounts_users.id IS NOT NULL
          )

Here is the explain analyze:

 Unique  (cost=1.84..3153.19 rows=322 width=29) (actual time=1.289..996.395 rows=76 loops=1)
   ->  Merge Left Join  (cost=1.84..2856.84 rows=118541 width=29) (actual time=1.288..985.494 rows=183907 loops=1)
         Merge Cond: (permissions.id = permissions_roles.permission_id)
         Filter: ((roles_users.id IS NOT NULL) OR (access_keys_users.id IS NOT NULL) OR (accounts_users.id IS NOT NULL))
         Rows Removed by Filter: 5613165
         ->  Merge Left Join  (cost=1.27..418.98 rows=7353 width=37) (actual time=0.044..45.830 rows=185714 loops=1)
               Merge Cond: (permissions.id = access_keys_permissions.permission_id)
               ->  Nested Loop Left Join  (cost=0.72..214.69 rows=1422 width=33) (actual time=0.020..2.614 rows=3223 loops=1)
                     Join Filter: (accounts_users_permissions.accounts_user_id = accounts_users.id)
                     Filter: (accounts_users.deleted_at IS NULL)
                     ->  Merge Left Join  (cost=0.43..162.80 rows=3172 width=33) (actual time=0.013..1.844 rows=3223 loops=1)
                           Merge Cond: (permissions.id = accounts_users_permissions.permission_id)
                           ->  Index Scan using permissions_pkey on permissions  (cost=0.15..14.01 rows=322 width=29) (actual time=0.007..0.099 rows=322 loops=1)
                           ->  Index Scan using accounts_users_permissions_permission_id on accounts_users_permissions  (cost=0.28..108.33 rows=3172 width=8) (actual time=0.004..1.140 rows=3172 loops=1)
                     ->  Materialize  (cost=0.29..4.31 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=3223)
                           ->  Index Scan using index_stores_users_on_store_id_and_user_id on accounts_users  (cost=0.29..4.31 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1)
                                 Index Cond: ((account_id = 0) AND (user_id = 13949))
               ->  Materialize  (cost=0.55..94.61 rows=1665 width=8) (actual time=0.022..9.459 rows=182891 loops=1)
                     ->  Nested Loop Left Join  (cost=0.55..90.44 rows=1665 width=8) (actual time=0.020..0.818 rows=1665 loops=1)
                           Join Filter: (access_keys_permissions.access_key_id = access_keys_users.access_key_id)
                           ->  Index Scan using index_access_keys_permissions_on_permission_id on access_keys_permissions  (cost=0.28..61.18 rows=1665 width=8) (actual time=0.011..0.455 rows=1665 loops=1)
                           ->  Materialize  (cost=0.28..4.29 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1665)
                                 ->  Index Scan using index_access_keys_users_on_user_id on access_keys_users  (cost=0.28..4.29 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
                                       Index Cond: (user_id = 13949)
         ->  Materialize  (cost=0.57..654.71 rows=5190 width=8) (actual time=0.738..269.930 rows=5797068 loops=1)
               ->  Nested Loop Left Join  (cost=0.57..641.73 rows=5190 width=8) (actual time=0.736..6.183 rows=5190 loops=1)
                     Join Filter: (permissions_roles.role_id = roles_users.role_id)
                     Rows Removed by Join Filter: 20636
                     ->  Index Scan using permissions_roles_permission_id on permissions_roles  (cost=0.28..150.89 rows=5190 width=8) (actual time=0.005..1.669 rows=5190 loops=1)
                     ->  Materialize  (cost=0.29..335.15 rows=2 width=8) (actual time=0.000..0.000 rows=4 loops=5190)
                           ->  Index Scan using index_roles_users_on_role_id_and_user_id on roles_users  (cost=0.29..335.14 rows=2 width=8) (actual time=0.179..0.724 rows=4 loops=1)
                                 Index Cond: (user_id = 13949)
 Planning time: 1.782 ms
 Execution time: 996.545 ms
(34 rows)

(I hope the boldened lines show up as boldened in your email reception to make this easier.)

This query runs thousands of times a day, specifically many hundreds of times between 3 am and 6 am causing hi stress load and consequently dropped connections.

My specific question to you is: I notice in the boldened lines the discrepancy between the 'estimated row count' and 'actual row count' Most blatant is one Materialize where estimated rowcount as calculated in the explain is 5190 and actual rowcount is 5,797.068.  How do I fix this so that the estimated is closer to the actual.  I run a vacuum and an analyze on all tables nightly.

I have attached a file containing table/index descriptions and our postgresql file for your reference.
Thank you for your help!  This is quite urgent.
(One note - this morning to try to fix this we rebooted and increased the shared_buffers from 8 GB to 14 GB.
 Our hope is that we find the real solution for this optimization and
 revert back to 8 GB.)





--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





Вложения

Re: Help needed on optimizing query

От
Ron
Дата:
Does the Azure VM have the same IOPS rating as the Expedient DC? (That would immediately explain the slowdown, as would the fact that Azure has been heavily strained by new services spun up during the COVID-19 crisis.)

On 4/13/20 9:07 AM, Mark Steben wrote:
Good morning,
We run postgres 9.4.21.  Last week we moved our database from an expedient datacenter to an azure vm.  
I submit this query for your consideration:

 SELECT DISTINCT ON (permissions.id)
        permissions.section,
        permissions.subject,
        permissions.action
       FROM
         permissions
         LEFT JOIN permissions_roles ON permissions.id = permissions_roles.permission_id
         LEFT JOIN roles_users ON permissions_roles.role_id = roles_users.role_id AND roles_users.user_id = 13949
         LEFT JOIN accounts_users_permissions ON permissions.id = accounts_users_permissions.permission_id
         LEFT JOIN accounts_users ON accounts_users_permissions.accounts_user_id = accounts_users.id
             AND accounts_users.user_id = 13949
             AND accounts_users.account_id = 0
         LEFT JOIN access_keys_permissions ON permissions.id = access_keys_permissions.permission_id
         LEFT JOIN access_keys_users ON access_keys_permissions.access_key_id = access_keys_users.access_key_id AND access_keys_users.user_id = 13949
       WHERE
        accounts_users.deleted_at IS NULL AND
         13949 IS NOT NULL AND
         (
           roles_users.id IS NOT NULL OR
           access_keys_users.id IS NOT NULL OR
           accounts_users.id IS NOT NULL
          )

Here is the explain analyze:

 Unique  (cost=1.84..3153.19 rows=322 width=29) (actual time=1.289..996.395 rows=76 loops=1)
   ->  Merge Left Join  (cost=1.84..2856.84 rows=118541 width=29) (actual time=1.288..985.494 rows=183907 loops=1)
         Merge Cond: (permissions.id = permissions_roles.permission_id)
         Filter: ((roles_users.id IS NOT NULL) OR (access_keys_users.id IS NOT NULL) OR (accounts_users.id IS NOT NULL))
         Rows Removed by Filter: 5613165
         ->  Merge Left Join  (cost=1.27..418.98 rows=7353 width=37) (actual time=0.044..45.830 rows=185714 loops=1)
               Merge Cond: (permissions.id = access_keys_permissions.permission_id)
               ->  Nested Loop Left Join  (cost=0.72..214.69 rows=1422 width=33) (actual time=0.020..2.614 rows=3223 loops=1)
                     Join Filter: (accounts_users_permissions.accounts_user_id = accounts_users.id)
                     Filter: (accounts_users.deleted_at IS NULL)
                     ->  Merge Left Join  (cost=0.43..162.80 rows=3172 width=33) (actual time=0.013..1.844 rows=3223 loops=1)
                           Merge Cond: (permissions.id = accounts_users_permissions.permission_id)
                           ->  Index Scan using permissions_pkey on permissions  (cost=0.15..14.01 rows=322 width=29) (actual time=0.007..0.099 rows=322 loops=1)
                           ->  Index Scan using accounts_users_permissions_permission_id on accounts_users_permissions  (cost=0.28..108.33 rows=3172 width=8) (actual time=0.004..1.140 rows=3172 loops=1)
                     ->  Materialize  (cost=0.29..4.31 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=3223)
                           ->  Index Scan using index_stores_users_on_store_id_and_user_id on accounts_users  (cost=0.29..4.31 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1)
                                 Index Cond: ((account_id = 0) AND (user_id = 13949))
               ->  Materialize  (cost=0.55..94.61 rows=1665 width=8) (actual time=0.022..9.459 rows=182891 loops=1)
                     ->  Nested Loop Left Join  (cost=0.55..90.44 rows=1665 width=8) (actual time=0.020..0.818 rows=1665 loops=1)
                           Join Filter: (access_keys_permissions.access_key_id = access_keys_users.access_key_id)
                           ->  Index Scan using index_access_keys_permissions_on_permission_id on access_keys_permissions  (cost=0.28..61.18 rows=1665 width=8) (actual time=0.011..0.455 rows=1665 loops=1)
                           ->  Materialize  (cost=0.28..4.29 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1665)
                                 ->  Index Scan using index_access_keys_users_on_user_id on access_keys_users  (cost=0.28..4.29 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
                                       Index Cond: (user_id = 13949)
         ->  Materialize  (cost=0.57..654.71 rows=5190 width=8) (actual time=0.738..269.930 rows=5797068 loops=1)
               ->  Nested Loop Left Join  (cost=0.57..641.73 rows=5190 width=8) (actual time=0.736..6.183 rows=5190 loops=1)
                     Join Filter: (permissions_roles.role_id = roles_users.role_id)
                     Rows Removed by Join Filter: 20636
                     ->  Index Scan using permissions_roles_permission_id on permissions_roles  (cost=0.28..150.89 rows=5190 width=8) (actual time=0.005..1.669 rows=5190 loops=1)
                     ->  Materialize  (cost=0.29..335.15 rows=2 width=8) (actual time=0.000..0.000 rows=4 loops=5190)
                           ->  Index Scan using index_roles_users_on_role_id_and_user_id on roles_users  (cost=0.29..335.14 rows=2 width=8) (actual time=0.179..0.724 rows=4 loops=1)
                                 Index Cond: (user_id = 13949)
 Planning time: 1.782 ms
 Execution time: 996.545 ms
(34 rows)

(I hope the boldened lines show up as boldened in your email reception to make this easier.)

This query runs thousands of times a day, specifically many hundreds of times between 3 am and 6 am causing hi stress load and consequently dropped connections.

My specific question to you is: I notice in the boldened lines the discrepancy between the 'estimated row count' and 'actual row count' Most blatant is one Materialize where estimated rowcount as calculated in the explain is 5190 and actual rowcount is 5,797.068.  How do I fix this so that the estimated is closer to the actual.  I run a vacuum and an analyze on all tables nightly.

I have attached a file containing table/index descriptions and our postgresql file for your reference.
Thank you for your help!  This is quite urgent.
(One note - this morning to try to fix this we rebooted and increased the shared_buffers from 8 GB to 14 GB.
 Our hope is that we find the real solution for this optimization and
 revert back to 8 GB.)





--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com






--
Angular momentum makes the world go 'round.

Re: Help needed on optimizing query

От
Mark Steben
Дата:
Ron, the IOPS is very similar now with the size increase to 64GB memory. We don't wish to incur the cost of the extra memory if possible.
That is why I would like to tune the individual query that is the primary cause of this problem first before we decide to incur the cost permanently.

On Mon, Apr 13, 2020 at 10:24 AM Ron <ronljohnsonjr@gmail.com> wrote:
Does the Azure VM have the same IOPS rating as the Expedient DC? (That would immediately explain the slowdown, as would the fact that Azure has been heavily strained by new services spun up during the COVID-19 crisis.)

On 4/13/20 9:07 AM, Mark Steben wrote:
Good morning,
We run postgres 9.4.21.  Last week we moved our database from an expedient datacenter to an azure vm.  
I submit this query for your consideration:

 SELECT DISTINCT ON (permissions.id)
        permissions.section,
        permissions.subject,
        permissions.action
       FROM
         permissions
         LEFT JOIN permissions_roles ON permissions.id = permissions_roles.permission_id
         LEFT JOIN roles_users ON permissions_roles.role_id = roles_users.role_id AND roles_users.user_id = 13949
         LEFT JOIN accounts_users_permissions ON permissions.id = accounts_users_permissions.permission_id
         LEFT JOIN accounts_users ON accounts_users_permissions.accounts_user_id = accounts_users.id
             AND accounts_users.user_id = 13949
             AND accounts_users.account_id = 0
         LEFT JOIN access_keys_permissions ON permissions.id = access_keys_permissions.permission_id
         LEFT JOIN access_keys_users ON access_keys_permissions.access_key_id = access_keys_users.access_key_id AND access_keys_users.user_id = 13949
       WHERE
        accounts_users.deleted_at IS NULL AND
         13949 IS NOT NULL AND
         (
           roles_users.id IS NOT NULL OR
           access_keys_users.id IS NOT NULL OR
           accounts_users.id IS NOT NULL
          )

Here is the explain analyze:

 Unique  (cost=1.84..3153.19 rows=322 width=29) (actual time=1.289..996.395 rows=76 loops=1)
   ->  Merge Left Join  (cost=1.84..2856.84 rows=118541 width=29) (actual time=1.288..985.494 rows=183907 loops=1)
         Merge Cond: (permissions.id = permissions_roles.permission_id)
         Filter: ((roles_users.id IS NOT NULL) OR (access_keys_users.id IS NOT NULL) OR (accounts_users.id IS NOT NULL))
         Rows Removed by Filter: 5613165
         ->  Merge Left Join  (cost=1.27..418.98 rows=7353 width=37) (actual time=0.044..45.830 rows=185714 loops=1)
               Merge Cond: (permissions.id = access_keys_permissions.permission_id)
               ->  Nested Loop Left Join  (cost=0.72..214.69 rows=1422 width=33) (actual time=0.020..2.614 rows=3223 loops=1)
                     Join Filter: (accounts_users_permissions.accounts_user_id = accounts_users.id)
                     Filter: (accounts_users.deleted_at IS NULL)
                     ->  Merge Left Join  (cost=0.43..162.80 rows=3172 width=33) (actual time=0.013..1.844 rows=3223 loops=1)
                           Merge Cond: (permissions.id = accounts_users_permissions.permission_id)
                           ->  Index Scan using permissions_pkey on permissions  (cost=0.15..14.01 rows=322 width=29) (actual time=0.007..0.099 rows=322 loops=1)
                           ->  Index Scan using accounts_users_permissions_permission_id on accounts_users_permissions  (cost=0.28..108.33 rows=3172 width=8) (actual time=0.004..1.140 rows=3172 loops=1)
                     ->  Materialize  (cost=0.29..4.31 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=3223)
                           ->  Index Scan using index_stores_users_on_store_id_and_user_id on accounts_users  (cost=0.29..4.31 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1)
                                 Index Cond: ((account_id = 0) AND (user_id = 13949))
               ->  Materialize  (cost=0.55..94.61 rows=1665 width=8) (actual time=0.022..9.459 rows=182891 loops=1)
                     ->  Nested Loop Left Join  (cost=0.55..90.44 rows=1665 width=8) (actual time=0.020..0.818 rows=1665 loops=1)
                           Join Filter: (access_keys_permissions.access_key_id = access_keys_users.access_key_id)
                           ->  Index Scan using index_access_keys_permissions_on_permission_id on access_keys_permissions  (cost=0.28..61.18 rows=1665 width=8) (actual time=0.011..0.455 rows=1665 loops=1)
                           ->  Materialize  (cost=0.28..4.29 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1665)
                                 ->  Index Scan using index_access_keys_users_on_user_id on access_keys_users  (cost=0.28..4.29 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
                                       Index Cond: (user_id = 13949)
         ->  Materialize  (cost=0.57..654.71 rows=5190 width=8) (actual time=0.738..269.930 rows=5797068 loops=1)
               ->  Nested Loop Left Join  (cost=0.57..641.73 rows=5190 width=8) (actual time=0.736..6.183 rows=5190 loops=1)
                     Join Filter: (permissions_roles.role_id = roles_users.role_id)
                     Rows Removed by Join Filter: 20636
                     ->  Index Scan using permissions_roles_permission_id on permissions_roles  (cost=0.28..150.89 rows=5190 width=8) (actual time=0.005..1.669 rows=5190 loops=1)
                     ->  Materialize  (cost=0.29..335.15 rows=2 width=8) (actual time=0.000..0.000 rows=4 loops=5190)
                           ->  Index Scan using index_roles_users_on_role_id_and_user_id on roles_users  (cost=0.29..335.14 rows=2 width=8) (actual time=0.179..0.724 rows=4 loops=1)
                                 Index Cond: (user_id = 13949)
 Planning time: 1.782 ms
 Execution time: 996.545 ms
(34 rows)

(I hope the boldened lines show up as boldened in your email reception to make this easier.)

This query runs thousands of times a day, specifically many hundreds of times between 3 am and 6 am causing hi stress load and consequently dropped connections.

My specific question to you is: I notice in the boldened lines the discrepancy between the 'estimated row count' and 'actual row count' Most blatant is one Materialize where estimated rowcount as calculated in the explain is 5190 and actual rowcount is 5,797.068.  How do I fix this so that the estimated is closer to the actual.  I run a vacuum and an analyze on all tables nightly.

I have attached a file containing table/index descriptions and our postgresql file for your reference.
Thank you for your help!  This is quite urgent.
(One note - this morning to try to fix this we rebooted and increased the shared_buffers from 8 GB to 14 GB.
 Our hope is that we find the real solution for this optimization and
 revert back to 8 GB.)





--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com






--
Angular momentum makes the world go 'round.


--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





Re: Help needed on optimizing query

От
Ron
Дата:
Given that There Is No Free Lunch, and therefore fewer IOPS means -- by definition -- slower queries in cases like this, are you saying that the query was not tuned properly when running in the Expedient DC?

(Also, since the query is has "13949 IS NOT NULL" in it, can we presume that the query is really parameterized, and you just put in some numbers to make it work from psql?)

On 4/13/20 9:48 AM, Mark Steben wrote:
Ron, the IOPS is very similar now with the size increase to 64GB memory. We don't wish to incur the cost of the extra memory if possible.
That is why I would like to tune the individual query that is the primary cause of this problem first before we decide to incur the cost permanently.

On Mon, Apr 13, 2020 at 10:24 AM Ron <ronljohnsonjr@gmail.com> wrote:
Does the Azure VM have the same IOPS rating as the Expedient DC? (That would immediately explain the slowdown, as would the fact that Azure has been heavily strained by new services spun up during the COVID-19 crisis.)

On 4/13/20 9:07 AM, Mark Steben wrote:
Good morning,
We run postgres 9.4.21.  Last week we moved our database from an expedient datacenter to an azure vm.  
I submit this query for your consideration:

 SELECT DISTINCT ON (permissions.id)
        permissions.section,
        permissions.subject,
        permissions.action
       FROM
         permissions
         LEFT JOIN permissions_roles ON permissions.id = permissions_roles.permission_id
         LEFT JOIN roles_users ON permissions_roles.role_id = roles_users.role_id AND roles_users.user_id = 13949
         LEFT JOIN accounts_users_permissions ON permissions.id = accounts_users_permissions.permission_id
         LEFT JOIN accounts_users ON accounts_users_permissions.accounts_user_id = accounts_users.id
             AND accounts_users.user_id = 13949
             AND accounts_users.account_id = 0
         LEFT JOIN access_keys_permissions ON permissions.id = access_keys_permissions.permission_id
         LEFT JOIN access_keys_users ON access_keys_permissions.access_key_id = access_keys_users.access_key_id AND access_keys_users.user_id = 13949
       WHERE
        accounts_users.deleted_at IS NULL AND
         13949 IS NOT NULL AND
         (
           roles_users.id IS NOT NULL OR
           access_keys_users.id IS NOT NULL OR
           accounts_users.id IS NOT NULL
          )

Here is the explain analyze:

 Unique  (cost=1.84..3153.19 rows=322 width=29) (actual time=1.289..996.395 rows=76 loops=1)
   ->  Merge Left Join  (cost=1.84..2856.84 rows=118541 width=29) (actual time=1.288..985.494 rows=183907 loops=1)
         Merge Cond: (permissions.id = permissions_roles.permission_id)
         Filter: ((roles_users.id IS NOT NULL) OR (access_keys_users.id IS NOT NULL) OR (accounts_users.id IS NOT NULL))
         Rows Removed by Filter: 5613165
         ->  Merge Left Join  (cost=1.27..418.98 rows=7353 width=37) (actual time=0.044..45.830 rows=185714 loops=1)
               Merge Cond: (permissions.id = access_keys_permissions.permission_id)
               ->  Nested Loop Left Join  (cost=0.72..214.69 rows=1422 width=33) (actual time=0.020..2.614 rows=3223 loops=1)
                     Join Filter: (accounts_users_permissions.accounts_user_id = accounts_users.id)
                     Filter: (accounts_users.deleted_at IS NULL)
                     ->  Merge Left Join  (cost=0.43..162.80 rows=3172 width=33) (actual time=0.013..1.844 rows=3223 loops=1)
                           Merge Cond: (permissions.id = accounts_users_permissions.permission_id)
                           ->  Index Scan using permissions_pkey on permissions  (cost=0.15..14.01 rows=322 width=29) (actual time=0.007..0.099 rows=322 loops=1)
                           ->  Index Scan using accounts_users_permissions_permission_id on accounts_users_permissions  (cost=0.28..108.33 rows=3172 width=8) (actual time=0.004..1.140 rows=3172 loops=1)
                     ->  Materialize  (cost=0.29..4.31 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=3223)
                           ->  Index Scan using index_stores_users_on_store_id_and_user_id on accounts_users  (cost=0.29..4.31 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1)
                                 Index Cond: ((account_id = 0) AND (user_id = 13949))
               ->  Materialize  (cost=0.55..94.61 rows=1665 width=8) (actual time=0.022..9.459 rows=182891 loops=1)
                     ->  Nested Loop Left Join  (cost=0.55..90.44 rows=1665 width=8) (actual time=0.020..0.818 rows=1665 loops=1)
                           Join Filter: (access_keys_permissions.access_key_id = access_keys_users.access_key_id)
                           ->  Index Scan using index_access_keys_permissions_on_permission_id on access_keys_permissions  (cost=0.28..61.18 rows=1665 width=8) (actual time=0.011..0.455 rows=1665 loops=1)
                           ->  Materialize  (cost=0.28..4.29 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1665)
                                 ->  Index Scan using index_access_keys_users_on_user_id on access_keys_users  (cost=0.28..4.29 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
                                       Index Cond: (user_id = 13949)
         ->  Materialize  (cost=0.57..654.71 rows=5190 width=8) (actual time=0.738..269.930 rows=5797068 loops=1)
               ->  Nested Loop Left Join  (cost=0.57..641.73 rows=5190 width=8) (actual time=0.736..6.183 rows=5190 loops=1)
                     Join Filter: (permissions_roles.role_id = roles_users.role_id)
                     Rows Removed by Join Filter: 20636
                     ->  Index Scan using permissions_roles_permission_id on permissions_roles  (cost=0.28..150.89 rows=5190 width=8) (actual time=0.005..1.669 rows=5190 loops=1)
                     ->  Materialize  (cost=0.29..335.15 rows=2 width=8) (actual time=0.000..0.000 rows=4 loops=5190)
                           ->  Index Scan using index_roles_users_on_role_id_and_user_id on roles_users  (cost=0.29..335.14 rows=2 width=8) (actual time=0.179..0.724 rows=4 loops=1)
                                 Index Cond: (user_id = 13949)
 Planning time: 1.782 ms
 Execution time: 996.545 ms
(34 rows)

(I hope the boldened lines show up as boldened in your email reception to make this easier.)

This query runs thousands of times a day, specifically many hundreds of times between 3 am and 6 am causing hi stress load and consequently dropped connections.

My specific question to you is: I notice in the boldened lines the discrepancy between the 'estimated row count' and 'actual row count' Most blatant is one Materialize where estimated rowcount as calculated in the explain is 5190 and actual rowcount is 5,797.068.  How do I fix this so that the estimated is closer to the actual.  I run a vacuum and an analyze on all tables nightly.

I have attached a file containing table/index descriptions and our postgresql file for your reference.
Thank you for your help!  This is quite urgent.
(One note - this morning to try to fix this we rebooted and increased the shared_buffers from 8 GB to 14 GB.
 Our hope is that we find the real solution for this optimization and
 revert back to 8 GB.)

--
Angular momentum makes the world go 'round.

Re: Help needed on optimizing query

От
Erik Serrano
Дата:
Dear ,

The first thing I would do is recreate the indexes of the table (s). Since I see very high times when searching
Take that action and first and tell me how it went.

goodbye



Erik R. Serrano Saavedra
Ingeniero de Sistemas Informáticos
      Data Base Administrator
        eserranos@gmail.com
                 


El lun., 13 abr. 2020 a las 10:08, Mark Steben (<mark.steben@drivedominion.com>) escribió:
Good morning,
We run postgres 9.4.21.  Last week we moved our database from an expedient datacenter to an azure vm.  
I submit this query for your consideration:

 SELECT DISTINCT ON (permissions.id)
        permissions.section,
        permissions.subject,
        permissions.action
       FROM
         permissions
         LEFT JOIN permissions_roles ON permissions.id = permissions_roles.permission_id
         LEFT JOIN roles_users ON permissions_roles.role_id = roles_users.role_id AND roles_users.user_id = 13949
         LEFT JOIN accounts_users_permissions ON permissions.id = accounts_users_permissions.permission_id
         LEFT JOIN accounts_users ON accounts_users_permissions.accounts_user_id = accounts_users.id
             AND accounts_users.user_id = 13949
             AND accounts_users.account_id = 0
         LEFT JOIN access_keys_permissions ON permissions.id = access_keys_permissions.permission_id
         LEFT JOIN access_keys_users ON access_keys_permissions.access_key_id = access_keys_users.access_key_id AND access_keys_users.user_id = 13949
       WHERE
        accounts_users.deleted_at IS NULL AND
         13949 IS NOT NULL AND
         (
           roles_users.id IS NOT NULL OR
           access_keys_users.id IS NOT NULL OR
           accounts_users.id IS NOT NULL
          )

Here is the explain analyze:

 Unique  (cost=1.84..3153.19 rows=322 width=29) (actual time=1.289..996.395 rows=76 loops=1)
   ->  Merge Left Join  (cost=1.84..2856.84 rows=118541 width=29) (actual time=1.288..985.494 rows=183907 loops=1)
         Merge Cond: (permissions.id = permissions_roles.permission_id)
         Filter: ((roles_users.id IS NOT NULL) OR (access_keys_users.id IS NOT NULL) OR (accounts_users.id IS NOT NULL))
         Rows Removed by Filter: 5613165
         ->  Merge Left Join  (cost=1.27..418.98 rows=7353 width=37) (actual time=0.044..45.830 rows=185714 loops=1)
               Merge Cond: (permissions.id = access_keys_permissions.permission_id)
               ->  Nested Loop Left Join  (cost=0.72..214.69 rows=1422 width=33) (actual time=0.020..2.614 rows=3223 loops=1)
                     Join Filter: (accounts_users_permissions.accounts_user_id = accounts_users.id)
                     Filter: (accounts_users.deleted_at IS NULL)
                     ->  Merge Left Join  (cost=0.43..162.80 rows=3172 width=33) (actual time=0.013..1.844 rows=3223 loops=1)
                           Merge Cond: (permissions.id = accounts_users_permissions.permission_id)
                           ->  Index Scan using permissions_pkey on permissions  (cost=0.15..14.01 rows=322 width=29) (actual time=0.007..0.099 rows=322 loops=1)
                           ->  Index Scan using accounts_users_permissions_permission_id on accounts_users_permissions  (cost=0.28..108.33 rows=3172 width=8) (actual time=0.004..1.140 rows=3172 loops=1)
                     ->  Materialize  (cost=0.29..4.31 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=3223)
                           ->  Index Scan using index_stores_users_on_store_id_and_user_id on accounts_users  (cost=0.29..4.31 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1)
                                 Index Cond: ((account_id = 0) AND (user_id = 13949))
               ->  Materialize  (cost=0.55..94.61 rows=1665 width=8) (actual time=0.022..9.459 rows=182891 loops=1)
                     ->  Nested Loop Left Join  (cost=0.55..90.44 rows=1665 width=8) (actual time=0.020..0.818 rows=1665 loops=1)
                           Join Filter: (access_keys_permissions.access_key_id = access_keys_users.access_key_id)
                           ->  Index Scan using index_access_keys_permissions_on_permission_id on access_keys_permissions  (cost=0.28..61.18 rows=1665 width=8) (actual time=0.011..0.455 rows=1665 loops=1)
                           ->  Materialize  (cost=0.28..4.29 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1665)
                                 ->  Index Scan using index_access_keys_users_on_user_id on access_keys_users  (cost=0.28..4.29 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
                                       Index Cond: (user_id = 13949)
         ->  Materialize  (cost=0.57..654.71 rows=5190 width=8) (actual time=0.738..269.930 rows=5797068 loops=1)
               ->  Nested Loop Left Join  (cost=0.57..641.73 rows=5190 width=8) (actual time=0.736..6.183 rows=5190 loops=1)
                     Join Filter: (permissions_roles.role_id = roles_users.role_id)
                     Rows Removed by Join Filter: 20636
                     ->  Index Scan using permissions_roles_permission_id on permissions_roles  (cost=0.28..150.89 rows=5190 width=8) (actual time=0.005..1.669 rows=5190 loops=1)
                     ->  Materialize  (cost=0.29..335.15 rows=2 width=8) (actual time=0.000..0.000 rows=4 loops=5190)
                           ->  Index Scan using index_roles_users_on_role_id_and_user_id on roles_users  (cost=0.29..335.14 rows=2 width=8) (actual time=0.179..0.724 rows=4 loops=1)
                                 Index Cond: (user_id = 13949)
 Planning time: 1.782 ms
 Execution time: 996.545 ms
(34 rows)

(I hope the boldened lines show up as boldened in your email reception to make this easier.)

This query runs thousands of times a day, specifically many hundreds of times between 3 am and 6 am causing hi stress load and consequently dropped connections.

My specific question to you is: I notice in the boldened lines the discrepancy between the 'estimated row count' and 'actual row count' Most blatant is one Materialize where estimated rowcount as calculated in the explain is 5190 and actual rowcount is 5,797.068.  How do I fix this so that the estimated is closer to the actual.  I run a vacuum and an analyze on all tables nightly.

I have attached a file containing table/index descriptions and our postgresql file for your reference.
Thank you for your help!  This is quite urgent.
(One note - this morning to try to fix this we rebooted and increased the shared_buffers from 8 GB to 14 GB.
 Our hope is that we find the real solution for this optimization and
 revert back to 8 GB.)





--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





Re: Help needed on optimizing query

От
Mark Steben
Дата:
Given that There Is No Free Lunch, and therefore fewer IOPS means -- by definition -- slower queries in cases like this, are you saying that the query was not tuned properly when running in the Expedient DC?
  I tuned this query to the best of my knowledge and ability when running in Expedient. What I am asking here is help from postgresql experts who know more than I to help with further tuning if possible.

(Also, since the query is has "13949 IS NOT NULL" in it, can we presume that the query is really parameterized, and you just put in some numbers to make it work from psql?)  
No I did not.  In fact I took out the '13949 is not null' out, reran the explain and it made no difference.
I will take Eriks suggestion, rebuild the indexes and see if that makes a difference.  But I am still curious as to the differences in the 'planned row' count vs the 'actual row' count.

On Mon, Apr 13, 2020 at 11:02 AM Ron <ronljohnsonjr@gmail.com> wrote:
Given that There Is No Free Lunch, and therefore fewer IOPS means -- by definition -- slower queries in cases like this, are you saying that the query was not tuned properly when running in the Expedient DC?

(Also, since the query is has "13949 IS NOT NULL" in it, can we presume that the query is really parameterized, and you just put in some numbers to make it work from psql?)

On 4/13/20 9:48 AM, Mark Steben wrote:
Ron, the IOPS is very similar now with the size increase to 64GB memory. We don't wish to incur the cost of the extra memory if possible.
That is why I would like to tune the individual query that is the primary cause of this problem first before we decide to incur the cost permanently.

On Mon, Apr 13, 2020 at 10:24 AM Ron <ronljohnsonjr@gmail.com> wrote:
Does the Azure VM have the same IOPS rating as the Expedient DC? (That would immediately explain the slowdown, as would the fact that Azure has been heavily strained by new services spun up during the COVID-19 crisis.)

On 4/13/20 9:07 AM, Mark Steben wrote:
Good morning,
We run postgres 9.4.21.  Last week we moved our database from an expedient datacenter to an azure vm.  
I submit this query for your consideration:

 SELECT DISTINCT ON (permissions.id)
        permissions.section,
        permissions.subject,
        permissions.action
       FROM
         permissions
         LEFT JOIN permissions_roles ON permissions.id = permissions_roles.permission_id
         LEFT JOIN roles_users ON permissions_roles.role_id = roles_users.role_id AND roles_users.user_id = 13949
         LEFT JOIN accounts_users_permissions ON permissions.id = accounts_users_permissions.permission_id
         LEFT JOIN accounts_users ON accounts_users_permissions.accounts_user_id = accounts_users.id
             AND accounts_users.user_id = 13949
             AND accounts_users.account_id = 0
         LEFT JOIN access_keys_permissions ON permissions.id = access_keys_permissions.permission_id
         LEFT JOIN access_keys_users ON access_keys_permissions.access_key_id = access_keys_users.access_key_id AND access_keys_users.user_id = 13949
       WHERE
        accounts_users.deleted_at IS NULL AND
         13949 IS NOT NULL AND
         (
           roles_users.id IS NOT NULL OR
           access_keys_users.id IS NOT NULL OR
           accounts_users.id IS NOT NULL
          )

Here is the explain analyze:

 Unique  (cost=1.84..3153.19 rows=322 width=29) (actual time=1.289..996.395 rows=76 loops=1)
   ->  Merge Left Join  (cost=1.84..2856.84 rows=118541 width=29) (actual time=1.288..985.494 rows=183907 loops=1)
         Merge Cond: (permissions.id = permissions_roles.permission_id)
         Filter: ((roles_users.id IS NOT NULL) OR (access_keys_users.id IS NOT NULL) OR (accounts_users.id IS NOT NULL))
         Rows Removed by Filter: 5613165
         ->  Merge Left Join  (cost=1.27..418.98 rows=7353 width=37) (actual time=0.044..45.830 rows=185714 loops=1)
               Merge Cond: (permissions.id = access_keys_permissions.permission_id)
               ->  Nested Loop Left Join  (cost=0.72..214.69 rows=1422 width=33) (actual time=0.020..2.614 rows=3223 loops=1)
                     Join Filter: (accounts_users_permissions.accounts_user_id = accounts_users.id)
                     Filter: (accounts_users.deleted_at IS NULL)
                     ->  Merge Left Join  (cost=0.43..162.80 rows=3172 width=33) (actual time=0.013..1.844 rows=3223 loops=1)
                           Merge Cond: (permissions.id = accounts_users_permissions.permission_id)
                           ->  Index Scan using permissions_pkey on permissions  (cost=0.15..14.01 rows=322 width=29) (actual time=0.007..0.099 rows=322 loops=1)
                           ->  Index Scan using accounts_users_permissions_permission_id on accounts_users_permissions  (cost=0.28..108.33 rows=3172 width=8) (actual time=0.004..1.140 rows=3172 loops=1)
                     ->  Materialize  (cost=0.29..4.31 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=3223)
                           ->  Index Scan using index_stores_users_on_store_id_and_user_id on accounts_users  (cost=0.29..4.31 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1)
                                 Index Cond: ((account_id = 0) AND (user_id = 13949))
               ->  Materialize  (cost=0.55..94.61 rows=1665 width=8) (actual time=0.022..9.459 rows=182891 loops=1)
                     ->  Nested Loop Left Join  (cost=0.55..90.44 rows=1665 width=8) (actual time=0.020..0.818 rows=1665 loops=1)
                           Join Filter: (access_keys_permissions.access_key_id = access_keys_users.access_key_id)
                           ->  Index Scan using index_access_keys_permissions_on_permission_id on access_keys_permissions  (cost=0.28..61.18 rows=1665 width=8) (actual time=0.011..0.455 rows=1665 loops=1)
                           ->  Materialize  (cost=0.28..4.29 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1665)
                                 ->  Index Scan using index_access_keys_users_on_user_id on access_keys_users  (cost=0.28..4.29 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
                                       Index Cond: (user_id = 13949)
         ->  Materialize  (cost=0.57..654.71 rows=5190 width=8) (actual time=0.738..269.930 rows=5797068 loops=1)
               ->  Nested Loop Left Join  (cost=0.57..641.73 rows=5190 width=8) (actual time=0.736..6.183 rows=5190 loops=1)
                     Join Filter: (permissions_roles.role_id = roles_users.role_id)
                     Rows Removed by Join Filter: 20636
                     ->  Index Scan using permissions_roles_permission_id on permissions_roles  (cost=0.28..150.89 rows=5190 width=8) (actual time=0.005..1.669 rows=5190 loops=1)
                     ->  Materialize  (cost=0.29..335.15 rows=2 width=8) (actual time=0.000..0.000 rows=4 loops=5190)
                           ->  Index Scan using index_roles_users_on_role_id_and_user_id on roles_users  (cost=0.29..335.14 rows=2 width=8) (actual time=0.179..0.724 rows=4 loops=1)
                                 Index Cond: (user_id = 13949)
 Planning time: 1.782 ms
 Execution time: 996.545 ms
(34 rows)

(I hope the boldened lines show up as boldened in your email reception to make this easier.)

This query runs thousands of times a day, specifically many hundreds of times between 3 am and 6 am causing hi stress load and consequently dropped connections.

My specific question to you is: I notice in the boldened lines the discrepancy between the 'estimated row count' and 'actual row count' Most blatant is one Materialize where estimated rowcount as calculated in the explain is 5190 and actual rowcount is 5,797.068.  How do I fix this so that the estimated is closer to the actual.  I run a vacuum and an analyze on all tables nightly.

I have attached a file containing table/index descriptions and our postgresql file for your reference.
Thank you for your help!  This is quite urgent.
(One note - this morning to try to fix this we rebooted and increased the shared_buffers from 8 GB to 14 GB.
 Our hope is that we find the real solution for this optimization and
 revert back to 8 GB.)

--
Angular momentum makes the world go 'round.


--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





Re: Help needed on optimizing query

От
Tom Lane
Дата:
Mark Steben <mark.steben@drivedominion.com> writes:
>          ->  *Materialize  (cost=0.57..654.71 rows=5190 width=8) (actual
> time=0.738..269.930 rows=5797068 loops=1)*
>                ->  Nested Loop Left Join  (cost=0.57..641.73 rows=5190
> width=8) (actual time=0.736..6.183 rows=5190 loops=1)

> My specific question to you is: I notice in the boldened lines the
> discrepancy between the 'estimated row count' and 'actual row count' Most
> blatant is one Materialize where estimated rowcount as calculated in the
> explain is 5190 and actual rowcount is 5,797.068.  How do I fix this so
> that the estimated is closer to the actual.

I don't think there's any estimation failure here at all: notice that the
input left-join node's estimate is dead on.  The reason the Materialize's
output row count is higher is that the same rows are being read from it
over and over, which is something that a Merge Join will do to its
right-hand input when the left-hand input has a lot of duplicate join
keys.  The reason the planner stuck a Materialize here is exactly to
make that as cheap as it could.

It looks to me like the really bad aspect of this plan is that
the most aggressive filter can't be applied till the very end:

         Filter: ((roles_users.id IS NOT NULL) OR (access_keys_users.id IS NOT NULL) OR (accounts_users.id IS NOT
NULL))
         Rows Removed by Filter: 5613165

You're likely not going to be able to do much about that without
a significant rethinking of your table layout.  But having to do
a seven-way join in a performance-critical query is already a sign
that you're in for pain.

            regards, tom lane