Обсуждение: Long running query in new production, not so long in old

Поиск

Список

Период

Сортировка

Long running query in new production, not so long in old

От

Mark Steben

Дата:

28 марта 2019 г., 02:27:06

Good evening

We are moving to a new VM environment (expedient) and have one query that typically runs in 22 - 25 seconds in our old environment, but is running in about 1 hour, 20 minutes in our new. I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliion

in the new environment and only 445 thousand in the old. I have sent the explains along with the table descriptions, row counts, the one function that I know causes the bottleneck, the query, some relevant configuration settings in postgresql conf (identical in both environments)

and a listing from top in both environments, showing memory, shared memory, and cpu.

Everything seems to be identical or close, except for the shared buffer count in the explain.

Any insight would be appreciated.

Thank you,

Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com

Вложения

Re: Long running query in new production, not so long in old

От

Alexander Shaburov

Дата:

28 марта 2019 г., 10:45:07

Hi!

Make the same shared buffer value for example. Disable any activity except you client query. Make query twice. Second one should be during about < 1 min.

If not, so it isnt problem shared buffer. My be processor count. If query will be faster, so in real situation you have any activity on db, that take away buffers from using query.
Alex

28.03.2019 1:27, Mark Steben пишет:

Good evening

We are moving to a new VM environment (expedient) and have one query that typically runs in 22 - 25 seconds in our old environment, but is running in about 1 hour, 20 minutes in our new. I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliion
in the new environment and only 445 thousand in the old. I have sent the explains along with the table descriptions, row counts, the one function that I know causes the bottleneck, the query, some relevant configuration settings in postgresql conf (identical in both environments)
and a listing from top in both environments, showing memory, shared memory, and cpu.

Everything seems to be identical or close, except for the shared buffer count in the explain.
Any insight would be appreciated.

Thank you,

--
Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com

Re: Long running query in new production, not so long in old

От

Mark Steben

Дата:

28 марта 2019 г., 15:39:29

Thank you Alex for your response. I do appreciate your taking the time as I am under a time schedule.

My question has more to do with an Interpretion of what the BUFFERS: SHARED HIT notification means within the explain and some insight as to why it can be so different when run in similar environments. I am running postgres 9.4 btw.

On Thu, Mar 28, 2019 at 3:45 AM Alexander Shaburov <shab2@mail.ru> wrote:

Hi!
Make the same shared buffer value for example. Disable any activity except you client query. Make query twice. Second one should be during about < 1 min.
If not, so it isnt problem shared buffer. My be processor count. If query will be faster, so in real situation you have any activity on db, that take away buffers from using query.
Alex
28.03.2019 1:27, Mark Steben пишет:
Good evening

We are moving to a new VM environment (expedient) and have one query that typically runs in 22 - 25 seconds in our old environment, but is running in about 1 hour, 20 minutes in our new. I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliion
in the new environment and only 445 thousand in the old. I have sent the explains along with the table descriptions, row counts, the one function that I know causes the bottleneck, the query, some relevant configuration settings in postgresql conf (identical in both environments)
and a listing from top in both environments, showing memory, shared memory, and cpu.

Everything seems to be identical or close, except for the shared buffer count in the explain.
Any insight would be appreciated.

Thank you,

--
Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com

Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com

Re: Long running query in new production, not so long in old

От

Laurenz Albe

Дата:

28 марта 2019 г., 15:56:16

On Wed, 2019-03-27 at 19:27 -0400, Mark Steben wrote:
> We are moving to a new VM environment (expedient) and have one query  that typically runs in 22 - 25
> seconds in our old environment, but is running in about 1 hour, 20 minutes in our new.
> I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliion
> in the new environment and only 445 thousand in the old.  I have sent the explains along with the
> table descriptions, row counts, the one function that I know causes the bottleneck,  the query,
> some relevant configuration settings in postgresql conf (identical in both environments)
> and a listing from top in both environments, showing memory, shared memory, and cpu.
> 
> Everything seems to be identical or close, except for the shared buffer count in the explain.  
> Any insight would be appreciated.

Slow plan:

 ->  Index Scan using emailrcpts_4columns on emailrcpts  (cost=0.56..119078.48 rows=5 width=29) (actual
time=4873080.765..4873080.765rows=0 loops=1)
 
       Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3))
       Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text)
<>0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND
 
(number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) =
'ohgoshnonotthebees!!!'::text)AND (NOT (SubPlan 1)))
 
       Rows Removed by Filter: 19952
       Buffers: shared hit=113768530 read=6244
       SubPlan 1
         ->  Limit  (cost=0.28..15.58 rows=1 width=0) (never executed)

Fast plan:

 ->  Index Scan using emailrcpts_4columns on emailrcpts  (cost=0.56..113162.26 rows=5 width=29) (actual
time=21086.555..21086.555rows=0 loops=1)
 
       Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3))
       Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text)
<>0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND
 
(number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) =
'ohgoshnonotthebees!!!'::text)AND (NOT (SubPlan 1)))
 
       Rows Removed by Filter: 19952
       Buffers: shared hit=445188 read=61756
       SubPlan 1
         ->  Limit  (cost=0.28..15.58 rows=1 width=0) (never executed)

Hmm. These are the ideas I can come up with:

1. There are many index tuples belonging to dead heap tuples.
   Then re-running the query should produce way fewer buffer hits.
   VACUUM would fix that issue.

2. The index is terribly fragmented.
   REINDEX INDEX emailrcpts_4columns
   would improve that one.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Long running query in new production, not so long in old

От

MichaelDBA

Дата:

28 марта 2019 г., 16:23:55

My question is about the new VM environment...

What vendor for the VMs?
Make sure memory and CPU are not overallocated in the hypervisor.
Evaluate the resource needs of the VM group/cluster to which it belongs.
Regarding VMWare, I dealt with a problem where the "cpu ready percentage" was too high (above 5-10%), which caused cpu cycle delays.

Regards,
Michael Vitale

Laurenz Albe wrote on 3/28/2019 8:56 AM:

On Wed, 2019-03-27 at 19:27 -0400, Mark Steben wrote:

We are moving to a new VM environment (expedient) and have one query  that typically runs in 22 - 25
seconds in our old environment, but is running in about 1 hour, 20 minutes in our new.
I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliion
in the new environment and only 445 thousand in the old.  I have sent the explains along with the
table descriptions, row counts, the one function that I know causes the bottleneck,  the query,
some relevant configuration settings in postgresql conf (identical in both environments)
and a listing from top in both environments, showing memory, shared memory, and cpu.

Everything seems to be identical or close, except for the shared buffer count in the explain.  
Any insight would be appreciated.

Slow plan:
->  Index Scan using emailrcpts_4columns on emailrcpts  (cost=0.56..119078.48 rows=5 width=29) (actual time=4873080.765..4873080.765 rows=0 loops=1)      Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3))      Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <> 0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND
(number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text) AND (NOT (SubPlan 1)))      Rows Removed by Filter: 19952      Buffers: shared hit=113768530 read=6244      SubPlan 1        ->  Limit  (cost=0.28..15.58 rows=1 width=0) (never executed)

Fast plan:
->  Index Scan using emailrcpts_4columns on emailrcpts  (cost=0.56..113162.26 rows=5 width=29) (actual time=21086.555..21086.555 rows=0 loops=1)      Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3))      Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <> 0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND
(number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text) AND (NOT (SubPlan 1)))      Rows Removed by Filter: 19952      Buffers: shared hit=445188 read=61756      SubPlan 1        ->  Limit  (cost=0.28..15.58 rows=1 width=0) (never executed)

Hmm. These are the ideas I can come up with:

1. There are many index tuples belonging to dead heap tuples.  Then re-running the query should produce way fewer buffer hits.  VACUUM would fix that issue.

2. The index is terribly fragmented.  REINDEX INDEX emailrcpts_4columns  would improve that one.

Yours,
Laurenz Albe

Re: Long running query in new production, not so long in old

От

Mark Steben

Дата:

05 апреля 2019 г., 18:23:29

Hello again,

I wish to make another couple observations about my query.

First our IT team did their due diligence to ensure that the cpu housing the VM is not overallocated.

The cpu ready percentage is allocated properly. I am convinced that the issue resides with the way the query is written.

I also, per the suggestion of this post, ensured that fragmentation was minimized by vacuuming more aggressively.

Now I want to compare the explains and see if you have any recommendations.

This is the query:

explain (analyze, buffers) SELECT 147624,id,fname,lname,email, phone_number,addr1,zip,service_date,(send_date::date + 0)::date

from (select id,fname,lname,email,home_phone as phone_number,addr1,zip,service_date,timeclause_combine(((null)::date - '0 days'::interval)::date,

(ss1.last_transaction_date+timespan('90 days'))::date,(null)::date,(null)::date,9,false) as send_date,last_send from

(select *, (select max(send_date) from queuenodes where qid=333887088 and queuenodes.id=emailrcpts.id and sent=true) as last_send from emailrcpts

FULL OUTER JOIN (SELECT i as iterator FROM series WHERE i BETWEEN 1 AND 1) as i ON true

where cid=333887088 and number_of_sends(emailrcpts.id,30,1)<3

and not exists (select 1 from client_subscribed_products where cid=333887088

and background_id=emailrcpts.id and product_id=1)

and removed=false and active=true

and not ((fname is null or length(fname)=0) and (lname is null or length(lname)=0))

AND NOT EXISTS (SELECT 1 FROM bounces WHERE email=emailrcpts.email AND bounce_count >= 3) and length(email)>0

and email not in

(select email from bogus_email_list

where (cid=333887088 or cid=0))

and NOT EXISTS (SELECT 1 FROM opt_outs oo INNER JOIN opt_out_reasons oor ON oo.reason = oor.id WHERE oo.sid=emailrcpts.id

AND oor.service_reminders_only=false

AND (oor.global = true OR oo.product_id = 1))

AND not exists (select 1 from opt_outs oo inner join opt_out_reasons oor on oo.reason=oor.id where target=emailrcpts.email

and (oor.global = true or oo.product_id = 1)

and (oor.global_by_delivery_type=true

or (oo.opt_out_cid=emailrcpts.cid and oor.global_by_client=true) ))

and (lower(emailrcpt_name(fname,lname)) not in

(select lower(name) from bogus_name_list where (cid=333887088 or cid=0)))

AND email NOT IN

(SELECT email FROM customer_preferences WHERE client_id=333887088

AND service_offers=false)

AND email NOT IN

(SELECT email FROM customer_preferences cp

WHERE client_id=333887088

and this is the first (longrunning) explain analyze with buffers. Notice I highlighted the buffers report, it seems to me that the large number of buffers (over 25 million)

compared to the buffers read from disk (less than 200) would be a good thing But the query runs in almost 20 minutes. The second explain follows.

Subquery Scan on ss2 (cost=2210.07..90750.90 rows=1 width=71) (actual time=173042.965..1130286.738 rows=964 loops=1)

Filter: ((ss2.send_date IS NOT NULL) AND (ss2.send_date <= ('now'::cstring)::date))

Rows Removed by Filter: 252

Buffers: shared hit=25190549 read=175 dirtied=24

-> Nested Loop Left Join (cost=2210.07..90750.88 rows=1 width=71) (actual time=556.742..1130283.136 rows=1216 loops=1)

Buffers: shared hit=25190549 read=175 dirtied=24

-> Nested Loop Anti Join (cost=2210.07..90738.52 rows=1 width=71) (actual time=556.193..1130206.938 rows=1216 loops=1)

Buffers: shared hit=25184433 read=175 dirtied=24

-> Nested Loop Anti Join (cost=2209.51..90729.94 rows=1 width=71) (actual time=556.152..1130174.060 rows=1304 loops=1)

Join Filter: (client_products.background_id = emailrcpts.id)

Rows Removed by Join Filter: 1304

Buffers: shared hit=25179121 read=171 dirtied=24

-> Nested Loop Anti Join (cost=2209.23..90721.62 rows=1 width=71) (actual time=556.107..1130170.188 rows=1304 loops=1)

Buffers: shared hit=25179116 read=171 dirtied=24

-> Nested Loop Anti Join (cost=2200.51..87968.62 rows=1 width=71) (actual time=556.091..1130156.848 rows=1305 loops=1)

Buffers: shared hit=25175197 read=171 dirtied=24

-> Index Scan using emailrcpts_4columns on emailrcpts (cost=2192.17..85033.36 rows=1 width=75) (actual time=556.044..1130113.008 rows=1308 loops=1)

Index Cond: ((cid = 333887088) AND (removed = false) AND (active = true))

Filter: ((NOT removed) AND active AND (last_transaction_date IS NOT NULL) AND (NOT (hashed SubPlan 2)) AND (NOT (hashed SubPlan 4)) AND (NOT (hashed SubPlan 6)) AND (((fname IS NOT NULL) AND (length((fname)::text) <> 0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND (number_of_sends(id, 30, 1) < 3) AND (NOT (hashed SubPlan 3)) AND ((lower(get_make(cid, (vin)::text, (make_purchased)::text)) ~~ 'for%'::text) OR (lower(get_make(cid, (vin)::text, (make_purchased)::text)) ~~ 'lin%'::text)) AND ((SubPlan 7) IS NULL))

Rows Removed by Filter: 1214

Buffers: shared hit=25169950 read=127 dirtied=24

SubPlan 2

-> Bitmap Heap Scan on bogus_email_list (cost=109.48..566.19 rows=3380 width=19) (actual time=0.365..1.418 rows=3384 loops=1)

Recheck Cond: ((cid = 333887088) OR (cid = 0))

Heap Blocks: exact=97

Buffers: shared hit=116

-> BitmapOr (cost=109.48..109.48 rows=3381 width=0) (actual time=0.351..0.351 rows=0 loops=1)

Buffers: shared hit=19

-> Bitmap Index Scan on be_cid_lower_email (cost=0.00..4.31 rows=5 width=0) (actual time=0.036..0.036 rows=8 loops=1)

Index Cond: (cid = 333887088)

Buffers: shared hit=2

-> Bitmap Index Scan on be_cid_lower_email (cost=0.00..103.48 rows=3376 width=0) (actual time=0.314..0.314 rows=3376 loops=1)

Index Cond: (cid = 0)

Buffers: shared hit=17

SubPlan 4

-> Index Scan using customer_preferences_cid_email_idx on customer_preferences (cost=0.14..8.16 rows=1 width=516) (actual time=0.002..0.002 rows=0 loops=1)

Index Cond: (client_id = 333887088)

Filter: (NOT service_offers)

Buffers: shared hit=1

SubPlan 6

-> Nested Loop Anti Join (cost=0.14..247.78 rows=1 width=516) (actual time=0.001..0.001 rows=0 loops=1)

Join Filter: (NOT (SubPlan 5))

Buffers: shared hit=1

-> Index Scan using customer_preferences_cid_email_idx on customer_preferences cp (cost=0.14..8.16 rows=1 width=520) (actual time=0.001..0.001 rows=0 loops=1)

Index Cond: (client_id = 333887088)

Buffers: shared hit=1

-> Materialize (cost=0.00..202.33 rows=2 width=4) (never executed)

-> Seq Scan on client_manufacturers cm (cost=0.00..202.32 rows=2 width=4) (never executed)

Filter: (cid = 333887088)

SubPlan 5

-> Seq Scan on customer_preference_makes (cost=0.00..37.25 rows=3 width=4) (never executed)

Filter: ((NOT include) AND (customer_preference_id = cp.id))

SubPlan 3

-> Bitmap Heap Scan on bogus_name_list (cost=19.00..1359.94 rows=432 width=20) (actual time=0.066..0.255 rows=238 loops=1)

Recheck Cond: ((cid = 333887088) OR (cid = 0))

Heap Blocks: exact=10

Buffers: shared hit=17

-> BitmapOr (cost=19.00..19.00 rows=433 width=0) (actual time=0.053..0.053 rows=0 loops=1)

Buffers: shared hit=7

-> Bitmap Index Scan on bn_cid_lower_name (cost=0.00..9.39 rows=216 width=0) (actual time=0.023..0.023 rows=0 loops=1)

Index Cond: (cid = 333887088)

Buffers: shared hit=3

-> Bitmap Index Scan on bn_cid_lower_name (cost=0.00..9.39 rows=216 width=0) (actual time=0.029..0.029 rows=238 loops=1)

Index Cond: (cid = 0)

Buffers: shared hit=4

SubPlan 7

-> Aggregate (cost=8.59..8.60 rows=1 width=4) (actual time=0.012..0.012 rows=1 loops=1308)

Buffers: shared hit=5232

-> Index Scan using queuenodes_qid_id on queuenodes queuenodes_1 (cost=0.57..8.59 rows=1 width=4) (actual time=0.010..0.010 rows=0 loops=1308)

Index Cond: ((qid = 336652327) AND (id = emailrcpts.id))

Filter: sent

Buffers: shared hit=5232

-> Hash Join (cost=8.34..1471.66 rows=29 width=27) (actual time=0.031..0.031 rows=0 loops=1308)

Hash Cond: (ooh_1.reason = oor_1.id)

Join Filter: ((oor_1.global_by_delivery_type OR ((ooh_1.opt_out_cid = emailrcpts.cid) AND oor_1.global_by_client)) AND (oor_1.global OR (ooh_1.product_id = 1)))

Rows Removed by Join Filter: 0

Buffers: shared hit=5247 read=44

-> Index Scan using opt_out_history_target on opt_out_history ooh_1 (cost=0.56..1462.73 rows=90 width=33) (actual time=0.028..0.029 rows=0 loops=1308)

Index Cond: ((target)::text = (emailrcpts.email)::text)

Filter: active

Rows Removed by Filter: 0

Buffers: shared hit=5241 read=44

-> Hash (cost=7.02..7.02 rows=61 width=7) (actual time=0.052..0.052 rows=74 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 3kB

Buffers: shared hit=3

-> Seq Scan on opt_out_reasons oor_1 (cost=0.00..7.02 rows=61 width=7) (actual time=0.013..0.043 rows=74 loops=1)

Filter: (global_by_delivery_type OR global_by_client)

Rows Removed by Filter: 28

Buffers: shared hit=3

-> Hash Join (cost=8.72..1380.41 rows=91 width=4) (actual time=0.008..0.008 rows=0 loops=1305)

Hash Cond: (ooh.reason = oor.id)

Join Filter: (oor.global OR (ooh.product_id = 1))

Buffers: shared hit=3919

-> Index Scan using opt_out_history_sid on opt_out_history ooh (cost=0.44..1370.49 rows=101 width=12) (actual time=0.007..0.007 rows=0 loops=1305)

Index Cond: (sid = emailrcpts.id)

Filter: active

Buffers: shared hit=3916

-> Hash (cost=7.02..7.02 rows=101 width=5) (actual time=0.038..0.038 rows=101 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 4kB

Buffers: shared hit=3

-> Seq Scan on opt_out_reasons oor (cost=0.00..7.02 rows=101 width=5) (actual time=0.004..0.027 rows=101 loops=1)

Filter: (NOT service_reminders_only)

Rows Removed by Filter: 1

Buffers: shared hit=3

-> Materialize (cost=0.28..8.30 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1304)

Buffers: shared hit=5

-> Index Scan using client_products_active_cid on client_products (cost=0.28..8.30 rows=1 width=4) (actual time=0.037..0.040 rows=1 loops=1)

Index Cond: ((cid = 333887088) AND (cancel_date IS NULL))

Filter: (product_id = 1)

Rows Removed by Filter: 2

Buffers: shared hit=5

-> Index Only Scan using bounces_email_bounce_count on bounces (cost=0.56..8.57 rows=1 width=21) (actual time=0.023..0.023 rows=0 loops=1304)

Index Cond: ((email = (emailrcpts.email)::text) AND (bounce_count >= 3))

Heap Fetches: 85

Buffers: shared hit=5312 read=4

-> Seq Scan on series (cost=0.00..3.48 rows=1 width=0) (actual time=0.004..0.012 rows=1 loops=1216)

Filter: ((i >= 1) AND (i <= 1))

Rows Removed by Filter: 98

Buffers: shared hit=1216

SubPlan 1

-> Aggregate (cost=8.59..8.60 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=1216)

Buffers: shared hit=4864

-> Index Scan using queuenodes_qid_id on queuenodes (cost=0.57..8.59 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1216)

Index Cond: ((qid = 336652327) AND (id = emailrcpts.id))

Filter: sent

Buffers: shared hit=4864

Planning time: 8.362 ms

Execution time: 1130287.638 ms

(128 rows)

Here is the second explain

Obviously there is cache benefit.

But the number of shared hit buffers is much lower and the pages read is much higher. It seems to me that would make it run longer, no?

It runs in less than 20 seconds.

In both cases there is not a lot of other contending queries.: It has the entire server pretty much to itself. Could someone have any insight as to why the query runs longer when there is a larger buffer pool allocated? We are running postgres 9.4

Subquery Scan on ss2 (cost=2210.07..90750.90 rows=1 width=71) (actual time=2609.111..19846.425 rows=964 loops=1)

Filter: ((ss2.send_date IS NOT NULL) AND (ss2.send_date <= ('now'::cstring)::date))

Rows Removed by Filter: 252

Buffers: shared hit=124118 read=68829 dirtied=2353

-> Nested Loop Left Join (cost=2210.07..90750.88 rows=1 width=71) (actual time=10.733..19843.724 rows=1216 loops=1)

Buffers: shared hit=124118 read=68829 dirtied=2353

-> Nested Loop Anti Join (cost=2210.07..90738.52 rows=1 width=71) (actual time=10.111..19781.534 rows=1216 loops=1)

Buffers: shared hit=118002 read=68829 dirtied=2353

-> Nested Loop Anti Join (cost=2209.51..90729.94 rows=1 width=71) (actual time=10.068..19751.075 rows=1304 loops=1)

Join Filter: (client_products.background_id = emailrcpts.id)

Rows Removed by Join Filter: 1304

Buffers: shared hit=112686 read=68829 dirtied=2353

-> Nested Loop Anti Join (cost=2209.23..90721.62 rows=1 width=71) (actual time=10.008..19748.119 rows=1304 loops=1)

Buffers: shared hit=112681 read=68829 dirtied=2353

-> Nested Loop Anti Join (cost=2200.51..87968.62 rows=1 width=71) (actual time=9.987..19738.478 rows=1305 loops=1)

Buffers: shared hit=108762 read=68829 dirtied=2353

-> Index Scan using emailrcpts_4columns on emailrcpts (cost=2192.17..85033.36 rows=1 width=75) (actual time=9.913..19701.271 rows=1308 loops=1)

Index Cond: ((cid = 333887088) AND (removed = false) AND (active = true))

Rows Removed by Filter: 1214

Buffers: shared hit=103471 read=68829 dirtied=2353

SubPlan 2

-> Bitmap Heap Scan on bogus_email_list (cost=109.48..566.19 rows=3380 width=19) (actual time=0.457..1.788 rows=3384 loops=1)

Recheck Cond: ((cid = 333887088) OR (cid = 0))

Heap Blocks: exact=97

Buffers: shared hit=116

-> BitmapOr (cost=109.48..109.48 rows=3381 width=0) (actual time=0.440..0.440 rows=0 loops=1)

Buffers: shared hit=19

-> Bitmap Index Scan on be_cid_lower_email (cost=0.00..4.31 rows=5 width=0) (actual time=0.036..0.036 rows=8 loops=1)

Index Cond: (cid = 333887088)

Buffers: shared hit=2

-> Bitmap Index Scan on be_cid_lower_email (cost=0.00..103.48 rows=3376 width=0) (actual time=0.403..0.403 rows=3376 loops=1)

Index Cond: (cid = 0)

Buffers: shared hit=17

SubPlan 4

-> Index Scan using customer_preferences_cid_email_idx on customer_preferences (cost=0.14..8.16 rows=1 width=516) (actual time=0.003..0.003 rows=0 loops=1)

Index Cond: (client_id = 333887088)

Filter: (NOT service_offers)

Buffers: shared hit=1

SubPlan 6

-> Nested Loop Anti Join (cost=0.14..247.78 rows=1 width=516) (actual time=0.001..0.001 rows=0 loops=1)

Join Filter: (NOT (SubPlan 5))

Buffers: shared hit=1

-> Index Scan using customer_preferences_cid_email_idx on customer_preferences cp (cost=0.14..8.16 rows=1 width=520) (actual time=0.000..0.000 rows=0 loops=1)

Index Cond: (client_id = 333887088)

Buffers: shared hit=1

-> Materialize (cost=0.00..202.33 rows=2 width=4) (never executed)

-> Seq Scan on client_manufacturers cm (cost=0.00..202.32 rows=2 width=4) (never executed)

Filter: (cid = 333887088)

SubPlan 5

-> Seq Scan on customer_preference_makes (cost=0.00..37.25 rows=3 width=4) (never executed)

Filter: ((NOT include) AND (customer_preference_id = cp.id))

SubPlan 3

-> Bitmap Heap Scan on bogus_name_list (cost=19.00..1359.94 rows=432 width=20) (actual time=0.098..0.352 rows=238 loops=1)

Recheck Cond: ((cid = 333887088) OR (cid = 0))

Heap Blocks: exact=10

Buffers: shared hit=17

-> BitmapOr (cost=19.00..19.00 rows=433 width=0) (actual time=0.083..0.083 rows=0 loops=1)

Buffers: shared hit=7

-> Bitmap Index Scan on bn_cid_lower_name (cost=0.00..9.39 rows=216 width=0) (actual time=0.039..0.039 rows=0 loops=1)

Index Cond: (cid = 333887088)

Buffers: shared hit=3

-> Bitmap Index Scan on bn_cid_lower_name (cost=0.00..9.39 rows=216 width=0) (actual time=0.042..0.042 rows=238 loops=1)

Index Cond: (cid = 0)

Buffers: shared hit=4

SubPlan 7

-> Aggregate (cost=8.59..8.60 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=1308)

Buffers: shared hit=5232

-> Index Scan using queuenodes_qid_id on queuenodes queuenodes_1 (cost=0.57..8.59 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=1308)

Index Cond: ((qid = 336652327) AND (id = emailrcpts.id))

Filter: sent

Buffers: shared hit=5232

-> Hash Join (cost=8.34..1471.66 rows=29 width=27) (actual time=0.027..0.027 rows=0 loops=1308)

Hash Cond: (ooh_1.reason = oor_1.id)

Join Filter: ((oor_1.global_by_delivery_type OR ((ooh_1.opt_out_cid = emailrcpts.cid) AND oor_1.global_by_client)) AND (oor_1.global OR (ooh_1.product_id = 1)))

Rows Removed by Join Filter: 0

Buffers: shared hit=5291

-> Index Scan using opt_out_history_target on opt_out_history ooh_1 (cost=0.56..1462.73 rows=90 width=33) (actual time=0.025..0.025 rows=0 loops=1308)

Index Cond: ((target)::text = (emailrcpts.email)::text)

Filter: active

Rows Removed by Filter: 0

Buffers: shared hit=5285

-> Hash (cost=7.02..7.02 rows=61 width=7) (actual time=0.041..0.041 rows=74 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 3kB

Buffers: shared hit=3

-> Seq Scan on opt_out_reasons oor_1 (cost=0.00..7.02 rows=61 width=7) (actual time=0.008..0.034 rows=74 loops=1)

Filter: (global_by_delivery_type OR global_by_client)

Rows Removed by Filter: 28

Buffers: shared hit=3

-> Hash Join (cost=8.72..1380.41 rows=91 width=4) (actual time=0.006..0.006 rows=0 loops=1305)

Hash Cond: (ooh.reason = oor.id)

Join Filter: (oor.global OR (ooh.product_id = 1))

Buffers: shared hit=3919

-> Index Scan using opt_out_history_sid on opt_out_history ooh (cost=0.44..1370.49 rows=101 width=12) (actual time=0.005..0.005 rows=0 loops=1305)

Index Cond: (sid = emailrcpts.id)

Filter: active

Buffers: shared hit=3916

-> Hash (cost=7.02..7.02 rows=101 width=5) (actual time=0.047..0.047 rows=101 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 4kB

Buffers: shared hit=3

-> Seq Scan on opt_out_reasons oor (cost=0.00..7.02 rows=101 width=5) (actual time=0.003..0.032 rows=101 loops=1)

Filter: (NOT service_reminders_only)

Rows Removed by Filter: 1

Buffers: shared hit=3

-> Materialize (cost=0.28..8.30 rows=1 width=4) (actual time=0.000..0.001 rows=1 loops=1304)

Buffers: shared hit=5

-> Index Scan using client_products_active_cid on client_products (cost=0.28..8.30 rows=1 width=4) (actual time=0.049..0.054 rows=1 loops=1)

Index Cond: ((cid = 333887088) AND (cancel_date IS NULL))

Filter: (product_id = 1)

Rows Removed by Filter: 2

Buffers: shared hit=5

-> Index Only Scan using bounces_email_bounce_count on bounces (cost=0.56..8.57 rows=1 width=21) (actual time=0.022..0.022 rows=0 loops=1304)

Index Cond: ((email = (emailrcpts.email)::text) AND (bounce_count >= 3))

Heap Fetches: 85

Buffers: shared hit=5316

-> Seq Scan on series (cost=0.00..3.48 rows=1 width=0) (actual time=0.003..0.011 rows=1 loops=1216)

Filter: ((i >= 1) AND (i <= 1))

Rows Removed by Filter: 98

Buffers: shared hit=1216

SubPlan 1

-> Aggregate (cost=8.59..8.60 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1216)

Buffers: shared hit=4864

-> Index Scan using queuenodes_qid_id on queuenodes (cost=0.57..8.59 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1216)

Index Cond: ((qid = 336652327) AND (id = emailrcpts.id))

Filter: sent

Buffers: shared hit=4864

Planning time: 10.516 ms

Execution time: 19847.302 ms

(128 rows)

On Thu, Mar 28, 2019 at 9:24 AM MichaelDBA <MichaelDBA@sqlexec.com> wrote:

My question is about the new VM environment...

What vendor for the VMs?
Make sure memory and CPU are not overallocated in the hypervisor.
Evaluate the resource needs of the VM group/cluster to which it belongs.
Regarding VMWare, I dealt with a problem where the "cpu ready percentage" was too high (above 5-10%), which caused cpu cycle delays.

Regards,
Michael Vitale

Laurenz Albe wrote on 3/28/2019 8:56 AM:

On Wed, 2019-03-27 at 19:27 -0400, Mark Steben wrote:

We are moving to a new VM environment (expedient) and have one query  that typically runs in 22 - 25
seconds in our old environment, but is running in about 1 hour, 20 minutes in our new.
I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliion
in the new environment and only 445 thousand in the old.  I have sent the explains along with the
table descriptions, row counts, the one function that I know causes the bottleneck,  the query,
some relevant configuration settings in postgresql conf (identical in both environments)
and a listing from top in both environments, showing memory, shared memory, and cpu.

Everything seems to be identical or close, except for the shared buffer count in the explain.  
Any insight would be appreciated.

Slow plan:
->  Index Scan using emailrcpts_4columns on emailrcpts  (cost=0.56..119078.48 rows=5 width=29) (actual time=4873080.765..4873080.765 rows=0 loops=1)      Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3))      Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <> 0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND
(number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text) AND (NOT (SubPlan 1)))      Rows Removed by Filter: 19952      Buffers: shared hit=113768530 read=6244      SubPlan 1        ->  Limit  (cost=0.28..15.58 rows=1 width=0) (never executed)

Fast plan:
->  Index Scan using emailrcpts_4columns on emailrcpts  (cost=0.56..113162.26 rows=5 width=29) (actual time=21086.555..21086.555 rows=0 loops=1)      Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3))      Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <> 0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND
(number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text) AND (NOT (SubPlan 1)))      Rows Removed by Filter: 19952      Buffers: shared hit=445188 read=61756      SubPlan 1        ->  Limit  (cost=0.28..15.58 rows=1 width=0) (never executed)

Hmm. These are the ideas I can come up with:

1. There are many index tuples belonging to dead heap tuples.  Then re-running the query should produce way fewer buffer hits.  VACUUM would fix that issue.

2. The index is terribly fragmented.  REINDEX INDEX emailrcpts_4columns  would improve that one.

Yours,
Laurenz Albe

Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Long running query in new production, not so long in old

Long running query in new production, not so long in old

Вложения

Re: Long running query in new production, not so long in old

Re: Long running query in new production, not so long in old

Re: Long running query in new production, not so long in old

Re: Long running query in new production, not so long in old

Re: Long running query in new production, not so long in old