Обсуждение: Long running query in new production, not so long in old
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com
Вложения
Hi!
Make the same shared buffer value for example. Disable any activity except you client query. Make query twice. Second one should be during about < 1 min.
If not, so it isnt problem shared buffer. My be processor count. If query will be faster, so in real situation you have any activity on db, that take away buffers from using query.
Alex
Good eveningWe are moving to a new VM environment (expedient) and have one query that typically runs in 22 - 25 seconds in our old environment, but is running in about 1 hour, 20 minutes in our new. I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliionin the new environment and only 445 thousand in the old. I have sent the explains along with the table descriptions, row counts, the one function that I know causes the bottleneck, the query, some relevant configuration settings in postgresql conf (identical in both environments)and a listing from top in both environments, showing memory, shared memory, and cpu.Everything seems to be identical or close, except for the shared buffer count in the explain.Any insight would be appreciated.Thank you,--Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com
Hi!
Make the same shared buffer value for example. Disable any activity except you client query. Make query twice. Second one should be during about < 1 min.
If not, so it isnt problem shared buffer. My be processor count. If query will be faster, so in real situation you have any activity on db, that take away buffers from using query.
Alex28.03.2019 1:27, Mark Steben пишет:Good eveningWe are moving to a new VM environment (expedient) and have one query that typically runs in 22 - 25 seconds in our old environment, but is running in about 1 hour, 20 minutes in our new. I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliionin the new environment and only 445 thousand in the old. I have sent the explains along with the table descriptions, row counts, the one function that I know causes the bottleneck, the query, some relevant configuration settings in postgresql conf (identical in both environments)and a listing from top in both environments, showing memory, shared memory, and cpu.Everything seems to be identical or close, except for the shared buffer count in the explain.Any insight would be appreciated.Thank you,--Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com
On Wed, 2019-03-27 at 19:27 -0400, Mark Steben wrote: > We are moving to a new VM environment (expedient) and have one query that typically runs in 22 - 25 > seconds in our old environment, but is running in about 1 hour, 20 minutes in our new. > I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliion > in the new environment and only 445 thousand in the old. I have sent the explains along with the > table descriptions, row counts, the one function that I know causes the bottleneck, the query, > some relevant configuration settings in postgresql conf (identical in both environments) > and a listing from top in both environments, showing memory, shared memory, and cpu. > > Everything seems to be identical or close, except for the shared buffer count in the explain. > Any insight would be appreciated. Slow plan: -> Index Scan using emailrcpts_4columns on emailrcpts (cost=0.56..119078.48 rows=5 width=29) (actual time=4873080.765..4873080.765rows=0 loops=1) Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3)) Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <>0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND (number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text)AND (NOT (SubPlan 1))) Rows Removed by Filter: 19952 Buffers: shared hit=113768530 read=6244 SubPlan 1 -> Limit (cost=0.28..15.58 rows=1 width=0) (never executed) Fast plan: -> Index Scan using emailrcpts_4columns on emailrcpts (cost=0.56..113162.26 rows=5 width=29) (actual time=21086.555..21086.555rows=0 loops=1) Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3)) Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <>0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND (number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text)AND (NOT (SubPlan 1))) Rows Removed by Filter: 19952 Buffers: shared hit=445188 read=61756 SubPlan 1 -> Limit (cost=0.28..15.58 rows=1 width=0) (never executed) Hmm. These are the ideas I can come up with: 1. There are many index tuples belonging to dead heap tuples. Then re-running the query should produce way fewer buffer hits. VACUUM would fix that issue. 2. The index is terribly fragmented. REINDEX INDEX emailrcpts_4columns would improve that one. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
- What vendor for the VMs?
- Make sure memory and CPU are not overallocated in the hypervisor.
- Evaluate the resource needs of the VM group/cluster to which it belongs.
- Regarding VMWare, I dealt with a problem where the "cpu ready percentage" was too high (above 5-10%), which caused cpu cycle delays.
Regards,
Michael Vitale
Laurenz Albe wrote on 3/28/2019 8:56 AM:
On Wed, 2019-03-27 at 19:27 -0400, Mark Steben wrote:We are moving to a new VM environment (expedient) and have one query that typically runs in 22 - 25 seconds in our old environment, but is running in about 1 hour, 20 minutes in our new. I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliion in the new environment and only 445 thousand in the old. I have sent the explains along with the table descriptions, row counts, the one function that I know causes the bottleneck, the query, some relevant configuration settings in postgresql conf (identical in both environments) and a listing from top in both environments, showing memory, shared memory, and cpu. Everything seems to be identical or close, except for the shared buffer count in the explain. Any insight would be appreciated.Slow plan: -> Index Scan using emailrcpts_4columns on emailrcpts (cost=0.56..119078.48 rows=5 width=29) (actual time=4873080.765..4873080.765 rows=0 loops=1) Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3)) Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <> 0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND (number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text) AND (NOT (SubPlan 1))) Rows Removed by Filter: 19952 Buffers: shared hit=113768530 read=6244 SubPlan 1 -> Limit (cost=0.28..15.58 rows=1 width=0) (never executed) Fast plan: -> Index Scan using emailrcpts_4columns on emailrcpts (cost=0.56..113162.26 rows=5 width=29) (actual time=21086.555..21086.555 rows=0 loops=1) Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3)) Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <> 0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND (number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text) AND (NOT (SubPlan 1))) Rows Removed by Filter: 19952 Buffers: shared hit=445188 read=61756 SubPlan 1 -> Limit (cost=0.28..15.58 rows=1 width=0) (never executed) Hmm. These are the ideas I can come up with: 1. There are many index tuples belonging to dead heap tuples. Then re-running the query should produce way fewer buffer hits. VACUUM would fix that issue. 2. The index is terribly fragmented. REINDEX INDEX emailrcpts_4columns would improve that one. Yours, Laurenz Albe
My question is about the new VM environment...
- What vendor for the VMs?
- Make sure memory and CPU are not overallocated in the hypervisor.
- Evaluate the resource needs of the VM group/cluster to which it belongs.
- Regarding VMWare, I dealt with a problem where the "cpu ready percentage" was too high (above 5-10%), which caused cpu cycle delays.
Regards,
Michael Vitale
Laurenz Albe wrote on 3/28/2019 8:56 AM:On Wed, 2019-03-27 at 19:27 -0400, Mark Steben wrote:We are moving to a new VM environment (expedient) and have one query that typically runs in 22 - 25 seconds in our old environment, but is running in about 1 hour, 20 minutes in our new. I'd like some insight as to why the explain is showing shared buffer hits numbering over 113 milliion in the new environment and only 445 thousand in the old. I have sent the explains along with the table descriptions, row counts, the one function that I know causes the bottleneck, the query, some relevant configuration settings in postgresql conf (identical in both environments) and a listing from top in both environments, showing memory, shared memory, and cpu. Everything seems to be identical or close, except for the shared buffer count in the explain. Any insight would be appreciated.Slow plan: -> Index Scan using emailrcpts_4columns on emailrcpts (cost=0.56..119078.48 rows=5 width=29) (actual time=4873080.765..4873080.765 rows=0 loops=1) Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3)) Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <> 0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND (number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text) AND (NOT (SubPlan 1))) Rows Removed by Filter: 19952 Buffers: shared hit=113768530 read=6244 SubPlan 1 -> Limit (cost=0.28..15.58 rows=1 width=0) (never executed) Fast plan: -> Index Scan using emailrcpts_4columns on emailrcpts (cost=0.56..113162.26 rows=5 width=29) (actual time=21086.555..21086.555 rows=0 loops=1) Index Cond: ((cid = 1784539) AND (removed = false) AND (active = true) AND (bounce < 3)) Filter: ((NOT removed) AND active AND (email IS NOT NULL) AND (((fname IS NOT NULL) AND (length((fname)::text) <> 0)) OR ((lname IS NOT NULL) AND (length((lname)::text) <> 0))) AND (number_of_sends(id, 30, 1) < 3) AND (lower(get_make(cid, (vin)::text, (make_purchased)::text)) = 'ohgoshnonotthebees!!!'::text) AND (NOT (SubPlan 1))) Rows Removed by Filter: 19952 Buffers: shared hit=445188 read=61756 SubPlan 1 -> Limit (cost=0.28..15.58 rows=1 width=0) (never executed) Hmm. These are the ideas I can come up with: 1. There are many index tuples belonging to dead heap tuples. Then re-running the query should produce way fewer buffer hits. VACUUM would fix that issue. 2. The index is terribly fragmented. REINDEX INDEX emailrcpts_4columns would improve that one. Yours, Laurenz Albe
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com