Обсуждение: extend statistics help reduce index scan a lot of shared buffer hits.

Поиск
Список
Период
Сортировка

extend statistics help reduce index scan a lot of shared buffer hits.

От
James Pang
Дата:
   Postgresql 14.8, Redhat8.   looks like have to create extend statistics on indexed and joined columns to make join filters pushed down to secondary index scan in nestloop, and the shared buffer hits show big difference. 

is it expected ? 


                       SELECT   ....
                        FROM
                                mtgxxxxxxxx a LEFT OUTER JOIN mtgxxxxxxxext b ON a.sssid = b.sssid and a.MMMUUID = b.MMMUUID and a.uuid = b.uuid
                        WHERE
                        a.SSSID=$1
                        AND a.MMMUID=$2
                        ORDER BY a.XXXX asc
                        offset 300 rows
                        FETCH FIRST 51 ROWS ONLY


explain (analyze,buffers) slowsql1(...)

1. with default, join filters just after nestloop,
Limit  (cost=5.61..5.62 rows=1 width=1169) (actual time=2249.443..2249.454 rows=51 loops=1)
   Buffers: shared hit=3864917
   ->  Sort  (cost=5.61..5.61 rows=1 width=1169) (actual time=2249.404..2249.438 rows=351 loops=1)
         Sort Key: a.email
         Sort Method: top-N heapsort  Memory: 174kB
         Buffers: shared hit=3864917
         ->  Nested Loop Left Join  (cost=1.12..5.60 rows=1 width=1169) (actual time=1.335..2246.971 rows=2142 loops=1)
               Join Filter: ((a.siteid = b.siteid) AND ((a.mtguuid)::text = (b.mtguuid)::text) AND (a.uuid = b.uuid))
               Rows Removed by Join Filter: 4586022
               Buffers: shared hit=3864917
               ->  Index Scan using idx_mmmcfattlist_mmmuuid_f on mtgxxxxxxxx a  (cost=0.56..2.79 rows=1 width=1093) (actual time=0.026..5.318 rows
=2142 loops=1)
                     Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text))
                     Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR ((usertype)::text = 'Testlist'::text))
                     Buffers: shared hit=2891
               ->  Index Scan using idx_mtgattndlstext_mmmuuid_uid on mtgxxxxxxxext b  (cost=0.56..2.78 rows=1 width=133) (actual time=0.016..0.698
rows=2142 loops=2142)
                     Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text))
                     Buffers: shared hit=3862026         <<< here  huge shared hits.
 Planning Time: 0.033 ms
 Execution Time: 2249.527 ms
 
 
 
 create statistics mtgxxxxxxext_sssid_mmmuuid(dependencies,ndistinct) on  sssid, mmmuuid from mtgxxxxxxxext.
 analyze mtgxxxxxxxext.
 
 
 
 2. join filters pushed down to secondary index scan, and reduce a lot of shared blks access.
 
 -------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------
 Limit  (cost=5.61..5.62 rows=1 width=1245) (actual time=12.371..12.380 rows=51 loops=1)
   Buffers: shared hit=12865
   ->  Sort  (cost=5.61..5.61 rows=1 width=1245) (actual time=12.333..12.364 rows=351 loops=1)
         Sort Key: a.email
         Sort Method: top-N heapsort  Memory: 174kB
         Buffers: shared hit=12865
         ->  Nested Loop Left Join  (cost=1.12..5.60 rows=1 width=1245) (actual time=0.042..10.819 rows=2142 loops=1)
               Buffers: shared hit=12865
               ->  Index Scan using idx_mmmcfattlist_mmmuuid_f on mtgxxxxxxxx a  (cost=0.56..2.79 rows=1 width=1169) (actual time=0.025..2.492
 rows=2142 loops=1)
                     Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text))
                     Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR ((usertype)::text = 'Testlist'::text))
                     Buffers: shared hit=2891
               ->  Index Scan using idx_mtgattndlstext_mmmuuid_uid on mtgxxxxxxxext b  (cost=0.56..2.79 rows=1 width=133) (actual time=0.003..0
.003 rows=1 loops=2142)
                     Index Cond: ((sssid = a.sssid) AND (sssid = $1) AND ((mmmuuid)::text = (a.mmmuuid)::text) AND ((mmmuuid)::text = ($2)::text) AND (uui
d = a.uuid))
                     Buffers: shared hit=10710    <<< here much less shared hits , and Index Cond automatically added sssid = a.sssid ,mmuuid=a.mmmuuid.
 Planning Time: 0.021 ms
 Execution Time: 12.451 ms
(17 rows)

Thanks,

James

Fwd: extend statistics help reduce index scan a lot of shared buffer hits.

От
James Pang
Дата:
   Postgresql 14.8, Redhat8.   looks like have to create extend statistics on indexed and joined columns to make join filters pushed down to secondary index scan in nestloop, and the shared buffer hits show big difference. 

is it expected ? 


                       SELECT   ....
                        FROM
                                mtgxxxxxxxx a LEFT OUTER JOIN mtgxxxxxxxext b ON a.sssid = b.sssid and a.MMMUUID = b.MMMUUID and a.uuid = b.uuid
                        WHERE
                        a.SSSID=$1
                        AND a.MMMUID=$2
                        ORDER BY a.XXXX asc
                        offset 300 rows
                        FETCH FIRST 51 ROWS ONLY


explain (analyze,buffers) slowsql1(...)

1. with default, join filters just after nestloop,
Limit  (cost=5.61..5.62 rows=1 width=1169) (actual time=2249.443..2249.454 rows=51 loops=1)
   Buffers: shared hit=3864917
   ->  Sort  (cost=5.61..5.61 rows=1 width=1169) (actual time=2249.404..2249.438 rows=351 loops=1)
         Sort Key: a.email
         Sort Method: top-N heapsort  Memory: 174kB
         Buffers: shared hit=3864917
         ->  Nested Loop Left Join  (cost=1.12..5.60 rows=1 width=1169) (actual time=1.335..2246.971 rows=2142 loops=1)
               Join Filter: ((a.sssid = b.sssid) AND ((a.mmmuuid)::text = (b.mmmuuid)::text) AND (a.uuid = b.uuid))
               Rows Removed by Join Filter: 4586022
               Buffers: shared hit=3864917
               ->  Index Scan using idx_mmmcfattlist_mmmuuid_f on mtgxxxxxxxx a  (cost=0.56..2.79 rows=1 width=1093) (actual time=0.026..5.318 rows
=2142 loops=1)
                     Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text))
                     Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR ((usertype)::text = 'Testlist'::text))
                     Buffers: shared hit=2891
               ->  Index Scan using idx_mtgattndlstext_mmmuuid_uid on mtgxxxxxxxext b  (cost=0.56..2.78 rows=1 width=133) (actual time=0.016..0.698
rows=2142 loops=2142)
                     Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text))
                     Buffers: shared hit=3862026         <<< here  huge shared hits.
 Planning Time: 0.033 ms
 Execution Time: 2249.527 ms
 
 
 
 create statistics mtgxxxxxxext_sssid_mmmuuid(dependencies,ndistinct) on  sssid, mmmuuid from mtgxxxxxxxext.
 analyze mtgxxxxxxxext.
 
 
 
 2. join filters pushed down to secondary index scan, and reduce a lot of shared blks access.
 
 -------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------
 Limit  (cost=5.61..5.62 rows=1 width=1245) (actual time=12.371..12.380 rows=51 loops=1)
   Buffers: shared hit=12865
   ->  Sort  (cost=5.61..5.61 rows=1 width=1245) (actual time=12.333..12.364 rows=351 loops=1)
         Sort Key: a.email
         Sort Method: top-N heapsort  Memory: 174kB
         Buffers: shared hit=12865
         ->  Nested Loop Left Join  (cost=1.12..5.60 rows=1 width=1245) (actual time=0.042..10.819 rows=2142 loops=1)
               Buffers: shared hit=12865
               ->  Index Scan using idx_mmmcfattlist_mmmuuid_f on mtgxxxxxxxx a  (cost=0.56..2.79 rows=1 width=1169) (actual time=0.025..2.492
 rows=2142 loops=1)
                     Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text))
                     Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR ((usertype)::text = 'Testlist'::text))
                     Buffers: shared hit=2891
               ->  Index Scan using idx_mtgattndlstext_mmmuuid_uid on mtgxxxxxxxext b  (cost=0.56..2.79 rows=1 width=133) (actual time=0.003..0
.003 rows=1 loops=2142)
                     Index Cond: ((sssid = a.sssid) AND (sssid = $1) AND ((mmmuuid)::text = (a.mmmuuid)::text) AND ((mmmuuid)::text = ($2)::text) AND (uui
d = a.uuid))
                     Buffers: shared hit=10710    <<< here much less shared hits , and Index Cond automatically added sssid = a.sssid ,mmuuid=a.mmmuuid.
 Planning Time: 0.021 ms
 Execution Time: 12.451 ms
(17 rows)

Thanks,

James

Re: Fwd: extend statistics help reduce index scan a lot of shared buffer hits.

От
Tomas Vondra
Дата:
On 2/27/24 14:58, James Pang wrote:
>    Postgresql 14.8, Redhat8.   looks like have to create extend statistics
> on indexed and joined columns to make join filters pushed down to secondary
> index scan in nestloop, and the shared buffer hits show big difference.
> 
> is it expected ?
> 

It's hard to say what exactly is happening in the example query (I'd
have to do some debugging, but that's impossible without a reproducer),
but I think it's mostly expected.

My guess is that without the stats the optimizer sees this:

->  Index Scan using idx_mtgattndlstext_mmmuuid_uid on mtgxxxxxxxext b
(cost=0.56..2.78 rows=1 width=133) (actual time=0.016..0.698 rows=2142
loops=2142)

and so decides there's no point in pushing down more conditions to the
index scan (because it already returns just 1 row). But there's some
sort of correlation / skew, and it returns 2142 rows.

With the extended stats it realizes pushing down more conditions makes
sense, because doing that in index scan is cheaper than having to read
the heap pages etc. So it does that.

So yeah, this seems exactly the improvement you'd expect from extended
stats. Why do you think this would not be expected?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company