Обсуждение: Postgresql equal join on function with columns not use index

Поиск
Список
Период
Сортировка

Postgresql equal join on function with columns not use index

От
"James Pang (chaolpan)"
Дата:

Hi,

       We migrate from Oracle to Postgresql14.8, one SQL has regression in Postgres  run in 5800 milliseconds in Postgresql v14.8,  but the same SQL got done in several hundred milliseconds in Oracle database.

       With multiple table JOINs, if the join condition is  tablea.column1=tableb.column1, optimizer will use the index to filter data in nest loops, but if  tablea.column1=regexp_replace(tableb.column1….),

Optimizer will not be able to use the index on tablea.column1, then it do a table scan and nestloop to produce a lot rows then use tablea.column1=regexp_replace(tableb.column1….) as a filter.  As a workaround we create a view then use tablea.column1=view.column1 that works.

     Is it expected ?    details as below.

 

           SELECT DISTINCT a.xxx, b.xxx as TOLLFREE FROM tableA a, tableB b

                        WHERE a.column1 = regexp_replace(b.column1,'[^0-9]','') AND b.column2 = $1 AND b.column3= $2

                                AND NOT EXISTS (SELECT 1 FROM tableC c

                                WHERE c.xxid = b.xxid

                                AND c.xxtype = case when b.col4 = 1 then 'TollFree' else 'Toll' end

                                AND c.xxid = $3)

 

     Unique  (cost=423374.60..423377.87 rows=436 width=21) (actual time=6070.963..6071.054 rows=395 loops=1)

   Buffers: shared hit=148

   ->  Sort  (cost=423374.60..423375.69 rows=436 width=21) (actual time=6070.963..6070.992 rows=397 loops=1)

         Sort Key: a.xx, b.xx

         Sort Method: quicksort  Memory: 56kB

         Buffers: shared hit=148

         ->  Nested Loop  (cost=0.69..423355.48 rows=436 width=21) (actual time=120.338..6070.669 rows=397 loops=1)

               Join Filter: ((a.column1)::text = regexp_replace((b.column1)::text, '[^0-9]'::text, ''::text))                                                 <<<optimizer only do filter after nest loops with a lot of rows

               Rows Removed by Join Filter: 1511155

               Buffers: shared hit=145

               ->  Seq Scan on tableA a  (cost=0.00..161.12 rows=7712 width=25) (actual time=0.022..1.380 rows=7712 loops=1)

                     Buffers: shared hit=84

               ->  Materialize  (cost=0.69..153.12 rows=207 width=21) (actual time=0.000..0.011 rows=196 loops=7712)

                     Buffers: shared hit=58

                     ->  Nested Loop Anti Join  (cost=0.69..152.09 rows=207 width=21) (actual time=0.069..0.278 rows=196 loops=1)

                           Join Filter: ((c.xxid = b.xxid) AND ((c.xxxx)::text = CASE WHEN (b.column2 = 1) THEN 'aaa'::text ELSE 'bbb'::t

ext END))

                           Buffers: shared hit=58

                           ->  Index Scan using idx_xxx on tableB b  (cost=0.42..146.55 rows=207 width=29) (actual time=0.047..0.207 rows=196 loops=1)

                                 Index Cond: ((colum3 = 40957) AND (column2 = 1))

                                 Buffers: shared hit=56

                           ->  Materialize  (cost=0.27..1.40 rows=1 width=15) (actual time=0.000..0.000 rows=0 loops=196)

                                 Buffers: shared hit=2

                                 ->  Index Only Scan using pk_xxxx on tableC c  (cost=0.27..1.39 rows=1 width=15

) (actual time=0.020..0.020 rows=0 loops=1)

                                       Index Cond: (xxxid = 12407262)

                                       Heap Fetches: 0

                                       Buffers: shared hit=2

 

If we create a view ,the SQL got done in several million seconds,

CREATE VIEW tableBREGXP as (select xx,column2,column3,xxid,regexp_replace(column1,'[^0-9]','') as column1 from tableB);

           SELECT DISTINCT a.xxx, b.xxx as TOLLFREE FROM tableA a, tableBREGXP b                                <<< replace the tableB with view name.

                        WHERE a.column1 = b.column1 AND b.column2 = $1 AND b.column3= $2                   <<< use b.column1 to replace regexp_replace((b.column1)::text, '[^0-9]'::text, ''::text))

                                AND NOT EXISTS (SELECT 1 FROM tableC c

                                WHERE c.xxid = b.xxid

                                AND c.xxtype = case when b.col4 = 1 then 'TollFree' else 'Toll' end

                                AND c.xxid = $3)

 

HashAggregate  (cost=408.19..412.76 rows=457 width=21) (actual time=4.524..4.644 rows=395 loops=1)

   Group Key: a.xxx, b.xx

   Batches: 1  Memory Usage: 61kB

   Buffers: shared hit=693

   ->  Nested Loop  (cost=0.97..405.90 rows=457 width=21) (actual time=0.154..4.205 rows=397 loops=1)

         Buffers: shared hit=693

         ->  Nested Loop Anti Join  (cost=0.69..214.97 rows=217 width=40) (actual time=0.137..2.877 rows=196 loops=1)

               Join Filter: ((c.xxyid = b.xxid) AND ((c.xxxxx)::text = CASE WHEN (b.column2 = 1) THEN 'To

llFree'::text ELSE 'Toll'::text END))

               Buffers: shared hit=55

               ->  Index Scan using idx_xxx on b  (cost=0.42..207.06 rows=217 width=64) (actual time=0.123..2.725 rows=196 loops=1)

                     Index Cond: ((column2 = 40957) AND (column3 = 1))

                     Buffers: shared hit=53

               ->  Materialize  (cost=0.27..1.40 rows=1 width=15) (actual time=0.000..0.000 rows=0 loops=196)

                     Buffers: shared hit=2

                     ->  Index Only Scan using pk_xxxx on tableC c  (cost=0.27..1.39 rows=1 width=15) (actual time=0.010..0.010 rows=0 loops=1)

                           Index Cond: (siteid = 12407262)

                           Heap Fetches: 0

                           Buffers: shared hit=2

         ->  Index Scan using idx_xxx on tableA a  (cost=0.28..0.86 rows=2 width=25) (actual time=0.004..0.005 rows=2 loops=196)

               Index Cond: ((xxxx)::text = (regexp_replace((b.phonenumber)::text, '[^0-9]'::text, ''::text)))                                                            <<< it use the index to filter a lot of rows here,

               Buffers: shared hit=638

Planning Time: 0.619 ms

Execution Time: 4.762 ms

 

 

Thanks,

 

James

Re: Postgresql equal join on function with columns not use index

От
Tom Lane
Дата:
"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>        We migrate from Oracle to Postgresql14.8, one SQL has regression in Postgres  run in 5800 milliseconds in
Postgresqlv14.8,  but the same SQL got done in several hundred milliseconds in Oracle database. 
>        With multiple table JOINs, if the join condition is  tablea.column1=tableb.column1, optimizer will use the
indexto filter data in nest loops, but if  tablea.column1=regexp_replace(tableb.column1....), 
> Optimizer will not be able to use the index on tablea.column1, then it do a table scan and nestloop to produce a lot
rowsthen use tablea.column1=regexp_replace(tableb.column1....) as a filter.  As a workaround we create a view then use
tablea.column1=view.column1that works. 
>      Is it expected ?    details as below.

It's impossible to comment on this usefully with such a fragmentary
description of the problem.  Please send a complete, self-contained
test case if you want anybody to look at it carefully.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

            regards, tom lane



RE: Postgresql equal join on function with columns not use index

От
"James Pang (chaolpan)"
Дата:
Hi,
    Looks like it's the function "regexp_replace" volatile and restrict=false make the difference,  we have our
applicationrole with default search_path=oracle,$user,public,pg_catalog.     
     =#    select oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile from pg_proc where
proname='regexp_replace'order by oid; 
  oid  |    proname     | pronamespace | prosecdef | proisstrict | provolatile
-------+----------------+--------------+-----------+-------------+-------------
  2284 | regexp_replace | pg_catalog   | f         | t           | i
  2285 | regexp_replace | pg_catalog   | f         | t           | i
 17095 | regexp_replace | oracle       | f         | f           | v
 17096 | regexp_replace | oracle       | f         | f           | v
 17097 | regexp_replace | oracle       | f         | f           | v
 17098 | regexp_replace | oracle       | f         | f           | v

--with default it use orafce, oracle.regexp_replace function,
Select  a.phonenumber,... from tableA a, tableB b where a.phonenumber=oracle. regexp_replace(b.PHONENUMBER,'[^0-9]','')
, 
  --index on a.phonenumber not used

Switch to pg_catalog.regexp_replace(b.PHONENUMBER,'[^0-9]',''),
  Index on a.phonenumber got used.

Thanks,

James Pang

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Monday, June 12, 2023 9:19 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Postgresql equal join on function with columns not use index

"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>        We migrate from Oracle to Postgresql14.8, one SQL has regression in Postgres  run in 5800 milliseconds in
Postgresqlv14.8,  but the same SQL got done in several hundred milliseconds in Oracle database. 
>        With multiple table JOINs, if the join condition is
> tablea.column1=tableb.column1, optimizer will use the index to filter
> data in nest loops, but if
> tablea.column1=regexp_replace(tableb.column1....),
> Optimizer will not be able to use the index on tablea.column1, then it do a table scan and nestloop to produce a lot
rowsthen use tablea.column1=regexp_replace(tableb.column1....) as a filter.  As a workaround we create a view then use
tablea.column1=view.column1that works. 
>      Is it expected ?    details as below.

It's impossible to comment on this usefully with such a fragmentary description of the problem.  Please send a
complete,self-contained test case if you want anybody to look at it carefully. 

https://wiki.postgresql.org/wiki/Slow_Query_Questions

            regards, tom lane



Re: Postgresql equal join on function with columns not use index

От
Tom Lane
Дата:
"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>     Looks like it's the function "regexp_replace" volatile and restrict=false make the difference,  we have our
applicationrole with default search_path=oracle,$user,public,pg_catalog.     
>      =#    select oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile from pg_proc where
proname='regexp_replace'order by oid; 
>   oid  |    proname     | pronamespace | prosecdef | proisstrict | provolatile
> -------+----------------+--------------+-----------+-------------+-------------
>   2284 | regexp_replace | pg_catalog   | f         | t           | i
>   2285 | regexp_replace | pg_catalog   | f         | t           | i
>  17095 | regexp_replace | oracle       | f         | f           | v
>  17096 | regexp_replace | oracle       | f         | f           | v
>  17097 | regexp_replace | oracle       | f         | f           | v
>  17098 | regexp_replace | oracle       | f         | f           | v

Why in the world are the oracle ones marked volatile?  That's what's
preventing them from being used in index quals.

            regards, tom lane



Re: Postgresql equal join on function with columns not use index

От
Pavel Stehule
Дата:


út 13. 6. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>     Looks like it's the function "regexp_replace" volatile and restrict=false make the difference,  we have our application role with default search_path=oracle,$user,public,pg_catalog.   
>      =#    select oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile from pg_proc where proname='regexp_replace' order by oid;
>   oid  |    proname     | pronamespace | prosecdef | proisstrict | provolatile
> -------+----------------+--------------+-----------+-------------+-------------
>   2284 | regexp_replace | pg_catalog   | f         | t           | i
>   2285 | regexp_replace | pg_catalog   | f         | t           | i
>  17095 | regexp_replace | oracle       | f         | f           | v
>  17096 | regexp_replace | oracle       | f         | f           | v
>  17097 | regexp_replace | oracle       | f         | f           | v
>  17098 | regexp_replace | oracle       | f         | f           | v

Why in the world are the oracle ones marked volatile?  That's what's
preventing them from being used in index quals.

It looks like orafce issue

I'll fix it

Regards

Pavel
 

                        regards, tom lane


Re: Postgresql equal join on function with columns not use index

От
Pavel Stehule
Дата:


út 13. 6. 2023 v 16:17 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


út 13. 6. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>     Looks like it's the function "regexp_replace" volatile and restrict=false make the difference,  we have our application role with default search_path=oracle,$user,public,pg_catalog.   
>      =#    select oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile from pg_proc where proname='regexp_replace' order by oid;
>   oid  |    proname     | pronamespace | prosecdef | proisstrict | provolatile
> -------+----------------+--------------+-----------+-------------+-------------
>   2284 | regexp_replace | pg_catalog   | f         | t           | i
>   2285 | regexp_replace | pg_catalog   | f         | t           | i
>  17095 | regexp_replace | oracle       | f         | f           | v
>  17096 | regexp_replace | oracle       | f         | f           | v
>  17097 | regexp_replace | oracle       | f         | f           | v
>  17098 | regexp_replace | oracle       | f         | f           | v

Why in the world are the oracle ones marked volatile?  That's what's
preventing them from being used in index quals.

It looks like orafce issue

I'll fix it

should be fixed in orafce 4.4.

Regards

Pavel
 

Regards

Pavel
 

                        regards, tom lane


RE: Postgresql equal join on function with columns not use index

От
"James Pang (chaolpan)"
Дата:

   Thanks a lot, we use orafce 3.17, and there some varchar2 columns and function indexes depends on oracle.substr too.   Is it ok to upgrade to orafce version 4.4 by  “alter extension update to ‘4.4’?  it’s online to do that ?

 

Thanks,

 

James

 

From: Pavel Stehule <pavel.stehule@gmail.com>
Sent: Tuesday, June 13, 2023 11:01 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: James Pang (chaolpan) <chaolpan@cisco.com>; pgsql-performance@lists.postgresql.org
Subject: Re: Postgresql equal join on function with columns not use index

 

 

 

út 13. 6. 2023 v 16:17 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

 

 

út 13. 6. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>     Looks like it's the function "regexp_replace" volatile and restrict=false make the difference,  we have our application role with default search_path=oracle,$user,public,pg_catalog.   
>      =#    select oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile from pg_proc where proname='regexp_replace' order by oid;
>   oid  |    proname     | pronamespace | prosecdef | proisstrict | provolatile
> -------+----------------+--------------+-----------+-------------+-------------
>   2284 | regexp_replace | pg_catalog   | f         | t           | i
>   2285 | regexp_replace | pg_catalog   | f         | t           | i
>  17095 | regexp_replace | oracle       | f         | f           | v
>  17096 | regexp_replace | oracle       | f         | f           | v
>  17097 | regexp_replace | oracle       | f         | f           | v
>  17098 | regexp_replace | oracle       | f         | f           | v

Why in the world are the oracle ones marked volatile?  That's what's
preventing them from being used in index quals.

 

It looks like orafce issue

 

I'll fix it

 

should be fixed in orafce 4.4.

 

Regards

 

Pavel

 

 

Regards

 

Pavel

 


                        regards, tom lane

Re: Postgresql equal join on function with columns not use index

От
Pavel Stehule
Дата:
Hi

čt 15. 6. 2023 v 10:32 odesílatel James Pang (chaolpan) <chaolpan@cisco.com> napsal:

   Thanks a lot, we use orafce 3.17, and there some varchar2 columns and function indexes depends on oracle.substr too.   Is it ok to upgrade to orafce version 4.4 by  “alter extension update to ‘4.4’?  it’s online to do that ?


I didn't release 4.4,  but it is available on github. Orafce supports online upgrades


Regards

Pavel


 

Thanks,

 

James

 

From: Pavel Stehule <pavel.stehule@gmail.com>
Sent: Tuesday, June 13, 2023 11:01 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: James Pang (chaolpan) <chaolpan@cisco.com>; pgsql-performance@lists.postgresql.org
Subject: Re: Postgresql equal join on function with columns not use index

 

 

 

út 13. 6. 2023 v 16:17 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

 

 

út 13. 6. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>     Looks like it's the function "regexp_replace" volatile and restrict=false make the difference,  we have our application role with default search_path=oracle,$user,public,pg_catalog.   
>      =#    select oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile from pg_proc where proname='regexp_replace' order by oid;
>   oid  |    proname     | pronamespace | prosecdef | proisstrict | provolatile
> -------+----------------+--------------+-----------+-------------+-------------
>   2284 | regexp_replace | pg_catalog   | f         | t           | i
>   2285 | regexp_replace | pg_catalog   | f         | t           | i
>  17095 | regexp_replace | oracle       | f         | f           | v
>  17096 | regexp_replace | oracle       | f         | f           | v
>  17097 | regexp_replace | oracle       | f         | f           | v
>  17098 | regexp_replace | oracle       | f         | f           | v

Why in the world are the oracle ones marked volatile?  That's what's
preventing them from being used in index quals.

 

It looks like orafce issue

 

I'll fix it

 

should be fixed in orafce 4.4.

 

Regards

 

Pavel

 

 

Regards

 

Pavel

 


                        regards, tom lane