Обсуждение: Odd Choice of seq scan

Поиск
Список
Период
Сортировка

Odd Choice of seq scan

От
Paul McGarry
Дата:
Hi there,

I'm wondering if anyone has any insight into what might make the database choose a sequential scan for a query (table defs and plan below) like :

SELECT orders.orderid FROM orders
WHERE (
orders.orderid IN ('546111')
  OR
orders.orderid IN  (select orderid FROM orderstotrans WHERE (transid IN ('546111')))
);

I have a couple of environments, all on Postgresql 13.7 and:
- on one the query executes with an sequential scan on the orders table 
- on the other sequential scan on an index (ie walks index and filters, rather than looking up ids on the index as an index condition.)

Plan and tables are below, but it seems to me that the planner knows the subplan is going to return 1 row (max) and should "know" that there is a max of 2 IDs to look up an indexes would be faster than a sequential scan (of either table or index) and filter. I've tried re analyzing to make sure stats are good and it hasn't helped

I can get a good plan that does use the index efficiently by using a union, eg:

select orders.orderid FROM orders
WHERE (
orders.orderid IN (
  SELECT '546111'
  UNION
  SELECT orderid FROM orderstotrans WHERE (transid IN ('546111'))
)
);

but I want to understand what warning signs I should be aware of with the original query that put it on the path of a bad plan, so I don't do it again.


Plan - seq scan of table:
=====
> explain                                                  
select orders.orderid FROM orders WHERE (orders.orderid IN ('546111') OR orders.orderid IN  (select orderid FROM orderstotrans WHERE (transid IN ('546111'))));

                                              QUERY PLAN                                              
-------------------------------------------------------------------------------------------------------
 Seq Scan on orders  (cost=8.45..486270.87 rows=4302781 width=8)
   Filter: ((orderid = '546111'::bigint) OR (hashed SubPlan 1))
   SubPlan 1
     ->  Index Scan using orderstotrans_transid_key on orderstotrans  (cost=0.43..8.45 rows=1 width=8)
           Index Cond: (transid = '546111'::bigint)
(5 rows)
=====

Plan - Seq scan and filter of index:
=====
> explain select orders.orderid FROM orders WHERE (orders.orderid IN ('546111') OR orders.orderid IN  (select orderid FROM orderstotrans WHERE (transid IN ('546111'))));
                                              QUERY PLAN                                              
-------------------------------------------------------------------------------------------------------
 Index Only Scan using orders_pkey on orders  (cost=9.16..4067888.60 rows=64760840 width=8)
   Filter: ((orderid = '546111'::bigint) OR (hashed SubPlan 1))
   SubPlan 1
     ->  Index Scan using orderstotrans_transid_key on orderstotrans  (cost=0.57..8.59 rows=1 width=8)
           Index Cond: (transid = '546111'::bigint)
(5 rows)
=====


Tables:
=====
                                   Table "test.orders"
        Column        |            Type             | Collation | Nullable |   Default    
----------------------+-----------------------------+-----------+----------+--------------
 orderid              | bigint                      |           | not null |
 istest               | smallint                    |           | not null | 0
 orderstatusid        | integer                     |           |          |
 customername         | text                        |           |          |
 customeraddress      | text                        |           |          |
 customercountry      | text                        |           |          |
 customercity         | text                        |           |          |
 customerstate        | text                        |           |          |
 customerzip          | text                        |           |          |
    "orders_pkey" PRIMARY KEY, btree (orderid)

              Table "test.orderstotrans"
   Column    |  Type   | Collation | Nullable | Default
-------------+---------+-----------+----------+---------
 orderid     | bigint  |           |          |
 transid     | bigint  |           |          |
 orderitemid | integer |           |          |
Indexes:
    "orderstotrans_orderid_idx" btree (orderid)
    "orderstotrans_orderitemid_idx" btree (orderitemid)
    "orderstotrans_transid_key" UNIQUE, btree (transid)


Happier plan for the union version:
====
explain select orders.orderid FROM orders
WHERE (
orders.orderid IN (
  SELECT '3131275553'
  UNION
  select orderid FROM orderstotrans WHERE (transid IN ('3131275553'))
)
);
                                                                           QUERY PLAN                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=9.21..21.84 rows=2 width=8) (actual time=0.034..0.043 rows=1 loops=1)
   ->  Unique  (cost=8.64..8.65 rows=2 width=8) (actual time=0.024..0.026 rows=2 loops=1)
         ->  Sort  (cost=8.64..8.64 rows=2 width=8) (actual time=0.023..0.024 rows=2 loops=1)
               Sort Key: ('3131275553'::bigint)
               Sort Method: quicksort  Memory: 25kB
               ->  Append  (cost=0.00..8.63 rows=2 width=8) (actual time=0.001..0.019 rows=2 loops=1)
                     ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1)
                     ->  Index Scan using orderstotrans_transid_key on orderstotrans  (cost=0.57..8.59 rows=1 width=8) (actual time=0.015..0.016 rows=1 loops=1)
                           Index Cond: (transid = '3131275553'::bigint)
   ->  Index Only Scan using orders_pkey on orders  (cost=0.57..6.58 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=2)
         Index Cond: (orderid = ('3131275553'::bigint))
         Heap Fetches: 0
 Planning Time: 0.165 ms
 Execution Time: 0.065 ms
(14 rows)
====
(though that plan is a bit misleading, as that index condition isn't exactly what is used, ie with:

select orders.orderid FROM orders
WHERE (                          
orders.orderid IN (
  SELECT '3131275553'
  UNION              
  select orderid FROM orderstotrans WHERE (transid IN ('3131275553'))
)                                                                    
);
  orderid  
-----------
 439155713
(1 row)

the orderid it matches, isn't the one the planner showed, but it works)

Re: Odd Choice of seq scan

От
Justin Pryzby
Дата:
On Fri, Dec 02, 2022 at 11:52:19AM +1100, Paul McGarry wrote:
> Hi there,
> 
> I'm wondering if anyone has any insight into what might make the database
> choose a sequential scan for a query (table defs and plan below) like :

> Plan - seq scan of table:
> =====
> > explain select orders.orderid FROM orders WHERE (orders.orderid IN ('546111') OR orders.orderid IN  (select orderid
FROMorderstotrans WHERE (transid IN ('546111'))));
 

> Plan - Seq scan and filter of index:
> =====
> > explain select orders.orderid FROM orders WHERE (orders.orderid IN ('546111') OR orders.orderid IN  (select orderid
FROMorderstotrans WHERE (transid IN ('546111'))));
 

Could you show explain analyze ?

Show the size of the table and its indexes 
And GUC settings
And the "statistics" here:
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram
Maybe on both a well-behaving instance and a badly-beving instance.

-- 
Justin



Re: Odd Choice of seq scan

От
Ronuk Raval
Дата:
On Thu, Dec 1, 2022 at 8:21 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Could you show explain analyze ?
>
> Maybe on both a well-behaving instance and a badly-beving instance.

Apologies for barging into this thread with a potentially unrelated
"me too" but here's a similar OR-causes-seqscan from 2018:
https://www.postgresql.org/message-id/CAPhHnhpc6bdGbRBa9hG7FQiKByVqR3s37VoY64DSMUxjeJGOjQ%40mail.gmail.com

I don't have other versions handy but can confirm that the problem
exists on Postgres 11.17 (dated but newer than the 10.1 in that post).

We've been working around the problem by rewriting queries to use UNION instead.



Re: Odd Choice of seq scan

От
Tom Lane
Дата:
Ronuk Raval <ronuk.raval@gmail.com> writes:
> We've been working around the problem by rewriting queries to use UNION instead.

Yeah, that.  The real issue here is that the seqscan and indexscan plans
both suck, because they will both run that sub-select for every row
in the table.  The index-only plan might fetch fewer blocks along the
way, because it only has to read the index not the table proper ...
but that's only true if the table's pages are mostly marked all-visible.
(My bet about the plan instability is that the planner might choose
differently depending on how much of the table it believes is
all-visible.)  That only helps a bit, though.

What you really want to have happen, assuming there are not too many
interesting orderid values, is to do a point indexscan for each one
of them.  Currently the planner won't think of that by itself when
faced with OR'd conditions in WHERE.  You have to help it along with
UNION or some similar locution.

            regards, tom lane



Re: Odd Choice of seq scan

От
Paul McGarry
Дата:


On Fri, 2 Dec 2022 at 12:21, Justin Pryzby <pryzby@telsasoft.com> wrote:
Could you show explain analyze ?

Show the size of the table and its indexes
And GUC settings
And the "statistics" here:
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram
Maybe on both a well-behaving instance and a badly-beving instance.


Analyzes below, but they are both "badly" behaved and the plans. The index scan is presumably a marginally better option when the magic that allows "index only" lines up, but it's still a scan of the whole index rather than an index lookup. Both plans fetch "everything" and then filter out all but the 0 to 2 rows that match.

In my head the stats should be simple, as
1) The
    "orderstotrans_transid_key" UNIQUE, btree (transid)
means the subquery can return at most one order_id when I look up by trans_id (and the query plan does seem to know that, ie says rows=1)
2) The other OR'd clause is exactly one order_id.

So the worst case scenario is effectively the same as:
select orders.orderid FROM orders WHERE (orders.orderid IN ('546111','436345353'));
which would be:
====
 Index Only Scan using orders_pkey on orders  (cost=0.57..13.17 rows=2 width=8) (actual time=0.038..0.039 rows=1 loops=1)
   Index Cond: (orderid = ANY ('{546111,436345353}'::bigint[]))
====
ie "Index Cond" rather than "filter"

Anyway, maybe that insight is more naturally obvious to a human than something the planner can determine cheaply and easily.

The alternate "union" phrasing of the query works and as Ronuk and Tom said in other replies (thanks) seems to be the way to go and for now at least I just need to remember that ORs like this don't help the planner and should be avoided.

Thanks all.


====
 explain analyze
 select orders.orderid FROM orders WHERE (orders.orderid IN ('546111') OR orders.orderid IN  (select orderid FROM orderstotrans WHERE (transid IN ('546111'))));

=====
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on orders  (cost=8.45..486499.59 rows=4304805 width=8) (actual time=9623.981..20796.568 rows=1 loops=1)
   Filter: ((orderid = '546111'::bigint) OR (hashed SubPlan 1))
   Rows Removed by Filter: 8615097
   SubPlan 1
     ->  Index Scan using orderstotrans_transid_key on orderstotrans  (cost=0.43..8.45 rows=1 width=8) (actual time=1.105..1.105 rows=0 loops=1)
           Index Cond: (transid = '546111'::bigint)
 Planning Time: 0.199 ms
 Execution Time: 20796.613 ms
==== 

                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using orders_pkey on orders  (cost=9.16..4070119.84 rows=64770768 width=8) (actual time=21011.157..21011.158 rows=0 loops=1)
   Filter: ((orderid = '546111'::bigint) OR (hashed SubPlan 1))
   Rows Removed by Filter: 130888763
   Heap Fetches: 3171118
   SubPlan 1
     ->  Index Scan using orderstotrans_transid_key on orderstotrans  (cost=0.57..8.59 rows=1 width=8) (actual time=1.113..1.113 rows=0 loops=1)
           Index Cond: (transid = '546111'::bigint)
 Planning Time: 0.875 ms
 Execution Time: 21011.224 ms

Re: Odd Choice of seq scan

От
Chris Hoover
Дата:
I don’t have a database running the versions you are, but what I’ve had to do to get around thing like is it to write
thequery something like this: 

WITH orderids AS (
SELECT ‘546111’ AS orderid
UNION
SELECT orderid
  FROM orderstotrans
 WHERE transid IN ('546111')
)
select orders.orderid
  FROM orderids
  JOIN orders USING (orderid);

Hope this helps your situation.

Thanks,


Chris Hoover
Senior DBA
AWeber.com
Cell: (803) 528-2269
Email: chrish@aweber.com



> On Dec 1, 2022, at 7:52 PM, Paul McGarry <paul@paulmcgarry.com> wrote:
>
> Hi there,
>
> I'm wondering if anyone has any insight into what might make the database choose a sequential scan for a query (table
defsand plan below) like : 
>
> SELECT orders.orderid FROM orders
> WHERE (
> orders.orderid IN ('546111')
>   OR
> orders.orderid IN  (select orderid FROM orderstotrans WHERE (transid IN ('546111')))
> );
>
> I have a couple of environments, all on Postgresql 13.7 and:
> - on one the query executes with an sequential scan on the orders table
> - on the other sequential scan on an index (ie walks index and filters, rather than looking up ids on the index as an
indexcondition.) 
>
> Plan and tables are below, but it seems to me that the planner knows the subplan is going to return 1 row (max) and
should"know" that there is a max of 2 IDs to look up an indexes would be faster than a sequential scan (of either table
orindex) and filter. I've tried re analyzing to make sure stats are good and it hasn't helped 
>
> I can get a good plan that does use the index efficiently by using a union, eg:
>
> select orders.orderid FROM orders
> WHERE (
> orders.orderid IN (
>   SELECT '546111'
>   UNION
>   SELECT orderid FROM orderstotrans WHERE (transid IN ('546111'))
> )
> );
>
> but I want to understand what warning signs I should be aware of with the original query that put it on the path of a
badplan, so I don't do it again. 
>
>
> Plan - seq scan of table:
> =====
> > explain
> select orders.orderid FROM orders WHERE (orders.orderid IN ('546111') OR orders.orderid IN  (select orderid FROM
orderstotransWHERE (transid IN ('546111')))); 
>
>                                               QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>  Seq Scan on orders  (cost=8.45..486270.87 rows=4302781 width=8)
>    Filter: ((orderid = '546111'::bigint) OR (hashed SubPlan 1))
>    SubPlan 1
>      ->  Index Scan using orderstotrans_transid_key on orderstotrans  (cost=0.43..8.45 rows=1 width=8)
>            Index Cond: (transid = '546111'::bigint)
> (5 rows)
> =====
>
> Plan - Seq scan and filter of index:
> =====
> > explain select orders.orderid FROM orders WHERE (orders.orderid IN ('546111') OR orders.orderid IN  (select orderid
FROMorderstotrans WHERE (transid IN ('546111')))); 
>                                               QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>  Index Only Scan using orders_pkey on orders  (cost=9.16..4067888.60 rows=64760840 width=8)
>    Filter: ((orderid = '546111'::bigint) OR (hashed SubPlan 1))
>    SubPlan 1
>      ->  Index Scan using orderstotrans_transid_key on orderstotrans  (cost=0.57..8.59 rows=1 width=8)
>            Index Cond: (transid = '546111'::bigint)
> (5 rows)
> =====
>
>
> Tables:
> =====
>                                    Table "test.orders"
>         Column        |            Type             | Collation | Nullable |   Default
> ----------------------+-----------------------------+-----------+----------+--------------
>  orderid              | bigint                      |           | not null |
>  istest               | smallint                    |           | not null | 0
>  orderstatusid        | integer                     |           |          |
>  customername         | text                        |           |          |
>  customeraddress      | text                        |           |          |
>  customercountry      | text                        |           |          |
>  customercity         | text                        |           |          |
>  customerstate        | text                        |           |          |
>  customerzip          | text                        |           |          |
>     "orders_pkey" PRIMARY KEY, btree (orderid)
>
>               Table "test.orderstotrans"
>    Column    |  Type   | Collation | Nullable | Default
> -------------+---------+-----------+----------+---------
>  orderid     | bigint  |           |          |
>  transid     | bigint  |           |          |
>  orderitemid | integer |           |          |
> Indexes:
>     "orderstotrans_orderid_idx" btree (orderid)
>     "orderstotrans_orderitemid_idx" btree (orderitemid)
>     "orderstotrans_transid_key" UNIQUE, btree (transid)
>
>
> Happier plan for the union version:
> ====
> explain select orders.orderid FROM orders
> WHERE (
> orders.orderid IN (
>   SELECT '3131275553'
>   UNION
>   select orderid FROM orderstotrans WHERE (transid IN ('3131275553'))
> )
> );
>                                                                            QUERY PLAN
                                           
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=9.21..21.84 rows=2 width=8) (actual time=0.034..0.043 rows=1 loops=1)
>    ->  Unique  (cost=8.64..8.65 rows=2 width=8) (actual time=0.024..0.026 rows=2 loops=1)
>          ->  Sort  (cost=8.64..8.64 rows=2 width=8) (actual time=0.023..0.024 rows=2 loops=1)
>                Sort Key: ('3131275553'::bigint)
>                Sort Method: quicksort  Memory: 25kB
>                ->  Append  (cost=0.00..8.63 rows=2 width=8) (actual time=0.001..0.019 rows=2 loops=1)
>                      ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1)
>                      ->  Index Scan using orderstotrans_transid_key on orderstotrans  (cost=0.57..8.59 rows=1
width=8)(actual time=0.015..0.016 rows=1 loops=1) 
>                            Index Cond: (transid = '3131275553'::bigint)
>    ->  Index Only Scan using orders_pkey on orders  (cost=0.57..6.58 rows=1 width=8) (actual time=0.007..0.007 rows=0
loops=2)
>          Index Cond: (orderid = ('3131275553'::bigint))
>          Heap Fetches: 0
>  Planning Time: 0.165 ms
>  Execution Time: 0.065 ms
> (14 rows)
> ====
> (though that plan is a bit misleading, as that index condition isn't exactly what is used, ie with:
>
> select orders.orderid FROM orders
> WHERE (
> orders.orderid IN (
>   SELECT '3131275553'
>   UNION
>   select orderid FROM orderstotrans WHERE (transid IN ('3131275553'))
> )
> );
>   orderid
> -----------
>  439155713
> (1 row)
>
> the orderid it matches, isn't the one the planner showed, but it works)
>