Обсуждение: Query run in 27s with 15.2 vs 37ms with 14.6

Поиск
Список
Период
Сортировка

Query run in 27s with 15.2 vs 37ms with 14.6

От
Charles
Дата:
Hi,

I have just upgraded my postgresql from 14.6 to 15.2 and my query that runs in 37ms in 14.6 run requires 27 seconds to complete. The table is a typical OHLC table (date date, code text, open int, high int, low int, close int, volume bigint, value bigint), 2725207 rows in table.

This is the query

with vol_avg as (
  select
    code,
    avg(value) as value
  from
    stock_price
  where
    value > 0 and
    date > (select date from stock_date order by date desc limit 1 offset 5)
  group by
    code
)
select
  s.code,
  s.close,
  100.0 * (s.close - s.open) / s.open as chg,
  s.value,
  s.volume,
  va.value as value_avg
from
  stock_price s,
  vol_avg va
where
  length(s.code) = 4 and
  s.date = '2023-02-20' and
  s.open > 0 and
  s.value > 0 and
  s.code = va.code

If run individually the query in the CTE and the bottom query without CTE runs fast. Only when joined do they require 27 seconds to complete.

The plan from 15.2:

Nested Loop  (cost=63003.26..64440.14 rows=1 width=89) (actual time=23.234..27407.834 rows=779 loops=1)
  Join Filter: (s.code = stock_price.code)
  Rows Removed by Join Filter: 349117
  ->  Index Scan using idx_stock_price_date on stock_price s  (cost=0.43..1152.53 rows=1 width=29) (actual time=0.180..4.060 rows=779 loops=1)
        Index Cond: (date = '2023-02-20'::date)
        Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
        Rows Removed by Filter: 210
  ->  Finalize GroupAggregate  (cost=63002.83..63264.98 rows=1005 width=37) (actual time=34.783..35.149 rows=449 loops=779)
        Group Key: stock_price.code
        InitPlan 1 (returns $0)
          ->  Limit  (cost=0.41..0.43 rows=1 width=4) (actual time=0.094..0.095 rows=1 loops=1)
                ->  Index Only Scan Backward using stock_date_pkey on stock_date  (cost=0.28..182.90 rows=7359 width=4) (actual time=0.091..0.092 rows=6 loops=1)
                      Heap Fetches: 6
        ->  Gather Merge  (cost=63002.40..63236.91 rows=2010 width=37) (actual time=34.772..34.843 rows=450 loops=779)
              Workers Planned: 2
              Params Evaluated: $0
              Workers Launched: 2
              ->  Sort  (cost=62002.37..62004.89 rows=1005 width=37) (actual time=0.907..0.919 rows=150 loops=2337)
                    Sort Key: stock_price.code
                    Sort Method: quicksort  Memory: 95kB
                    Worker 0:  Sort Method: quicksort  Memory: 25kB
                    Worker 1:  Sort Method: quicksort  Memory: 25kB
                    ->  Partial HashAggregate  (cost=61939.70..61952.26 rows=1005 width=37) (actual time=0.706..0.786 rows=302 loops=2337)
                          Group Key: stock_price.code
                          Batches: 1  Memory Usage: 577kB
                          Worker 0:  Batches: 1  Memory Usage: 73kB
                          Worker 1:  Batches: 1  Memory Usage: 73kB
                          ->  Parallel Bitmap Heap Scan on stock_price  (cost=8799.81..61483.32 rows=91275 width=13) (actual time=0.068..0.333 rows=1427 loops=2337)
                                Recheck Cond: (date > $0)
                                Filter: (value > 0)
                                Rows Removed by Filter: 222
                                Heap Blocks: exact=138662
                                ->  Bitmap Index Scan on idx_stock_price_date  (cost=0.00..8745.05 rows=908402 width=0) (actual time=0.176..0.176 rows=8917 loops=779)
                                      Index Cond: (date > $0)
Planning Time: 1.122 ms
Execution Time: 27408.516 ms

Plan generated by 14.6

Merge Join  (cost=61456.65..61716.09 rows=1 width=89) (actual time=25.509..37.185 rows=779 loops=1)
  Merge Cond: (s.code = stock_price.code)
  ->  Sort  (cost=319.77..319.78 rows=1 width=29) (actual time=0.745..0.771 rows=779 loops=1)
        Sort Key: s.code
        Sort Method: quicksort  Memory: 85kB
        ->  Index Scan using idx_stock_price_date on stock_price s  (cost=0.43..319.76 rows=1 width=29) (actual time=0.197..0.514 rows=779 loops=1)
              Index Cond: (date = '2023-02-20'::date)
              Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
              Rows Removed by Filter: 210
  ->  Finalize GroupAggregate  (cost=61136.88..61384.43 rows=949 width=37) (actual time=24.756..35.979 rows=906 loops=1)
        Group Key: stock_price.code
        InitPlan 1 (returns $0)
          ->  Limit  (cost=0.41..0.43 rows=1 width=4) (actual time=0.116..0.116 rows=1 loops=1)
                ->  Index Only Scan Backward using stock_date_pkey on stock_date  (cost=0.28..181.48 rows=7261 width=4) (actual time=0.113..0.115 rows=6 loops=1)
                      Heap Fetches: 6
        ->  Gather Merge  (cost=61136.45..61357.90 rows=1898 width=37) (actual time=24.736..35.284 rows=907 loops=1)
              Workers Planned: 2
              Params Evaluated: $0
              Workers Launched: 2
              ->  Sort  (cost=60136.42..60138.80 rows=949 width=37) (actual time=1.286..1.311 rows=302 loops=3)
                    Sort Key: stock_price.code
                    Sort Method: quicksort  Memory: 95kB
                    Worker 0:  Sort Method: quicksort  Memory: 25kB
                    Worker 1:  Sort Method: quicksort  Memory: 25kB
                    ->  Partial HashAggregate  (cost=60077.63..60089.50 rows=949 width=37) (actual time=1.098..1.179 rows=302 loops=3)
                          Group Key: stock_price.code
                          Batches: 1  Memory Usage: 577kB
                          Worker 0:  Batches: 1  Memory Usage: 73kB
                          Worker 1:  Batches: 1  Memory Usage: 73kB
                          ->  Parallel Index Scan using idx_stock_price_date on stock_price  (cost=0.43..59671.39 rows=81248 width=13) (actual time=0.017..0.745 rows=1427 loops=3)
                                Index Cond: (date > $0)
                                Filter: (value > 0)
                                Rows Removed by Filter: 222
Planning Time: 1.331 ms
Execution Time: 37.413 ms

Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Tom Lane
Дата:
Charles <peacech@gmail.com> writes:
> I have just upgraded my postgresql from 14.6 to 15.2 and my query that runs
> in 37ms in 14.6 run requires 27 seconds to complete. The table is a typical
> OHLC table (date date, code text, open int, high int, low int, close int,
> volume bigint, value bigint), 2725207 rows in table.

You need to do something to fix this poor row-count estimate:

>   ->  Index Scan using idx_stock_price_date on stock_price s
>  (cost=0.43..1152.53 rows=1 width=29) (actual time=0.180..4.060 rows=779 loops=1)
>         Index Cond: (date = '2023-02-20'::date)
>         Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
>         Rows Removed by Filter: 210

Considering that 14.x also estimated this as returning only one row,
I'm fairly surprised that you didn't get the same poor plan choice there.

The length(code) condition suggests a fairly poor choice of data
representation; can you change that?  In any case, the planner does
not keep statistics that would help it estimate that.  It might be
useful to try defining extended statistics on length(code) as such,
or if that doesn't help then on the combination of open, value,
and length(code).  But you need to get that estimate up to at least
several rows to discourage the choice of nestloop join.

            regards, tom lane



Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Charles
Дата:
On Mon, Feb 20, 2023 at 11:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Charles <peacech@gmail.com> writes:
> I have just upgraded my postgresql from 14.6 to 15.2 and my query that runs
> in 37ms in 14.6 run requires 27 seconds to complete. The table is a typical
> OHLC table (date date, code text, open int, high int, low int, close int,
> volume bigint, value bigint), 2725207 rows in table.

You need to do something to fix this poor row-count estimate:

>   ->  Index Scan using idx_stock_price_date on stock_price s
>  (cost=0.43..1152.53 rows=1 width=29) (actual time=0.180..4.060 rows=779 loops=1)
>         Index Cond: (date = '2023-02-20'::date)
>         Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
>         Rows Removed by Filter: 210

Considering that 14.x also estimated this as returning only one row,
I'm fairly surprised that you didn't get the same poor plan choice there.

The length(code) condition suggests a fairly poor choice of data
representation; can you change that?

Thanks, removing the length filter do restore the execution time to 30ms.

There are only around 900 rows in a date so it looks like 14.6 filters the date first (which is indexed) while 15.2 filters the length first.

Although when not joined with the cte the query runs fast even with the length filter

select
  s.code,
  s.close,
  100.0 * (s.close - s.open) / s.open as chg,
  s.value,
  s.volume
from
  stock_price s
where
  s.date = '2023-02-20' and
  length(s.code) = 4 and
  s.open > 0 and
  s.value > 0

Index Scan using idx_stock_price_date on stock_price s  (cost=0.43..1152.55 rows=1 width=57) (actual time=0.074..0.461 rows=779 loops=1)
  Index Cond: (date = '2023-02-20'::date)
  Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
  Rows Removed by Filter: 210
Planning Time: 0.098 ms
Execution Time: 0.487 ms





Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Charles
Дата:

On Mon, Feb 20, 2023 at 11:44 PM Charles <peacech@gmail.com> wrote:
On Mon, Feb 20, 2023 at 11:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Charles <peacech@gmail.com> writes:
> I have just upgraded my postgresql from 14.6 to 15.2 and my query that runs
> in 37ms in 14.6 run requires 27 seconds to complete. The table is a typical
> OHLC table (date date, code text, open int, high int, low int, close int,
> volume bigint, value bigint), 2725207 rows in table.

You need to do something to fix this poor row-count estimate:

>   ->  Index Scan using idx_stock_price_date on stock_price s
>  (cost=0.43..1152.53 rows=1 width=29) (actual time=0.180..4.060 rows=779 loops=1)
>         Index Cond: (date = '2023-02-20'::date)
>         Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
>         Rows Removed by Filter: 210

Considering that 14.x also estimated this as returning only one row,
I'm fairly surprised that you didn't get the same poor plan choice there.

The length(code) condition suggests a fairly poor choice of data
representation; can you change that?

Thanks, removing the length filter do restore the execution time to 30ms.








Wrapping the query with a select * from (...) t where length(code) = 4 puts the execution time back to 27 seconds.

This is a bit unexpected since I expect that the result from the inner query to be executed first and then filtered.

select * from (
  with vol_avg as (
    select
      code,
      avg(value) as value
    from
      stock_price
    where
      value > 0 and
      date > (select date from stock_date order by date desc limit 1 offset 5)
    group by
      code
  )
  select
    s.code,
    s.close,
    100.0 * (s.close - s.open) / s.open as chg,
    s.value,
    s.volume,
    va.value as value_avg
  from
    stock_price s
    inner join vol_avg va on
      s.code = va.code
  where
    s.date = '2023-02-20' and
    s.open > 0 and
    s.value > 0
) t where length(code) = 4

Nested Loop  (cost=63003.26..64440.14 rows=1 width=89) (actual time=22.859..26784.170 rows=779 loops=1)
  Join Filter: (s.code = stock_price.code)
  Rows Removed by Join Filter: 349117
  ->  Index Scan using idx_stock_price_date on stock_price s  (cost=0.43..1152.53 rows=1 width=29) (actual time=0.084..4.024 rows=779 loops=1)
        Index Cond: (date = '2023-02-20'::date)
        Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
        Rows Removed by Filter: 210
  ->  Finalize GroupAggregate  (cost=63002.83..63264.98 rows=1005 width=37) (actual time=33.983..34.347 rows=449 loops=779)
        Group Key: stock_price.code
        InitPlan 1 (returns $0)
          ->  Limit  (cost=0.41..0.43 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=1)
                ->  Index Only Scan Backward using stock_date_pkey on stock_date  (cost=0.28..182.90 rows=7359 width=4) (actual time=0.012..0.012 rows=6 loops=1)
                      Heap Fetches: 6
        ->  Gather Merge  (cost=63002.40..63236.91 rows=2010 width=37) (actual time=33.972..34.045 rows=450 loops=779)
              Workers Planned: 2
              Params Evaluated: $0
              Workers Launched: 2
              ->  Sort  (cost=62002.37..62004.89 rows=1005 width=37) (actual time=0.885..0.896 rows=150 loops=2337)
                    Sort Key: stock_price.code
                    Sort Method: quicksort  Memory: 95kB
                    Worker 0:  Sort Method: quicksort  Memory: 25kB
                    Worker 1:  Sort Method: quicksort  Memory: 25kB
                    ->  Partial HashAggregate  (cost=61939.70..61952.26 rows=1005 width=37) (actual time=0.696..0.776 rows=302 loops=2337)
                          Group Key: stock_price.code
                          Batches: 1  Memory Usage: 577kB
                          Worker 0:  Batches: 1  Memory Usage: 73kB
                          Worker 1:  Batches: 1  Memory Usage: 73kB
                          ->  Parallel Bitmap Heap Scan on stock_price  (cost=8799.81..61483.32 rows=91275 width=13) (actual time=0.068..0.335 rows=1427 loops=2337)
                                Recheck Cond: (date > $0)
                                Filter: (value > 0)
                                Rows Removed by Filter: 222
                                Heap Blocks: exact=138662
                                ->  Bitmap Index Scan on idx_stock_price_date  (cost=0.00..8745.05 rows=908402 width=0) (actual time=0.177..0.177 rows=8917 loops=779)
                                      Index Cond: (date > $0)
Planning Time: 0.362 ms
Execution Time: 26784.929 ms

Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Jeff Janes
Дата:
On Mon, Feb 20, 2023 at 10:56 AM Charles <peacech@gmail.com> wrote:
 
The plan from 15.2:

Nested Loop  (cost=63003.26..64440.14 rows=1 width=89) (actual time=23.234..27407.834 rows=779 loops=1)
 
Plan generated by 14.6

Merge Join  (cost=61456.65..61716.09 rows=1 width=89) (actual time=25.509..37.185 rows=779 loops=1)

Given how close those plan estimates are to each other, I would say your query was very fragile under 14.6, and it was just a matter of luck of how the statistics were computed that you got the better plan on the older version.  As opposed to some important coding changes that happened between versions.  To verify that, Can you force each version to choose the other plan, for example by fiddling with enable_nestedloop on one and enable_mergjoin on the other?

Cheers,

Jeff

Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Stephen Frost
Дата:
Greetings,

* Charles (peacech@gmail.com) wrote:
> Wrapping the query with a select * from (...) t where length(code) = 4 puts
> the execution time back to 27 seconds.
>
> This is a bit unexpected since I expect that the result from the inner
> query to be executed first and then filtered.

It's really not- PG will (correctly) attempt to pull in such subselects
into the overall optimization, which is generally better for everyone.
If you want to force it, you can use a WITH MATERIALIZED CTE, or throw
in an 'OFFSET 0' as a hack into your sub-select, but really it's a much
better idea to generate extended stats on what you're filtering as has
been suggested, or come up with a better data representation where
you're not doing a search on a 'length()' as you are.

Thanks,

Stephen

Вложения

Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Charles
Дата:
On Tue, Feb 21, 2023 at 2:25 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Feb 20, 2023 at 10:56 AM Charles <peacech@gmail.com> wrote:
 
The plan from 15.2:

Nested Loop  (cost=63003.26..64440.14 rows=1 width=89) (actual time=23.234..27407.834 rows=779 loops=1)
 
Plan generated by 14.6

Merge Join  (cost=61456.65..61716.09 rows=1 width=89) (actual time=25.509..37.185 rows=779 loops=1)

Given how close those plan estimates are to each other, I would say your query was very fragile under 14.6, and it was just a matter of luck of how the statistics were computed that you got the better plan on the older version.  As opposed to some important coding changes that happened between versions.  To verify that, Can you force each version to choose the other plan, for example by fiddling with enable_nestedloop on one and enable_mergjoin on the other?


Disabling mergejoin on 14.6 and disabling nestedloop on 15.2 causes both to use hashjoin where it runs for 37ms in 14.6 and 208ms in 15.2.

14.6:
Hash Join  (cost=61456.66..61716.21 rows=1 width=89) (actual time=23.391..36.458 rows=779 loops=1)
  Hash Cond: (stock_price.code = s.code)
  ->  Finalize GroupAggregate  (cost=61136.88..61384.43 rows=949 width=37) (actual time=22.698..35.350 rows=907 loops=1)
        Group Key: stock_price.code
        InitPlan 1 (returns $0)
          ->  Limit  (cost=0.41..0.43 rows=1 width=4) (actual time=0.235..0.278 rows=1 loops=1)
                ->  Index Only Scan Backward using stock_date_pkey on stock_date  (cost=0.28..181.48 rows=7261 width=4) (actual time=0.231..0.233 rows=6 loops=1)
                      Heap Fetches: 6
        ->  Gather Merge  (cost=61136.45..61357.90 rows=1898 width=37) (actual time=22.681..34.642 rows=907 loops=1)
              Workers Planned: 2
              Params Evaluated: $0
              Workers Launched: 2
              ->  Sort  (cost=60136.42..60138.80 rows=949 width=37) (actual time=1.235..1.257 rows=302 loops=3)
                    Sort Key: stock_price.code
                    Sort Method: quicksort  Memory: 95kB
                    Worker 0:  Sort Method: quicksort  Memory: 25kB
                    Worker 1:  Sort Method: quicksort  Memory: 25kB
                    ->  Partial HashAggregate  (cost=60077.63..60089.50 rows=949 width=37) (actual time=1.038..1.127 rows=302 loops=3)
                          Group Key: stock_price.code
                          Batches: 1  Memory Usage: 577kB
                          Worker 0:  Batches: 1  Memory Usage: 73kB
                          Worker 1:  Batches: 1  Memory Usage: 73kB
                          ->  Parallel Index Scan using idx_stock_price_date on stock_price  (cost=0.43..59671.39 rows=81248 width=13) (actual time=0.021..0.689 rows=1427 loops=3)
                                Index Cond: (date > $0)
                                Filter: (value > 0)
                                Rows Removed by Filter: 222
  ->  Hash  (cost=319.76..319.76 rows=1 width=29) (actual time=0.650..0.652 rows=779 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 57kB
        ->  Index Scan using idx_stock_price_date on stock_price s  (cost=0.43..319.76 rows=1 width=29) (actual time=0.212..0.542 rows=779 loops=1)
              Index Cond: (date = '2023-02-20'::date)
              Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
              Rows Removed by Filter: 210
Planning Time: 1.303 ms
Execution Time: 36.671 ms

15.2:
Hash Join  (cost=65863.57..69710.36 rows=1 width=89) (actual time=196.886..208.079 rows=779 loops=1)
  Hash Cond: (s.code = va.code)
  ->  Bitmap Heap Scan on stock_price s  (cost=20.61..3867.38 rows=1 width=29) (actual time=0.209..0.554 rows=779 loops=1)
        Recheck Cond: (date = '2023-02-20'::date)
        Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
        Rows Removed by Filter: 210
        Heap Blocks: exact=34
        ->  Bitmap Index Scan on idx_stock_price_date  (cost=0.00..20.60 rows=1090 width=0) (actual time=0.082..0.082 rows=1827 loops=1)
              Index Cond: (date = '2023-02-20'::date)
  ->  Hash  (cost=65830.40..65830.40 rows=1005 width=37) (actual time=196.635..207.098 rows=907 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 49kB
        ->  Subquery Scan on va  (cost=65558.20..65830.40 rows=1005 width=37) (actual time=194.141..206.923 rows=907 loops=1)
              ->  Finalize GroupAggregate  (cost=65558.20..65820.35 rows=1005 width=37) (actual time=194.139..206.872 rows=907 loops=1)
                    Group Key: stock_price.code
                    InitPlan 1 (returns $0)
                      ->  Limit  (cost=0.53..0.58 rows=1 width=4) (actual time=0.117..0.117 rows=1 loops=1)
                            ->  Index Only Scan Backward using stock_date_pkey on stock_date  (cost=0.28..363.63 rows=7359 width=4) (actual time=0.114..0.116 rows=6 loops=1)
                                  Heap Fetches: 6
                    ->  Gather Merge  (cost=65557.62..65792.14 rows=2010 width=37) (actual time=194.123..205.341 rows=1992 loops=1)
                          Workers Planned: 2
                          Params Evaluated: $0
                          Workers Launched: 2
                          ->  Sort  (cost=64557.60..64560.11 rows=1005 width=37) (actual time=176.452..176.494 rows=664 loops=3)
                                Sort Key: stock_price.code
                                Sort Method: quicksort  Memory: 95kB
                                Worker 0:  Sort Method: quicksort  Memory: 43kB
                                Worker 1:  Sort Method: quicksort  Memory: 90kB
                                ->  Partial HashAggregate  (cost=64494.92..64507.48 rows=1005 width=37) (actual time=175.830..176.168 rows=664 loops=3)
                                      Group Key: stock_price.code
                                      Batches: 1  Memory Usage: 577kB
                                      Worker 0:  Batches: 1  Memory Usage: 193kB
                                      Worker 1:  Batches: 1  Memory Usage: 577kB
                                      ->  Parallel Seq Scan on stock_price  (cost=0.00..64038.54 rows=91275 width=13) (actual time=130.043..173.124 rows=1427 loops=3)
                                            Filter: ((value > 0) AND (date > $0))
                                            Rows Removed by Filter: 906975
Planning Time: 1.310 ms
Execution Time: 208.262 ms

Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Charles
Дата:


On Tue, Feb 21, 2023 at 3:58 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Charles (peacech@gmail.com) wrote:
> Wrapping the query with a select * from (...) t where length(code) = 4 puts
> the execution time back to 27 seconds.
>
> This is a bit unexpected since I expect that the result from the inner
> query to be executed first and then filtered.

It's really not- PG will (correctly) attempt to pull in such subselects
into the overall optimization, which is generally better for everyone.
If you want to force it, you can use a WITH MATERIALIZED CTE, or throw
in an 'OFFSET 0' as a hack into your sub-select, but really it's a much
better idea to generate extended stats on what you're filtering as has
been suggested, or come up with a better data representation where
you're not doing a search on a 'length()' as you are.



Generating extended statistics on the expression (length(code)) causes the planner to generate has join which runs in 183ms (still slower than 37ms on 14.6). Using materialized cte (no extended stats) results in nested loop that runs in 229ms. I think I'll revert back to postgresql 14 for now. Thank you for the suggestions.

with extended stats:
Hash Join  (cost=65865.64..69478.05 rows=244 width=89) (actual time=171.915..183.674 rows=779 loops=1)
  Hash Cond: (s.code = va.code)
  ->  Bitmap Heap Scan on stock_price s  (cost=16.12..3624.83 rows=244 width=29) (actual time=0.079..0.238 rows=779 loops=1)
        Recheck Cond: (date = '2023-02-20'::date)
        Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
        Rows Removed by Filter: 210
        Heap Blocks: exact=34
        ->  Bitmap Index Scan on idx_stock_price_date  (cost=0.00..16.06 rows=1017 width=0) (actual time=0.042..0.042 rows=1827 loops=1)
              Index Cond: (date = '2023-02-20'::date)
  ->  Hash  (cost=65836.90..65836.90 rows=1010 width=37) (actual time=171.827..183.109 rows=907 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 49kB
        ->  Subquery Scan on va  (cost=65563.34..65836.90 rows=1010 width=37) (actual time=169.957..182.987 rows=907 loops=1)
              ->  Finalize GroupAggregate  (cost=65563.34..65826.80 rows=1010 width=37) (actual time=169.957..182.947 rows=907 loops=1)
                    Group Key: stock_price.code
                    InitPlan 1 (returns $0)
                      ->  Limit  (cost=0.53..0.58 rows=1 width=4) (actual time=0.012..0.012 rows=1 loops=1)
                            ->  Index Only Scan Backward using stock_date_pkey on stock_date  (cost=0.28..363.63 rows=7359 width=4) (actual time=0.010..0.011 rows=6 loops=1)
                                  Heap Fetches: 6
                    ->  Gather Merge  (cost=65562.76..65798.45 rows=2020 width=37) (actual time=169.947..181.671 rows=2616 loops=1)
                          Workers Planned: 2
                          Params Evaluated: $0
                          Workers Launched: 2
                          ->  Sort  (cost=64562.74..64565.26 rows=1010 width=37) (actual time=156.536..156.575 rows=872 loops=3)
                                Sort Key: stock_price.code
                                Sort Method: quicksort  Memory: 94kB
                                Worker 0:  Sort Method: quicksort  Memory: 93kB
                                Worker 1:  Sort Method: quicksort  Memory: 90kB
                                ->  Partial HashAggregate  (cost=64499.71..64512.34 rows=1010 width=37) (actual time=156.002..156.246 rows=872 loops=3)
                                      Group Key: stock_price.code
                                      Batches: 1  Memory Usage: 577kB
                                      Worker 0:  Batches: 1  Memory Usage: 577kB
                                      Worker 1:  Batches: 1  Memory Usage: 577kB
                                      ->  Parallel Seq Scan on stock_price  (cost=0.00..64038.54 rows=92234 width=13) (actual time=68.301..155.471 rows=1427 loops=3)
                                            Filter: ((value > 0) AND (date > $0))
                                            Rows Removed by Filter: 906975
Planning Time: 0.340 ms
Execution Time: 183.833 ms

materialized cte:
Nested Loop  (cost=65839.48..69480.90 rows=1 width=89) (actual time=156.363..229.143 rows=779 loops=1)
  Join Filter: (s.code = va.code)
  Rows Removed by Join Filter: 705774
  CTE vol_avg
    ->  Finalize GroupAggregate  (cost=65561.00..65823.42 rows=1006 width=37) (actual time=154.575..156.254 rows=907 loops=1)
          Group Key: stock_price.code
          InitPlan 1 (returns $0)
            ->  Limit  (cost=0.53..0.58 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=1)
                  ->  Index Only Scan Backward using stock_date_pkey on stock_date  (cost=0.28..363.63 rows=7359 width=4) (actual time=0.011..0.012 rows=6 loops=1)
                        Heap Fetches: 6
          ->  Gather Merge  (cost=65560.42..65795.17 rows=2012 width=37) (actual time=154.564..155.143 rows=2625 loops=1)
                Workers Planned: 2
                Params Evaluated: $0
                Workers Launched: 2
                ->  Sort  (cost=64560.40..64562.92 rows=1006 width=37) (actual time=138.777..138.822 rows=875 loops=3)
                      Sort Key: stock_price.code
                      Sort Method: quicksort  Memory: 92kB
                      Worker 0:  Sort Method: quicksort  Memory: 93kB
                      Worker 1:  Sort Method: quicksort  Memory: 93kB
                      ->  Partial HashAggregate  (cost=64497.65..64510.23 rows=1006 width=37) (actual time=138.193..138.455 rows=875 loops=3)
                            Group Key: stock_price.code
                            Batches: 1  Memory Usage: 577kB
                            Worker 0:  Batches: 1  Memory Usage: 577kB
                            Worker 1:  Batches: 1  Memory Usage: 577kB
                            ->  Parallel Seq Scan on stock_price  (cost=0.00..64038.54 rows=91822 width=13) (actual time=133.465..137.738 rows=1427 loops=3)
                                  Filter: ((value > 0) AND (date > $0))
                                  Rows Removed by Filter: 906975
  ->  Bitmap Heap Scan on stock_price s  (cost=16.06..3624.77 rows=1 width=29) (actual time=0.071..0.570 rows=779 loops=1)
        Recheck Cond: (date = '2023-02-20'::date)
        Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
        Rows Removed by Filter: 210
        Heap Blocks: exact=34
        ->  Bitmap Index Scan on idx_stock_price_date  (cost=0.00..16.06 rows=1017 width=0) (actual time=0.036..0.036 rows=1827 loops=1)
              Index Cond: (date = '2023-02-20'::date)
  ->  CTE Scan on vol_avg va  (cost=0.00..20.12 rows=1006 width=64) (actual time=0.198..0.244 rows=907 loops=779)
Planning Time: 0.499 ms
Execution Time: 229.381 ms

Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Stephen Frost
Дата:
Greetings,

On Mon, Feb 20, 2023 at 20:54 Charles <peacech@gmail.com> wrote:
On Tue, Feb 21, 2023 at 3:58 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Charles (peacech@gmail.com) wrote:
> Wrapping the query with a select * from (...) t where length(code) = 4 puts
> the execution time back to 27 seconds.
>
> This is a bit unexpected since I expect that the result from the inner
> query to be executed first and then filtered.

It's really not- PG will (correctly) attempt to pull in such subselects
into the overall optimization, which is generally better for everyone.
If you want to force it, you can use a WITH MATERIALIZED CTE, or throw
in an 'OFFSET 0' as a hack into your sub-select, but really it's a much
better idea to generate extended stats on what you're filtering as has
been suggested, or come up with a better data representation where
you're not doing a search on a 'length()' as you are.



Generating extended statistics on the expression (length(code)) causes the planner to generate has join which runs in 183ms (still slower than 37ms on 14.6). Using materialized cte (no extended stats) results in nested loop that runs in 229ms. I think I'll revert back to postgresql 14 for now. Thank you for the suggestions.

As pointed out elsewhere, the query costs seem pretty close and so you’re likely going to get with the bad plan at some point on 14 also. Have you tried tweaking the statistics target?  Might help.  Though really, as was said before, modeling the data better likely would help a lot. 

Thanks,

Stephen

Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Charles
Дата:


On Tue, Feb 21, 2023 at 9:03 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

On Mon, Feb 20, 2023 at 20:54 Charles <peacech@gmail.com> wrote:
On Tue, Feb 21, 2023 at 3:58 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Charles (peacech@gmail.com) wrote:
> Wrapping the query with a select * from (...) t where length(code) = 4 puts
> the execution time back to 27 seconds.
>
> This is a bit unexpected since I expect that the result from the inner
> query to be executed first and then filtered.

It's really not- PG will (correctly) attempt to pull in such subselects
into the overall optimization, which is generally better for everyone.
If you want to force it, you can use a WITH MATERIALIZED CTE, or throw
in an 'OFFSET 0' as a hack into your sub-select, but really it's a much
better idea to generate extended stats on what you're filtering as has
been suggested, or come up with a better data representation where
you're not doing a search on a 'length()' as you are.



Generating extended statistics on the expression (length(code)) causes the planner to generate has join which runs in 183ms (still slower than 37ms on 14.6). Using materialized cte (no extended stats) results in nested loop that runs in 229ms. I think I'll revert back to postgresql 14 for now. Thank you for the suggestions.

As pointed out elsewhere, the query costs seem pretty close and so you’re likely going to get with the bad plan at some point on 14 also. Have you tried tweaking the statistics target?  Might help.  Though really, as was said before, modeling the data better likely would help a lot. 


I have used 14.6 for probably a year running this query multiple times a day (>50x) and have never experienced a slow response time.

Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
David Rowley
Дата:
On Tue, 21 Feb 2023 at 14:38, Charles <peacech@gmail.com> wrote:
> Disabling mergejoin on 14.6 and disabling nestedloop on 15.2 causes both to use hashjoin where it runs for 37ms in
14.6and 208ms in 15.2.
 
>
> 14.6:
>                           ->  Parallel Index Scan using idx_stock_price_date on stock_price  (cost=0.43..59671.39
rows=81248width=13) (actual time=0.021..0.689 rows=1427 loops=3)
 
>                                 Index Cond: (date > $0)
>                                 Filter: (value > 0)
>                                 Rows Removed by Filter: 222

> 15.2:
>                                       ->  Parallel Seq Scan on stock_price  (cost=0.00..64038.54 rows=91275 width=13)
(actualtime=130.043..173.124 rows=1427 loops=3)
 
>                                             Filter: ((value > 0) AND (date > $0))
>                                             Rows Removed by Filter: 906975

The difference in the two above fragments likely accounts for the
majority of the remaining performance difference.  Possibly 15.2 is
using a Seq Scan because it's estimating slightly more rows from
stock_price for these two quals. For the date > $0 qual, the stats
don't really help as the planner does not know what $0 will be during
planning, so it'll just assume that the selectivity is 1/3rd of rows.
For the value > 0, there could be some variation there just between
ANALYZE runs. That might be enough to account for the difference in
estimate between 14 and 15.

You might also want to check that effective_cache_size is set to
something realistic on 15. random_page_cost is also a factor for index
scan vs seq scan.  However, ultimately, the estimate is bad here and
adjusting those two GUCs to make this plan do what you want might be
to the detriment of other plans where the statistics are more
accurate.   You could maybe get the planner to give you the v14 plan
if you changed the CTE qual to "where value + 0 > 0", that would mean
the planner would be unable to use the statistics, which presumably
say that most values are > 0, and it would apply the 0.3333 to both
those quals which should reduce the row estimate.  However, that's not
to say it won't make things bad again for some other "date" value.

David



Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Charles
Дата:


On Tue, Feb 21, 2023 at 9:22 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 21 Feb 2023 at 14:38, Charles <peacech@gmail.com> wrote:
> Disabling mergejoin on 14.6 and disabling nestedloop on 15.2 causes both to use hashjoin where it runs for 37ms in 14.6 and 208ms in 15.2.
>
> 14.6:
>                           ->  Parallel Index Scan using idx_stock_price_date on stock_price  (cost=0.43..59671.39 rows=81248 width=13) (actual time=0.021..0.689 rows=1427 loops=3)
>                                 Index Cond: (date > $0)
>                                 Filter: (value > 0)
>                                 Rows Removed by Filter: 222

> 15.2:
>                                       ->  Parallel Seq Scan on stock_price  (cost=0.00..64038.54 rows=91275 width=13) (actual time=130.043..173.124 rows=1427 loops=3)
>                                             Filter: ((value > 0) AND (date > $0))
>                                             Rows Removed by Filter: 906975

The difference in the two above fragments likely accounts for the
majority of the remaining performance difference.  Possibly 15.2 is
using a Seq Scan because it's estimating slightly more rows from
stock_price for these two quals. For the date > $0 qual, the stats
don't really help as the planner does not know what $0 will be during
planning, so it'll just assume that the selectivity is 1/3rd of rows.
For the value > 0, there could be some variation there just between
ANALYZE runs. That might be enough to account for the difference in
estimate between 14 and 15.

You might also want to check that effective_cache_size is set to
something realistic on 15. random_page_cost is also a factor for index
scan vs seq scan. 

Thank you for the hint. I think this is it. random_page_cost was set to 1.1 in 14.6 (the data directory is located in a ssd). When upgrading to 15.2 I thought that ssd random seek time is not that close to sequential seek time, so I revert it back to 4.

The new timings are
random_page_cost = 1.1 with no extended stats = still 27 seconds
random_page_cost = 1.1 with materialized cte = 92ms
random_page_cost = 1.1 with extended stats = 33ms
random_page_cost = 1.1 with extended stats and materialized cte = 33ms


Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Stephen Frost
Дата:
Greetings,

* Charles (peacech@gmail.com) wrote:
> Thank you for the hint. I think this is it. random_page_cost was set to 1.1
> in 14.6 (the data directory is located in a ssd). When upgrading to 15.2 I
> thought that ssd random seek time is not that close to sequential seek
> time, so I revert it back to 4.

In the future, you might want to be sure and include 'settings' in your
EXPLAIN options, eg:

EXPLAIN (ANALYZE, BUFFERS, SETTINGS) query ...

so that you see what non-default planner settings are set when the query
is being planned.

Thanks,

Stephen

Вложения

Re: Query run in 27s with 15.2 vs 37ms with 14.6

От
Jeff Janes
Дата:
On Mon, Feb 20, 2023 at 8:38 PM Charles <peacech@gmail.com> wrote:
On Tue, Feb 21, 2023 at 2:25 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Feb 20, 2023 at 10:56 AM Charles <peacech@gmail.com> wrote:
 
The plan from 15.2:

Nested Loop  (cost=63003.26..64440.14 rows=1 width=89) (actual time=23.234..27407.834 rows=779 loops=1)
 
Plan generated by 14.6

Merge Join  (cost=61456.65..61716.09 rows=1 width=89) (actual time=25.509..37.185 rows=779 loops=1)

Given how close those plan estimates are to each other, I would say your query was very fragile under 14.6, and it was just a matter of luck of how the statistics were computed that you got the better plan on the older version.  As opposed to some important coding changes that happened between versions.  To verify that, Can you force each version to choose the other plan, for example by fiddling with enable_nestedloop on one and enable_mergjoin on the other?


Disabling mergejoin on 14.6 and disabling nestedloop on 15.2 causes both to use hashjoin where it runs for 37ms in 14.6 and 208ms in 15.2.

So please also disable hashjoin on each (as well as what you already disabled) and see if that forces at least one of them to switch to using the other one's plan.  Then once you get at least one version to show both plans, it is a question of whether there was just a small difference in cost estimates which was still large enough to change the rank order or the plans, or was it a large difference.

The difference in performance of the hashjoin plans is also interesting and probably worth investigating, but it is not obviously related to the original performance difference.  So how many different things are you willing to investigate, and in what order?

If you could offer up a dataset which reproduces the problem but can be shared without confidentiality problems, that could help.  Especially if you can provide a generator for the data which uses random() and generate_series(), rather than the data itself.

Cheers,

Jeff