Re: Todo: Teach planner to evaluate multiple windows in the optimal order

Поиск
Список
Период
Сортировка
От Ankit Kumar Pandey
Тема Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Дата
Msg-id 8599211e-1ea9-8827-4b9b-adeb865656bd@gmail.com
обсуждение исходный текст
Ответ на Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
> On 18/01/23 15:12, David Rowley wrote:

> I also thought I'd better test that foreach_delete_current() works
> with foreach_reverse(). I can confirm that it *does not* work
> correctly.  I guess maybe you only tested the fact that it deleted the
> current item and not that the subsequent loop correctly went to the
> item directly before the deleted item. There's a problem with that. We
> skip an item.

Hmm, not really sure why did I miss that. I tried this again (added 
following in postgres.c above

PortalStart)

List* l = NIL;
l = lappend(l, 1);
l = lappend(l, 2);
l = lappend(l, 3);
l = lappend(l, 4);

ListCell *lc;
foreach_reverse(lc, l)
{
    if (foreach_current_index(lc) == 2) // delete 3
    {
        foreach_delete_current(l, lc);
    }
}

foreach(lc, l)
{
    int i = (int) lfirst(lc);
    ereport(LOG,(errmsg("%d", i)));
}

Got result:
2023-01-18 20:23:28.115 IST [51007] LOG:  1
2023-01-18 20:23:28.115 IST [51007] STATEMENT:  select pg_backend_pid();
2023-01-18 20:23:28.115 IST [51007] LOG:  2
2023-01-18 20:23:28.115 IST [51007] STATEMENT:  select pg_backend_pid();
2023-01-18 20:23:28.115 IST [51007] LOG:  4
2023-01-18 20:23:28.115 IST [51007] STATEMENT:  select pg_backend_pid();

I had expected list_delete_cell to take care of rest.

> Instead of fixing that, I think it's likely better just to loop
> backwards manually with a for() loop, so I've adjusted the patch to
> work that way.  It's quite likely that the additional code in
> foreach() and what was in foreach_reverse() is slower than looping
> manually due to the additional work those macros do to set the cell to
> NULL when we run out of cells to loop over.

Okay, current version looks fine as well.

> I made another pass over the v7 patch and fixed a bug that was
> disabling the optimization when the deepest WindowAgg had a
> runCondition.  This should have been using llast_node instead of
> linitial_node.  The top-level WindowAgg is the last in the list.

> I also made a pass over the tests and added a test case for the
> runCondition check to make sure we disable the optimization when the
> top-level WindowAgg has one of those.  

> I wasn't sure what your test comments case numbers were meant to represent. 
> They were not aligned with the code comments that document when the optimisation is
> disabled, they started out aligned, but seemed to go off the rails at
> #3. I've now made them follow the comments in create_one_window_path()
> and made it more clear what we expect the test outcome to be in each
> case.

Those were just numbering for exceptional cases, making them in sync 
with comments

wasn't really on my mind, but now they looks better.

> I've attached the v9 patch. I feel like this patch is quite
> self-contained and I'm quite happy with it now.  If there are no
> objections soon, I'm planning on pushing it.

Patch is already rebased with latest master, tests are all green.

Tried some basic profiling and it looked good.


I also tried a bit unrealistic case.

create table abcdefgh(a int, b int, c int, d int, e int, f int, g int, h int);

insert into abcdefgh select a,b,c,d,e,f,g,h from
generate_series(1,7) a,
generate_series(1,7) b,
generate_series(1,7) c,
generate_series(1,7) d,
generate_series(1,7) e,
generate_series(1,7) f,
generate_series(1,7) g,
generate_series(1,7) h;

explain analyze select count(*) over (order by a),
row_number() over (partition by a order by b) from abcdefgh order by a,b,c,d,e,f,g,h;

In patch version

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------
---------------
  WindowAgg  (cost=1023241.14..1225007.67 rows=5764758 width=48) (actual 
time=64957.894..81950.352 rows=5764801 loops=1)
    ->  WindowAgg  (cost=1023241.14..1138536.30 rows=5764758 width=40) 
(actual time=37959.055..60391.799 rows=5764801 loops
=1)
          ->  Sort  (cost=1023241.14..1037653.03 rows=5764758 width=32) 
(actual time=37959.045..52968.791 rows=5764801 loop
s=1)
                Sort Key: a, b, c, d, e, f, g, h
                Sort Method: external merge  Disk: 237016kB
                ->  Seq Scan on abcdefgh  (cost=0.00..100036.58 
rows=5764758 width=32) (actual time=0.857..1341.107 rows=57
64801 loops=1)
  Planning Time: 0.168 ms
  Execution Time: 82748.789 ms
(8 rows)

In Master

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------
---------------------
  Incremental Sort  (cost=1040889.72..1960081.97 rows=5764758 width=48) 
(actual time=23461.815..69654.700 rows=5764801 loop
s=1)
    Sort Key: a, b, c, d, e, f, g, h
    Presorted Key: a, b
    Full-sort Groups: 49  Sort Method: quicksort  Average Memory: 30kB  
Peak Memory: 30kB
    Pre-sorted Groups: 49  Sort Method: external merge  Average Disk: 
6688kB  Peak Disk: 6688kB
    ->  WindowAgg  (cost=1023241.14..1225007.67 rows=5764758 width=48) 
(actual time=22729.171..40189.407 rows=5764801 loops
=1)
          ->  WindowAgg  (cost=1023241.14..1138536.30 rows=5764758 
width=40) (actual time=8726.562..18268.663 rows=5764801
loops=1)
                ->  Sort  (cost=1023241.14..1037653.03 rows=5764758 
width=32) (actual time=8726.551..11291.494 rows=5764801
  loops=1)
                      Sort Key: a, b
                      Sort Method: external merge  Disk: 237016kB
                      ->  Seq Scan on abcdefgh (cost=0.00..100036.58 
rows=5764758 width=32) (actual time=0.029..1600.042 r
ows=5764801 loops=1)
  Planning Time: 2.742 ms
  Execution Time: 71172.586 ms
(13 rows)


Patch version is approx 11 sec slower.

Patch version sort took around 15 sec, whereas master sort took ~2 + 46 
= around 48 sec

BUT somehow master version is faster as Window function took 10 sec less 
time.

Maybe I am interpreting it wrong but still wanted to bring this to notice.


Thanks,

Ankit






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Implement missing join selectivity estimation for range types
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences