Обсуждение: slow count in window query
Hello, I did some test - median via window function - I found probably some bad optimised code. I found two methods - Celko and Itzik Ben-Gan. Ben-Gan methoud should to be faster - there is one sort less, but in practice - it is 2 times slower. create table x(a integer); insert into x select (random()*10000)::int from generate_series(1,10000); Celko method: postgres=# explain select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo, from x) s where hi in (lo-1,lo+1); QUERY PLAN -------------------------------------------------------------------------------------------------Aggregate (cost=2144.02..2144.03rows=1 width=4) -> Subquery Scan s (cost=1643.77..2143.77 rows=100 width=4) Filter: ((s.hi= (s.lo - 1)) OR (s.hi = (s.lo + 1))) -> WindowAgg (cost=1643.77..1943.77 rows=10000 width=4) -> WindowAgg (cost=1643.77..1818.77 rows=10000 width=4) -> Sort (cost=1643.77..1668.77 rows=10000width=4) Sort Key: x.a -> WindowAgg (cost=804.39..979.39 rows=10000 width=4) -> Sort (cost=804.39..829.39 rows=10000 width=4) Sort Key: x.a -> Seq Scanon x (cost=0.00..140.00 rows=10000 width=4) (11 rows) Ben-Gan: postgres=# explain select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; QUERY PLAN -------------------------------------------------------------------------------------Aggregate (cost=1354.64..1354.65 rows=1width=4) -> Subquery Scan p (cost=804.39..1354.39 rows=100 width=4) Filter: ((p.r = ((p.rc + 1) / 2)) OR(p.r = ((p.rc + 2) / 2))) -> WindowAgg (cost=804.39..1104.39 rows=10000 width=4) -> WindowAgg (cost=804.39..979.39rows=10000 width=4) -> Sort (cost=804.39..829.39 rows=10000 width=4) Sort Key: x.a -> Seq Scan on x (cost=0.00..140.00 rows=10000 width=4) (8 rows) but postgres=# select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; avg -----------------------5027.0000000000000000 (1 row) Time: 179,310 ms postgres=# select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo from x) s where hi in (lo-1,lo+1); avg -----------------------5027.0000000000000000 (1 row) Time: 78,791 ms When I checked diff, I found, so the problem is count() function. count(*) over () is very slow. - maybe so this is standard aggregate? Regards Pavel Stehule
On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehule<pavel.stehule@gmail.com> wrote: > postgres=# select avg(a) from (select a, row_number() over (order by > a) as r, count(*) over () as rc from x ) p where r in > ((rc+1)/2,(rc+2)/2) ; How does this compare to the plain non-windowing SQL implementation: select a from x order by a offset (select trunc(count(*)/2) from x) limit 1 (except that that only works if count(*) is odd). Interestingly finding the median is actually O(n) using Quickselect. Maybe we should provide a C implementation of quickselect as a window function. I'm not sure how to wedge in the concept that the sort is unnecessary even though the ORDER BY is specified though. I'm also not sure how to handle this if the set has to be spooled to disk. Quicksort and Quickselect do a lot of scans throught he data and wouldn't perform well on disk. -- greg http://mit.edu/~gsstark/resume.pdf
> I'm also not sure how to handle this if the set has to be spooled to > disk. Quicksort and Quickselect do a lot of scans throught he data and > wouldn't perform well on disk. I thing, so problem is in aggregate func used as window func - or some missing optimalisation. when I replaced count(*) over () by subselect (SELECT count(*) FROM ...) then I got expected speed. Pavel > > -- > greg > http://mit.edu/~gsstark/resume.pdf >
2009/7/16 Greg Stark <gsstark@mit.edu>: > On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehule<pavel.stehule@gmail.com> wrote: >> postgres=# select avg(a) from (select a, row_number() over (order by >> a) as r, count(*) over () as rc from x ) p where r in >> ((rc+1)/2,(rc+2)/2) ; > > How does this compare to the plain non-windowing SQL implementation: > > select a from x order by a offset (select trunc(count(*)/2) from x) limit 1 > > (except that that only works if count(*) is odd). > > Interestingly finding the median is actually O(n) using Quickselect. > Maybe we should provide a C implementation of quickselect as a window > function. I'm not sure how to wedge in the concept that the sort is > unnecessary even though the ORDER BY is specified though. median() should be aggregate, not window function, shouldn't it? > > I'm also not sure how to handle this if the set has to be spooled to > disk. Quicksort and Quickselect do a lot of scans throught he data and > wouldn't perform well on disk. The WindowAgg spools rows into the tuplestore, which holds the data in memory as far as it fits in. Do you have any idea how it stores millons of millions of rows without tuplestore? Regards, -- Hitoshi Harada
2009/7/16 Pavel Stehule <pavel.stehule@gmail.com>: >> I'm also not sure how to handle this if the set has to be spooled to >> disk. Quicksort and Quickselect do a lot of scans throught he data and >> wouldn't perform well on disk. > > I thing, so problem is in aggregate func used as window func - or some > missing optimalisation. > > when I replaced count(*) over () by subselect (SELECT count(*) FROM > ...) then I got expected speed. > WindowAgg always spools its input in the buffer though (in your case) it throws away row by row, so compared with pure aggregate it has overhead. I think this is reasonable approach for large data situation and different type of window. But yes, we must improve the current model. 1) There should be some kind of lightweight approach for such small-data/simple-window situations. 2) tuplestore_puttupleslot() seems to me heavy (copy, check, etc) even if the data fits in the memory by triming rows. We want to have more flexible temporary storage on the fly. Regards, -- Hitoshi Harada
2009/7/16 Hitoshi Harada <umi.tanuki@gmail.com>: > 2009/7/16 Greg Stark <gsstark@mit.edu>: >> On Wed, Jul 15, 2009 at 11:18 AM, Pavel Stehule<pavel.stehule@gmail.com> wrote: >>> postgres=# select avg(a) from (select a, row_number() over (order by >>> a) as r, count(*) over () as rc from x ) p where r in >>> ((rc+1)/2,(rc+2)/2) ; >> >> How does this compare to the plain non-windowing SQL implementation: >> >> select a from x order by a offset (select trunc(count(*)/2) from x) limit 1 >> >> (except that that only works if count(*) is odd). >> >> Interestingly finding the median is actually O(n) using Quickselect. >> Maybe we should provide a C implementation of quickselect as a window >> function. I'm not sure how to wedge in the concept that the sort is >> unnecessary even though the ORDER BY is specified though. > > median() should be aggregate, not window function, shouldn't it? > yes - the core of my topic is significant slowness query, that use window functions, when aggregate function was used. This case could be simply optimized. This case isn't important for me. Simply I played with w.f. and I found Celko's query - and I was surprised, because this query was faster, then other - I expected some else. >> >> I'm also not sure how to handle this if the set has to be spooled to >> disk. Quicksort and Quickselect do a lot of scans throught he data and >> wouldn't perform well on disk. > > The WindowAgg spools rows into the tuplestore, which holds the data in > memory as far as it fits in. Do you have any idea how it stores > millons of millions of rows without tuplestore? > > Regards, > > > -- > Hitoshi Harada >
Hello look on: postgres=# explain select count(*) over () from x; QUERY PLAN -------------------------------------------------------------WindowAgg (cost=0.00..265.00 rows=10000 width=0) -> Seq Scanon x (cost=0.00..140.00 rows=10000 width=0) (2 rows) Time: 1,473 ms postgres=# explain select count(*) over (order by a) from x; QUERY PLAN ------------------------------------------------------------------------WindowAgg (cost=0.00..556.25 rows=10000 width=4) -> Index Scan using gg on x (cost=0.00..406.25 rows=10000 width=4) (2 rows) but query1: 160ms query2: 72ms regards Pavel Stehule
2009/7/17 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > look on: > postgres=# explain select count(*) over () from x; > QUERY PLAN > ------------------------------------------------------------- > WindowAgg (cost=0.00..265.00 rows=10000 width=0) > -> Seq Scan on x (cost=0.00..140.00 rows=10000 width=0) > (2 rows) > > Time: 1,473 ms > postgres=# explain select count(*) over (order by a) from x; > QUERY PLAN > ------------------------------------------------------------------------ > WindowAgg (cost=0.00..556.25 rows=10000 width=4) > -> Index Scan using gg on x (cost=0.00..406.25 rows=10000 width=4) > (2 rows) > > but > query1: 160ms > query2: 72ms Well, how about "select a from x order by a"? I wonder if index scan affects more than windowagg. -- Hitoshi Harada
Pavel Stehule <pavel.stehule@gmail.com> wrote: > postgres=# explain select count(*) over () from x; > WindowAgg (cost=0.00..265.00 rows=10000 width=0) > -> Seq Scan on x (cost=0.00..140.00 rows=10000 width=0) > postgres=# explain select count(*) over (order by a) from x; > WindowAgg (cost=0.00..556.25 rows=10000 width=4) > -> Index Scan using gg on x (cost=0.00..406.25 rows=10000 width=4) > query1: 160ms > query2: 72ms EXPLAIN ANALYZE is more telling than just EXPLAIN. Did you run both several times or flush caches carefully between the runs to eliminate caching effects? Is it possible that there are a lot of dead rows in the table (from UPDATEs or DELETEs), and the table has been vacuumed? (Output from VACUUM VERBOSE on the table would show that.) -Kevin
2009/7/17 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> postgres=# explain select count(*) over () from x; > >> WindowAgg (cost=0.00..265.00 rows=10000 width=0) >> -> Seq Scan on x (cost=0.00..140.00 rows=10000 width=0) > >> postgres=# explain select count(*) over (order by a) from x; > >> WindowAgg (cost=0.00..556.25 rows=10000 width=4) >> -> Index Scan using gg on x (cost=0.00..406.25 rows=10000 > width=4) > >> query1: 160ms >> query2: 72ms > > EXPLAIN ANALYZE is more telling than just EXPLAIN. Query1 QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=931.50..931.51 rows=1 width=4) (actual time=274.423..274.425 rows=1 loops=1) -> Subquery Scan p (cost=0.00..931.25 rows=100 width=4) (actual time=220.220..274.388 rows=2 loops=1) Filter: ((p.r = ((p.rc + 1) / 2)) OR (p.r = ((p.rc + 2) / 2))) -> WindowAgg (cost=0.00..681.25 rows=10000 width=4) (actual time=120.622..247.618 rows=10000 loops=1) -> WindowAgg (cost=0.00..556.25 rows=10000 width=4) (actual time=0.088..89.848 rows=10000 loops=1) -> Index Scan using gg on x (cost=0.00..406.25 rows=10000 width=4) (actual time=0.066..33.962 rows=10000 loopsTotal runtime: 274.934 ms (7 rows) query2: postgres=# explain analyze select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo from x) s where hi in (lo-1,lo+1); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=1595.89..1595.90 rows=1 width=4) (actual time=215.101..215.103 rows=1 loops=1) -> Subquery Scan s (cost=1220.63..1595.63 rows=100 width=4) (actual time=175.159..215.073 rows=1 loops=1) Filter: ((s.hi = (s.lo - 1)) OR (s.hi = (s.lo + 1))) -> WindowAgg (cost=1220.63..1395.63 rows=10000 width=4) (actual time=136.985..191.231 rows=10000 loops=1) -> Sort (cost=1220.63..1245.63 rows=10000 width=4) (actual time=136.970..151.905 rows=10000 loops=1) Sort Key: x.a Sort Method: quicksort Memory: 686kB -> WindowAgg (cost=0.00..556.25 rows=10000 width=4) (actual time=0.078..106.927 rows=10000 loops=1) -> Index Scan using gg on x (cost=0.00..406.25 rows=10000 width=4) (actual time=0.058..33.594 rows=10000Total runtime: 215.845 ms (10 rows) > > Did you run both several times or flush caches carefully between the > runs to eliminate caching effects? yes, - in both variants data was read from cache. > > Is it possible that there are a lot of dead rows in the table (from > UPDATEs or DELETEs), and the table has been vacuumed? (Output from > VACUUM VERBOSE on the table would show that.) > table was filled with random numbers and analyzed - you can simple check it - look on begin of the thread. This table wasn't updated. Pavel > -Kevin >
2009/7/17 Hitoshi Harada <umi.tanuki@gmail.com>: > 2009/7/17 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> look on: >> postgres=# explain select count(*) over () from x; >> QUERY PLAN >> ------------------------------------------------------------- >> WindowAgg (cost=0.00..265.00 rows=10000 width=0) >> -> Seq Scan on x (cost=0.00..140.00 rows=10000 width=0) >> (2 rows) >> >> Time: 1,473 ms >> postgres=# explain select count(*) over (order by a) from x; >> QUERY PLAN >> ------------------------------------------------------------------------ >> WindowAgg (cost=0.00..556.25 rows=10000 width=4) >> -> Index Scan using gg on x (cost=0.00..406.25 rows=10000 width=4) >> (2 rows) >> >> but >> query1: 160ms >> query2: 72ms > > Well, how about "select a from x order by a"? > I wonder if index scan affects more than windowagg. > select a from x -> 42ms select a from x order by a -> 50ms all data are from cache. > -- > Hitoshi Harada >
Pavel Stehule <pavel.stehule@gmail.com> wrote: > table was filled with random numbers and analyzed - you can simple > check it - look on begin of the thread. This table wasn't updated. Confirmed. The ORDER BY consistently speeds up the query. Odd.... Sort speed varied based on random sequence generated, but typical plan and timings: test=# explain analyze select count(*) over () from x;WindowAgg (cost=0.00..229.00 rows=10000 width=0) (actual time=32.435..97.448 rows=10000 loops=1) -> Seq Scan on x (cost=0.00..104.00 rows=10000 width=0) (actual time=0.007..14.818 rows=10000 loops=1)Total runtime: 112.526 ms test=# explain analyze select count(*) over (order by a) from x;WindowAgg (cost=768.39..943.39 rows=10000 width=4) (actual time=34.982..87.803 rows=10000 loops=1) -> Sort (cost=768.39..793.39 rows=10000 width=4) (actual time=34.962..49.533 rows=10000 loops=1) Sort Key: a Sort Method: quicksort Memory: 491kB -> Seq Scanon x (cost=0.00..104.00 rows=10000 width=4) (actual time=0.006..14.682 rows=10000 loops=1)Total runtime: 102.023 ms -Kevin
2009/7/18 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> table was filled with random numbers and analyzed - you can simple >> check it - look on begin of the thread. This table wasn't updated. > > Confirmed. The ORDER BY consistently speeds up the query. Odd.... > > Sort speed varied based on random sequence generated, but typical > plan and timings: Kevin's result is quite odd. I confirmed that using IndexScan looked fater in Pavel's result but yours is with Sort node. I found that those results are seen in relatively small set. I increased the source table up to 100000 rows and the OVER (ORDER BY a) case got slower. What really suprised me is in any case without ORDER BY clause in the window, WindowAgg node starts quite later than the lower node finishes. > test=# explain analyze select count(*) over () from x; > WindowAgg (cost=0.00..229.00 rows=10000 width=0) (actual > time=32.435..97.448 rows=10000 loops=1) > -> Seq Scan on x (cost=0.00..104.00 rows=10000 width=0) (actual > time=0.007..14.818 rows=10000 loops=1) > Total runtime: 112.526 ms I had thought WindowAgg actual time would be 14.xxx ... 97.448 but actually 32.435 ....97.448. ORDER BY case returns the first result as soon as underneath Sort (or IndexScan) returns the first (actually the second), because window frame has only a row. But even the frame contains all the row (i.e. OVER() case) can return the first row not so later than the underneath node returns the last. If I understand exlain analyze correctly and it tells us the fact, WindowAgg without ORDER BY clause gets unreasonably slow. Let me see. Regards, -- Hitoshi Harada
2009/7/18 Hitoshi Harada <umi.tanuki@gmail.com>: > If I understand exlain analyze correctly and it tells us the fact, > WindowAgg without ORDER BY clause gets unreasonably slow. Let me see. > I haven't determined the difference between with and without ORDER BY clause in OVER(), but I took a benchmark that throws an interesting result. $ bin/psql regression -c 'explain analyze select count(*) over() from x' QUERY PLAN -------------------------------------------------------------------------------- --------------------------------WindowAgg (cost=0.00..2741.00 rows=100000 width=0) (actual time=3725.294..4559 .828 rows=100000 loops=1) -> Seq Scan on x (cost=0.00..1491.00 rows=100000 width=0) (actual time=0.11 2..310.349 rows=100000 loops=1)Total runtime: 4811.115 ms (3 rows) The query is quite slow because profiling hook function calls gettimeofday() each time. And here's the result that counted up eval_windowaggregate() call and its children functions. Elapse time is in second and it is subtracted with total gettimeofday() overhead. eval_windowaggregates: Count 100000 Elapse 0.588426 Address |Name |Count |Elapse(Total) 0x8204067|initialize_windowaggregate | 1| 0.000277 0x8204d4a|spool_tuples |100002| 0.620092 0x83dcd08|tuplestore_select_read_pointer|100001| 0.011080 0x83dda2f|tuplestore_gettupleslot |100001| 0.049005 0x8204fdd|row_is_in_frame |100000| 0.014978 0x8204168|advance_windowaggregate |100000| 0.025675 0x81ead8a|ExecClearTuple |100000| 0.022105 0x8204462|finalize_windowaggregate | 1| 0.000015 0x8204120|MemoryContextSwitchTo | 2| 0.000000 spool_tuples() is dominant in eval_windowaggregates(). I think it is not needed if the query contains only simple aggregate like count(*) OVER () but currently we copy all the rows from the source table to tuplestore. Even if it fits in memory, the copy operation costs too much. I am thinking about how to avoid unnecessary copy overhead... Regards, --- Hitoshi Harada