[PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
I've been working on speeding up aggregate functions when used in the context of a window's with non fixed frame heads.

Currently if the top of the window is not in a fixed position then when the window frame moves down the aggregates must be recalculated by looping over each record in the new scope of the window frame.

Take the following as an example:

SELECT SUM(n::int) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM generate_series(1,20000) g(n);

Here the frame head moves along with the current row and always finishes with the last row in the partition or the last row in the frame if no partitioning exists.
Currently PostgreSQL must recalculate the aggregate each time, this means looping from the current row to the end of the frame for each row produced... So the first row needs 20000 iterations, the 2nd row 19999, 3rd row 19998 etc.

The above query on the unpatched head, on my laptop takes 36676 ms. With the attached patch it takes 73 ms. So a speed increase of about 500 times for 20,000 rows. Of course this performance gap will increase with more rows and decrease with less rows, but it's even seeing a performance improvement with as little as 50 rows.


The attached patch is not quite finished yet, I've not yet fully covered SUM and AVG for all types. I just need to look into numeric a bit more before I figure that one out.
Here is a list of things still todo:

1. Fully implement negative transition functions for SUM and AVG.
2. CREATE AGGREGATE support for user defined aggregates.
3. Documentation.
4. Look to see which other aggregates apart from COUNT, SUM and AVG that can make use of this.

Please note that if the aggregate function does not have a negative transition function setup, e.g MAX and MIN, then the old behaviour will take place.

One thing that is currently on my mind is what to do when passing volatile functions to the aggregate. Since the number of times we execute a volatile function will much depend on the window frame options, I think we should include some sort of warning in the documentation that "The number of times that the expression is evaluated within the aggregate function when used in the context of a WINDOW is undefined". The other option would be to disable this optimisation if the aggregate expression contained a volatile function, but doing this, to me seems a bit weird as is anyone actually going to be depending on a volatile function being executed so many times?

If anyone can see any road blocking issues to why this optimisation cannot be done please let me know.

Otherwise I'll continue to work on this so that it is ready for CF4.

Regards

David Rowley
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Extension Templates S03E11
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] Negative Transition Aggregate Functions (WIP)