Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

Поиск
Список
Период
Сортировка
От Andrew Gierth
Тема Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Дата
Msg-id 2b8b55b8ba82f83ef4e6070b95fb0cd0@news-out.riddles.org.uk
обсуждение исходный текст
Ответы Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)  (David Fetter <david@fetter.org>)
Список pgsql-hackers
The spec defines two types of aggregate function classed as "ordered set
function", as follows:
1. An "inverse distribution function" taking one argument (which must be  a grouped column or otherwise constant within
groups)plus a sorted  group with exactly one column:  =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ...  The
motivatingexample for this (and the only ones in the spec) are  percentile_cont and percentile_disc, to return a
percentileresult  from a continuous or discrete distribution. (Thus  percentile_cont(0.5) within group (order by x) is
thespec's version  of a median(x) function.)
 
2. A "hypothetical set function" taking N arguments of arbitrary types  (a la VARIADIC "any", rather than a fixed list)
plusa sorted group  with N columns of matching types:  =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...))
from...  (where typeof(p1)==typeof(q1) and so on, at least up to trivial  conversions)  The motivating example here is
tobe able to do rank(p1,p2,...) to  return the rank that the specified values would have had if they were  added to the
group.
As usual, we do not want to constrain ourselves to supporting only the
specific cases in the spec, but would prefer a general solution.
We (meaning myself and Atri) have an implementation that basically
works, though it is not yet complete, but before taking it any further
we need to resolve the design question of how to represent these two
types of function in the system catalogs. The fact that there are in
effect two parts to the parameter list, which are either independent
(for inverse distribution funcs) or closely related (for hypothetical
set functions), doesn't seem to point to an obvious way to represent
this in pg_proc/pg_aggregate.
I'm not yet satisfied with the method used in our implementation, so
we're throwing this open for suggestions. We will post the
work-in-progress patch along with a description of its current
implementation shortly.
One of the major complications is that we ideally want to be able to do
polymorphism based on the type of the sorted group, specifically in
order to be able to do
percentile_disc(float8) within group (order by anyelement)
returning anyelement. (i.e. we should be able to get a discrete
percentile from any type that is orderable.) The question here is how to
resolve the return type both of the aggregate itself and of the finalfn.
We've also had an expression of interest in extending this to allow
percentile_disc(float8[]) and percentile_cont(float8[]) returning
arrays; e.g. percentile_cont(array[0, 0.25, 0.5, 0.75, 1]) to return an
array containing the bounds, median and quartiles in one go. This is an
extension to the spec but it seems sufficiently obviously useful to be
worth supporting.

Comments?

-- 
Andrew (irc:RhodiumToad)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)