Обсуждение: Generalizing max and min

Поиск
Список
Период
Сортировка

Generalizing max and min

От
Bruno Wolff III
Дата:
I was thinking about how to generalize max and min and came up with
something that I think conceptually provides a better view of the
problem, but may not be of any practical help.

The generalization is that both max and min can be looked at as unions
over a set with a partial order operator. (You can also look at them
as intesection operators which may be more natural in other cases.)
The union operator for max is that the union of two items is the
greater of the two items according to the < operator. We can use an
index on < as a short cut because the union of two sets A and B is
equal to A if (and only if) B < A.

Another type of operator that would work similarly would be the union
of two dimensional boxes. If we had an index that would tell us
that a bunch of the boxes in the union were already contained in the
partially computed union then we know that they could all be safely
skipped.

So if you were to try to make use of this situation, I would think you
would want to define a subtype of aggregates for unions. They would
be defined by a supplied union function and an optional partial order.
The initial condition would always be null. The union function would
really need to be a function that had the appropiate properties
(include null union anything is equal to anything).

The hard problem is still trying to decide how to take advantage of the
index when it is available. Sometimes an index scan should be used,
sometimes a sequential scan and sometimes a query with multiple
aggregates should get broken up into multiple subqueries so that
multiple indexes can be used.

P.S. A good way to see the max can be viewed is a union is to consider
a number as representing the set of all numbers of that domain less than
or equal to that number.

Re: Generalizing max and min

От
"Jim C. Nasby"
Дата:
On Wed, Jun 11, 2003 at 07:31:22PM -0500, Bruno Wolff III wrote:
> The hard problem is still trying to decide how to take advantage of the
> index when it is available. Sometimes an index scan should be used,
> sometimes a sequential scan and sometimes a query with multiple
> aggregates should get broken up into multiple subqueries so that
> multiple indexes can be used.

Keep in mind that with a BTREE you don't need to scan the index, you
only need to run down the left or right side of the tree (granted, an
over-simplification in the case of MVCC, but you get the point).

Even if you don't have the ideal index, you can still avoid a full index
scan. For example:
CREATE INDEX .. (a, b);
SELECT max(b) ...;

Instead of scanning the entire index, you only need to get the last
(assuming asc index) tuple for each value of a, and take the maximum of
those (of course you don't have to store the set of max for each a; you
can do a simple comparison each time you get a new last tuple).
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"