Generalized Top Queries on PostgreSQL

Поиск
Список
Период
Сортировка
От Roberto Cornacchia
Тема Generalized Top Queries on PostgreSQL
Дата
Msg-id Pine.GSO.3.96.1000219022048.17025B-100000@adina.cs.unibo.it
обсуждение исходный текст
Список pgsql-hackers
Hello everyone, 

Dr. P. Ciaccia, A. Ghidini and I (R. Cornacchia), have recently
developed, and implemented on PostgreSQL, a class of TOP Queries. 

Here I would like to briefly present our main results.

The proposed sintax is the following:

SELECT
FROM
WHERE
STOP AFTER <N> FOR EACH <stop-grouping attribute list> RANK BY <ranking specification>
ORDER BY 

where both FOR EACH and RANK BY (and STOP AFTER clause as a whole) are
optional.

Here is an example:

----
"Retrieve the 10 highest paid employees for each department.Order the results first on department name and then on
employeename"
 

SELECT Dept.name, Emp.name, Emp.salary
FROM Emp, Dept
WHERE Emp.dno = Dept.dno
STOP AFTER 10 FOR EACH Dept.dno RANK BY Emp.salary DESC
ORDER BY Dept.name, Emp.name
----

We called such a query "Generalized Top Query".

The semantics introduced by the example is derived from the one
proposed by Carey and Kossmann ("On Saying 'Enougth Already' in
SQL", 1997). The main points of our extension are: 

1) You can obtain the <N> Top rows as dictated by the RANK BY
specification and then produce the results according to the 
ORDER BY specification. This means much more flexibility.

2) You can obtain the Top N rows for each of the "groups" which are 
individuated by the FOR EACH specification 
(from here "Generalized" top query).


Please consider that our semantics completely includes the current 
LIMIT clause capabilities, offering at the same time some additional 
ones.

For what concerns PostgreSQL, one of our first issues were to keep
basically unchanged
the current framework. 

Here is a brief list of major changes we operated:

- We provided 2 new physical operators: ScanStop: stops the stream to <N> rows           SortStop: performs an ad-hoc
sorton the stream,            retaining only the <N> top rows. 
 

- The optimizer can operate a push-down in the path-tree of  those two operators. It means we reduce the  stream
cardinalityas soon as possible, leading to a great improvement on the performances of the subsequent operators.
 

- A larger number of optimizable operators force the optimizer to generate more    plans to handle, so we extended the
operatorsproperties and the pruning rules.
 

- We extended the cost model as well, introducing estimates for the cost of producing the FIRST N tuples.

- The rules involved in the Stop operators placement are mainly based upon referential integrity constraints. In this
waywe  provided a *temporary* solution to this Postgres lack, storing the informations on constraints in two new system
catalogs.

- The FOR EACH generalization leads to natural generalization of all the above concepts.

- The evaluation of GROUP BY  clause can be performed before and after the STOP AFTER, and both makes sense. In order
tooptimize the Stop After operator, we choose to evaluate it before the group by  (and the order by).
 


* Let us summarize the current state of our work. * 

- Our extension is highly optimized and can lead to performance improvements of several orders of magnitude in
comparisonwith the current "LIMIT approach"
 
- It has a low impact on the original PostgreSQL code
- It does not affect the usual processing of classical queries
- It is soon expected to work with views. 
- It works with subqueries
- It works with cursors
- It efficently makes use of indices
- It is updated to a 6.6 snapshot of November 1999 (we are waiting  for a more stable release)

--------------------------------------------------------------------

More details on this subject are going to be presented in a
forthcoming paper, available if interested. 

We would be glad to receive your comments on this work. 

Best regards,  

R. Cornacchia (cornacch@cs.unibo.it) Computer Science, University of
Bologna

A. Ghidini (ghidini@cs.unibo.it) Computer Science, University of Bologna 

Dr. Paolo Ciaccia (ciaccia@cs.unibo.it) DEIS CSITE-CNR, University of
Bologna 





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level ?
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level ?