Generalized Top Queries on PostgreSQL
От | Roberto Cornacchia |
---|---|
Тема | Generalized Top Queries on PostgreSQL |
Дата | |
Msg-id | Pine.GSO.3.96.1000219022048.17025B-100000@adina.cs.unibo.it обсуждение исходный текст |
Список | pgsql-hackers |
Hello everyone, Dr. P. Ciaccia, A. Ghidini and I (R. Cornacchia), have recently developed, and implemented on PostgreSQL, a class of TOP Queries. Here I would like to briefly present our main results. The proposed sintax is the following: SELECT FROM WHERE STOP AFTER <N> FOR EACH <stop-grouping attribute list> RANK BY <ranking specification> ORDER BY where both FOR EACH and RANK BY (and STOP AFTER clause as a whole) are optional. Here is an example: ---- "Retrieve the 10 highest paid employees for each department.Order the results first on department name and then on employeename" SELECT Dept.name, Emp.name, Emp.salary FROM Emp, Dept WHERE Emp.dno = Dept.dno STOP AFTER 10 FOR EACH Dept.dno RANK BY Emp.salary DESC ORDER BY Dept.name, Emp.name ---- We called such a query "Generalized Top Query". The semantics introduced by the example is derived from the one proposed by Carey and Kossmann ("On Saying 'Enougth Already' in SQL", 1997). The main points of our extension are: 1) You can obtain the <N> Top rows as dictated by the RANK BY specification and then produce the results according to the ORDER BY specification. This means much more flexibility. 2) You can obtain the Top N rows for each of the "groups" which are individuated by the FOR EACH specification (from here "Generalized" top query). Please consider that our semantics completely includes the current LIMIT clause capabilities, offering at the same time some additional ones. For what concerns PostgreSQL, one of our first issues were to keep basically unchanged the current framework. Here is a brief list of major changes we operated: - We provided 2 new physical operators: ScanStop: stops the stream to <N> rows SortStop: performs an ad-hoc sorton the stream, retaining only the <N> top rows. - The optimizer can operate a push-down in the path-tree of those two operators. It means we reduce the stream cardinalityas soon as possible, leading to a great improvement on the performances of the subsequent operators. - A larger number of optimizable operators force the optimizer to generate more plans to handle, so we extended the operatorsproperties and the pruning rules. - We extended the cost model as well, introducing estimates for the cost of producing the FIRST N tuples. - The rules involved in the Stop operators placement are mainly based upon referential integrity constraints. In this waywe provided a *temporary* solution to this Postgres lack, storing the informations on constraints in two new system catalogs. - The FOR EACH generalization leads to natural generalization of all the above concepts. - The evaluation of GROUP BY clause can be performed before and after the STOP AFTER, and both makes sense. In order tooptimize the Stop After operator, we choose to evaluate it before the group by (and the order by). * Let us summarize the current state of our work. * - Our extension is highly optimized and can lead to performance improvements of several orders of magnitude in comparisonwith the current "LIMIT approach" - It has a low impact on the original PostgreSQL code - It does not affect the usual processing of classical queries - It is soon expected to work with views. - It works with subqueries - It works with cursors - It efficently makes use of indices - It is updated to a 6.6 snapshot of November 1999 (we are waiting for a more stable release) -------------------------------------------------------------------- More details on this subject are going to be presented in a forthcoming paper, available if interested. We would be glad to receive your comments on this work. Best regards, R. Cornacchia (cornacch@cs.unibo.it) Computer Science, University of Bologna A. Ghidini (ghidini@cs.unibo.it) Computer Science, University of Bologna Dr. Paolo Ciaccia (ciaccia@cs.unibo.it) DEIS CSITE-CNR, University of Bologna
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Bruce MomjianДата:
Сообщение: Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level ?
Следующее
От: Hannu KrosingДата:
Сообщение: Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level ?