Re: iceberg queries
От | Jan Wieck |
---|---|
Тема | Re: iceberg queries |
Дата | |
Msg-id | 3E40011C.94F11C92@Yahoo.com обсуждение исходный текст |
Ответ на | Re: iceberg queries (Christoph Haller <ch@rodos.fzk.de>) |
Ответы |
Re: iceberg queries
|
Список | pgsql-sql |
Christoph Haller wrote: > > > > > Does PostgreSQL optimizer handle iceberg queries well? > > > What do you mean by "iceberg query" ? > I've never heard this term. Iceberg queries compute one or more aggregate functions to find aggregate values above a specified threshold. A typical iceberg query would be SELECT a, count(a) FROM tab GROUP BY a HAVING count(a) >= 100; This base form can easily be made more complicated by doing self joins and the like. This type of query is often found in market research, data warehousing and search engines. As to the original question, if an index is available that returns the rows in the sort order of the GROUP BY clause, PostgreSQL defaults to an index scan, otherwise it will do a sort of the rows matching an optional WHERE clause. This sorted set is then grouped and aggregated and filtered by the HAVING clause after aggregation. It is well known that this approach does not scale well for large data sets. But in contrast to a specialized statistical software, PostgreSQL has to answer the query precisely. So sampling or bucket methods aren't options. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
В списке pgsql-sql по дате отправления: