Re: Extracting superlatives - SQL design philosophy

Поиск

Список

Период

Сортировка

От	Dave Crooke
Тема	Re: Extracting superlatives - SQL design philosophy
Дата	24 февраля 2010 г. 19:37:24
Msg-id	ca24673e1002241537o5ce1295egee3e2154a9f1b8a2@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Extracting superlatives - SQL design philosophy (Richard Huxton <dev@archonet.com>)
Ответы	Re: Extracting superlatives - SQL design philosophy
Список	pgsql-performance

Дерево обсуждения

1. The city temps table is a toy example, not meant to be realistic :-)

2. Yes, my (Java) algorithm is deterministic ... it will return
exactly one row per city, and that will be the row (or strictly, *a*
row) containing the highest temp. Temp value ties will break in favour
of earlier rows in Guinness Book of Records tradition :-) It's
equivalent to a HashAggregate implementation.


The following two query plans (from my real schema) illustrate the
itch I am trying to scratch .... I want the functionality of the 2nd
one, but with the execution plan structure of the first:

# explain analyse select a, max(b) from perf_raw_2010_02_23 group by a;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=117953.09..117961.07 rows=639 width=8) (actual
time=10861.845..10863.008 rows=1023 loops=1)
   ->  Seq Scan on perf_raw_2010_02_23  (cost=0.00..91572.39
rows=5276139 width=8) (actual time=0.038..4459.222 rows=5276139
loops=1)
 Total runtime: 10863.856 ms
(3 rows)

Time: 10864.817 ms
# explain analyse select distinct on (a) * from perf_raw_2010_02_23
order by a, b desc ;
                                                                 QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1059395.04..1085775.73 rows=639 width=28) (actual
time=46011.204..58428.210 rows=1023 loops=1)
   ->  Sort  (cost=1059395.04..1072585.39 rows=5276139 width=28)
(actual time=46011.200..53561.112 rows=5276139 loops=1)
         Sort Key: a, b
         Sort Method:  external merge  Disk: 247584kB
-- actually OS RAM buffers
         ->  Seq Scan on perf_raw_2010_02_23  (cost=0.00..91572.39
rows=5276139 width=28) (actual time=0.047..6491.036 rows=5276139
loops=1)
 Total runtime: 58516.185 ms
(6 rows)

Time: 58517.233 ms

The only difference between these two is that the second query returns
the whole row. The *ratio* in cost between these two plans increases
in proportion to log(n) of the table size ... at 5.5m rows its
livable, at 500m it's probably not :-!

Cheers
Dave

On Wed, Feb 24, 2010 at 5:12 PM, Richard Huxton <dev@archonet.com> wrote:
> On 24/02/10 22:47, Dave Crooke wrote:
>>
>> I'd imagine it would be possible to have a query planner optimization
>> that would convert Garrett's DISTINCT ON syntax to do what I was
>> trying to, by realizing that DISTINCT ON X ... ORDER BY Y DESC is
>> going to return the the one row for each X which has the highest value
>> of Y, and so use a MAX-structured accumulation instead of a sort.
>
> Why is there only one row? For city temperatures, that seems unlikely.
>
> In the event of more than one row does your algorithm give repeatable
> results?
>
> --
>  Richard Huxton
>  Archonet Ltd
>

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Extracting superlatives - SQL design philosophy