Re: Why percent_rank is so slower than rank?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Why percent_rank is so slower than rank?
Дата
Msg-id 29707.1291928483@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Why percent_rank is so slower than rank?  (Jie Li <jay23jack@gmail.com>)
Ответы Re: Why percent_rank is so slower than rank?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Jie Li <jay23jack@gmail.com> writes:
> I'm new to window functions. Recently I run some simple queries but
> surprised to find percent_rank is so slower than rank, could anybody tell me
> why?

Huh, interesting.  I can reproduce this with toy data, such as

create table inventory1 (inv_date_sk int, inv_item_sk int);
insert into inventory1 select 1, random()* 100000 from generate_series(1,189000);
explain analyze select inv_date_sk,inv_item_sk, percent_rank()over(partition by inv_date_sk order by inv_item_sk) from
inventory1;

The example is *not* particularly slow if you leave work_mem at default.
But if you bump up work_mem enough so that the WindowAgg's internal
tuplestore fits into memory, it slows down like crazy.  A bit of quality
time with oprofile shows that all the time is going into this memmove()
in tuplestore_trim():
   /*    * Slide the array down and readjust pointers.  This may look pretty    * stupid, but we expect that there will
usuallynot be very many    * tuple-pointers to move, so this isn't that expensive; and it keeps a    * lot of other
logicsimple.    *    * In fact, in the current usage for merge joins, it's demonstrable that    * there will always be
exactlyone non-removed tuple; so optimize that    * case.    */   if (nremove + 1 == state->memtupcount)
state->memtuples[0]= state->memtuples[nremove];   else       memmove(state->memtuples, state->memtuples + nremove,
        (state->memtupcount - nremove) * sizeof(void *));
 

We're throwing away one tuple at a time as we advance forward through
the tuplestore, and moving 100000+ tuple pointers each time.  Ugh.
This code was all right when written, because (IIRC) the mergejoin
case was actually the only caller.  But it's not all right for
WindowAgg's less-predictable usage patterns.

I thought for a bit about changing things around so that the first-used
tuple slot isn't necessarily state->memtuples[0], but just like the
comment says, that complicates a lot of other logic.  And there isn't
any easy place to reclaim the wasted slots later.

What seems like the best bet is to put in a heuristic to make
tuplestore_trim simply not do anything until nremove reaches some
reasonably large amount, perhaps 10% of the number of stored tuples.
This wastes up to 10% of the alloted memory, but that seems tolerable.
We could complicate things a bit more by remembering that so-and-so
many slots are authorized to be removed, and forcing a trim operation
to discard them if we find ourselves in memory trouble.  I'm not sure
that extra complication is worthwhile though.  Comments?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Patch to add a primary key using an existing index
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: initdb failure with Postgres 8.4.4