mostly null slots in hash-aggs cause performance degradation

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	mostly null slots in hash-aggs cause performance degradation
Дата	22 июля 2016 г. 03:05:39
Msg-id	20160722000510.6wcduni5oe7mmjb4@alap3.anarazel.de обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

Hi,

To build the representative tuple for each group in hash-aggregates we
do:
static AggHashEntry
lookup_hash_entry(AggState *aggstate, TupleTableSlot *inputslot)
{
.../* if first time through, initialize hashslot by cloning input slot */if (hashslot->tts_tupleDescriptor == NULL){
MemoryContextoldContext = MemoryContextSwitchTo(hashslot->tts_mcxt);    /* get rid of constraints */
ExecSetSlotDescriptor(hashslot,CreateTupleDescCopy(inputslot->tts_tupleDescriptor));
MemoryContextSwitchTo(oldContext);   /* Make sure all unused columns are NULLs */    ExecStoreAllNullTuple(hashslot);}
 
/* transfer just the needed columns into hashslot */slot_getsomeattrs(inputslot,
linitial_int(aggstate->hash_needed));foreach(l,aggstate->hash_needed){    int            varNumber = lfirst_int(l) -
1;
    hashslot->tts_values[varNumber] = inputslot->tts_values[varNumber];    hashslot->tts_isnull[varNumber] =
inputslot->tts_isnull[varNumber];}

i.e. we have a tuple that's all null, except for the group-by columns. I
n LookupTupleHashEntry(), we form a minimal tuple of that, if the tuple
represents a new group, which is stored in the hash-table.  Then, for
comparisons, we'll deform that again in execTuplesMatch, whenever a
hash-lookup finds a pre-existing tuple (i.e. hash conflict, or
additional row in group)..


If a tuple has a couple columns, and the group by column isn't leading,
that means we'll spend a considerable amount of time forming and
deforming NULL columns. I've seen the position of the grouping column
make as much as 40% performance difference, *even if* input to the
aggregates refers a later column.


Thus it seems like we instead should have a separate targetlist for the
hash slot, only containing the grouped-by columns.


I'm not entirely sure what the best way to do that is, though.  The
simpler, and hackier, way would be to do that locally in nodeAgg.c, and
reconstruct the expected tuple again in agg_retrieve_hash_table() (where
we currently do the ExecStoreMinimalTuple()).  Alternatively we could
build a separate targetlist at createplan.c time; but the details aren't
entirely clear to me.

Comments?

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 22 июля 2016 г., 03:02:52
Сообщение: Re: Password identifiers, protocol aging and SCRAM protocol

Следующее

От: Michael Paquier
Дата: 22 июля 2016 г., 03:06:20
Сообщение: Re: Password identifiers, protocol aging and SCRAM protocol

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

mostly null slots in hash-aggs cause performance degradation

Предыдущее

Следующее