Re: Infinite Interval

Поиск
Список
Период
Сортировка
От jian he
Тема Re: Infinite Interval
Дата
Msg-id CACJufxHFmB6uOWd8Rz6cppqG2-UOdqTZ6_2DG0FJ9FCfvErpHQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Infinite Interval  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы Re: Infinite Interval  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
On Tue, Sep 19, 2023 at 7:14 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
>
> I haven't reviewed this part in any detail yet, but I can confirm that
> there are some impressive performance improvements for avg(). However,
> for me, sum() seems to be consistently a few percent slower with this
> patch.

Now there should be no degradation.

> The introduction of an internal transition state struct seems like a
> promising approach, but I think there is more to be gained by
> eliminating per-row pallocs, and IntervalAggState's MemoryContext
> (interval addition, unlike numeric addition, doesn't require memory
> allocation, right?).

"eliminating per-row pallocs"
I guess I understand. If not , please point it out.

> IntervalAggState's MemoryContext
> (interval addition, unlike numeric addition, doesn't require memory
> allocation, right?).

if I remove IntervalAggState's element: MemoryContext, it will not work.
so I don't understand what the above sentence means...... Sorry. (it's
my problem)

> Also, this needs to include serialization and deserialization
> functions, otherwise these aggregates will no longer be able to use
> parallel workers. That makes a big difference to queryE, if the size
> of the test data is scaled up.
>
I tried, but failed. sum(interval) result is correct, but
avg(interval) result is wrong.

Datum
interval_avg_serialize(PG_FUNCTION_ARGS)
{
IntervalAggState *state;
StringInfoData buf;
bytea    *result;
/* Ensure we disallow calling when not in aggregate context */
if (!AggCheckCallContext(fcinfo, NULL))
elog(ERROR, "aggregate function called in non-aggregate context");
state = (IntervalAggState *) PG_GETARG_POINTER(0);
pq_begintypsend(&buf);
/* N */
pq_sendint64(&buf, state->N);
/* Interval struct elements, one by one. */
pq_sendint64(&buf, state->sumX.time);
pq_sendint32(&buf, state->sumX.day);
pq_sendint32(&buf, state->sumX.month);
/* pInfcount */
pq_sendint64(&buf, state->pInfcount);
/* nInfcount */
pq_sendint64(&buf, state->nInfcount);
result = pq_endtypsend(&buf);
PG_RETURN_BYTEA_P(result);
}

SELECT sum(b) ,avg(b)
        ,avg(b) = sum(b)/count(*) as should_be_true
        ,avg(b) * count(*) = sum(b) as should_be_true_too
from interval_aggtest_1m; --1million row.
The above query expects two bool columns to return true, but actually
both returns false.(spend some time found out parallel mode will make
the number of rows to 1_000_002, should be 1_000_0000).


> This comment:
>
> +   int64       N;              /* count of processed numbers */
>
> should be "count of processed intervals".

fixed.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Следующее
От: David Rowley
Дата:
Сообщение: Re: Comment about set_join_pathlist_hook()