Re: MAP syntax for arrays

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: MAP syntax for arrays
Дата
Msg-id CAFjFpRek0WWatOLUJ4jbt0JWhF4_AnevwNoBza21wR7JJLka4g@mail.gmail.com
обсуждение исходный текст
Ответ на MAP syntax for arrays  (Ildar Musin <i.musin@postgrespro.ru>)
Ответы Re: MAP syntax for arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, May 4, 2018 at 6:38 PM, Ildar Musin <i.musin@postgrespro.ru> wrote:
> Hello hackers,
>
> Recently I was working with sql arrays in postgres and it turned out
> that postgres doesn't have such very convinient functional constructions
> as map, reduce and filter. Currently to map function over array user has
> to make a subquery like:
>
> select u.* from
>         my_table,
>         lateral (
>                 select array_agg(lower(elem))
>                 from unnest(arr) as elem
>         ) as u;
>
> Which is not only inconvenient but not very efficient as well (see
> 'Demo' section below).

Is there a way we can improve unnest() and array_agg() to match the
performance you have specified by let's say optimizing the cases
specially when those two are used together. Identifying that may be
some work, but will not require introducing new syntax.

>
> When I dug into the code I found that postgres already has the needed
> infrastructure for implementing map for arrays; actually array coercing
> already works that way (it basically maps cast function).
>
> In the attached patch there is a simple map implementation which
> introduces new expression type and syntax:
>
>         MAP(<func_name> OVER <array_expression>)
>
> For example:
>
>         SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
>             ?column?
>         -----------------
>          {ONE,TWO,THREE}
>         (1 row)
>
> This is probably not the most useful notation and it would be better to
> have syntax for mapping arbitrary expressions over array, not just
> function. I'm struggling to come up with a good idea of how it should
> look like. It could look something like following:
>
>         MAP(<expr> FOR <placeholder> IN <array_expressin>)
>
> For instance:
>
>         SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);
>
> Looking forward for community's suggestions!

What if the expression has more than one variable, each mapping to a
different array? What if the arrays have different lengths or worse
different dimensions? This looks like the way SRFs used to work.

Instead of introducing a new syntax, is it possible to detect that
argument to a function is an array of the same type as the argument
and apply MAP automatically? In your example, upper(arr) would detect
that the input is an array of the same type as the scalar argument
type and do array_agg(upper(arr[id1], arr[id2], ...).

>
>  elements per array |  map (tps) | unnest/aggregate (tps)
> --------------------+------------+------------------------
>                   5 | 139.105359 |       74.434010
>                  10 |  74.089743 |       43.622554
>                 100 |   7.693000 |        5.325805
>
> Apparently map is more efficient for small arrays. And as the size of
> array increases the difference decreases.

I am afraid that the way difference is diminishing with increase in
the number of elements, unnest, array_agg combination might win for
large number of elements. Have you tried that?
If we try to improve unnest, array_agg combination for small array, we
will get consistent performance without any additional syntax.
Although, I admit that query involving unnest and array_agg is not
very readable.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sergey Mirvoda
Дата:
Сообщение: Re: citext function overloads for text parameters
Следующее
От: Alexander Kuzmenkov
Дата:
Сообщение: Re: Reopen logfile on SIGHUP