MAP syntax for arrays

Поиск
Список
Период
Сортировка
От Ildar Musin
Тема MAP syntax for arrays
Дата
Msg-id f86e5d8b-20a7-36ce-acd6-ac58b290c8f0@postgrespro.ru
обсуждение исходный текст
Ответы Re: MAP syntax for arrays  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Список pgsql-hackers
Hello hackers,

Recently I was working with sql arrays in postgres and it turned out
that postgres doesn't have such very convinient functional constructions
as map, reduce and filter. Currently to map function over array user has
to make a subquery like:

select u.* from
    my_table,
    lateral (
        select array_agg(lower(elem))
        from unnest(arr) as elem
    ) as u;

Which is not only inconvenient but not very efficient as well (see
'Demo' section below).

When I dug into the code I found that postgres already has the needed
infrastructure for implementing map for arrays; actually array coercing
already works that way (it basically maps cast function).

In the attached patch there is a simple map implementation which
introduces new expression type and syntax:

    MAP(<func_name> OVER <array_expression>)

For example:

    SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
        ?column?
    -----------------
     {ONE,TWO,THREE}
    (1 row)

This is probably not the most useful notation and it would be better to
have syntax for mapping arbitrary expressions over array, not just
function. I'm struggling to come up with a good idea of how it should
look like. It could look something like following:

    MAP(<expr> FOR <placeholder> IN <array_expressin>)

For instance:

    SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);

Looking forward for community's suggestions!

Demo
----

Here is a small comparison between map and unnest/aggregate ways for
per-element processing of arrays. Given a table with 1K rows which
contains single column of text[] type. Each array contains 5/10/100
elements.

create table my_table (arr text[]);
insert into my_table
    select array_agg(md5(random()::text))
    from generate_series(1, 1000) as rows,
         generate_series(1, 10) as elements
    group by rows;

There are two scripts for pgbench. One for 'map' syntax:

    select map(upper over arr) from my_table;

And one for unnest/aggregate:

    select u.* from my_table,
        lateral (
            select array_agg(upper(elem))
            from unnest(arr) as elem
        ) as u;

Results are:

  elements per array |  map (tps) | unnest/aggregate (tps)
--------------------+------------+------------------------
                   5 | 139.105359 |       74.434010
                  10 |  74.089743 |       43.622554
                 100 |   7.693000 |        5.325805

Apparently map is more efficient for small arrays. And as the size of
array increases the difference decreases.

I'll be glad to any input from the community. Thanks!

-- 
Ildar Musin
i.musin@postgrespro.ru

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: pg_rewind and postgresql.conf
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: pg_rewind and postgresql.conf