Обсуждение: [GENERAL] Performance appending to an array column

Поиск

Список

Период

Сортировка

[GENERAL] Performance appending to an array column

От

Paul A Jungwirth

Дата:

21 сентября 2017 г., 23:06:06

I'm considering a table structure where I'd be continuously appending
to long arrays of floats (10 million elements or more). Keeping the
data in arrays gives me much faster SELECT performance vs keeping it
in millions of rows.

But since these arrays keep growing, I'm wondering about the UPDATE
performance. I was reading this commit message about improving
performance of *overwriting* individual array elements, and I was
wondering if there is anything similar for growing an array column?:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1dc5ebc9077ab742

Is there a faster way to append to an array than just this?:
   UPDATE measurements   SET vals = vals || ARRAY[5.0, 4.2, 9.9]::float[]   ;

Thanks!
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance appending to an array column

От

Tom Lane

Дата:

21 сентября 2017 г., 23:25:32

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> I'm considering a table structure where I'd be continuously appending
> to long arrays of floats (10 million elements or more). Keeping the
> data in arrays gives me much faster SELECT performance vs keeping it
> in millions of rows.

> But since these arrays keep growing, I'm wondering about the UPDATE
> performance.

It's going to suck big-time :-(.  You'd be constantly replacing all
of a multi-megabyte toasted field.  Even if the UPDATE speed per se
seemed tolerable, this would be pretty nasty in terms of the
vacuuming overhead and/or bloat it would impose.

My very first use of Postgres, twenty years ago, involved time series
data which perhaps is much like what you're doing.  We ended up keeping
the time series data outside the DB; I doubt the conclusion would be
different today.  I seem to recall having heard about a commercial fork
of PG that is less bad for this type of data, but the community code
is not the weapon you want.
        regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance appending to an array column

От

Paul A Jungwirth

Дата:

22 сентября 2017 г., 00:05:13

> It's going to suck big-time :-(.

Ha ha that's what I thought, but thank you for confirming. :-)

> We ended up keeping
> the time series data outside the DB; I doubt the conclusion would be
> different today.

Interesting. That seems a little radical to me, but I'll consider it
more seriously now. I also tried cstore_fdw for this, but my queries
(building a 2-D histogram) were taking 4+ seconds, compared to 500ms
using arrays. Putting everything into regular files gives up filtering
and other SQL built-ins, but maybe I could write my own extension to
load regular files into Postgres arrays, sort of getting the best of
both worlds.

Anyway, thanks for sharing your experience!

Yours,
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance appending to an array column

От

Thomas Kellerer

Дата:

22 сентября 2017 г., 00:29:18

Paul A Jungwirth schrieb am 21.09.2017 um 23:05:
> but maybe I could write my own extension to
> load regular files into Postgres arrays, sort of getting the best of
> both worlds.

There is a foreign data wrapper for that:
   https://github.com/adunstan/file_text_array_fdw

but it's pretty old and seems un-maintained.









-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance appending to an array column

От

Imre Samu

Дата:

22 сентября 2017 г., 16:22:01

>I also tried cstore_fdw for this, but my queries
>(building a 2-D histogram) were taking 4+ seconds,
>compared to 500ms using arrays.
> ...
> but maybe I could write my own extension

Have you checked the new TimescaleDB extension? [ https://github.com/timescale/timescaledb ]
"TimescaleDB is packaged as a PostgreSQL extension and released under the Apache 2 open-source license."

"TimescaleDB is an open-source database designed to make SQL scalable for time-series data.

It is engineered up from PostgreSQL, providing automatic partitioning across time and space (partitioning key), as well as full SQL support."

and it has a built in histogram function: https://docs.timescale.com/latest/api/api-timescaledb#histogram

Regards,

Imre

2017-09-21 23:05 GMT+02:00 Paul A Jungwirth <pj@illuminatedcomputing.com>:

> It's going to suck big-time :-(.

Ha ha that's what I thought, but thank you for confirming. :-)

> We ended up keeping
> the time series data outside the DB; I doubt the conclusion would be
> different today.

Interesting. That seems a little radical to me, but I'll consider it
more seriously now. I also tried cstore_fdw for this, but my queries
(building a 2-D histogram) were taking 4+ seconds, compared to 500ms
using arrays. Putting everything into regular files gives up filtering
and other SQL built-ins, but maybe I could write my own extension to
load regular files into Postgres arrays, sort of getting the best of
both worlds.

Anyway, thanks for sharing your experience!

Yours,
Paul

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [GENERAL] Performance appending to an array column

[GENERAL] Performance appending to an array column

Re: [GENERAL] Performance appending to an array column

Re: [GENERAL] Performance appending to an array column

Re: [GENERAL] Performance appending to an array column

Re: [GENERAL] Performance appending to an array column