Обсуждение: [GENERAL] Performance appending to an array column

Поиск
Список
Период
Сортировка

[GENERAL] Performance appending to an array column

От
Paul A Jungwirth
Дата:
I'm considering a table structure where I'd be continuously appending
to long arrays of floats (10 million elements or more). Keeping the
data in arrays gives me much faster SELECT performance vs keeping it
in millions of rows.

But since these arrays keep growing, I'm wondering about the UPDATE
performance. I was reading this commit message about improving
performance of *overwriting* individual array elements, and I was
wondering if there is anything similar for growing an array column?:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1dc5ebc9077ab742

Is there a faster way to append to an array than just this?:
   UPDATE measurements   SET vals = vals || ARRAY[5.0, 4.2, 9.9]::float[]   ;

Thanks!
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance appending to an array column

От
Tom Lane
Дата:
Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> I'm considering a table structure where I'd be continuously appending
> to long arrays of floats (10 million elements or more). Keeping the
> data in arrays gives me much faster SELECT performance vs keeping it
> in millions of rows.

> But since these arrays keep growing, I'm wondering about the UPDATE
> performance.

It's going to suck big-time :-(.  You'd be constantly replacing all
of a multi-megabyte toasted field.  Even if the UPDATE speed per se
seemed tolerable, this would be pretty nasty in terms of the
vacuuming overhead and/or bloat it would impose.

My very first use of Postgres, twenty years ago, involved time series
data which perhaps is much like what you're doing.  We ended up keeping
the time series data outside the DB; I doubt the conclusion would be
different today.  I seem to recall having heard about a commercial fork
of PG that is less bad for this type of data, but the community code
is not the weapon you want.
        regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance appending to an array column

От
Paul A Jungwirth
Дата:
> It's going to suck big-time :-(.

Ha ha that's what I thought, but thank you for confirming. :-)

> We ended up keeping
> the time series data outside the DB; I doubt the conclusion would be
> different today.

Interesting. That seems a little radical to me, but I'll consider it
more seriously now. I also tried cstore_fdw for this, but my queries
(building a 2-D histogram) were taking 4+ seconds, compared to 500ms
using arrays. Putting everything into regular files gives up filtering
and other SQL built-ins, but maybe I could write my own extension to
load regular files into Postgres arrays, sort of getting the best of
both worlds.

Anyway, thanks for sharing your experience!

Yours,
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance appending to an array column

От
Thomas Kellerer
Дата:
Paul A Jungwirth schrieb am 21.09.2017 um 23:05:
> but maybe I could write my own extension to
> load regular files into Postgres arrays, sort of getting the best of
> both worlds.

There is a foreign data wrapper for that:
   https://github.com/adunstan/file_text_array_fdw

but it's pretty old and seems un-maintained.









-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance appending to an array column

От
Imre Samu
Дата:
>I also tried cstore_fdw for this, but my queries
>(building a 2-D histogram) were taking 4+ seconds,
>compared to 500ms  
using arrays. 
> ...
but maybe I could write my own extension

Have you checked the new TimescaleDB extension? [ https://github.com/timescale/timescaledb ]  
"TimescaleDB is packaged as a PostgreSQL extension and released under the Apache 2 open-source license."

"TimescaleDB is an open-source database designed to make SQL scalable for time-series data. 
It is engineered up from PostgreSQL, providing automatic partitioning across time and space (partitioning key), as well as full SQL support."

and it has a built in histogram function:   https://docs.timescale.com/latest/api/api-timescaledb#histogram

Regards,
Imre





2017-09-21 23:05 GMT+02:00 Paul A Jungwirth <pj@illuminatedcomputing.com>:
> It's going to suck big-time :-(.

Ha ha that's what I thought, but thank you for confirming. :-)

> We ended up keeping
> the time series data outside the DB; I doubt the conclusion would be
> different today.

Interesting. That seems a little radical to me, but I'll consider it
more seriously now. I also tried cstore_fdw for this, but my queries
(building a 2-D histogram) were taking 4+ seconds, compared to 500ms
using arrays. Putting everything into regular files gives up filtering
and other SQL built-ins, but maybe I could write my own extension to
load regular files into Postgres arrays, sort of getting the best of
both worlds.

Anyway, thanks for sharing your experience!

Yours,
Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general