Обсуждение: Is it reasonable to store double[] arrays of 30K elements
Every row of my table has a double[] array of approximately 30K numbers. I have ran a few tests, and so far everything looks good. I am not pushing the limits here, right? It should be perfectly fine to store arrays of 30k double numbers, correct? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 02/04/2014 01:52 PM, AlexK wrote:
What sorts of tests and what sorts of results?Every row of my table has a double[] array of approximately 30K numbers. I have ran a few tests, and so far everything looks good. I am not pushing the limits here, right? It should be perfectly fine to store arrays of 30k double numbers, correct? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Each record has something like 30000*16 + 30000*(per cell overhead, which could be zero) but that is definitely spilling over to toast. Have you done any large scale deletes?
No large deletes, just inserts/updates/selects. What are the potential problems with deletes? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5790568.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Tue, Feb 4, 2014 at 2:59 PM, Rob Sargent <robjsargent@gmail.com> wrote: > On 02/04/2014 01:52 PM, AlexK wrote: > > Every row of my table has a double[] array of approximately 30K numbers. I > have ran a few tests, and so far everything looks good. > > I am not pushing the limits here, right? It should be perfectly fine to > store arrays of 30k double numbers, correct? > > What sorts of tests and what sorts of results? > Each record has something like 30000*16 + 30000*(per cell overhead, which > could be zero) but that is definitely spilling over to toast. Have you done > any large scale deletes? My take: Depends on your definition of 'fine'. your single datum will be pushing 100's of k which have to be dealt with in total if you want to read or write any single element basically. This works out well if your application always reads and writes the entire array as a block (so that it behaves as a single complete structure) and poorly for any other use case. In particular, if you tend to update one by one random elements in the array this approach will tend to fall over. also, that's bet pessimal on the size estimate: it's 30000 * 8 (the size of float8). any solution storing arrays is going to be much more compact than value per row since you amortize MVCC tracking across all the elements (notwithstanding the flexibility you give up to do that). point being: however many blocks the array takes up toasted, it will take up a lot more with standard records. merlin
I will be always reading/writing the whole array. The table is about 40GB. It replaces two tables, parent and child, using about 160 GB together. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5790570.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Would 10K elements of float[3] make any difference in terms of read/write performance? Or 240K byte array? Or are these all functionally the same issue for the server? If so, intriguing possibilities abound. :) -- View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792099.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Hello
I worked with 80K float fields without any problem.
There are possible issues:
* needs lot of memory for detoast - it can be problem with more parallel queries
* there is a risk of possible repeated detost - some unhappy usage in plpgsql can be slow - it is solvable, but you have to identify this issue
* any update of large array is slow - so these arrays are good for write once data
Regards
Pavel
2014-02-14 23:07 GMT+01:00 lup <robjsargent@gmail.com>:
Would 10K elements of float[3] make any difference in terms of read/write
performance?
Or 240K byte array?
Or are these all functionally the same issue for the server? If so,
intriguing possibilities abound. :)
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792099.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi Pavel,
1. I believe we have lots of memory. How much is needed to read one array of 30K float number?
2. What do we need to avoid possible repeated detost, and what it is?
3. We are not going to update individual elements of the arrays. We might occasionally replace the whole thing. When we benchmarked, we did not notice slowness. Can you explain how to reproduce slowness?
TIA!
On Fri, Feb 14, 2014 at 11:03 PM, Pavel Stehule [via PostgreSQL] <[hidden email]> wrote:
HelloI worked with 80K float fields without any problem.There are possible issues:* needs lot of memory for detoast - it can be problem with more parallel queries* there is a risk of possible repeated detost - some unhappy usage in plpgsql can be slow - it is solvable, but you have to identify this issue* any update of large array is slow - so these arrays are good for write once dataRegardsPavel2014-02-14 23:07 GMT+01:00 lup <[hidden email]>:Would 10K elements of float[3] make any difference in terms of read/write--
performance?
Or 240K byte array?
Or are these all functionally the same issue for the server? If so,
intriguing possibilities abound. :)
View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792099.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-generalhttp://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792144.htmlIf you reply to this email, your message will be added to the discussion below:
View this message in context: Re: Is it reasonable to store double[] arrays of 30K elements
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
2014-02-15 21:52 GMT+01:00 AlexK <alkuzo@gmail.com>:
postgres=# do
$$
declare a int := 0;
begin
for i in 1..100000
loop
a := a + 1;
end loop;
end;
$$;
DO
Time: 88.481 ms
Hi Pavel,1. I believe we have lots of memory. How much is needed to read one array of 30K float number?
it is not too much - about 120KB
2. What do we need to avoid possible repeated detost, and what it is?
any access to array emits detoast - so repeated access to any field in array is wrong. You can read a complete array as one block, or you can evaluate a array as table - and then detost is processed only once.
3. We are not going to update individual elements of the arrays. We might occasionally replace the whole thing. When we benchmarked, we did not notice slowness. Can you explain how to reproduce slowness?
you can see this example
postgres=# do
$$
declare a int[] := '{}';
begin
for i in 1..1000
loop
a := a || i;
end loop;
end;
$$;
postgres=# do
$$
declare a int[] := '{}';
begin
for i in 1..1000
loop
a := a || i;
end loop;
end;
$$;
This code is fast only for small arrays
10K ~ 100ms
100K ~ 10000ms
postgres=# do
$$
declare a int := 0;
begin
for i in 1..100000
loop
a := a + 1;
end loop;
end;
$$;
DO
Time: 88.481 ms
overhead of plpgsql cycle is about 100ms
but you can generate a array by fast way (but should not be by update)
postgres=# select array_upper(array(select generate_series(1,100000)),1);
array_upper
─────────────
100000
(1 row)
Time: 19.441 ms
postgres=# select array_upper(array(select generate_series(1,100000)),1);
array_upper
─────────────
100000
(1 row)
Time: 19.441 ms
Pg can manipulate with large arrays relatively fast
postgres=# select max(unnest) from (select unnest(array(select generate_series(1,100000)))) x;
max
────────
100000
(1 row)
Time: 96.644 ms
postgres=# select max(unnest) from (select unnest(array(select generate_series(1,100000)))) x;
max
────────
100000
(1 row)
Time: 96.644 ms
but it should be a block operations
Regards
Pavel
TIA!On Fri, Feb 14, 2014 at 11:03 PM, Pavel Stehule [via PostgreSQL] <[hidden email]> wrote:HelloI worked with 80K float fields without any problem.There are possible issues:* needs lot of memory for detoast - it can be problem with more parallel queries* there is a risk of possible repeated detost - some unhappy usage in plpgsql can be slow - it is solvable, but you have to identify this issue* any update of large array is slow - so these arrays are good for write once dataRegardsPavel2014-02-14 23:07 GMT+01:00 lup <[hidden email]>:Would 10K elements of float[3] make any difference in terms of read/write--
performance?
Or 240K byte array?
Or are these all functionally the same issue for the server? If so,
intriguing possibilities abound. :)
View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792099.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list ([hidden email])http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792144.htmlIf you reply to this email, your message will be added to the discussion below:
View this message in context: Re: Is it reasonable to store double[] arrays of 30K elements