Обсуждение: Is it reasonable to store double[] arrays of 30K elements

Поиск
Список
Период
Сортировка

Is it reasonable to store double[] arrays of 30K elements

От
AlexK
Дата:
Every row of my table has a double[] array of approximately 30K numbers. I
have ran a few tests, and so far everything looks good.

I am not pushing the limits here, right? It should be perfectly fine to
store arrays of 30k double numbers, correct?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Is it reasonable to store double[] arrays of 30K elements

От
Rob Sargent
Дата:
On 02/04/2014 01:52 PM, AlexK wrote:
Every row of my table has a double[] array of approximately 30K numbers. I
have ran a few tests, and so far everything looks good.

I am not pushing the limits here, right? It should be perfectly fine to
store arrays of 30k double numbers, correct?



--
View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


What sorts of tests and what sorts of results?
Each record has something like 30000*16 + 30000*(per cell overhead, which could be zero) but that is definitely spilling over to toast.  Have you done any large scale deletes?


Re: Is it reasonable to store double[] arrays of 30K elements

От
AlexK
Дата:
No large deletes, just inserts/updates/selects. What are the potential
problems with deletes?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5790568.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Is it reasonable to store double[] arrays of 30K elements

От
Merlin Moncure
Дата:
On Tue, Feb 4, 2014 at 2:59 PM, Rob Sargent <robjsargent@gmail.com> wrote:
> On 02/04/2014 01:52 PM, AlexK wrote:
>
> Every row of my table has a double[] array of approximately 30K numbers. I
> have ran a few tests, and so far everything looks good.
>
> I am not pushing the limits here, right? It should be perfectly fine to
> store arrays of 30k double numbers, correct?
>
> What sorts of tests and what sorts of results?
> Each record has something like 30000*16 + 30000*(per cell overhead, which
> could be zero) but that is definitely spilling over to toast.  Have you done
> any large scale deletes?

My take:
Depends on your definition of 'fine'.  your single datum will be
pushing 100's of k which have to be dealt with in total if you want to
read or write any single element basically.  This works out well if
your application always reads and writes the entire array as a block
(so that it behaves as a single complete structure) and poorly for any
other use case.  In particular, if you tend to update one by one
random elements in the array this approach will tend to fall over.

also, that's bet pessimal on the size estimate: it's 30000 * 8 (the
size of float8).  any solution storing arrays is going to be much more
compact than value per row since you amortize MVCC tracking across all
the elements (notwithstanding the flexibility you give up to do that).
 point being: however many blocks the array takes up toasted, it will
take up a lot more with standard records.

merlin


Re: Is it reasonable to store double[] arrays of 30K elements

От
AlexK
Дата:
I will be always reading/writing the whole array. The table is about 40GB. It
replaces two tables, parent and child, using about 160 GB together.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5790570.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Is it reasonable to store double[] arrays of 30K elements

От
lup
Дата:
Would 10K elements of float[3] make any difference in terms of read/write
performance?
Or 240K byte array?

Or are these all functionally the same issue for the server? If so,
intriguing possibilities abound. :)





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792099.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Is it reasonable to store double[] arrays of 30K elements

От
Pavel Stehule
Дата:
Hello


I worked with 80K float fields without any problem.

There are possible issues:

* needs lot of memory for detoast - it can be problem with more parallel queries
* there is a risk of possible repeated detost - some unhappy usage in plpgsql can be slow - it is solvable, but you have to identify this issue
* any update of large array is slow - so these arrays are good for write once data

Regards

Pavel


2014-02-14 23:07 GMT+01:00 lup <robjsargent@gmail.com>:
Would 10K elements of float[3] make any difference in terms of read/write
performance?
Or 240K byte array?

Or are these all functionally the same issue for the server? If so,
intriguing possibilities abound. :)





--
View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792099.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Is it reasonable to store double[] arrays of 30K elements

От
AlexK
Дата:
Hi Pavel,

1. I believe we have lots of memory. How much is needed to read one array of 30K float number?
2. What do we need to avoid possible repeated detost, and what it is?
3. We are not going to update individual elements of the arrays. We might occasionally replace the whole thing. When we benchmarked, we did not notice slowness. Can you explain how to reproduce slowness?

TIA!


On Fri, Feb 14, 2014 at 11:03 PM, Pavel Stehule [via PostgreSQL] <[hidden email]> wrote:
Hello


I worked with 80K float fields without any problem.

There are possible issues:

* needs lot of memory for detoast - it can be problem with more parallel queries
* there is a risk of possible repeated detost - some unhappy usage in plpgsql can be slow - it is solvable, but you have to identify this issue
* any update of large array is slow - so these arrays are good for write once data

Regards

Pavel


2014-02-14 23:07 GMT+01:00 lup <[hidden email]>:
Would 10K elements of float[3] make any difference in terms of read/write
performance?
Or 240K byte array?

Or are these all functionally the same issue for the server? If so,
intriguing possibilities abound. :)





--
View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792099.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




If you reply to this email, your message will be added to the discussion below:
http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792144.html
To unsubscribe from Is it reasonable to store double[] arrays of 30K elements, click here.
NAML



View this message in context: Re: Is it reasonable to store double[] arrays of 30K elements
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Is it reasonable to store double[] arrays of 30K elements

От
Pavel Stehule
Дата:



2014-02-15 21:52 GMT+01:00 AlexK <alkuzo@gmail.com>:
Hi Pavel,

1. I believe we have lots of memory. How much is needed to read one array of 30K float number?

it is not too much - about 120KB
 
2. What do we need to avoid possible repeated detost, and what it is?

any access to array emits detoast - so repeated access to any field in array is wrong. You can read a complete array as one block, or you can evaluate a array as table - and then detost is processed only once.
 
3. We are not going to update individual elements of the arrays. We might occasionally replace the whole thing. When we benchmarked, we did not notice slowness. Can you explain how to reproduce slowness?

you can see this example

postgres=# do
$$
declare a int[] := '{}';
begin
  for i in 1..1000
  loop
    a := a || i;
  end loop;
end;
$$;

This code is fast only for small arrays

10K ~ 100ms
100K ~ 10000ms

postgres=# do
$$
declare a int := 0;
begin
  for i in 1..100000
  loop
    a := a + 1;
  end loop;
end;
$$;
DO
Time: 88.481 ms

overhead of plpgsql cycle is about 100ms

but you can generate a array by fast way (but should not be by update)

postgres=# select array_upper(array(select generate_series(1,100000)),1);
 array_upper
─────────────
      100000
(1 row)
Time: 19.441 ms

Pg can manipulate with large arrays relatively fast

postgres=# select max(unnest) from (select unnest(array(select generate_series(1,100000)))) x;
  max  
────────
 100000
(1 row)

Time: 96.644 ms

but it should be a block operations

Regards

Pavel

 

TIA!


On Fri, Feb 14, 2014 at 11:03 PM, Pavel Stehule [via PostgreSQL] <[hidden email]> wrote:
Hello


I worked with 80K float fields without any problem.

There are possible issues:

* needs lot of memory for detoast - it can be problem with more parallel queries
* there is a risk of possible repeated detost - some unhappy usage in plpgsql can be slow - it is solvable, but you have to identify this issue
* any update of large array is slow - so these arrays are good for write once data

Regards

Pavel


2014-02-14 23:07 GMT+01:00 lup <[hidden email]>:
Would 10K elements of float[3] make any difference in terms of read/write
performance?
Or 240K byte array?

Or are these all functionally the same issue for the server? If so,
intriguing possibilities abound. :)





--
View this message in context: http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792099.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list ([hidden email])

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




If you reply to this email, your message will be added to the discussion below:
http://postgresql.1045698.n5.nabble.com/Is-it-reasonable-to-store-double-arrays-of-30K-elements-tp5790562p5792144.html
To unsubscribe from Is it reasonable to store double[] arrays of 30K elements, click here.
NAML



View this message in context: Re: Is it reasonable to store double[] arrays of 30K elements

Sent from the PostgreSQL - general mailing list archive at Nabble.com.