Обсуждение: Performance of a large array access by position (tested version 9.1.3)

Поиск
Список
Период
Сортировка

Performance of a large array access by position (tested version 9.1.3)

От
Maxim Boguk
Дата:
Hi all,

May be I completely wrong but I always assumed that the access speed to the array element in PostgreSQL should be close to constant time.
But in tests I found that access speed degrade as O(N) of array size.

Test case (performed on large not busy server with 1GB work_mem to ensure I working with memory only):

WITH
t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array)
SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,10000) as g(i);

Results for N between 1 and 10.000.000 (used locally connected psql with \timing):

N:          Time:
1           5.8ms
10          5.8ms
100         5.8ms
1000        6.7ms
--until there all reasonable
5k         21ms
10k        34ms
50k       177ms
100k      321ms
500k     4100ms
1M       8100ms
2M      22000ms
5M      61000ms
10M    220000ms = 22ms to sinlge array element access.


Is that behaviour is correct?

PS: what I actually lookin for - constant fast access by position tuplestore for use with recursive queries and/or pl/pgsql, but without using C programming.

--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: Performance of a large array access by position (tested version 9.1.3)

От
Jesper Krogh
Дата:
On 22/06/12 09:02, Maxim Boguk wrote:
Hi all,

May be I completely wrong but I always assumed that the access speed to the array element in PostgreSQL should be close to constant time.
But in tests I found that access speed degrade as O(N) of array size.

Test case (performed on large not busy server with 1GB work_mem to ensure I working with memory only):

WITH
t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array)
SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,10000) as g(i);

Results for N between 1 and 10.000.000 (used locally connected psql with \timing):

N:          Time:
1           5.8ms
10          5.8ms
100         5.8ms
1000        6.7ms
--until there all reasonable
5k         21ms
10k        34ms
50k       177ms
100k      321ms
500k     4100ms
1M       8100ms
2M      22000ms
5M      61000ms
10M    220000ms = 22ms to sinlge array element access.


Is that behaviour is correct?

PS: what I actually lookin for - constant fast access by position tuplestore for use with recursive queries and/or pl/pgsql, but without using C programming.

Default column storage is to "compress it, and store in TOAST" with large values.
This it what is causing the shift. Try to change the column storage of the column
to EXTERNAL instead and rerun the test.

ALTER TABLE <tablename> ALTER COLUMN <column name> SET STORAGE EXTERNAL

Default is EXTENDED which runs compression on it, which again makes it hard to
position into without reading and decompressing everything.

http://www.postgresql.org/docs/9.1/static/sql-altertable.html

Let us know what you get.?

Jesper

Re: Performance of a large array access by position (tested version 9.1.3)

От
"Marc Mamin"
Дата:
>> On 22/06/12 09:02, Maxim Boguk wrote: 

>> May be I completely wrong but I always assumed that the access speed to the array element in PostgreSQL should be
closeto constant time.
 
>> But in tests I found that access speed degrade as O(N) of array size.

>> Is that behaviour is correct?


> From: pgsql-performance-owner@postgresql.org On Behalf Of Jesper Krogh

> Default column storage is to "compress it, and store in TOAST" with large values. 
> This it what is causing the shift. Try to change the column storage of the column
> to EXTERNAL instead and rerun the test. 


Hello,

I've repeated your test in a simplified form:
you are right :-(

create table t1 ( _array int[]);
alter table t1 alter _array set storage external;
insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));

create table t2 ( _array int[]);
alter table t2 alter _array set storage external;
insert into t2 SELECT ARRAY(SELECT * FROM generate_series(1,5000000));

explain analyze SELECT _array[1] FROM t1;
Total runtime: 0.125 ms

explain analyze SELECT _array[1] FROM t2;
Total runtime: 8.649 ms


best regards,

Marc Mamin



Re: Performance of a large array access by position (tested version 9.1.3)

От
Pavel Stehule
Дата:
2012/6/26 Marc Mamin <M.Mamin@intershop.de>:
>
>>> On 22/06/12 09:02, Maxim Boguk wrote:
>
>>> May be I completely wrong but I always assumed that the access speed to the array element in PostgreSQL should be
closeto constant time. 
>>> But in tests I found that access speed degrade as O(N) of array size.
>
>>> Is that behaviour is correct?

yes - access to n position means in postgresql - skip n-1 elements

Regards

Pavel

>
>
>> From: pgsql-performance-owner@postgresql.org On Behalf Of Jesper Krogh
>
>> Default column storage is to "compress it, and store in TOAST" with large values.
>> This it what is causing the shift. Try to change the column storage of the column
>> to EXTERNAL instead and rerun the test.
>
>
> Hello,
>
> I've repeated your test in a simplified form:
> you are right :-(
>
> create table t1 ( _array int[]);
> alter table t1 alter _array set storage external;
> insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
>
> create table t2 ( _array int[]);
> alter table t2 alter _array set storage external;
> insert into t2 SELECT ARRAY(SELECT * FROM generate_series(1,5000000));
>
> explain analyze SELECT _array[1] FROM t1;
> Total runtime: 0.125 ms
>
> explain analyze SELECT _array[1] FROM t2;
> Total runtime: 8.649 ms
>
>
> best regards,
>
> Marc Mamin
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance of a large array access by position (tested version 9.1.3)

От
"Marc Mamin"
Дата:

> -----Original Message-----
> From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
> 
> 2012/6/26 Marc Mamin <M.Mamin@intershop.de>:
> >
> >>> On 22/06/12 09:02, Maxim Boguk wrote:
> >
> >>> May be I completely wrong but I always assumed that the access
> speed to the array element in PostgreSQL should be close to constant
> time.
> >>> But in tests I found that access speed degrade as O(N) of array
> size.
> >
> >>> Is that behaviour is correct?
> 
> yes - access to n position means in postgresql - skip n-1 elements


Hmmm...

how many elements to be skipped here ?

SELECT _array[1] FROM t2;

I wonder if the time rather get spent in first retrieving the array itself before accessing its elements.

regards,

Marc Mamin

> 
> Regards
> 
> Pavel
> 
> >
> >
> >> From: pgsql-performance-owner@postgresql.org On Behalf Of Jesper
> Krogh
> >
> >> Default column storage is to "compress it, and store in TOAST" with
> large values.
> >> This it what is causing the shift. Try to change the column storage
> of the column
> >> to EXTERNAL instead and rerun the test.
> >
> >
> > Hello,
> >
> > I've repeated your test in a simplified form:
> > you are right :-(
> >
> > create table t1 ( _array int[]);
> > alter table t1 alter _array set storage external;
> > insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
> >
> > create table t2 ( _array int[]);
> > alter table t2 alter _array set storage external;
> > insert into t2 SELECT ARRAY(SELECT * FROM
> generate_series(1,5000000));
> >
> > explain analyze SELECT _array[1] FROM t1;
> > Total runtime: 0.125 ms
> >
> > explain analyze SELECT _array[1] FROM t2;
> > Total runtime: 8.649 ms
> >
> >
> > best regards,
> >
> > Marc Mamin
> >
> >
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-
> performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance of a large array access by position (tested version 9.1.3)

От
Pavel Stehule
Дата:
2012/6/26 Marc Mamin <M.Mamin@intershop.de>:
>
>
>> -----Original Message-----
>> From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
>>
>> 2012/6/26 Marc Mamin <M.Mamin@intershop.de>:
>> >
>> >>> On 22/06/12 09:02, Maxim Boguk wrote:
>> >
>> >>> May be I completely wrong but I always assumed that the access
>> speed to the array element in PostgreSQL should be close to constant
>> time.
>> >>> But in tests I found that access speed degrade as O(N) of array
>> size.
>> >
>> >>> Is that behaviour is correct?
>>
>> yes - access to n position means in postgresql - skip n-1 elements
>
>
> Hmmm...
>
> how many elements to be skipped here ?

there are two independent stages:

a) detoast - loading and decompression (complete array is detoasted)
b) access

if you has very large arrays, then @a is significant

Regards

Pavel


>
> SELECT _array[1] FROM t2;
>
> I wonder if the time rather get spent in first retrieving the array itself before accessing its elements.
>
> regards,
>
> Marc Mamin
>
>>
>> Regards
>>
>> Pavel
>>
>> >
>> >
>> >> From: pgsql-performance-owner@postgresql.org On Behalf Of Jesper
>> Krogh
>> >
>> >> Default column storage is to "compress it, and store in TOAST" with
>> large values.
>> >> This it what is causing the shift. Try to change the column storage
>> of the column
>> >> to EXTERNAL instead and rerun the test.
>> >
>> >
>> > Hello,
>> >
>> > I've repeated your test in a simplified form:
>> > you are right :-(
>> >
>> > create table t1 ( _array int[]);
>> > alter table t1 alter _array set storage external;
>> > insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
>> >
>> > create table t2 ( _array int[]);
>> > alter table t2 alter _array set storage external;
>> > insert into t2 SELECT ARRAY(SELECT * FROM
>> generate_series(1,5000000));
>> >
>> > explain analyze SELECT _array[1] FROM t1;
>> > Total runtime: 0.125 ms
>> >
>> > explain analyze SELECT _array[1] FROM t2;
>> > Total runtime: 8.649 ms
>> >
>> >
>> > best regards,
>> >
>> > Marc Mamin
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-performance mailing list (pgsql-
>> performance@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance of a large array access by position (tested version 9.1.3)

От
Pavel Stehule
Дата:
2012/6/26 Maxim Boguk <maxim.boguk@gmail.com>:
>
>
> On Tue, Jun 26, 2012 at 6:04 PM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>>
>> 2012/6/26 Marc Mamin <M.Mamin@intershop.de>:
>> >
>> >>> On 22/06/12 09:02, Maxim Boguk wrote:
>> >
>> >>> May be I completely wrong but I always assumed that the access speed
>> >>> to the array element in PostgreSQL should be close to constant time.
>> >>> But in tests I found that access speed degrade as O(N) of array size.
>> >
>> >>> Is that behaviour is correct?
>>
>> yes - access to n position means in postgresql - skip n-1 elements
>>
>
> Hi,
>
> I understand what you mean, but in my test for all values of N test
> performed access only to first 10000 elements of the array independent of
> the array size....
> So i still can't see why access time degrade soo fast for N>10000...:
>
> WITH
> --GENERATE single entry table with single ARRAY field of size N
> t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array)
> --iterate over first 10000 elements of that ARRAY
> SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,10000) as
> g(i);
>
> ... if access time depend only on position then after 10k there should not
> be any serious degradation, in fact a perfromance degradation is almost
> linear.
> 10k        34ms
> 50k       177ms
> 100k      321ms
> 500k     4100ms
> 1M       8100ms
> 2M      22000ms
> 5M      61000ms
> 10M    220000ms = 22ms to sinlge array element access.

in this use case TOAST/DETOAST is not used, but probably there are
problem with array copy. Taking n element of array means calling some
function, and postgresql uses only passing parameters by value - so
copy of large array needs higher time.

this issue is solved partially in 9.2, where you can use FOR EACH
cycle http://www.postgresql.org/docs/9.1/interactive/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY

Regards

Pavel

>
> And I think no toasting/detoasting happen in my test case.
>
> Kind Regards,
> Maksym

Re: Performance of a large array access by position (tested version 9.1.3)

От
Cédric Villemain
Дата:
> >> >>> Is that behaviour is correct?
> >>
> >> yes - access to n position means in postgresql - skip n-1 elements
> >
> > Hmmm...
> >
> > how many elements to be skipped here ?
>
> there are two independent stages:
>
> a) detoast - loading and decompression (complete array is detoasted)
> b) access
>
> if you has very large arrays, then @a is significant

There is a place to add PG_GETARG_ARRAY_P_SLICE. The code is just not done
yet.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Вложения