Обсуждение: Strange plpgsql performance -- arithmetic, numeric() type, arrays
FYI
Postgresql 8.0.1
$ uname -a
Linux example.example.com 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 19
01:53:23 GMT 2005 i686 i686 i386 GNU/Linux
Incrementing the loop counter by a factor of 10, from 1000 to 10000
makes the process take more than 100 times longer.  (I only saw
this happen when I happened upon using a numeric() datatype
and then dividing i/100 to avoid overflow.  It does not happen
without the array and working with other, much larger, arrays
of other data types you see no slowdown.)
create or replace function baz()
returns void
language plpgsql
as $func$
declare
a numeric(4,2)[] := '{}';
begin
for i in 1..1000 loop
a[i] := -9.0;
a[i] := a[i] + i/100;
end loop;
return;
end;
$func$;
------loop size of 1000
=> explain analyze select baz();
                                       QUERY PLAN
------------------------------------------------------------------------------------------
Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=1116.873..1116.874 rows=1 loops=1)
Total runtime: 1116.894 ms
(2 rows)
-----loop size of 10000
=> explain analyze select baz();
                                         QUERY PLAN
----------------------------------------------------------------------------------------------
Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=134312.457..134312.458 rows=1 loops=1)
Total runtime: 134312.487 ms
(2 rows)
Postgresql recompiled from the source rpm
with rpmbuild --target=i686-centos-linux
$ gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.3/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --
infodir=/usr/share/info --enable-shared --enable-threads=posix --
disable-checking --with-system-zlib --enable-__cxa_atexit --host=i386-
redhat-linux
Thread model: posix
gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-42)
$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 1
cpu MHz         : 2992.800
cache size      : 1024 KB
physical id     : 0
siblings        : 2
runqueue        : 0
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm nx lm
bogomips        : 5976.88
processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 1
cpu MHz         : 2992.800
cache size      : 1024 KB
physical id     : 0
siblings        : 2
runqueue        : 0
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm nx lm
bogomips        : 5976.88
Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                -- Robert A. Heinlein
			
		> Incrementing the loop counter by a factor of 10, from 1000 to 10000
> makes the process take more than 100 times longer.  (I only saw
> this happen when I happened upon using a numeric() datatype
> and then dividing i/100 to avoid overflow.  It does not happen
> without the array and working with other, much larger, arrays
> of other data types you see no slowdown.)
>
It's not bug, it's feature ;-). plpgsql isn't good language for
inicialisation big arrays. If it's possible use plperl for example.
CREATE OR REPLACE FUNCTION speed1(integer) RETURNS numeric(7,2)[] AS $$
$i = 0.00;
@myarray = ();
while ($i<$_[0]) {
  push @myarray, $i;
  $i = $i + 1;
}
return '{'.join(',',@myarray).'}';
$$ LANGUAGE plperlu;
select speed(100);
CREATE OR REPLACE FUNCTION speed2(integer) RETURNS numeric(7,2)[] AS $$
DECLARE a numeric(7,2)[] = '{}';
BEGIN
  FOR _i IN 1..$1 LOOP
    a[_i] := _i;
  END LOOP;
  RETURN a;
END;
$$ LANGUAGE plpgsql;
tarif=# select speed(10000);
Time: 28,269 ms
tarif=# select speed2(10000);
Time: 91186,199 ms
Regards
Pavel Stehule
			
		Hello,
I found different behavior of array inicialised from plperl than plpgsql.
I use code from my prev. mail
tarif=# select speed(10);
         speed
-----------------------
 {0,1,2,3,4,5,6,7,8,9}
(1 row)
Time: 2,304 ms
tarif=# select speed2(10);
                        speed2
------------------------------------------------------
 {1.00,2.00,3.00,4.00,5.00,6.00,7.00,8.00,9.00,10.00}
(1 row)
Time: 0,863 ms
the array from speed2 is ok, but array from speed is mal formated.
I declare all function as numeric(7,2)[]
when I change init value on 0.01, than results are equal
regards
Pavel Stehule
			
		Pavel Stehule <stehule@kix.fsv.cvut.cz> writes:
> the array from speed2 is ok, but array from speed is mal formated.
They both look OK to me.
> I declare all function as numeric(7,2)[]
Type modifiers applied to function arguments and results are generally
ignored.  What you have here is functions returning numeric[], and not
anything else.  In the plpgsql example the coercion to numeric(7,2)
happens because you stored into a local variable declared that way,
but there's nothing to make it happen in the plperl example.
            regards, tom lane
			
		"Karl O. Pinc" <kop@meme.com> writes:
> Incrementing the loop counter by a factor of 10, from 1000 to 10000
> makes the process take more than 100 times longer.  (I only saw
> this happen when I happened upon using a numeric() datatype
> and then dividing i/100 to avoid overflow.  It does not happen
> without the array and working with other, much larger, arrays
> of other data types you see no slowdown.)
It's the array access, not the arithmetic, that's getting you.
Since numeric is not a fixed-width datatype, accessing the N'th
element of the array requires O(N) time to find that element.
            regards, tom lane
			
		
On 04/03/2005 08:04:27 PM, Tom Lane wrote:
> "Karl O. Pinc" <kop@meme.com> writes:
> > Incrementing the loop counter by a factor of 10, from 1000 to 10000
> > makes the process take more than 100 times longer.  (I only saw
> > this happen when I happened upon using a numeric() datatype
> > and then dividing i/100 to avoid overflow.  It does not happen
> > without the array and working with other, much larger, arrays
> > of other data types you see no slowdown.)
>
> It's the array access, not the arithmetic, that's getting you.
> Since numeric is not a fixed-width datatype, accessing the N'th
> element of the array requires O(N) time to find that element.
Makes sense.  Thanks.  (Makes me think that these sorts of arrays
should be implimented with an extra level of indirection,
an array of pointers to the varying data, which may not be the
best way to represent arrays on disk because storage requirements
go up and disk is slow....  Anyhow, I leave it
to the coders.)
Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein