Обсуждение: Slow function in queries SELECT clause.

Поиск
Список
Период
Сортировка

Slow function in queries SELECT clause.

От
"Davor J."
Дата:
I think I have read what is to be read about queries being prepared in
plpgsql functions, but I still can not explain the following, so I thought
to post it here:

Suppose 2 functions: factor(int,int) and offset(int, int).
Suppose a third function: convert(float,int,int) which simply returns
$1*factor($2,$3)+offset($2,$3)
All three functions are IMMUTABLE.

Very simple, right? Now I have very fast AND very slow executing queries on
some 150k records:

VERY FAST (half a second):
----------------
SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;

VERY SLOW (a minute):
----------------
SELECT convert(data, 1, 2) FROM tbl_data;

The slowness cannot be due to calling a function 150k times. If I define
convert2(float,int,int) to return a constant value, then it executes in
about a second. (still half as slow as the VERY FAST query).

I assume that factor and offset are cached in the VERY FAST query, and not
in the slow one? If so, why not and how can I "force" it? Currently I need
only one function for conversions.

Regards,
Davor



Re: Slow function in queries SELECT clause.

От
Szymon Guz
Дата:


2010/6/19 Davor J. <DavorJ@live.com>
I think I have read what is to be read about queries being prepared in
plpgsql functions, but I still can not explain the following, so I thought
to post it here:

Suppose 2 functions: factor(int,int) and offset(int, int).
Suppose a third function: convert(float,int,int) which simply returns
$1*factor($2,$3)+offset($2,$3)
All three functions are IMMUTABLE.

Very simple, right? Now I have very fast AND very slow executing queries on
some 150k records:

VERY FAST (half a second):
----------------
SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;

VERY SLOW (a minute):
----------------
SELECT convert(data, 1, 2) FROM tbl_data;

The slowness cannot be due to calling a function 150k times. If I define
convert2(float,int,int) to return a constant value, then it executes in
about a second. (still half as slow as the VERY FAST query).

I assume that factor and offset are cached in the VERY FAST query, and not
in the slow one? If so, why not and how can I "force" it? Currently I need
only one function for conversions.

Regards,
Davor




Hi,
show us the code of those two functions and explain analyze of those queries.

regards
Szymon Guz

Re: Slow function in queries SELECT clause.

От
"Davor J."
Дата:

I didn't consider them to be important as they showed the same, only the execution time was different. Also, they are a bit more complex than the ones put in the previous post. But here they are:
 
Definitions:
-----------------------------------------------------------
CREATE OR REPLACE FUNCTION appfunctions.fnc_unit_conversion_factor(_tree_id integer, _unit_to_id integer)
  RETURNS real AS
$BODY$
DECLARE
BEGIN
RETURN (SELECT unit_conv_factor AS factor
  FROM vew_unit_conversions AS c
  INNER JOIN tbl_sensors AS s ON (s.unit_id = c.unit_id_from)
  INNER JOIN tbl_trees USING (sens_id)
  WHERE tree_id = _tree_id AND unit_id_to = _unit_to_id)::real;
END;
$BODY$
  LANGUAGE 'plpgsql' IMMUTABLE
--------------------------
CREATE OR REPLACE FUNCTION appfunctions.fnc_unit_conversion_offset(_tree_id integer, _unit_to_id integer)
  RETURNS real AS
$BODY$
DECLARE
BEGIN
RETURN (SELECT unit_conv_offset AS offset
  FROM vew_unit_conversions AS c
  INNER JOIN tbl_sensors AS s ON (s.unit_id = c.unit_id_from)
  INNER JOIN tbl_trees USING (sens_id)
  WHERE tree_id = _tree_id AND unit_id_to = _unit_to_id)::real;
END;
$BODY$
  LANGUAGE 'plpgsql' IMMUTABLE
--------------------------
CREATE OR REPLACE FUNCTION appfunctions.fnc_unit_convert(_rawdata real, _tree_id integer, _unit_to_id integer)
  RETURNS real AS
$BODY$
DECLARE
BEGIN
RETURN _rawdata
 * fnc_unit_conversion_factor(_tree_id, _unit_to_id)
 + fnc_unit_conversion_offset(_tree_id, _unit_to_id);
END;
$BODY$
  LANGUAGE 'plpgsql' IMMUTABLE
 
 
 
Executions:
-----------------------------------------------------------
EXPLAIN ANALYSE SELECT timestamp,
 
data_from_tree_id_70 AS "flow_11"
 
 FROM
(SELECT sens_chan_data_timestamp AS timestamp, sens_chan_data_data AS data_from_tree_id_70 FROM tbl_sensor_channel_data WHERE tree_id = 70 AND sens_chan_data_timestamp >= '2008-06-11T00:00:00' AND sens_chan_data_timestamp <= '2008-06-18T00:00:00' ) AS "70"
 
 ORDER BY timestamp;
 
"Sort  (cost=175531.00..175794.64 rows=105456 width=12) (actual time=598.454..638.400 rows=150678 loops=1)"
"  Sort Key: tbl_sensor_channel_data.sens_chan_data_timestamp"
"  Sort Method:  external sort  Disk: 3240kB"
"  ->  Bitmap Heap Scan on tbl_sensor_channel_data  (cost=3005.29..166732.66 rows=105456 width=12) (actual time=34.810..371.099 rows=150678 loops=1)"
"        Recheck Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= '2008-06-11 00:00:00'::timestamp without time zone) AND (sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time zone))"
"        ->  Bitmap Index Scan on tbl_sensor_channel_data_pkey  (cost=0.00..2978.92 rows=105456 width=0) (actual time=28.008..28.008 rows=150678 loops=1)"
"              Index Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= '2008-06-11 00:00:00'::timestamp without time zone) AND (sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time zone))"
"Total runtime: 663.478 ms"
-----------------------------------------------------------
EXPLAIN ANALYSE SELECT timestamp,
 
fnc_unit_convert(data_from_tree_id_70, 70, 7) AS "flow_11"
 
 FROM
(SELECT sens_chan_data_timestamp AS timestamp, sens_chan_data_data AS data_from_tree_id_70 FROM tbl_sensor_channel_data WHERE tree_id = 70 AND sens_chan_data_timestamp >= '2008-06-11T00:00:00' AND sens_chan_data_timestamp <= '2008-06-18T00:00:00' ) AS "70"
 
 ORDER BY timestamp;
 
"Sort  (cost=201895.00..202158.64 rows=105456 width=12) (actual time=35334.017..35372.977 rows=150678 loops=1)"
"  Sort Key: tbl_sensor_channel_data.sens_chan_data_timestamp"
"  Sort Method:  external sort  Disk: 3240kB"
"  ->  Bitmap Heap Scan on tbl_sensor_channel_data  (cost=3005.29..193096.66 rows=105456 width=12) (actual time=60.012..35037.129 rows=150678 loops=1)"
"        Recheck Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= '2008-06-11 00:00:00'::timestamp without time zone) AND (sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time zone))"
"        ->  Bitmap Index Scan on tbl_sensor_channel_data_pkey  (cost=0.00..2978.92 rows=105456 width=0) (actual time=21.884..21.884 rows=150678 loops=1)"
"              Index Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= '2008-06-11 00:00:00'::timestamp without time zone) AND (sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time zone))"
"Total runtime: 35397.841 ms"
-----------------------------------------------------------
EXPLAIN ANALYSE SELECT timestamp,
 
data_from_tree_id_70*fnc_unit_conversion_factor(70, 7)+ fnc_unit_conversion_offset(70, 7) AS "flow_11"
 
 FROM
(SELECT sens_chan_data_timestamp AS timestamp, sens_chan_data_data AS data_from_tree_id_70 FROM tbl_sensor_channel_data WHERE tree_id = 70 AND sens_chan_data_timestamp >= '2008-06-11T00:00:00' AND sens_chan_data_timestamp <= '2008-06-18T00:00:00' ) AS "70"
 
 ORDER BY timestamp;
 
EXPLAIN ANALYSE SELECT timestamp,
 
"Sort  (cost=176058.28..176321.92 rows=105456 width=12) (actual time=630.350..669.843 rows=150678 loops=1)"
"  Sort Key: tbl_sensor_channel_data.sens_chan_data_timestamp"
"  Sort Method:  external sort  Disk: 3240kB"
"  ->  Bitmap Heap Scan on tbl_sensor_channel_data  (cost=3005.29..167259.94 rows=105456 width=12) (actual time=35.498..399.726 rows=150678 loops=1)"
"        Recheck Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= '2008-06-11 00:00:00'::timestamp without time zone) AND (sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time zone))"
"        ->  Bitmap Index Scan on tbl_sensor_channel_data_pkey  (cost=0.00..2978.92 rows=105456 width=0) (actual time=27.433..27.433 rows=150678 loops=1)"
"              Index Cond: ((tree_id = 70) AND (sens_chan_data_timestamp >= '2008-06-11 00:00:00'::timestamp without time zone) AND (sens_chan_data_timestamp <= '2008-06-18 00:00:00'::timestamp without time zone))"
"Total runtime: 694.968 ms"
 
 
 
 
 


2010/6/19 Davor J. <DavorJ@live.com>
I think I have read what is to be read about queries being prepared in
plpgsql functions, but I still can not explain the following, so I thought
to post it here:

Suppose 2 functions: factor(int,int) and offset(int, int).
Suppose a third function: convert(float,int,int) which simply returns
$1*factor($2,$3)+offset($2,$3)
All three functions are IMMUTABLE.

Very simple, right? Now I have very fast AND very slow executing queries on
some 150k records:

VERY FAST (half a second):
----------------
SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;

VERY SLOW (a minute):
----------------
SELECT convert(data, 1, 2) FROM tbl_data;

The slowness cannot be due to calling a function 150k times. If I define
convert2(float,int,int) to return a constant value, then it executes in
about a second. (still half as slow as the VERY FAST query).

I assume that factor and offset are cached in the VERY FAST query, and not
in the slow one? If so, why not and how can I "force" it? Currently I need
only one function for conversions.

Regards,
Davor




Hi,
show us the code of those two functions and explain analyze of those queries.

regards
Szymon Guz

Re: Slow function in queries SELECT clause.

От
Tom Lane
Дата:
"Davor J." <DavorJ@live.com> writes:
> Suppose 2 functions: factor(int,int) and offset(int, int).
> Suppose a third function: convert(float,int,int) which simply returns
> $1*factor($2,$3)+offset($2,$3)
> All three functions are IMMUTABLE.

You should write the third function as a SQL function, which'd allow it
to be inlined.

> VERY FAST (half a second):
> ----------------
> SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;

In this case both factor() calls are folded to constants, hence executed
only once.

> VERY SLOW (a minute):
> ----------------
> SELECT convert(data, 1, 2) FROM tbl_data;

Without inlining, there's no hope of any constant-folding here.
The optimizer just sees the plpgsql function as a black box and
can't do anything with it.

BTW, your later mail shows that the factor() functions are not really
IMMUTABLE, since they select from tables that presumably are subject to
change.  The "correct" declaration would be STABLE.  If you're relying
on constant-folding to get reasonable application performance, you're
going to have to continue to mislabel them as IMMUTABLE; but be aware
that you're likely to have issues any time you do change the table
contents.  The changes won't get reflected into existing query plans.

            regards, tom lane

Re: Slow function in queries SELECT clause.

От
"Davor J."
Дата:
Thanks Tom,

Your concepts of "inlining" and "black box" really cleared things up for me.
With fnc_unit_convert() written in SQL and declared as STABLE I indeed have
fast performance now.

I appreciate the note on the IMMUTABLE part. The table contents should not
change in a way to affect the functions. So, as far as I understand the
Postgres workings, this shouldn't pose a problem.

Regards,
Davor

"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
news:25116.1277047267@sss.pgh.pa.us...
> "Davor J." <DavorJ@live.com> writes:
>> Suppose 2 functions: factor(int,int) and offset(int, int).
>> Suppose a third function: convert(float,int,int) which simply returns
>> $1*factor($2,$3)+offset($2,$3)
>> All three functions are IMMUTABLE.
>
> You should write the third function as a SQL function, which'd allow it
> to be inlined.
>
>> VERY FAST (half a second):
>> ----------------
>> SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;
>
> In this case both factor() calls are folded to constants, hence executed
> only once.
>
>> VERY SLOW (a minute):
>> ----------------
>> SELECT convert(data, 1, 2) FROM tbl_data;
>
> Without inlining, there's no hope of any constant-folding here.
> The optimizer just sees the plpgsql function as a black box and
> can't do anything with it.
>
> BTW, your later mail shows that the factor() functions are not really
> IMMUTABLE, since they select from tables that presumably are subject to
> change.  The "correct" declaration would be STABLE.  If you're relying
> on constant-folding to get reasonable application performance, you're
> going to have to continue to mislabel them as IMMUTABLE; but be aware
> that you're likely to have issues any time you do change the table
> contents.  The changes won't get reflected into existing query plans.
>
> regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



Re: Slow function in queries SELECT clause.

От
"Davor J."
Дата:
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
> news:25116.1277047267@sss.pgh.pa.us...
>> "Davor J." <DavorJ@live.com> writes:
>>> Suppose 2 functions: factor(int,int) and offset(int, int).
>>> Suppose a third function: convert(float,int,int) which simply returns
>>> $1*factor($2,$3)+offset($2,$3)
>>> All three functions are IMMUTABLE.
>>
>> You should write the third function as a SQL function, which'd allow it
>> to be inlined.
>>
>>> VERY FAST (half a second):
>>> ----------------
>>> SELECT data*factor(1,2)+offset(1,2) FROM tbl_data;
>>
>> In this case both factor() calls are folded to constants, hence executed
>> only once.
>>
>>> VERY SLOW (a minute):
>>> ----------------
>>> SELECT convert(data, 1, 2) FROM tbl_data;
>>
>> Without inlining, there's no hope of any constant-folding here.
>> The optimizer just sees the plpgsql function as a black box and
>> can't do anything with it.
>>
> Your concepts of "inlining" and "black box" really cleared things up for
> me. With fnc_unit_convert() written in SQL and declared as STABLE I indeed
> have fast performance now.

A note on performance here: If I declare the fast SQL function
fnc_unit_convert() as STRICT or as SECURITY DEFINER, then I suddenly get
slow performance again (i.e. no apparent inlining).