R: 14.1 immutable function, bad performance if check number = 'NaN'

Поиск
Список
Период
Сортировка
От Federico Travaglini
Тема R: 14.1 immutable function, bad performance if check number = 'NaN'
Дата
Msg-id 43e497193ed4de4bfc503e5221b843d3@mail.gmail.com
обсуждение исходный текст
Ответ на Re: 14.1 immutable function, bad performance if check number = 'NaN'  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: 14.1 immutable function, bad performance if check number = 'NaN'
Re: R: 14.1 immutable function, bad performance if check number = 'NaN'
Список pgsql-bugs

Good morning, thank you very much for the time you spent for my question.

 

Yes inlining could be the problem, because maybe does not allow to use the IMMUTABLE feature?

 

The context of the query is quite complex, therefore I avoided to provide it in previous email

 

 

Here it is what I tested. I’s a code fragment from a bigger procedure. The strings in green are passed as parameters, as well as the thresholds 1,2,3,4,5. To test just this fragment of code I replaced them with fixed values

 

SET random_page_cost = 0.1; (otherwise it takes more than 4 minutes in place of 33 sec)

 

EXPLAIN (ANALYZE, BUFFERS, verbose)

select

*    from

    (

            select

                tms,

                fh.file_id,

                (e.measure_list #>> ('{' || 'cluster_comuni_italiani' || ',s}')::text[])   as value_s_1,

                (e.measure_list #> ('{' || 'cluster_comuni_italiani' || ',n}')::text[])::numeric   as value_n_1,

                (e.measure_list #> ('{' || 'cluster_comuni_italiani' || ',o}')::text[])::numeric   as value_o_1,

                antsgeo_get_severity_thr((e.measure_list #> ('{' || 'cluster_comuni_italiani' || ',o}')::text[])::numeric, 1, 2, 3, 4, 5) AS severity_1,

               

                (e.measure_list #>> ('{' || 'act_geoposition_pers_act_confidence' || ',s}')::text[])   as value_s_2,

                (e.measure_list #> ('{' || 'act_geoposition_pers_act_confidence' || ',n}')::text[])::numeric   as value_n_2,

                (e.measure_list #> ('{' || 'act_geoposition_pers_act_confidence' || ',o}')::text[])::numeric   as value_o_2,

                antsgeo_get_severity_thr((e.measure_list #> ('{' || 'act_geoposition_pers_act_confidence' || ',o}')::text[])::numeric1, 2, 3, 4, 5) AS severity_2,

               

                (e.measure_list #>> ('{' || 'act_coverage_band_pcell' || ',s}')::text[])   as value_s_3,

                (e.measure_list #> ('{' || 'act_coverage_band_pcell' || ',n}')::text[])::numeric   as value_n_3,

                (e.measure_list #> ('{' || 'act_coverage_band_pcell' || ',o}')::text[])::numeric   as value_o_3,

                antsgeo_get_severity_thr((e.measure_list #> ('{' || 'act_coverage_band_pcell' || ',o}')::text[])::numeric, 1, 2, 3, 4, 5) AS severity_3,

               

                (e.measure_list #>> ('{' || null::text || ',s}')::text[])   as value_s_4,

                (e.measure_list #> ('{' || null::text || ',n}')::text[])::numeric   as value_n_4,

                (e.measure_list #> ('{' || null::text || ',o}')::text[])::numeric   as value_o_4,

                antsgeo_get_severity_thr((e.measure_list #> ('{' || null::text || ',o}')::text[])::numeric1, 2, 3, 4, 5) AS severity_4

 

            from

                file_hist fh,

                geo_measr_sample e

            where

                (

                    (fh.agn_group_id = 21)

                    and fh.data_min_tms <= '2022-04-25 00:00:00' and fh.data_max_tms >= '2022-02-28 00:00:00' --lo usa

                )

                and fh.act_id = e.act_id                                

                and (e.tms >= '2022-02-28 00:00:00' and e.tms <= '2022-04-25 00:00:00')

                and (e.measure_list #>> ('{act_edit,s}')::text[] not in ('excld') or e.measure_list #>> ('{act_edit,s}')::text[] is null)

                )t1

 

e.measure_list is a jsonb, with a variable structure

{

  "act_plmn": {

    "s": "222/1"

  },

  "struct_day": {

    "s": "2022-04-22"

  },

  "struct_week": {

    "s": "2022-04-18"

  },

  "act_plmn_name": {

    "s": "Tim.Ita (222-01)"

  },

  "struct_act_id": {

    "s": "1809464"

  },

  "struct_tc_name": {

    "s": "VoiceCall_MO"

  },

  "struct_yyyy_mm": {

    "s": "2022-04"

  },

  "act_coverage_ci": {

    "s": "63"

  },

  "act_coverage_ta": {

    "n": 4,

    "o": 4

  },

  "act_environment": {

    "s": "in-door"

  },

  "cell_code_pcell": {

    "s": "FE23E3"

  },

  "struct_act_code": {

    "s": "20220422_164238_SDTU100010.01"

  },

  "struct_act_name": {

    "s": "20220422_164238_SDTU100010.01. Copy of Voice MO 0687201815"

  },…

 

Nested Loop  (cost=0.43..2055500.00 rows=1441783 width=524) (actual time=0.761..33647.744 rows=415401 loops=1)

  Output: e.tms, fh.file_id, (e.measure_list #>> ('{cluster_comuni_italiani,s}'::cstring)::text[]), ((e.measure_list #> ('{cluster_comuni_italiani,n}'::cstring)::text[]))::numeric, ((e.measure_list #> ('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric, CASE WHEN ((((e.measure_list #> ('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double precision >= '4'::double precision) THEN '1 Clear'::text WHEN ((((e.measure_list #> ('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double precision >= '3'::double precision) THEN '2 Warning'::text WHEN ((((e.measure_list #> ('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double precision >= '2'::double precision) THEN '3 Minor'::text WHEN ((((e.measure_list #> ('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double precision >= '1'::double precision) THEN '4 Major'::text WHEN ((((e.measure_list #> ('{cluster_comuni_italiani,o}'::cstring)::text[]))::numeric)::double precision < '1'::double precision) THEN '5 Critical'::text ELSE '6 Unk'::text END, (e.measure_list #>> ('{act_geoposition_pers_act_confidence,s}'::cstring)::text[]), ((e.measure_list #> ('{act_geoposition_pers_act_confidence,n}'::cstring)::text[]))::numeric, ((e.measure_list #> ('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric, CASE WHEN ((((e.measure_list #> ('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double precision >= '4'::double precision) THEN '1 Clear'::text WHEN ((((e.measure_list #> ('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double precision >= '3'::double precision) THEN '2 Warning'::text WHEN ((((e.measure_list #> ('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double precision >= '2'::double precision) THEN '3 Minor'::text WHEN ((((e.measure_list #> ('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double precision >= '1'::double precision) THEN '4 Major'::text WHEN ((((e.measure_list #> ('{act_geoposition_pers_act_confidence,o}'::cstring)::text[]))::numeric)::double precision < '1'::double precision) THEN '5 Critical'::text ELSE '6 Unk'::text END, (e.measure_list #>> ('{act_coverage_band_pcell,s}'::cstring)::text[]), ((e.measure_list #> ('{act_coverage_band_pcell,n}'::cstring)::text[]))::numeric, ((e.measure_list #> ('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric, CASE WHEN ((((e.measure_list #> ('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double precision >= '4'::double precision) THEN '1 Clear'::text WHEN ((((e.measure_list #> ('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double precision >= '3'::double precision) THEN '2 Warning'::text WHEN ((((e.measure_list #> ('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double precision >= '2'::double precision) THEN '3 Minor'::text WHEN ((((e.measure_list #> ('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double precision >= '1'::double precision) THEN '4 Major'::text WHEN ((((e.measure_list #> ('{act_coverage_band_pcell,o}'::cstring)::text[]))::numeric)::double precision < '1'::double precision) THEN '5 Critical'::text ELSE '6 Unk'::text END, NULL::text, NULL::numeric, NULL::numeric, '6 Unk'::text

  Buffers: shared hit=365255

  ->  Seq Scan on geo_ants.file_hist fh  (cost=0.00..443.28 rows=311 width=8) (actual time=0.698..1.434 rows=315 loops=1)

        Output: fh.file_id, fh.file_name, fh.rtu, fh.port, fh.act_code, fh.file_size, fh.file_tms, fh.loaded_tms, fh.update_tms, fh.status, fh.data_min_tms, fh.data_max_tms, fh.enh_tms, fh.file_type, fh.partial_output_flag, fh.record_count, fh.status_description, fh.act_lenght, fh.act_id, fh.file_act_done, fh.enh_start_tms, fh.agn_code, fh.agn_group_id, fh.ts_sched_id, fh.ts_sched_ver, fh.enh_attempt, fh.act_done_list, fh.data_max_proc_tms, fh.data_max_loaded_tms, fh.error_count, fh.dbg_mode

        Filter: ((fh.data_min_tms <= '2022-04-25 00:00:00'::timestamp without time zone) AND (fh.data_max_tms >= '2022-02-28 00:00:00'::timestamp without time zone) AND (fh.agn_group_id = 21))

        Rows Removed by Filter: 3358

        Buffers: shared hit=379

  ->  Append  (cost=0.43..4609.77 rows=57257 width=1552) (actual time=0.012..9.971 rows=1319 loops=315)

        Buffers: shared hit=106416

        ->  Index Scan using geo_measr_sample_2022_02_act_id_tms_idx on geo_ants.geo_measr_sample_2022_02 e_1  (cost=0.43..14.42 rows=166 width=1362) (actual time=0.003..0.003 rows=0 loops=315)

              Output: e_1.tms, e_1.measure_list, e_1.act_id

              Index Cond: ((e_1.act_id = fh.act_id) AND (e_1.tms >= '2022-02-28 00:00:00'::timestamp without time zone) AND (e_1.tms <= '2022-04-25 00:00:00'::timestamp without time zone))

              Filter: (((e_1.measure_list #>> '{act_edit,s}'::text[]) <> 'excld'::text) OR ((e_1.measure_list #>> '{act_edit,s}'::text[]) IS NULL))

              Buffers: shared hit=946

        ->  Index Scan using geo_measr_sample_2022_03_act_id_tms_idx on geo_ants.geo_measr_sample_2022_03 e_2  (cost=0.56..2333.98 rows=30845 width=1552) (actual time=0.006..7.586 rows=1061 loops=315)

              Output: e_2.tms, e_2.measure_list, e_2.act_id

              Index Cond: ((e_2.act_id = fh.act_id) AND (e_2.tms >= '2022-02-28 00:00:00'::timestamp without time zone) AND (e_2.tms <= '2022-04-25 00:00:00'::timestamp without time zone))

              Filter: (((e_2.measure_list #>> '{act_edit,s}'::text[]) <> 'excld'::text) OR ((e_2.measure_list #>> '{act_edit,s}'::text[]) IS NULL))

              Rows Removed by Filter: 3

              Buffers: shared hit=75873

        ->  Index Scan using geo_measr_sample_2022_04_act_id_tms_idx on geo_ants.geo_measr_sample_2022_04 e_3  (cost=0.43..1975.08 rows=26246 width=1557) (actual time=0.005..2.232 rows=258 loops=315)

              Output: e_3.tms, e_3.measure_list, e_3.act_id

              Index Cond: ((e_3.act_id = fh.act_id) AND (e_3.tms >= '2022-02-28 00:00:00'::timestamp without time zone) AND (e_3.tms <= '2022-04-25 00:00:00'::timestamp without time zone))

              Filter: (((e_3.measure_list #>> '{act_edit,s}'::text[]) <> 'excld'::text) OR ((e_3.measure_list #>> '{act_edit,s}'::text[]) IS NULL))

              Buffers: shared hit=29597

Query Identifier: -6803725219970975357

Planning:

  Buffers: shared hit=933

Planning Time: 2.057 ms

Execution Time: 33677.292 ms

 

CREATE OR REPLACE FUNCTION geo_ants.antsgeo_get_severity_thr(v_measure_value double precision, thr_value_1 double precision, thr_value_2 double precision, thr_value_3 double precision, thr_value_4 double precision, thr_value_5 double precision)

RETURNS text

LANGUAGE sql

IMMUTABLE

AS $function$

----------------------------------------------------------------------------------------------------------------------

-- Author: Federico Travaglini

-- Date: 2020

-- Description:

-- Change Hist: please mark changes in code as "yyyy-mm-dd, Author, change request id in Merant, brief description"

----------------------------------------------------------------------------------------------------------------------

       

-- 20220426 non so perchè ma in questa versione non è efifciente

    select

        case

            --WHEN v_measure_value= 'NaN' THEN '6 Unk'::text non scommentare o le performance per qualche motivo iragionevole degradano di molto.

            when thr_value_1 = thr_value_4 then -- colorazione disabilitata, ad esempio per lat, long...

                '6 none'::text

            when thr_value_1 > thr_value_4 then -- valori critical > clear

                 -- SIAMO NEL CASO: valori critical > clear ( thr_5 clear thr_4  warning thr_3 minor thr_2 major thr_1 critical)

                CASE

                    WHEN v_measure_value >= thr_value_1 THEN '5 Critical'::text --critical

                    WHEN v_measure_value < thr_value_1 AND v_measure_value >= thr_value_2 THEN '4 Major'::text --major

                    WHEN v_measure_value < thr_value_2 AND v_measure_value >= thr_value_3 THEN '3 Minor'::text --minor

                    WHEN v_measure_value < thr_value_3 AND v_measure_value >= thr_value_4 THEN '2 Warning'::text --warning

                    WHEN v_measure_value < thr_value_4 THEN '1 Clear'::text --clear

                    ELSE '6 Unk'::text -- null values

                end

            else

                 -- SIAMO NEL CASO: valori critical < clear (critical thr_1 maj thr_2  minor thr_3 war thr_4 clear thr_5)

                CASE

                    WHEN v_measure_value < thr_value_1 THEN '5 Critical'::text --critical

                    WHEN v_measure_value >= thr_value_1 AND v_measure_value < thr_value_2 THEN '4 Major'::text --major

                    WHEN v_measure_value >= thr_value_2 AND v_measure_value < thr_value_3 THEN '3 Minor'::text --minor

                    WHEN v_measure_value >= thr_value_3 AND v_measure_value < thr_value_4 THEN '2 Warning'::text --warning

                    WHEN v_measure_value >= thr_value_4 THEN '1 Clear'::text --clear

                    ELSE '6 Unk'::text -- null values

                end               

            end::text          

$function$

;

 

 

 

By the way, if I call the overall function where it is this code fragment, I get much better performance (22 sec in place of 41) re-writing function case without nesting sub-cases, unfortunately I’m not so cleaver to get the query plan for a query executed inside a function

CREATE OR REPLACE FUNCTION geo_ants.antsgeo_get_severity_thr(v_measure_value double precision, thr_value_1 double precision, thr_value_2 double precision, thr_value_3 double precision, thr_value_4 double precision, thr_value_5 double precision)

RETURNS text

LANGUAGE sql

IMMUTABLE

AS $function$

----------------------------------------------------------------------------------------------------------------------

-- Author: Federico Travaglini

-- Date: 2020

-- Description:

-- Change Hist: please mark changes in code as "yyyy-mm-dd, Author, change request id in Merant, brief description"

----------------------------------------------------------------------------------------------------------------------

    select

        case

            --WHEN v_measure_value= 'NaN' THEN '6 Unk'::text this must be commented, it is not a problem because the semantic does not change (same case of the ELSE), but I don’t understand why it changes performance.

            when thr_value_1 = thr_value_4 then '6 Unk'::text -- colorazione disabilitata, ad esempio per lat, long...

            -- SIAMO NEL CASO: valori critical > clear ( thr_5 clear thr_4  warning thr_3 minor thr_2 major thr_1 critical)

            WHEN thr_value_1 > thr_value_4 and v_measure_value <  thr_value_4  THEN '1 Clear'::text --clear

            WHEN thr_value_1 > thr_value_4 and v_measure_value <  thr_value_3  THEN '2 Warning'::text --warning

            WHEN thr_value_1 > thr_value_4 and v_measure_value <  thr_value_2  THEN '3 Minor'::text --minor

            WHEN thr_value_1 > thr_value_4 and v_measure_value <  thr_value_1  THEN '4 Major'::text --major

            WHEN thr_value_1 > thr_value_4 and v_measure_value >= thr_value_1  THEN '5 Critical'::text --major

            -- SIAMO NEL CASO: valori critical < clear (critical thr_1 maj thr_2  minor thr_3 war thr_4 clear thr_5)

            WHEN thr_value_1 < thr_value_4 and v_measure_value >= thr_value_4 THEN '1 Clear'::text --clear

            WHEN thr_value_1 < thr_value_4 and v_measure_value >= thr_value_3 THEN '2 Warning'::text --warning

            WHEN thr_value_1 < thr_value_4 and v_measure_value >= thr_value_2 THEN '3 Minor'::text --minor

            WHEN thr_value_1 < thr_value_4 and v_measure_value >= thr_value_1 THEN '4 Major'::text --major

            WHEN thr_value_1 < thr_value_4 and v_measure_value <  thr_value_1 THEN '5 Critical'::text --critical

            ELSE '6 Unk'::text -- null values

        end::text          

$function$

;

 

Da: Merlin Moncure <mmoncure@gmail.com>
Inviato: lunedì 25 aprile 2022 21:24
A: Federico Travaglini <federico.travaglini@aubay.it>
Cc: pgsql-bugs <pgsql-bugs@lists.postgresql.org>
Oggetto: Re: 14.1 immutable function, bad performance if check number = 'NaN'

 

On Mon, Apr 25, 2022 at 11:58 AM Federico Travaglini <federico.travaglini@aubay.it> wrote:

Good evening, and thanks to your excellent Postgres.

 

This funcion in used as a column in a select on about 400k records

If I leave the highlighted row it takes 27 seconds, otherwise 14 seconds! Such behaviour looks not to be reasonable.

 

lightly testing this, I got 10million iterations in about two seconds, about the same after commenting the NaN test.  Given that, problem is probably failure to inline query.  Careful examination of explain of wrapping query should prove that.

 

merlin




This message is confidential and solely for the intended address(es). If you are not the intended recipient of this message, please notify the sender immediately and delete it from your system. Unauthorised reproduction, disclosure, modification and or distribution of this e-mail is strictly prohibited. The contents of this e-mail do not constitute a commitment by Aubay S.p.A., except where expressly provided for in a written agreement between you and Aubay.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: BUG #17468: Ranking of search results: ts_rank_cd with normalization variant 4
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: 14.1 immutable function, bad performance if check number = 'NaN'