Re: Performance bottleneck due to array manipulation

Поиск
Список
Период
Сортировка
От Igor Neyman
Тема Re: Performance bottleneck due to array manipulation
Дата
Msg-id A76B25F2823E954C9E45E32FA49D70ECCD511A63@mail.corp.perceptron.com
обсуждение исходный текст
Ответ на Performance bottleneck due to array manipulation  (Genc, Ömer <Oemer.Genc@iais.fraunhofer.de>)
Ответы Re: Performance bottleneck due to array manipulation
Список pgsql-performance

 

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Genc, Ömer
Sent: Friday, August 21, 2015 8:49 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Performance bottleneck due to array manipulation

 

Hey,

 

i have a very long running stored procedure, due to array manipulation in a stored procedure. The following procedure takes 13 seconds to finish.

 

BEGIN

    point_ids_older_than_one_hour := '{}';

    object_ids_to_be_invalidated := '{}';

 

    select ARRAY(SELECT

                    point_id

                from ONLY

                    public.ims_point as p

                where

                    p.timestamp < m_before_one_hour

                )

    into point_ids_older_than_one_hour ; -- this array has a size of 20k

 

    select ARRAY(SELECT

                        object_id

                  from

                        public.ims_object_header h

                  WHERE

                        h.last_point_id= ANY(point_ids_older_than_one_hour)

                 )

    into object_ids_to_be_invalidated; -- this array has a size of 100

 

    --    current_last_point_ids will have a size of 100k

    current_last_point_ids := ARRAY( SELECT

                                            last_point_id

                                      from

                                            public.ims_object_header h

                                     );                                    

    -- START OF PERFORMANCE BOTTLENECK

    IF(array_length(current_last_point_ids, 1) > 0)

    THEN   

        FOR i IN 0 .. array_upper(current_last_point_ids, 1)

        LOOP

            point_ids_older_than_one_hour = array_remove(point_ids_older_than_one_hour, current_last_point_ids[i]::bigint);

        END LOOP;

    END IF;

    -- END OF PERFORMANCE BOTTLENECK

END;

 

The array manipulation part is the performance bottleneck. I am pretty sure, that there is a better way of doing this, however I couldn’t find one.

What I have is two table, lets call them ims_point and ims_object_header. ims_object_header references some entries of ims_point in the column last_point_id.

Now I want to delete all entries from ims_point, where the timestamp is older than one hour. The currently being referenced ids of the table ims_object_header should be excluded from this deletion. Therefore I stored the ids in arrays and iterate over those arrays to exclude the referenced values from being deleted.

 

However, I not sure if using an array for an operation like this is the best approach.

 

Can anyone give me some advice how this could be enhanced.

 

Thanks in advance.

 

 

I think in this case (as is in many other cases) “pure” SQL does the job much better than procedural language:

 

DELETE FROM public.ims_point as P

WHERE  P.timestamp < m_before_one_hour

     AND NOT EXISTS (SELECT 1 FROM  public.ims_object_header OH

                                                WHERE OH.last_point_id = P.object_id);

 

Is that what you are trying to accomplish?

 

Regards,

Igor Neyman

 

 

 

 

В списке pgsql-performance по дате отправления:

Предыдущее
От: Félix GERZAGUET
Дата:
Сообщение: Re: Performance bottleneck due to array manipulation
Следующее
От: "Guo, Yun"
Дата:
Сообщение: query not using GIN index