Обсуждение: Slow query in trigger function

Поиск
Список
Период
Сортировка

Slow query in trigger function

От
Guido Niewerth
Дата:

Hello,

 

I´ve got a table custom_data which essentially contains a number of key/value pairs. This table holds a large number (about 40M) of records and I need the distinct keys and values for some reasons. Selecting those distinct data takes a couple of seconds, so I decided to maintain a separate lookup table for both the key and value data. The lookup tables are maintained by a trigger that reacts on inserts/updates/deletes on the original table. While checking the correctness of my trigger function I noticed that the SQL query in the trigger function is surprisingly slow, taking about 5-6 seconds. When I ran the SQL query outside the trigger function it showed the expected performance and returned in a couple of milliseconds. Though the original table is very large it holds only a small number of distinct key / value values:

 

SELECT DISTINCT key FROM custom_data;

>> 12 rows returned

 

SELECT DISTINCT value FROM custom_data;

>> 13 rows returned

 

 

Here are the relveant information (function body of the trigger function reduced to show the behaviour):

 

PostgreSQL Version:

PostgreSQL 9.1.13, compiled by Visual C++ build 1500, 64-bit

 

OS Version:

Windows 7 64bit

 

Scenario to reproduce the behaviour:

EMS Solution SQL Manager: SQL Editor used to run SQL commands from an editor

 

Server configuration:

name                                                   current_setting                               source

DateStyle                                           ISO, DMY                                           session

default_text_search_config      pg_catalog.german                        configuration file

effective_cache_size                    8GB                                                      configuration file

lc_messages                                     German_Germany.1252              configuration file

lc_monetary                                     German_Germany.1252              configuration file

lc_numeric                                        German_Germany.1252              configuration file

lc_time                                                German_Germany.1252              configuration file

listen_addresses                             *                                                            configuration file

log_destination                               stderr                                                  configuration file

log_line_prefix                                %t                                                         configuration file

log_timezone                                  CET                                                       environment variable

logging_collector                             on                                                         configuration file

max_connections                           100                                                        configuration file

max_stack_depth                          2MB                                                     environment variable

port                                                      5432                                                     configuration file

shared_buffers                               4GB                                                      configuration file

statement_timeout                       0                                                            session

TimeZone                                          CET                                                       environment variable

work_mem                                       64MB                                                   configuration file

 

custom_data table definition:

CREATE TABLE public.custom_data (

  custom_data_id SERIAL,

  file_id INTEGER DEFAULT 0 NOT NULL,

  user_id INTEGER DEFAULT 0 NOT NULL,

  "timestamp" TIMESTAMP(0) WITHOUT TIME ZONE DEFAULT '1970-01-01 00:00:00'::timestamp without time zone NOT NULL,

  key TEXT DEFAULT ''::text NOT NULL,

  value TEXT DEFAULT ''::text NOT NULL,

  CONSTRAINT pkey_custom_data PRIMARY KEY(custom_data_id),

) WITHOUT OIDS;

 

CREATE INDEX idx_custom_data_key ON public.custom_data USING btree (key);

 

CREATE INDEX idx_custom_data_value ON public.custom_data  USING btree (value);

 

CREATE TRIGGER on_custom_data_changed AFTER INSERT OR UPDATE OR DELETE

ON public.custom_data FOR EACH ROW

EXECUTE PROCEDURE public.on_change_custom_data();

 

CREATE OR REPLACE FUNCTION public.on_change_custom_data () 
RETURNS trigger AS 
$body$ 
BEGIN 
   IF TG_OP = 'UPDATE' THEN 

      RAISE NOTICE 'Check custom data key start  : %', timeofday(); 
      IF NOT EXISTS (SELECT 1 FROM custom_data WHERE key = OLD.key ) THEN      

      END IF; 
      RAISE NOTICE 'Check custom data key end    : %', timeofday(); 
    END IF; 
    RETURN NULL; 
END; 
$body$ 
LANGUAGE 'plpgsql' 
VOLATILE CALLED ON NULL INPUT SECURITY INVOKER COST 100;

 

postgreSQL log:

HINWEIS:  Check custom data key start  : Fri Oct 30 11:56:41.785000 2015 CET << start of IF NOT EXIST (...)

HINWEIS:  Check custom data key end    : Fri Oct 30 11:56:47.145000 2015 CET << end of IF NOT EXISTS (...) : ~5.4 seconds

 
Query OK, 1 rows affected (5,367 sec)

 

Same query run in SQL editor:

SELECT 1 FROM custom_data WHERE key='key-1' 
1 rows returned (16 ms)

 

As you can see there´s a huge runtime difference between the select query used in the trigger function and the one run from the SQL editor.

 

 

Guido Niewerth

25 years inspired synergy
OCS Optical Control Systems GmbH
Wullener Feld 24
58454 Witten
Germany

Tel: +49 (0) 2302 95622-0
Fax: +49 (0) 2302 95622-33

Email: gniewerth@ocsgmbh.com
Web: http://www.ocsgmbh.com

HRB 8442 (Bochum) | VAT-ID: DE 124 084 990
Directors: Hans Gloeckler, Fatah Najaf, Merdan Sariboga


Re: Slow query in trigger function

От
Tom Lane
Дата:
Guido Niewerth <gniewerth@ocsgmbh.com> writes:
> As you can see there�s a huge runtime difference between the select query used in the trigger function and the one
runfrom the SQL editor. 

contrib/auto_explain might be useful in seeing what's going on, in
particular it would tell us whether or not a different plan is being
used for the query inside the function.

            regards, tom lane


Re: Slow query in trigger function

От
Guido Niewerth
Дата:

I needed to set up the trigger function again, so here it is:

 

CREATE OR REPLACE FUNCTION public.fn_trigger_test ()

RETURNS trigger AS

$body$

DECLARE

                start TIMESTAMP;

BEGIN

   start := timeofday();

   IF TG_OP = 'UPDATE' THEN

      IF NOT EXISTS( SELECT key FROM custom_data WHERE key = old.key LIMIT 1 ) THEN

                DELETE FROM lookup_custom_data_keys WHERE key = old.key;

      END IF;

      IF NOT EXISTS( SELECT 1 FROM lookup_custom_data_keys WHERE key = new.key LIMIT 1 ) THEN

                INSERT INTO lookup_custom_data_keys (key) VALUES (new.key);

      END IF;

   END IF;

   RAISE NOTICE 'Trigger % ran: %', TG_OP, age( timeofday() ::TIMESTAMP, start );

   RETURN NULL;

END;

$body$

LANGUAGE 'plpgsql'

VOLATILE

CALLED ON NULL INPUT

SECURITY INVOKER

COST 100;

 

And this is the execution plan. It looks like it does a slow sequential scan where it´s able to do an index scan:

 

2015-11-02 17:42:10 CET LOG:  duration: 5195.673 ms  plan:

                Query Text: SELECT NOT EXISTS( SELECT 1 FROM custom_data WHERE key = old.key LIMIT 1 )

                Result  (cost=0.09..0.10 rows=1 width=0) (actual time=5195.667..5195.667 rows=1 loops=1)

                  Output: (NOT $0)

                  Buffers: shared hit=34 read=351750

                  InitPlan 1 (returns $0)

                    ->  Limit  (cost=0.00..0.09 rows=1 width=0) (actual time=5195.662..5195.662 rows=0 loops=1)

                          Output: (1)

                          Buffers: shared hit=34 read=351750

                          ->  Seq Scan on public.custom_data  (cost=0.00..821325.76 rows=9390835 width=0) (actual time=5195.658..5195.658 rows=0 loops=1)

                                Output: 1

                                Filter: (custom_data.key = $15)

                                Buffers: shared hit=34 read=351750

2015-11-02 17:42:10 CET ZUSAMMENHANG:  SQL-Anweisung »SELECT NOT EXISTS( SELECT 1 FROM custom_data WHERE key = old.key LIMIT 1 )«

                PL/pgSQL function "fn_trigger_test" line 7 at IF

2015-11-02 17:42:10 CET LOG:  duration: 0.014 ms  plan:

                Query Text: DELETE FROM lookup_custom_data_keys WHERE key = old.key

                Delete on public.lookup_custom_data_keys  (cost=0.00..38.25 rows=1 width=6) (actual time=0.013..0.013 rows=0 loops=1)

                  Buffers: shared hit=2

                  ->  Seq Scan on public.lookup_custom_data_keys  (cost=0.00..38.25 rows=1 width=6) (actual time=0.007..0.007 rows=1 loops=1)

                        Output: ctid

                        Filter: (lookup_custom_data_keys.key = $15)

                        Buffers: shared hit=1

2015-11-02 17:42:10 CET ZUSAMMENHANG:  SQL-Anweisung »DELETE FROM lookup_custom_data_keys WHERE key = old.key«

                PL/pgSQL function "fn_trigger_test" line 8 at SQL-Anweisung

2015-11-02 17:42:10 CET LOG:  duration: 0.005 ms  plan:

                Query Text: SELECT NOT EXISTS( SELECT 1 FROM lookup_custom_data_keys WHERE key = new.key LIMIT 1 )

                Result  (cost=38.25..38.26 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)

                  Output: (NOT $0)

                  Buffers: shared hit=1

                  InitPlan 1 (returns $0)

                    ->  Limit  (cost=0.00..38.25 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=1)

                          Output: (1)

                          Buffers: shared hit=1

                          ->  Seq Scan on public.lookup_custom_data_keys  (cost=0.00..38.25 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)

                                Output: 1

                                Filter: (lookup_custom_data_keys.key = $17)

                                Buffers: shared hit=1

2015-11-02 17:42:10 CET ZUSAMMENHANG:  SQL-Anweisung »SELECT NOT EXISTS( SELECT 1 FROM lookup_custom_data_keys WHERE key = new.key LIMIT 1 )«

                PL/pgSQL function "fn_trigger_test" line 10 at IF

2015-11-02 17:42:10 CET LOG:  duration: 0.116 ms  plan:

                Query Text: INSERT INTO lookup_custom_data_keys (key) VALUES (new.key)

                Insert on public.lookup_custom_data_keys  (cost=0.00..0.01 rows=1 width=0) (actual time=0.115..0.115 rows=0 loops=1)

                  Buffers: shared hit=1

                  ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)

                        Output: $17

2015-11-02 17:42:10 CET ZUSAMMENHANG:  SQL-Anweisung »INSERT INTO lookup_custom_data_keys (key) VALUES (new.key)«

                PL/pgSQL function "fn_trigger_test" line 11 at SQL-Anweisung

2015-11-02 17:42:10 CET LOG:  duration: 5200.475 ms  plan:

                Query Text: UPDATE custom_data SET key= 'key-2' WHERE key = 'key-1'

                Update on public.custom_data  (cost=0.00..15.35 rows=1 width=34) (actual time=0.369..0.369 rows=0 loops=1)

                  Buffers: shared hit=29

                  ->  Index Scan using idx_custom_data_key on public.custom_data  (cost=0.00..15.35 rows=1 width=34) (actual time=0.088..0.090 rows=1 loops=1)

                        Output: custom_data_id, file_id, user_id, "timestamp", 'key-2'::text, value, ctid

                        Index Cond: (custom_data.key = 'key-1'::text)

                        Buffers: shared hit=6

 

 

 

Execution plan of the normal query "SELECT NOT EXISTS( SELECT 1 FROM custom_data WHERE key='key-1' LIMIT 1 );":

 

2015-11-02 17:44:28 CET LOG:  duration: 0.052 ms  plan:

                Query Text: SELECT NOT EXISTS( SELECT 1 FROM custom_data WHERE key='key-1' LIMIT 1 );

                Result  (cost=15.35..15.36 rows=1 width=0) (actual time=0.047..0.047 rows=1 loops=1)

                  Output: (NOT $0)

                  Buffers: shared hit=6

                  InitPlan 1 (returns $0)

                    ->  Limit  (cost=0.00..15.35 rows=1 width=0) (actual time=0.045..0.045 rows=0 loops=1)

                          Output: (1)

                          Buffers: shared hit=6

                          ->  Index Scan using idx_custom_data_key on public.custom_data  (cost=0.00..15.35 rows=1 width=0) (actual time=0.043..0.043 rows=0 loops=1)

                                Output: 1

                                Index Cond: (custom_data.key = 'key-1'::text)

                                Buffers: shared hit=6

 

Guido Niewerth

25 years inspired synergy
OCS Optical Control Systems GmbH
Wullener Feld 24
58454 Witten
Germany

Tel: +49 (0) 2302 95622-0
Fax: +49 (0) 2302 95622-33

Email: gniewerth@ocsgmbh.com
Web: http://www.ocsgmbh.com

HRB 8442 (Bochum) | VAT-ID: DE 124 084 990
Directors: Hans Gloeckler, Fatah Najaf, Merdan Sariboga


Re: Slow query in trigger function

От
Tom Lane
Дата:
Guido Niewerth <gniewerth@ocsgmbh.com> writes:
> And this is the execution plan. It looks like it does a slow sequential scan where it�s able to do an index scan:

> 2015-11-02 17:42:10 CET LOG:  duration: 5195.673 ms  plan:
>                 Query Text: SELECT NOT EXISTS( SELECT 1 FROM custom_data WHERE key = old.key LIMIT 1 )
>                 Result  (cost=0.09..0.10 rows=1 width=0) (actual time=5195.667..5195.667 rows=1 loops=1)
>                   Output: (NOT $0)
>                   Buffers: shared hit=34 read=351750
>                   InitPlan 1 (returns $0)
>                     ->  Limit  (cost=0.00..0.09 rows=1 width=0) (actual time=5195.662..5195.662 rows=0 loops=1)
>                           Output: (1)
>                           Buffers: shared hit=34 read=351750
>                           ->  Seq Scan on public.custom_data  (cost=0.00..821325.76 rows=9390835 width=0) (actual
time=5195.658..5195.658rows=0 loops=1) 
>                                 Output: 1
>                                 Filter: (custom_data.key = $15)
>                                 Buffers: shared hit=34 read=351750

It looks like you're getting bit by an inaccurate estimate of what will be
the quickest way to satisfy a LIMIT query.  In this particular situation,
I'd advise just dropping the LIMIT, as it contributes nothing useful.

(If memory serves, 9.5 will actually ignore constant-LIMIT clauses inside
EXISTS(), because people keep writing them even though they're useless.
Earlier releases do not have that code though.)

            regards, tom lane


Re: Slow query in trigger function

От
Guido Niewerth
Дата:

These are the queries I used to get the execution planer use the index scan instead of the sequential scan:

 

IF NOT EXISTS (SELECT 1 FROM custom_data WHERE key = old.key) => sequential scan

IF NOT EXISTS (SELECT 1 FROM custom_data WHERE key = old.key LIMIT 1) => sequential scan

IF NOT EXISTS (SELECT max( 1 ) FROM custom_data WHERE key = old.key) => sequential scan

 

After breaking up the code into two statements the execution planer uses the index scan:

 

result INTEGER;

SELECT 1 FROM custom_data where key = old.key INTO result;

IF result ISNULL THEN

   ...

END IF;

 

To me it looks like the execution planer does not choose the optimal strategy. Even small changes in the function body make the execution planer use the slow sequential scan.

Guido Niewerth

25 years inspired synergy
OCS Optical Control Systems GmbH
Wullener Feld 24
58454 Witten
Germany

Tel: +49 (0) 2302 95622-0
Fax: +49 (0) 2302 95622-33

Email: gniewerth@ocsgmbh.com
Web: http://www.ocsgmbh.com

HRB 8442 (Bochum) | VAT-ID: DE 124 084 990
Directors: Hans Gloeckler, Fatah Najaf, Merdan Sariboga