Query much slower when run from postgres function
От | Mario Splivalo |
---|---|
Тема | Query much slower when run from postgres function |
Дата | |
Msg-id | 49B51B60.6030600@megafon.hr обсуждение исходный текст |
Ответы |
Re: Query much slower when run from postgres function
Re: Query much slower when run from postgres function |
Список | pgsql-performance |
I have a function, looking like this: CREATE OR REPLACE FUNCTION get_memo_display_queue_size(a_service_id integer) RETURNS integer AS $BODY$ SELECT COUNT(*)::integer FROM v_messages_memo LEFT JOIN messages_memo_displayed ON id = message_id WHERE service_id = $1 AND state = 1 AND admin_id IS NULL; $BODY$ LANGUAGE 'sql' VOLATILE SECURITY DEFINER COST 100; Now, when I run that function from psql, it takes around 200ms to complete: pulitzer2=# explain analyze select get_memo_display_queue_size(1829); QUERY PLAN ---------------------------------------------------------------------------------------- Result (cost=0.00..0.26 rows=1 width=0) (actual time=219.728..219.730 rows=1 loops=1) Total runtime: 219.758 ms (2 rows) pulitzer2=# And it takes around 200ms each time I run the function! When I rewrite the query so I can see queryplan, I get this: create view _v1 as SELECT * FROM v_messages_memo LEFT JOIN messages_memo_displayed ON id = message_id WHERE state = 1 AND admin_id IS NULL; pulitzer2=# EXPLAIN ANALYZE select count(*) from _v1 WHERE service_id = 1829; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=23506.14..23506.15 rows=1 width=0) (actual time=6.001..6.002 rows=1 loops=1) -> Nested Loop (cost=150.69..23505.59 rows=216 width=0) (actual time=5.744..5.971 rows=13 loops=1) -> Hash Left Join (cost=150.69..11035.16 rows=2104 width=4) (actual time=5.721..5.793 rows=13 loops=1) Hash Cond: (messages.id = messages_memo_displayed.message_id) Filter: (messages_memo_displayed.admin_id IS NULL) -> Bitmap Heap Scan on messages (cost=97.03..10955.11 rows=4209 width=4) (actual time=0.042..0.075 rows=13 loops=1) Recheck Cond: (service_id = 1829) -> Bitmap Index Scan on messages_uq__service_id__tan (cost=0.00..95.98 rows=4209 width=0) (actual time=0.032..0.032 rows=13 loops=1) Index Cond: (service_id = 1829) -> Hash (cost=28.85..28.85 rows=1985 width=8) (actual time=5.666..5.666 rows=1985 loops=1) -> Seq Scan on messages_memo_displayed (cost=0.00..28.85 rows=1985 width=8) (actual time=0.009..2.697 rows=1985 loops=1) -> Index Scan using messages_memo_pk on messages_memo (cost=0.00..5.91 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=13) Index Cond: (messages_memo.message_id = messages.id) Filter: ((messages_memo.state)::integer = 1) Total runtime: 6.079 ms (15 rows) So I noticed that postgres is using seq_scan on messages_memo_displayed, although there is a PK (and an index) on message_id in messages_memo_displayed (I'll post DDL of the tables at the end of the post). So, I tried EXPLAIN ANALYZE after I forced planner not to use sequential scans: pulitzer2=# EXPLAIN ANALYZE select count(*) from _v1 WHERE service_id = 1829; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=25403.60..25403.61 rows=1 width=0) (actual time=6.546..6.547 rows=1 loops=1) -> Nested Loop (cost=2048.16..25403.06 rows=216 width=0) (actual time=6.287..6.512 rows=13 loops=1) -> Hash Left Join (cost=2048.16..12932.63 rows=2104 width=4) (actual time=6.268..6.340 rows=13 loops=1) Hash Cond: (messages.id = messages_memo_displayed.message_id) Filter: (messages_memo_displayed.admin_id IS NULL) -> Bitmap Heap Scan on messages (cost=97.03..10955.11 rows=4209 width=4) (actual time=0.043..0.078 rows=13 loops=1) Recheck Cond: (service_id = 1829) -> Bitmap Index Scan on messages_uq__service_id__tan (cost=0.00..95.98 rows=4209 width=0) (actual time=0.032..0.032 rows=13 loops=1) Index Cond: (service_id = 1829) -> Hash (cost=1926.31..1926.31 rows=1985 width=8) (actual time=6.211..6.211 rows=1985 loops=1) -> Index Scan using messages_memo_displayed_pk on messages_memo_displayed (cost=0.00..1926.31 rows=1985 width=8) (actual time=0.069..3.221 rows=1985 loops=1) -> Index Scan using messages_memo_pk on messages_memo (cost=0.00..5.91 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=13) Index Cond: (messages_memo.message_id = messages.id) Filter: ((messages_memo.state)::integer = 1) Total runtime: 6.628 ms (15 rows) No sequential scan. So I 'changed' my function so that first row says 'SET enable_seqscan TO false'. After that, here are the times for the function call: mike@som:~$ psql -U postgres pulitzer2 Welcome to psql 8.3.5, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit pulitzer2=# explain analyze select get_memo_display_queue_size(1829); QUERY PLAN ---------------------------------------------------------------------------------------- Result (cost=0.00..0.26 rows=1 width=0) (actual time=300.448..300.450 rows=1 loops=1) Total runtime: 300.491 ms (2 rows) pulitzer2=# explain analyze select get_memo_display_queue_size(1829); QUERY PLAN ------------------------------------------------------------------------------------ Result (cost=0.00..0.26 rows=1 width=0) (actual time=1.940..1.941 rows=1 loops=1) Total runtime: 1.961 ms (2 rows) pulitzer2=# explain analyze select get_memo_display_queue_size(1829); QUERY PLAN ------------------------------------------------------------------------------------ Result (cost=0.00..0.26 rows=1 width=0) (actual time=1.946..1.947 rows=1 loops=1) Total runtime: 1.973 ms (2 rows) pulitzer2=# explain analyze select get_memo_display_queue_size(1829); QUERY PLAN ------------------------------------------------------------------------------------ Result (cost=0.00..0.26 rows=1 width=0) (actual time=1.936..1.937 rows=1 loops=1) Total runtime: 1.964 ms (2 rows) pulitzer2=# So, first query on the same connection takes 300ms, and any subsequential query on the same connection takes less than 2 ms. If I remove 'SET enable_seqscan TO false' from the top of the function, every call to the function takes around 200-300ms. Now, as I was explained on pg-jdbc mailinglist, that 'SET enable_seqscan TO false' affects all queries on that persistent connection from tomcat, and It's not good solution. So I wanted to post here to ask what other options do I have. While writing this I realized that, without forcing sequential scan out, I get much quicker execution times when I do: SELECT count(*) FROM _v1 WHERE service_id = 1829 then when I do SELECT get_memo_display_queue_size(1829), as seen here: mike@som:~$ psql -U postgres pulitzer2 Welcome to psql 8.3.5, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit pulitzer2=# explain analyze select get_memo_display_queue_size(1829); QUERY PLAN ---------------------------------------------------------------------------------------- Result (cost=0.00..0.26 rows=1 width=0) (actual time=259.090..259.092 rows=1 loops=1) Total runtime: 259.132 ms (2 rows) pulitzer2=# EXPLAIN ANALYZE select count(*) from _v1 WHERE service_id = 1829; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=23517.98..23517.99 rows=1 width=0) (actual time=5.942..5.943 rows=1 loops=1) -> Nested Loop (cost=150.70..23517.44 rows=216 width=0) (actual time=5.674..5.909 rows=13 loops=1) -> Hash Left Join (cost=150.70..11037.87 rows=2105 width=4) (actual time=5.633..5.706 rows=13 loops=1) Hash Cond: (messages.id = messages_memo_displayed.message_id) Filter: (messages_memo_displayed.admin_id IS NULL) -> Bitmap Heap Scan on messages (cost=97.04..10957.81 rows=4210 width=4) (actual time=0.032..0.063 rows=13 loops=1) Recheck Cond: (service_id = 1829) -> Bitmap Index Scan on messages_uq__service_id__tan (cost=0.00..95.98 rows=4210 width=0) (actual time=0.022..0.022 rows=13 loops=1) Index Cond: (service_id = 1829) -> Hash (cost=28.85..28.85 rows=1985 width=8) (actual time=5.588..5.588 rows=1985 loops=1) -> Seq Scan on messages_memo_displayed (cost=0.00..28.85 rows=1985 width=8) (actual time=0.009..2.690 rows=1985 loops=1) -> Index Scan using messages_memo_pk on messages_memo (cost=0.00..5.92 rows=1 width=4) (actual time=0.008..0.010 rows=1 loops=13) Index Cond: (messages_memo.message_id = messages.id) Filter: ((messages_memo.state)::integer = 1) Total runtime: 6.026 ms (15 rows) pulitzer2=# explain analyze select get_memo_display_queue_size(1829); QUERY PLAN ---------------------------------------------------------------------------------------- Result (cost=0.00..0.26 rows=1 width=0) (actual time=211.712..211.714 rows=1 loops=1) Total runtime: 211.742 ms (2 rows) pulitzer2=# EXPLAIN ANALYZE select count(*) from _v1 WHERE service_id = 1829; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=23517.98..23517.99 rows=1 width=0) (actual time=5.918..5.920 rows=1 loops=1) -> Nested Loop (cost=150.70..23517.44 rows=216 width=0) (actual time=5.659..5.885 rows=13 loops=1) -> Hash Left Join (cost=150.70..11037.87 rows=2105 width=4) (actual time=5.638..5.711 rows=13 loops=1) Hash Cond: (messages.id = messages_memo_displayed.message_id) Filter: (messages_memo_displayed.admin_id IS NULL) -> Bitmap Heap Scan on messages (cost=97.04..10957.81 rows=4210 width=4) (actual time=0.043..0.078 rows=13 loops=1) Recheck Cond: (service_id = 1829) -> Bitmap Index Scan on messages_uq__service_id__tan (cost=0.00..95.98 rows=4210 width=0) (actual time=0.033..0.033 rows=13 loops=1) Index Cond: (service_id = 1829) -> Hash (cost=28.85..28.85 rows=1985 width=8) (actual time=5.581..5.581 rows=1985 loops=1) -> Seq Scan on messages_memo_displayed (cost=0.00..28.85 rows=1985 width=8) (actual time=0.009..2.678 rows=1985 loops=1) -> Index Scan using messages_memo_pk on messages_memo (cost=0.00..5.92 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=13) Index Cond: (messages_memo.message_id = messages.id) Filter: ((messages_memo.state)::integer = 1) Total runtime: 5.994 ms (15 rows) pulitzer2=# Now I'm confused, why is 'sql' function much slower than 'direct' SELECT? Mike P.S. Here are tables definition, from psql: pulitzer2=# \d messages Table "public.messages" Column | Type | Modifiers --------------------+--------------------------+--------------------------------------------------------------------- id | integer | not null default nextval(('public.message_id_seq'::text)::regclass) from | character varying(15) | not null to | character varying(10) | not null receiving_time | timestamp with time zone | not null default now() raw_text | character varying | not null keyword | character varying | destination_id | integer | not null vpn_id | integer | service_id | integer | status | integer | not null default 2 gateway_message_id | character varying | not null prize_id | integer | tan | character varying | Indexes: "messages_pk" PRIMARY KEY, btree (id) "messages_uq__gateway_message_id" UNIQUE, btree (gateway_message_id) "messages_uq__service_id__tan" UNIQUE, btree (service_id, tan) "messages_ix_from" btree ("from") "messages_ix_receiving_time__service_id__status" btree (receiving_time, service_id, status) "messages_ix_vpn_id" btree (vpn_id) Foreign-key constraints: "messages_fk__destinations_id" FOREIGN KEY (destination_id) REFERENCES destinations(id) "messages_fk__service_prizes_prize_id" FOREIGN KEY (prize_id) REFERENCES service_prizes(prize_id) "messages_fk__services_id" FOREIGN KEY (service_id) REFERENCES services(id) "messages_fk__vpns_id" FOREIGN KEY (vpn_id) REFERENCES vpns(id) pulitzer2=# \d messages_memo Table "public.messages_memo" Column | Type | Modifiers ------------------------+--------------------------+----------- message_id | integer | not null memo | character varying | state | dom_messages_memo_state | not null admin_id | integer | admin_change_timestamp | timestamp with time zone | Indexes: "messages_memo_pk" PRIMARY KEY, btree (message_id) Foreign-key constraints: "messages_memo_fk__messages_id" FOREIGN KEY (message_id) REFERENCES messages(id) pulitzer2=# \d messages_memo_displayed Table "public.messages_memo_displayed" Column | Type | Modifiers ------------+---------+----------- message_id | integer | not null admin_id | integer | not null Indexes: "messages_memo_displayed_pk" PRIMARY KEY, btree (message_id, admin_id) Foreign-key constraints: "messages_memo_displayed_fk__admins_id" FOREIGN KEY (admin_id) REFERENCES admins(id) "messages_memo_displayed_fk__messages_id" FOREIGN KEY (message_id) REFERENCES messages(id) pulitzer2=# \d v_messages_memo View "public.v_messages_memo" Column | Type | Modifiers --------------------+--------------------------+----------- id | integer | from | character varying(15) | to | character varying(10) | receiving_time | timestamp with time zone | raw_text | character varying | keyword | character varying | destination_id | integer | vpn_id | integer | service_id | integer | status | integer | gateway_message_id | character varying | prize_id | integer | tan | character varying | memo | character varying | state | dom_messages_memo_state | displayed | boolean | View definition: SELECT v_messages_full.id, v_messages_full."from", v_messages_full."to", v_messages_full.receiving_time, v_messages_full.raw_text, v_messages_full.keyword, v_messages_full.destination_id, v_messages_full.vpn_id, v_messages_full.service_id, v_messages_full.status, v_messages_full.gateway_message_id, v_messages_full.prize_id, v_messages_full.tan, messages_memo.memo, messages_memo.state, NULL::boolean AS displayed FROM messages v_messages_full JOIN messages_memo ON v_messages_full.id = messages_memo.message_id; pulitzer2=#
В списке pgsql-performance по дате отправления: