Обсуждение: Postgresql Sort cost Poor performance?
1、postgresql version qis3_dp2=> select * from version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit (1 row) qis3_dp2=> 2、postgresql work_mem qis3_dp2=> SHOW work_mem; work_mem ---------- 2GB (1 row) qis3_dp2=> SHOW shared_buffers; shared_buffers ---------------- 4028MB (1 row) qis3_dp2=> 3、Table count qis3_dp2=> select count(*) from QIS_CARPASSEDSTATION; count ---------- 11453079 (1 row) qis3_dp2=> 4、table desc qis3_dp2=> \dS QIS_CARPASSEDSTATION; Table "qis_schema.qis_carpassedstation" Column | Type | Collation | Nullable | Default --------------+-----------------------------+-----------+----------+--------- iid | integer | | not null | scartypecd | character varying(50) | | | svin | character varying(20) | | | sstationcd | character varying(50) | | | dpassedtime | timestamp(6) with time zone | | | dworkdate | date | | | iworkyear | integer | | | iworkmonth | integer | | | iweek | integer | | | sinputteamcd | character varying(20) | | | sinputdutycd | character varying(20) | | | smtoc | character varying(50) | | | slineno | character varying(18) | | | Indexes: "qis_carpassedstation_pkey" PRIMARY KEY, btree (iid) "q_carp_dworkdate" btree (dworkdate) "q_carp_smtoc" btree (smtoc) qis3_dp2=> 5、Execute SQL: qis3_dp2=> EXPLAIN (analyze true,buffers true) SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM ,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD') group by SMTOC ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------- GroupAggregate (cost=697738.61..714224.02 rows=372 width=30) (actual time=5908.786..32420.412 rows=410 loops=1) Group Key: smtoc Buffers: shared hit=401 read=184983 I/O Timings: read=1377.762 -> Sort (cost=697738.61..703232.51 rows=2197559 width=40) (actual time=5907.791..6139.351 rows=2142215 loops=1) Sort Key: smtoc Sort Method: quicksort Memory: 265665kB Buffers: shared hit=401 read=184983 I/O Timings: read=1377.762 -> Gather (cost=1000.00..466253.56 rows=2197559 width=40) (actual time=0.641..1934.614 rows=2142215 loops=1) Workers Planned: 5 Workers Launched: 5 Buffers: shared hit=401 read=184983 I/O Timings: read=1377.762 -> Parallel Seq Scan on qis_carpassedstation a (cost=0.00..245497.66 rows=439512 width=40) (actual time=0.245..1940.527 rows=357036 loops=6) Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND ((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >= to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da te('2019-03-11'::text, 'YYYY-MM-DD'::text))) Rows Removed by Filter: 1551811 Buffers: shared hit=401 read=184983 I/O Timings: read=1377.762 Planning Time: 0.393 ms Execution Time: 32439.704 ms (21 rows) qis3_dp2=> 6、Why does sort take a long time to execute and how can you optimize it? Thanks!!! -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
>>>>> "tank" == tank zhang <6220104@qq.com> writes: tank> smtoc | character varying(50) | | | tank> Sort Key: smtoc What is the output of SHOW lc_collate; One of the most common reasons for slow sorting is that you're sorting a text/varchar field in a locale other than C. The slowdown for using other locales varies according to the data, the locale, and the operating system, but 8-20x slowdowns are very common, 50-100x slowdowns are not unusual, and there have been reports of even worse cases with unusual script combinations. -- Andrew (irc:RhodiumToad)
Thank you for your reply. qis3_dp2=> SHOW lc_collate; lc_collate ------------- en_US.UTF-8 (1 row) Time: 0.311 ms qis3_dp2=> qis3_dp2=> SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM ,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD') group by SMTOC ; checkcarnum | smtoc -------------+----------------------- 90 | HT6LHD700 NH731P A 690 | HT6LHD700 NH788P A 90 | HT6LHD700 R550P A 30 | HT6LHD700 YR615M A 1141 | HT6MHB700 NH731P A Is there any possibility of optimization? -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
On 01/04/2019 23:20, Andrew Gierth wrote: >>>>>> "tank" == tank zhang <6220104@qq.com> writes: > tank> smtoc | character varying(50) | | | > tank> Sort Key: smtoc > > What is the output of SHOW lc_collate; > > One of the most common reasons for slow sorting is that you're sorting a > text/varchar field in a locale other than C. The slowdown for using > other locales varies according to the data, the locale, and the > operating system, but 8-20x slowdowns are very common, 50-100x slowdowns > are not unusual, and there have been reports of even worse cases with > unusual script combinations. > Just wondering... Would it be possible to optionally enable the system to create a hidden system column for the text field to be sorted, the new column would be the original column preprocessed to sort correctly & efficiently. This would seem to lead to a massive improvement in performance. Depending relative tradeoffs disk storage vs processing: (A) create hidden system column for each sort invocation (B) at table creation (C) other possibilities (A) could be done automatically, and possibly controlled via a GUC parameter (B) might require a change to the CREATE TABLE syntax Anyhow, just some thoughts... Cheers, Gavin
Re: Gavin Flower 2019-04-01 <3113f2f4-a4da-4862-8596-2e189398c988@archidevsys.co.nz> > Would it be possible to optionally enable the system to create a hidden > system column for the text field to be sorted, the new column would be the > original column preprocessed to sort correctly & efficiently. That's the idea behind the strxfrm(3) optimization that ultimately got disabled again for non-C locales because glibc fails to implement it correctly. Christoph
Hi,
If your problem is the sort, try creating an index on the Field that you consider thst could be needed (you can star with smtoc that is the one you are grouping and sorting)
Another thing that i noticed is your work_mem, I thing is too high for a global config (if you think 2gb can hel for this operation you can set it before execute the query but only for that session), but generally this value most be smaller depending on the commons query every sub query uses that amount of mem (i.e if you have a query that have 3 subqueries and each one with a sort operation and a grouping operation, you can be using 12 gb of mem in that only big query, and it doesn't mean it will be faster).. try to monitor the uses of ram by pgsql maybe you can be suffering paging problems because os the size of you work_mem and that make the dbms slow too
On Mon, Apr 1, 2019, 6:45 AM tank.zhang <6220104@qq.com> wrote:
1、postgresql version
qis3_dp2=> select * from version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)
qis3_dp2=>
2、postgresql work_mem
qis3_dp2=> SHOW work_mem;
work_mem
----------
2GB
(1 row)
qis3_dp2=> SHOW shared_buffers;
shared_buffers
----------------
4028MB
(1 row)
qis3_dp2=>
3、Table count
qis3_dp2=> select count(*) from QIS_CARPASSEDSTATION;
count
----------
11453079
(1 row)
qis3_dp2=>
4、table desc
qis3_dp2=> \dS QIS_CARPASSEDSTATION;
Table "qis_schema.qis_carpassedstation"
Column | Type | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+---------
iid | integer | | not null |
scartypecd | character varying(50) | | |
svin | character varying(20) | | |
sstationcd | character varying(50) | | |
dpassedtime | timestamp(6) with time zone | | |
dworkdate | date | | |
iworkyear | integer | | |
iworkmonth | integer | | |
iweek | integer | | |
sinputteamcd | character varying(20) | | |
sinputdutycd | character varying(20) | | |
smtoc | character varying(50) | | |
slineno | character varying(18) | | |
Indexes:
"qis_carpassedstation_pkey" PRIMARY KEY, btree (iid)
"q_carp_dworkdate" btree (dworkdate)
"q_carp_smtoc" btree (smtoc)
qis3_dp2=>
5、Execute SQL:
qis3_dp2=> EXPLAIN (analyze true,buffers true) SELECT COUNT(DISTINCT SVIN)
AS CHECKCARNUM ,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD
= 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD') group by SMTOC
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
GroupAggregate (cost=697738.61..714224.02 rows=372 width=30) (actual
time=5908.786..32420.412 rows=410 loops=1)
Group Key: smtoc
Buffers: shared hit=401 read=184983
I/O Timings: read=1377.762
-> Sort (cost=697738.61..703232.51 rows=2197559 width=40) (actual
time=5907.791..6139.351 rows=2142215 loops=1)
Sort Key: smtoc
Sort Method: quicksort Memory: 265665kB
Buffers: shared hit=401 read=184983
I/O Timings: read=1377.762
-> Gather (cost=1000.00..466253.56 rows=2197559 width=40) (actual
time=0.641..1934.614 rows=2142215 loops=1)
Workers Planned: 5
Workers Launched: 5
Buffers: shared hit=401 read=184983
I/O Timings: read=1377.762
-> Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..245497.66 rows=439512 width=40) (actual time=0.245..1940.527
rows=357036 loops=6)
Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, 'YYYY-MM-DD'::text)))
Rows Removed by Filter: 1551811
Buffers: shared hit=401 read=184983
I/O Timings: read=1377.762
Planning Time: 0.393 ms
Execution Time: 32439.704 ms
(21 rows)
qis3_dp2=>
6、Why does sort take a long time to execute and how can you optimize it?
Thanks!!!
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
1、DISTINCT response time is fast without being added qis3_dp2=# SELECT COUNT(*) AS CHECKCARNUM FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD'); checkcarnum ------------- 2142215 (1 row) *Time: 2237.970 ms (00:02.238)* qis3_dp2=# 2、 Adding a DISTINCT response time was very slow qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD'); checkcarnum ------------- 1071367 (1 row) *Time: 38979.246 ms (00:38.979)* -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
On Tue, 2 Apr 2019 at 20:00, tank.zhang <6220104@qq.com> wrote: > 2、 Adding a DISTINCT response time was very slow > > qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM > QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO > IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= > TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <= > TO_DATE('2019-03-11','YYYY-MM-DD'); > checkcarnum > ------------- > 1071367 > (1 row) That's because of how DISTINCT is implemented within an aggregate function in PostgreSQL. Internally within the aggregate code in the executor, a sort is performed on the entire input to the aggregate node. The planner is currently unable to make use of any indexes that provide pre-sorted input. One way to work around this would be to perform the DISTINCT and COUNT(*) in separate stages using a subquery. From your original query, something like: SELECT COUNT(SVIN) AS CHECKCARNUM,SMTOC FROM ( SELECT SMTOC,SVIN FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD') GROUP BY SMTOC,SVIN ) A GROUP BY SMTOC; An index something like: CREATE INDEX ON QIS_CARPASSEDSTATION (SMTOC, SVIN, SSTATIONCD, DWORKDATE); Should help speed up the subquery and provide pre-sorted input to the outer aggregate. If you like, you could add SLINENO to the end of the index to allow an index-only scan which may result in further performance improvements. Without the index, you're forced to sort, but at least it's just one sort instead of two. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Thank you replay! I tried to use the TMP table is very fast . thank you qis3_dp2=# explain analyze SELECT COUNT(*),SMTOC FROM ( SELECT DISTINCT(SVIN) AS CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD')) AS TEMP group by SMTOC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------- HashAggregate (cost=691386.41..691388.41 rows=200 width=30) (actual time=4090.951..4091.027 rows=410 loops=1) Group Key: a.smtoc -> HashAggregate (cost=666561.44..676491.43 rows=992999 width=40) (actual time=3481.712..3794.213 rows=1071367 loops=1) Group Key: a.svin, a.smtoc -> Gather (cost=1000.00..656098.93 rows=2092501 width=40) (actual time=0.657..1722.814 rows=2142215 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Seq Scan on qis_carpassedstation a (cost=0.00..445848.83 rows=523125 width=40) (actual time=65.187..2287.739 rows=428443 loops=5) Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND ((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >= to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da te('2019-03-11'::text, 'YYYY-MM-DD'::text))) Rows Removed by Filter: 1862173 Planning Time: 0.513 ms Execution Time: 4147.542 ms (12 rows) Time: 4148.852 ms (00:04.149) qis3_dp2=# qis3_dp2=# SELECT COUNT(*),SMTOC FROM ( SELECT DISTINCT(SVIN) AS CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD')) AS TEMP group by SMTOC; **Time: 3223.935 ms (00:03.224)** 2、 Before qis3_dp2=# explain analyze SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM ,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD') group by SMTOC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------- GroupAggregate (cost=875778.02..891475.55 rows=377 width=30) (actual time=6400.991..33314.132 rows=410 loops=1) Group Key: smtoc -> Sort (cost=875778.02..881009.28 rows=2092501 width=40) (actual time=6399.993..6626.151 rows=2142215 loops=1) Sort Key: smtoc Sort Method: quicksort Memory: 265665kB -> Gather (cost=1000.00..656098.93 rows=2092501 width=40) (actual time=0.557..2467.778 rows=2142215 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Seq Scan on qis_carpassedstation a (cost=0.00..445848.83 rows=523125 width=40) (actual time=66.908..2428.397 rows=428443 loops=5) Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND ((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >= to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da te('2019-03-11'::text, 'YYYY-MM-DD'::text))) Rows Removed by Filter: 1862173 Planning Time: 0.457 ms Execution Time: 33335.429 ms (13 rows) * Time: 33336.720 ms (00:33.337)* qis3_dp2=# -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html