Обсуждение: Postgresql Sort cost Poor performance?

Поиск

Список

Период

Сортировка

Postgresql Sort cost Poor performance?

От

"tank.zhang"

Дата:

01 апреля 2019 г., 12:45:14

1、postgresql version

qis3_dp2=> select * from version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

qis3_dp2=>

2、postgresql work_mem


qis3_dp2=> SHOW work_mem;
 work_mem
----------
 2GB
(1 row)

qis3_dp2=> SHOW  shared_buffers;
 shared_buffers
----------------
 4028MB
(1 row)

qis3_dp2=>

3、Table count

qis3_dp2=> select count(*) from  QIS_CARPASSEDSTATION;
  count
----------
 11453079
(1 row)

qis3_dp2=>

4、table desc

qis3_dp2=> \dS QIS_CARPASSEDSTATION;
                   Table "qis_schema.qis_carpassedstation"
    Column    |            Type             | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+---------
 iid          | integer                     |           | not null |
 scartypecd   | character varying(50)       |           |          |
 svin         | character varying(20)       |           |          |
 sstationcd   | character varying(50)       |           |          |
 dpassedtime  | timestamp(6) with time zone |           |          |
 dworkdate    | date                        |           |          |
 iworkyear    | integer                     |           |          |
 iworkmonth   | integer                     |           |          |
 iweek        | integer                     |           |          |
 sinputteamcd | character varying(20)       |           |          |
 sinputdutycd | character varying(20)       |           |          |
 smtoc        | character varying(50)       |           |          |
 slineno      | character varying(18)       |           |          |
Indexes:
    "qis_carpassedstation_pkey" PRIMARY KEY, btree (iid)
    "q_carp_dworkdate" btree (dworkdate)
    "q_carp_smtoc" btree (smtoc)

qis3_dp2=>

5、Execute SQL:
qis3_dp2=> EXPLAIN (analyze true,buffers true)   SELECT COUNT(DISTINCT SVIN)
AS CHECKCARNUM ,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD
= 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD') group by  SMTOC
;

 
QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
 GroupAggregate  (cost=697738.61..714224.02 rows=372 width=30) (actual
time=5908.786..32420.412 rows=410 loops=1)
   Group Key: smtoc
   Buffers: shared hit=401 read=184983
   I/O Timings: read=1377.762
   ->  Sort  (cost=697738.61..703232.51 rows=2197559 width=40) (actual
time=5907.791..6139.351 rows=2142215 loops=1)
         Sort Key: smtoc
         Sort Method: quicksort  Memory: 265665kB
         Buffers: shared hit=401 read=184983
         I/O Timings: read=1377.762
         ->  Gather  (cost=1000.00..466253.56 rows=2197559 width=40) (actual
time=0.641..1934.614 rows=2142215 loops=1)
               Workers Planned: 5
               Workers Launched: 5
               Buffers: shared hit=401 read=184983
               I/O Timings: read=1377.762
               ->  Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..245497.66 rows=439512 width=40) (actual time=0.245..1940.527
rows=357036 loops=6)
                     Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, 'YYYY-MM-DD'::text)))
                     Rows Removed by Filter: 1551811
                     Buffers: shared hit=401 read=184983
                     I/O Timings: read=1377.762
 Planning Time: 0.393 ms
 Execution Time: 32439.704 ms
(21 rows)

qis3_dp2=>


6、Why does sort take a long time to execute and how can you optimize it?
Thanks!!!






--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html

Re: Postgresql Sort cost Poor performance?

От

Andrew Gierth

Дата:

01 апреля 2019 г., 13:20:48

>>>>> "tank" == tank zhang <6220104@qq.com> writes:

 tank>  smtoc        | character varying(50)       |           |          | 
 tank>          Sort Key: smtoc

What is the output of  SHOW lc_collate;

One of the most common reasons for slow sorting is that you're sorting a
text/varchar field in a locale other than C. The slowdown for using
other locales varies according to the data, the locale, and the
operating system, but 8-20x slowdowns are very common, 50-100x slowdowns
are not unusual, and there have been reports of even worse cases with
unusual script combinations.

-- 
Andrew (irc:RhodiumToad)

Re: Postgresql Sort cost Poor performance?

От

"tank.zhang"

Дата:

01 апреля 2019 г., 13:30:09

Thank you for your reply. 

qis3_dp2=>  SHOW lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

Time: 0.311 ms
qis3_dp2=> 

qis3_dp2=> SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM ,SMTOC FROM
QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND
A.SLINYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD')
group by  SMTOC
;
 checkcarnum |         smtoc         
-------------+-----------------------
          90 | HT6LHD700 NH731P    A
         690 | HT6LHD700 NH788P    A
          90 | HT6LHD700 R550P     A
          30 | HT6LHD700 YR615M    A
        1141 | HT6MHB700 NH731P    A



Is there any possibility of optimization?





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html

Re: Postgresql Sort cost Poor performance?

От

Gavin Flower

Дата:

01 апреля 2019 г., 13:32:40

On 01/04/2019 23:20, Andrew Gierth wrote:
>>>>>> "tank" == tank zhang <6220104@qq.com> writes:
>   tank>  smtoc        | character varying(50)       |           |          |
>   tank>          Sort Key: smtoc
>
> What is the output of  SHOW lc_collate;
>
> One of the most common reasons for slow sorting is that you're sorting a
> text/varchar field in a locale other than C. The slowdown for using
> other locales varies according to the data, the locale, and the
> operating system, but 8-20x slowdowns are very common, 50-100x slowdowns
> are not unusual, and there have been reports of even worse cases with
> unusual script combinations.
>
Just wondering...

Would it be possible to optionally enable the system to create a hidden 
system column for the text field to be sorted, the new column would be 
the original column preprocessed to sort correctly & efficiently.  This 
would seem to lead to a massive improvement in performance.

Depending relative tradeoffs disk storage vs processing:
(A) create hidden system column for each sort invocation
(B) at table creation
(C) other possibilities

(A) could be done automatically, and possibly controlled via a GUC parameter
(B) might require a change to the CREATE TABLE syntax

Anyhow, just some thoughts...


Cheers,
Gavin

Re: Postgresql Sort cost Poor performance?

От

Christoph Berg

Дата:

01 апреля 2019 г., 13:54:35

Re: Gavin Flower 2019-04-01 <3113f2f4-a4da-4862-8596-2e189398c988@archidevsys.co.nz>
> Would it be possible to optionally enable the system to create a hidden
> system column for the text field to be sorted, the new column would be the
> original column preprocessed to sort correctly & efficiently.

That's the idea behind the strxfrm(3) optimization that ultimately got
disabled again for non-C locales because glibc fails to implement it
correctly.

Christoph

Re: Postgresql Sort cost Poor performance?

От

Ramón Bastidas

Дата:

02 апреля 2019 г., 08:25:54

Hi,

If your problem is the sort, try creating an index on the Field that you consider thst could be needed (you can star with smtoc that is the one you are grouping and sorting)

Another thing that i noticed is your work_mem, I thing is too high for a global config (if you think 2gb can hel for this operation you can set it before execute the query but only for that session), but generally this value most be smaller depending on the commons query every sub query uses that amount of mem (i.e if you have a query that have 3 subqueries and each one with a sort operation and a grouping operation, you can be using 12 gb of mem in that only big query, and it doesn't mean it will be faster).. try to monitor the uses of ram by pgsql maybe you can be suffering paging problems because os the size of you work_mem and that make the dbms slow too

On Mon, Apr 1, 2019, 6:45 AM tank.zhang <6220104@qq.com> wrote:

1、postgresql version

qis3_dp2=> select * from version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

qis3_dp2=>

2、postgresql work_mem

qis3_dp2=> SHOW work_mem;
work_mem
----------
2GB
(1 row)

qis3_dp2=> SHOW shared_buffers;
shared_buffers
----------------
4028MB
(1 row)

qis3_dp2=>

3、Table count

qis3_dp2=> select count(*) from QIS_CARPASSEDSTATION;
count
----------
11453079
(1 row)

qis3_dp2=>

4、table desc

qis3_dp2=> \dS QIS_CARPASSEDSTATION;
Table "qis_schema.qis_carpassedstation"
Column | Type | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+---------
iid | integer | | not null |
scartypecd | character varying(50) | | |
svin | character varying(20) | | |
sstationcd | character varying(50) | | |
dpassedtime | timestamp(6) with time zone | | |
dworkdate | date | | |
iworkyear | integer | | |
iworkmonth | integer | | |
iweek | integer | | |
sinputteamcd | character varying(20) | | |
sinputdutycd | character varying(20) | | |
smtoc | character varying(50) | | |
slineno | character varying(18) | | |
Indexes:
"qis_carpassedstation_pkey" PRIMARY KEY, btree (iid)
"q_carp_dworkdate" btree (dworkdate)
"q_carp_smtoc" btree (smtoc)

qis3_dp2=>

5、Execute SQL:
qis3_dp2=> EXPLAIN (analyze true,buffers true) SELECT COUNT(DISTINCT SVIN)
AS CHECKCARNUM ,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD
= 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD') group by SMTOC
;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
GroupAggregate (cost=697738.61..714224.02 rows=372 width=30) (actual
time=5908.786..32420.412 rows=410 loops=1)
Group Key: smtoc
Buffers: shared hit=401 read=184983
I/O Timings: read=1377.762
-> Sort (cost=697738.61..703232.51 rows=2197559 width=40) (actual
time=5907.791..6139.351 rows=2142215 loops=1)
Sort Key: smtoc
Sort Method: quicksort Memory: 265665kB
Buffers: shared hit=401 read=184983
I/O Timings: read=1377.762
-> Gather (cost=1000.00..466253.56 rows=2197559 width=40) (actual
time=0.641..1934.614 rows=2142215 loops=1)
Workers Planned: 5
Workers Launched: 5
Buffers: shared hit=401 read=184983
I/O Timings: read=1377.762
-> Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..245497.66 rows=439512 width=40) (actual time=0.245..1940.527
rows=357036 loops=6)
Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, 'YYYY-MM-DD'::text)))
Rows Removed by Filter: 1551811
Buffers: shared hit=401 read=184983
I/O Timings: read=1377.762
Planning Time: 0.393 ms
Execution Time: 32439.704 ms
(21 rows)

qis3_dp2=>

6、Why does sort take a long time to execute and how can you optimize it?
Thanks!!!

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html

Re: Postgresql Sort cost Poor performance?

От

"tank.zhang"

Дата:

02 апреля 2019 г., 10:00:09

1、DISTINCT response time is fast without being added

qis3_dp2=# SELECT COUNT(*) AS CHECKCARNUM FROM QIS_CARPASSEDSTATION A WHERE
1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND
A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD');
 checkcarnum
-------------
     2142215
(1 row)

*Time: 2237.970 ms (00:02.238)*
qis3_dp2=#

2、 Adding a DISTINCT response time was very slow

qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM
QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO
IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD');
 checkcarnum
-------------
     1071367
(1 row)

*Time: 38979.246 ms (00:38.979)*





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html

Re: Postgresql Sort cost Poor performance?

От

David Rowley

Дата:

02 апреля 2019 г., 10:42:07

On Tue, 2 Apr 2019 at 20:00, tank.zhang <6220104@qq.com> wrote:
> 2、 Adding a DISTINCT response time was very slow
>
> qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM
> QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO
> IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
> TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
> TO_DATE('2019-03-11','YYYY-MM-DD');
>  checkcarnum
> -------------
>      1071367
> (1 row)

That's because of how DISTINCT is implemented within an aggregate
function in PostgreSQL. Internally within the aggregate code in the
executor, a sort is performed on the entire input to the aggregate
node.  The planner is currently unable to make use of any indexes that
provide pre-sorted input.

One way to work around this would be to perform the DISTINCT and
COUNT(*) in separate stages using a subquery.

From your original query, something like:

SELECT COUNT(SVIN) AS CHECKCARNUM,SMTOC
FROM (
SELECT SMTOC,SVIN
FROM QIS_CARPASSEDSTATION A
WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG'
AND A.SLINENO IN ( '1F' , '2F' , '3F' )
AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD')
AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD')
GROUP BY  SMTOC,SVIN
) A GROUP BY SMTOC;

An index something like:
CREATE INDEX ON QIS_CARPASSEDSTATION (SMTOC, SVIN, SSTATIONCD, DWORKDATE);

Should help speed up the subquery and provide pre-sorted input to the
outer aggregate.  If you like, you could add SLINENO to the end of the
index to allow an index-only scan which may result in further
performance improvements.

Without the index, you're forced to sort, but at least it's just one
sort instead of two.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgresql Sort cost Poor performance?

От

"tank.zhang"

Дата:

02 апреля 2019 г., 11:23:50

Thank you  replay!

I tried to use the TMP table is very fast .    thank you


qis3_dp2=# explain analyze SELECT COUNT(*),SMTOC FROM ( SELECT
DISTINCT(SVIN) AS CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1
AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND
A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD')) AS TEMP  group by SMTOC;

 
QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
 HashAggregate  (cost=691386.41..691388.41 rows=200 width=30) (actual
time=4090.951..4091.027 rows=410 loops=1)
   Group Key: a.smtoc
   ->  HashAggregate  (cost=666561.44..676491.43 rows=992999 width=40)
(actual time=3481.712..3794.213 rows=1071367 loops=1)
         Group Key: a.svin, a.smtoc
         ->  Gather  (cost=1000.00..656098.93 rows=2092501 width=40) (actual
time=0.657..1722.814 rows=2142215 loops=1)
               Workers Planned: 4
               Workers Launched: 4
               ->  Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..445848.83 rows=523125 width=40) (actual time=65.187..2287.739
rows=428443 loops=5)
                     Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, 'YYYY-MM-DD'::text)))
                     Rows Removed by Filter: 1862173
 Planning Time: 0.513 ms
 Execution Time: 4147.542 ms
(12 rows)

Time: 4148.852 ms (00:04.149)
qis3_dp2=#


qis3_dp2=# SELECT COUNT(*),SMTOC FROM ( SELECT DISTINCT(SVIN) AS
CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD =
'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD')) AS TEMP  group by SMTOC;

**Time: 3223.935 ms (00:03.224)**


2、 Before

qis3_dp2=# explain analyze SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM ,SMTOC
FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND
A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD') group by  SMTOC;

 
QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
 GroupAggregate  (cost=875778.02..891475.55 rows=377 width=30) (actual
time=6400.991..33314.132 rows=410 loops=1)
   Group Key: smtoc
   ->  Sort  (cost=875778.02..881009.28 rows=2092501 width=40) (actual
time=6399.993..6626.151 rows=2142215 loops=1)
         Sort Key: smtoc
         Sort Method: quicksort  Memory: 265665kB
         ->  Gather  (cost=1000.00..656098.93 rows=2092501 width=40) (actual
time=0.557..2467.778 rows=2142215 loops=1)
               Workers Planned: 4
               Workers Launched: 4
               ->  Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..445848.83 rows=523125 width=40) (actual time=66.908..2428.397
rows=428443 loops=5)
                     Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, 'YYYY-MM-DD'::text)))
                     Rows Removed by Filter: 1862173
 Planning Time: 0.457 ms
 Execution Time: 33335.429 ms
(13 rows)
*
Time: 33336.720 ms (00:33.337)*
qis3_dp2=#











--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Postgresql Sort cost Poor performance?