Обсуждение: Query performance

Поиск

Список

Период

Сортировка

Query performance

От

Nagaraj Raj

Дата:

22 октября 2020 г., 03:32:29

Hi, I have long running query which running for long time and its planner always performing sequnce scan the table2.My gole is to reduce Read IO on the disk cause, this query runns more oftenly ( using this in funtion for ETL).

table1: transfer_order_header(records 2782678)

table2: transfer_order_item ( records: 15995697)

here is the query:

set work_mem = '688552kB';

explain (analyze,buffers)

select COALESCE(itm.serialnumber,'') AS SERIAL_NO,

COALESCE(itm.ITEM_SKU,'') AS SKU,

COALESCE(itm.receivingplant,'') AS RECEIVINGPLANT,

COALESCE(itm.STO_ID,'') AS STO,

supplyingplant,

COALESCE(itm.deliveryitem,'') AS DELIVERYITEM,

min(eventtime) as eventtime

FROM sor_t.transfer_order_header hed,sor_t.transfer_order_item itm

where hed.eventid=itm.eventid group by 1,2,3,4,5,6

Query Planner[2]:

"Finalize GroupAggregate (cost=1930380.06..4063262.11 rows=16004137 width=172) (actual time=56050.500..83268.566 rows=15891873 loops=1)" " Group Key: (COALESCE(itm.serialnumber, ''::character varying)), (COALESCE(itm.item_sku, ''::character varying)), (COALESCE(itm.receivingplant, ''::character varying)), (COALESCE(itm.sto_id, ''::character varying)), hed.supplyingplant, (COALESCE(itm.deliveryitem, ''::character varying))" " Buffers: shared hit=712191 read=3, temp read=38232 written=38233" " -> Gather Merge (cost=1930380.06..3669827.09 rows=13336780 width=172) (actual time=56050.488..77106.993 rows=15948520 loops=1)" " Workers Planned: 2" " Workers Launched: 2" " Buffers: shared hit=2213081 read=12, temp read=142840 written=142843" " -> Partial GroupAggregate (cost=1929380.04..2129431.74 rows=6668390 width=172) (actual time=50031.458..54888.828 rows=5316173 loops=3)" " Group Key: (COALESCE(itm.serialnumber, ''::character varying)), (COALESCE(itm.item_sku, ''::character varying)), (COALESCE(itm.receivingplant, ''::character varying)), (COALESCE(itm.sto_id, ''::character varying)), hed.supplyingplant, (COALESCE(itm.deliveryitem, ''::character varying))" " Buffers: shared hit=2213081 read=12, temp read=142840 written=142843" " -> Sort (cost=1929380.04..1946051.01 rows=6668390 width=172) (actual time=50031.446..52823.352 rows=5332010 loops=3)" " Sort Key: (COALESCE(itm.serialnumber, ''::character varying)), (COALESCE(itm.item_sku, ''::character varying)), (COALESCE(itm.receivingplant, ''::character varying)), (COALESCE(itm.sto_id, ''::character varying)), hed.supplyingplant, (COALESCE(itm.deliveryitem, ''::character varying))" " Sort Method: external merge Disk: 305856kB" " Worker 0: Sort Method: external merge Disk: 436816kB" " Worker 1: Sort Method: external merge Disk: 400048kB" " Buffers: shared hit=2213081 read=12, temp read=142840 written=142843" " -> Parallel Hash Join (cost=133229.66..603743.97 rows=6668390 width=172) (actual time=762.925..3901.133 rows=5332010 loops=3)" " Hash Cond: ((itm.eventid)::text = (hed.eventid)::text)" " Buffers: shared hit=2213027 read=12" " -> Parallel Seq Scan on transfer_order_item itm (cost=0.00..417722.90 rows=6668390 width=68) (actual time=0.005..524.359 rows=5332010 loops=3)" " Buffers: shared hit=351039" " -> Parallel Hash (cost=118545.68..118545.68 rows=1174718 width=35) (actual time=755.590..755.590 rows=926782 loops=3)" " Buckets: 4194304 Batches: 1 Memory Usage: 243808kB" " Buffers: shared hit=1861964 read=12" " -> Parallel Index Only Scan using transfer_order_header_eventid_supplyingplant_eventtime_idx1 on transfer_order_header hed (cost=0.56..118545.68 rows=1174718 width=35) (actual time=0.128..388.436 rows=926782 loops=3)" " Heap Fetches: 18322" " Buffers: shared hit=1861964 read=12" "Planning Time: 1.068 ms" "Execution Time: 84274.004 ms"

Tables[1] created ddls in dbfiddle.

PG Server: PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit.

RAM: 456

Mem Settings:

"maintenance_work_mem" "8563712" "kB"

"work_mem" "688552" "kB"

"wal_buffers" "2048" "8kB"

"shared_buffers" "44388442" "8kB"

Any suggestions would greatly appretiated.

Thanks,

Re: Query performance

От

Justin Pryzby

Дата:

22 октября 2020 г., 04:09:07

On Thu, Oct 22, 2020 at 12:32:29AM +0000, Nagaraj Raj wrote:
> Hi, I have long running query which running for long time and its planner always performing sequnce scan the
table2.Mygole is to reduce Read IO on the disk cause, this query runns more oftenly ( using this in funtion for ETL). 

> 
> table1: transfer_order_header(records 2782678)table2: transfer_order_item ( records: 15995697)here is the query:
> 
> set work_mem = '688552kB';explain (analyze,buffers)select     COALESCE(itm.serialnumber,'') AS SERIAL_NO,            
COALESCE(itm.ITEM_SKU,'')AS SKU,             COALESCE(itm.receivingplant,'') AS RECEIVINGPLANT, 
COALESCE(itm.STO_ID,'')AS STO, supplyingplant,            COALESCE(itm.deliveryitem,'') AS DELIVERYITEM,    
min(eventtime)as eventtime  FROM sor_t.transfer_order_header hed,sor_t.transfer_order_item itm  where
hed.eventid=itm.eventidgroup by 1,2,3,4,5,6

It spends most its time writing tempfiles for sorting, so it (still) seems to
be starved for work_mem.
|Sort (cost=1929380.04..1946051.01 rows=6668390 width=172) (actual time=50031.446..52823.352 rows=5332010 loops=3)

First, can you get a better plan with 2GB work_mem or with enable_sort=off ?  

If so, maybe you could make it less expensive by moving all the coalesce()
into a subquery, like
| SELECT COALESCE(a,''), COALESCE(b,''), .. FROM (SELECT a,b, .. GROUP BY 1,2,..)x;

Or, if you have a faster disks available, use them for temp_tablespace.

-- 
Justin

Re: Query performance

От

"David G. Johnston"

Дата:

22 октября 2020 г., 04:11:15

On Wed, Oct 21, 2020 at 5:32 PM Nagaraj Raj <nagaraj.sf@yahoo.com> wrote:

Hi, I have long running query which running for long time and its planner always performing sequnce scan the table2.

FROM sor_t.transfer_order_header hed,sor_t.transfer_order_item itm
where hed.eventid=itm.eventid group by 1,2,3,4,5,6

Any suggestions would greatly appretiated.

You aren't filtering out any rows so it is unsurprising that a sequential scan was chosen to fulfil the request that the entire detail table be consulted. The good news is you have access to parallelism - see if you can increase that factor.

Any other suggestions probably requires more knowledge of your problem domain than you've provided here.

Finding a way to add a where clause or compute your desired result during record insertion or updating are two other potential avenues of consideration.

David J.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Query performance

Query performance

Re: Query performance

Re: Query performance