Slow query - lots of temporary files.

Поиск
Список
Период
Сортировка
От Johann Spies
Тема Slow query - lots of temporary files.
Дата
Msg-id CAGZ55DSB+9m4VxeptJnQOJYkcYpduJkpa5z24Tv3xKOgvdN3pg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Slow query - lots of temporary files.  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-performance
I have stopped this query after about 16 hours.  At the same time I ran a 'explain analyze' on the same query to find out why it took so long.  These two processes generated temporary files of 173GB in /var/lib/postgresql/9.4/main/base/pgsql_tmp.

COPY (SELECT A.ut,         B.go AS funding_org,         B.gn AS grant_no,         C.gt AS thanks,         D.au  FROM isi.funding_text C,       isi.rauthor D,       isi.africa_uts A  LEFT JOIN isi.funding_org B ON (B.ut = A.ut)  WHERE (C.ut IS NOT NULL         OR B.ut IS NOT NULL)    AND D.rart_id = C.ut    AND C.ut = B.ut  GROUP BY A.ut,           GO,           gn,           gt,           au  ORDER BY funding_org) TO '/tmp/africafunding2.csv' WITH csv quote '"' DELIMITER ',';


A modified version of this query finished in 1min 27 sek:

COPY
(SELECT 'UT'||A.ut,
B.go AS funding_org,
B.gn AS grant_no,
C.gt AS thanks
FROM isi.africa_uts A
LEFT JOIN isi.funding_org B ON (B.ut = A.ut)
LEFT JOIN isi.funding_text C ON (A.ut = C.ut)
WHERE (C.ut IS NOT NULL
OR B.ut IS NOT NULL)
GROUP BY A.ut,
GO,
gn,
gt) TO '/tmp/africafunding.csv' WITH csv quote '"' DELIMITER ',';
As I said, the process of 'explain analyze' of the problematic query contributed to the 173GB 
temporary files and did not finish in about 16 hours.

Just explain of the query part produces this:

"Sort (cost=4781458203.46..4798118612.44 rows=6664163593 width=390)"
" Output: a.ut, b.go, b.gn, c.gt, (array_to_string(array_agg(d.au), ';'::text)), b.go, b.gn, d.au"
" Sort Key: b.go"
" -> GroupAggregate (cost=2293037801.73..2509623118.51 rows=6664163593 width=390)"
" Output: a.ut, b.go, b.gn, c.gt, array_to_string(array_agg(d.au), ';'::text), b.go, b.gn, d.au"
" Group Key: a.ut, b.go, b.gn, c.gt, d.au"
" -> Sort (cost=2293037801.73..2309698210.72 rows=6664163593 width=390)"
" Output: a.ut, c.gt, b.go, b.gn, d.au"
" Sort Key: a.ut, b.go, b.gn, c.gt, d.au"
" -> Merge Join (cost=4384310.92..21202716.78 rows=6664163593 width=390)"
" Output: a.ut, c.gt, b.go, b.gn, d.au"
" Merge Cond: ((c.ut)::text = (d.rart_id)::text)"
" -> Merge Join (cost=635890.84..1675389.41 rows=6069238 width=412)"
" Output: c.gt, c.ut, a.ut, b.go, b.gn, b.ut"
" Merge Cond: ((c.ut)::text = (b.ut)::text)"
" Join Filter: ((c.ut IS NOT NULL) OR (b.ut IS NOT NULL))"
" -> Merge Join (cost=635476.30..675071.77 rows=1150354 width=348)"
" Output: c.gt, c.ut, a.ut"
" Merge Cond: ((a.ut)::text = (c.ut)::text)"
" -> Index Only Scan using africa_ut_idx on isi.africa_uts a (cost=0.42..19130.19 rows=628918 width=16)"
" Output: a.ut"
" -> Sort (cost=632211.00..640735.23 rows=3409691 width=332)"
" Output: c.gt, c.ut"
" Sort Key: c.ut"
" -> Seq Scan on isi.funding_text c (cost=0.00..262238.91 rows=3409691 width=332)"
" Output: c.gt, c.ut"
" -> Index Scan using funding_org_ut_idx on isi.funding_org b (cost=0.56..912582.50 rows=9835492 width=64)"
" Output: b.id, b.ut, b.go, b.gn"
" -> Materialize (cost=0.57..17914892.46 rows=159086560 width=26)"
" Output: d.id, d.rart_id, d.au, d.ro, d.ln, d.af, d.ras, d.ad, d.aa, d.em, d.ag, d.tsv"
" -> Index Scan using rauthor_rart_id_idx on isi.rauthor d (cost=0.57..17517176.06 rows=159086560 width=26)"
" Output: d.id, d.rart_id, d.au, d.ro, d.ln, d.af, d.ras, d.ad, d.aa, d.em, d.ag, d.tsv"

Any idea on why adding the rauthor table in the query is so problematic?

My systerm:

768 GB RAM
shared_ buffers:  32GB
work_mem:  4608MB

Regards
Johann

--
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)

В списке pgsql-performance по дате отправления:

Предыдущее
От: Claudio Freire
Дата:
Сообщение: Re: How to reduce writing on disk ? (90 gb on pgsql_tmp)
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: Slow query - lots of temporary files.