Обсуждение: Sort Method: external merge
Hi,
I have several queries in *single* transaction and I want to figure
out reasonable work_mem value. Here is the excerpt from "explain plan"
-- each query has two sorts:
1) Sort Method: quicksort Memory: 6 324kB
Sort Method: quicksort Memory: 1 932 134kB
2) Sort Method: quicksort Memory: 28 806kB
Sort Method: quicksort Memory: 977 183kB
3) Sort Method: quicksort Memory: 103 397kB
Sort Method: external merge Disk: 3 105 728kB
4) Sort Method: quicksort Memory: 12 760kB
Sort Method: quicksort Memory: 3 704 460kB
5) Sort Method: quicksort Memory: 84 862kB
Sort Method: external merge Disk: 3 593 120kB
6) Sort Method: quicksort Memory: 4 828kB
Sort Method: quicksort Memory: 112 472kB
7) Sort Method: quicksort Memory: 1 490kB
Sort Method: quicksort Memory: 81 066kB
8) Sort Method: quicksort Memory: 78174kB
Sort Method: quicksort Memory: 2 579 739kB
9) Sort Method: quicksort Memory: 101 717kB
Sort Method: quicksort Memory: 2 913 709kB
work_mem is set to 4 000 000 kb and I do not understand why few
queries (3 and 5) used disk and the rest fit were able to data into
memory. Why disk was used and subsequent query was able to sort bigger
data set in the memory (see 3 and 4)? The box has 90GB RAM and no
other queries run during that time.
PG version is: PostgreSQL 8.4.7 on x86_64-redhat-linux-gnu, compiled
by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit
Thanks,
--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)
=?UTF-8?Q?Ondrej_Ivani=C4=8D?= <ondrej.ivanic@gmail.com> writes:
> work_mem is set to 4 000 000 kb and I do not understand why few
> queries (3 and 5) used disk and the rest fit were able to data into
> memory.
The on-disk representation of sort data is quite a bit more compact than
the in-memory representation. So where it says that 3.5GB of disk were
needed, it's not a surprise that the memory requirement would have
exceeded 4GB to do the sort in-memory.
If you want to know what the conversion factor is for your particular
dataset, try doing the query with small and large work_mem so you can
see what is reported as the amount of space needed each way.
regards, tom lane
Hi, On 26 August 2011 00:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > =?UTF-8?Q?Ondrej_Ivani=C4=8D?= <ondrej.ivanic@gmail.com> writes: >> work_mem is set to 4 000 000 kb and I do not understand why few >> queries (3 and 5) used disk and the rest fit were able to data into >> memory. > > The on-disk representation of sort data is quite a bit more compact than > the in-memory representation. So where it says that 3.5GB of disk were > needed, it's not a surprise that the memory requirement would have > exceeded 4GB to do the sort in-memory. I managed to do in memory shorting by setting work_mem to 9GB. Memory usage peeked around 6.5GB. The idea behind this exercise was to see if query could perform better: - original query: around 8 hours - parallelised query over 4 connections: default work_mem (=256M): 110 minutes work_mem = 4G: 99 minutes work_mem = 9G: 97 minutes parallelised query: original query is "group by" style query and one of the group by attributes has around 40 unique values. I replaced original query by 40 queries (with different value for that attribute) and execute them over 4 connections. > If you want to know what the conversion factor is for your particular > dataset, try doing the query with small and large work_mem so you can > see what is reported as the amount of space needed each way. Looks like that disk representation is half of memory usage in my case. Anyway, the test showed that work_mem is "irrelevant" in my case (disk is fast SSD PCIE card) Thanks, -- Ondrej Ivanic (ondrej.ivanic@gmail.com)