Обсуждение: external sort performance

Поиск
Список
Период
Сортировка

external sort performance

От
Jon Nelson
Дата:
I have one query which does not run very often. Sometimes it may be
months between runs.
However, when it does get executed, it scans approximately 100
identically-structured tables (a form of partitioning), extracts and
groups on a subset of the columns, and creates a new table. The
individual table queries have no where clauses, this is a full table
scan for every table.

I've tried all sorts of things to try to improve the performance,
which can take a /very/ long time.
We are talking about approximately 175GB of data before grouping/summarizing.

This is on PG 8.4.8 on Linux, 16GB of "real" RAM.
Most recently, I enabled trace_sort, disabled hash aggregation[1], and
set a large work_mem (normally very small, in this case I tried
anything from 8MB to 256MB. I even tried 1GB and 2GB).

In the logs, I saw this:

external sort ended, 7708696 disk blocks used: CPU 359.84s/57504.66u
sec elapsed 58966.76 sec

Am I to understand that the CPU portion of the sorting only took 6
minutes but the sort itself took almost 16.5 hours and used approx
60GB of disk space?
The resulting summary table is about 5GB in size as reported by \d+ in
psql (and pg_relation_size).

The underlying storage is ext4 on a hardware raid 10 with a BBU.

What sorts of things should I be looking at to improve the performance
of this query? Is my interpretation of that log line totally off base?



[1] if I don't disable hash aggregation and the work_mem is over 8MB
in size, the memory allocation explodes to the point where postgresql
wants dozens of gigs of memory. I've tried setting the statistics as
high as 1000 without benefit.

--
Jon

Re: external sort performance

От
Claudio Freire
Дата:
On Thu, Nov 17, 2011 at 2:10 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
> What sorts of things should I be looking at to improve the performance
> of this query? Is my interpretation of that log line totally off base?

You'll have to post some more details.
Like a query and an explain/explain analyze.

Memory consumption probably skyrockets since you'll need at least one
sort per table, so if you have 100+, then that's (at least) 100+
sorts.
Without the explain output it's impossible to be sure.

Re: external sort performance

От
Craig James
Дата:
On 11/17/11 9:10 AM, Jon Nelson wrote:
> I have one query which does not run very often. Sometimes it may be
> months between runs.
> However, when it does get executed, it scans approximately 100
> identically-structured tables (a form of partitioning), extracts and
> groups on a subset of the columns, and creates a new table. The
> individual table queries have no where clauses, this is a full table
> scan for every table.
>
> I've tried all sorts of things to try to improve the performance,
> which can take a /very/ long time.
> We are talking about approximately 175GB of data before grouping/summarizing.
>
> This is on PG 8.4.8 on Linux, 16GB of "real" RAM.
> Most recently, I enabled trace_sort, disabled hash aggregation[1], and
> set a large work_mem (normally very small, in this case I tried
> anything from 8MB to 256MB. I even tried 1GB and 2GB).
>
> In the logs, I saw this:
>
> external sort ended, 7708696 disk blocks used: CPU 359.84s/57504.66u
> sec elapsed 58966.76 sec
>
> Am I to understand that the CPU portion of the sorting only took 6
> minutes but the sort itself took almost 16.5 hours and used approx
> 60GB of disk space?
> The resulting summary table is about 5GB in size as reported by \d+ in
> psql (and pg_relation_size).
>
> The underlying storage is ext4 on a hardware raid 10 with a BBU.
>
> What sorts of things should I be looking at to improve the performance
> of this query? Is my interpretation of that log line totally off base?
You don't give any details about how and why you are sorting. Are you actually using all of the columns in your
aggregated-datatable in the sort operation?  Or just a few of them? 

You're making the sort operation work with 175 GB of data.  If most of that data is only needed for the report (not the
sort),then separate it into two tables - one of just the data that the sorting/grouping needs, and the other with the
restof the data. Then create a view that joins it all back together for reporting purposes. 

Craig

Re: external sort performance

От
Tom Lane
Дата:
Jon Nelson <jnelson+pgsql@jamponi.net> writes:
> This is on PG 8.4.8 on Linux, 16GB of "real" RAM.
> Most recently, I enabled trace_sort, disabled hash aggregation[1], and
> set a large work_mem (normally very small, in this case I tried
> anything from 8MB to 256MB. I even tried 1GB and 2GB).

FWIW, I think hash aggregation is your best shot at getting reasonable
performance.  Sorting 175GB of data is going to hurt no matter what.

If the grouped table amounts to 5GB, I wouldn't have expected the hash
table to be more than maybe 2-3X that size (although this does depend on
what aggregates you're running...).  Letting the hash aggregation have
all your RAM might be the best answer.

            regards, tom lane

Re: external sort performance

От
Jon Nelson
Дата:
I'll try to compile multiple questions/answers into a single response.

On Thu, Nov 17, 2011 at 11:16 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Thu, Nov 17, 2011 at 2:10 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
>> What sorts of things should I be looking at to improve the performance
>> of this query? Is my interpretation of that log line totally off base?

> You'll have to post some more details.
> Like a query and an explain/explain analyze.

Please see below, however, I am also very interested to know if I'm
interpreting that log line correctly.

> Memory consumption probably skyrockets since you'll need at least one
> sort per table, so if you have 100+, then that's (at least) 100+
> sorts.

Right, that much I had understood.


On Thu, Nov 17, 2011 at 11:28 AM, Craig James
<craig_james@emolecules.com> wrote:
> You don't give any details about how and why you are sorting. Are you
> actually using all of the columns in your aggregated-data table in the sort
> operation?  Or just a few of them?

> You're making the sort operation work with 175 GB of data.  If most of that
> data is only needed for the report (not the sort), then separate it into two
> tables - one of just the data that the sorting/grouping needs, and the other
> with the rest of the data. Then create a view that joins it all back
> together for reporting purposes.

I'm not actually using any ORDER BY at all. This is purely a GROUP BY.
The sort happens because of the group aggregate (vs. hash aggregate).
Two of the columns are used to group, the other two are aggregates (SUM).

On Thu, Nov 17, 2011 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jon Nelson <jnelson+pgsql@jamponi.net> writes:
>> This is on PG 8.4.8 on Linux, 16GB of "real" RAM.
>> Most recently, I enabled trace_sort, disabled hash aggregation[1], and
>> set a large work_mem (normally very small, in this case I tried
>> anything from 8MB to 256MB. I even tried 1GB and 2GB).
>
> FWIW, I think hash aggregation is your best shot at getting reasonable
> performance.  Sorting 175GB of data is going to hurt no matter what.

> If the grouped table amounts to 5GB, I wouldn't have expected the hash
> table to be more than maybe 2-3X that size (although this does depend on
> what aggregates you're running...).  Letting the hash aggregation have
> all your RAM might be the best answer.

I'm re-running the query with work_mem set to 16GB (for just that query).

The query (with table and column names changed):

SELECT anon_1.columnA, sum(anon_1.columnB) AS columnB,
sum(anon_1.columnC) AS columnC, anon_1.columnD
FROM (
  SELECT columnA, columnB, columnC, columnD FROM tableA
  UNION ALL
  .... same select/union all pattern but from 90-ish other tables
) AS anon_1
GROUP BY anon_1.columnA, anon_1.columnD
HAVING (anon_1.columnB) > 0

The explain verbose with work_mem = 16GB

 HashAggregate  (cost=54692162.83..54692962.83 rows=40000 width=28)
   Output: columnA, sum(columnB), sum(columnC), columnD
   Filter: (sum(columnB) > 0)
   ->  Append  (cost=0.00..34547648.48 rows=1611561148 width=28)
         ->  Seq Scan on tableA  (cost=0.00..407904.40 rows=19045540 width=28)
               Output: columnA, columnB, columnC, columnD
         .... 90-ish more tables here

12 minutes into the query it is consuming 10.1GB of memory.
21 minutes into the query it is consuming 12.9GB of memory.
After just under 34 minutes it completed with about 15GB of memory being used.
That is a rather impressive improvement. Previously, I had been
advised against using a large work_mem value. I had never thought to
use one 3 times the size of the resulting table.

The explain verbose with enable_hashagg = false:

 GroupAggregate  (cost=319560040.24..343734257.46 rows=40000 width=28)
   Output: columnA, sum(columnB), sum(columnC), columnD
   Filter: (sum(columnB) > 0)
   ->  Sort  (cost=319560040.24..323588943.11 rows=1611561148 width=28)
         Output: columnA, columnB, columnC, columnD
         Sort Key: columnA, columnD
         ->  Result  (cost=0.00..34547648.48 rows=1611561148 width=28)
               Output: columnA, columnB, columnC, columnD
               ->  Append  (cost=0.00..34547648.48 rows=1611561148 width=28)
                     ->  Seq Scan on tableA  (cost=0.00..407904.40
rows=19045540 width=28)
                         Output: columnA, columnB, columnC, columnD
                     .... 90-ish more tables here



--
Jon

Re: external sort performance

От
Jon Nelson
Дата:
A follow-up question.
Even with both work_mem and maintenance_work_mem equal to 16GB, I see this:

LOG:  00000: begin index sort: unique = f, workMem = 16777216, randomAccess = f
and shortly thereafter:
LOG:  00000: switching to external sort with 59919 tapes: CPU
2.59s/13.20u sec elapsed 16.85 sec
and a while later:
LOG:  00000: finished writing run 1 to tape 0: CPU 8.16s/421.45u sec
elapsed 433.83 sec
LOG:  00000: performsort done (except 2-way final merge): CPU
9.53s/561.56u sec elapsed 576.54 sec
LOG:  00000: external sort ended, 181837 disk blocks used: CPU
12.90s/600.45u sec elapsed 625.05 sec


The first log statement is expected. The second log statement, however, isn't.
The total table size is (as noted earlier) about 5GB and, in fact, fit
into one nice hash table (approx 15GB in size).
Is the sorting that is necessary for index creation unable to use a
hash table? (This is a standard btree index).

--
Jon

Re: external sort performance

От
Tom Lane
Дата:
Jon Nelson <jnelson+pgsql@jamponi.net> writes:
> Is the sorting that is necessary for index creation unable to use a
> hash table? (This is a standard btree index).

Hash aggregation isn't sorting --- it's only useful for grouping.

            regards, tom lane

Re: external sort performance

От
Josh Berkus
Дата:
> The first log statement is expected. The second log statement, however, isn't.
> The total table size is (as noted earlier) about 5GB and, in fact, fit
> into one nice hash table (approx 15GB in size).
> Is the sorting that is necessary for index creation unable to use a
> hash table? (This is a standard btree index).

How big is the source table?  You're not sorting the *result* table,
you're sorting the source table if you're summarizing it.

If the original source data is only 5GB, I'd check your code for a
cartesian join.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: external sort performance

От
Jeremy Harris
Дата:
On 2011-11-17 17:10, Jon Nelson wrote:
> external sort ended, 7708696 disk blocks used: CPU 359.84s/57504.66u
> sec elapsed 58966.76 sec
>
> Am I to understand that the CPU portion of the sorting only took 6
> minutes but the sort itself took almost 16.5 hours and used approx
> 60GB of disk space?


I realise you've had helpful answers by now, but.... that reads
as 16 hours of cpu time to me; mostly user-mode but with 6 minute
of system-mode.  98% cpu usage for the 16 hours elapsed.

--
Jeremy

Re: external sort performance

От
Jon Nelson
Дата:
On Sun, Nov 20, 2011 at 7:56 AM, Jeremy Harris <jgh@wizmail.org> wrote:
> On 2011-11-17 17:10, Jon Nelson wrote:
>>
>> external sort ended, 7708696 disk blocks used: CPU 359.84s/57504.66u
>> sec elapsed 58966.76 sec
>>
>> Am I to understand that the CPU portion of the sorting only took 6
>> minutes but the sort itself took almost 16.5 hours and used approx
>> 60GB of disk space?
>
>
> I realise you've had helpful answers by now, but.... that reads
> as 16 hours of cpu time to me; mostly user-mode but with 6 minute
> of system-mode.  98% cpu usage for the 16 hours elapsed.

Thank you very much!
I was going to post a followup asking for help interpreting the log
line, but now I don't have to. Do you happen to recall if disk I/O is
counted as user or system time? If it's counted as system time, then I
have more questions, namely:

If using a hash table (hash aggregation) shows that the GROUPing can
take place in 35 minutes, but a Group Aggregation takes 16 hours, how
much of that is CPU and how much is waiting for I/O?


--
Jon

Re: external sort performance

От
Jeremy Harris
Дата:
On 2011-11-20 15:00, Jon Nelson wrote:
>  Do you happen to recall if disk I/O is
> counted as user or system time?

Neither, traditionally.  Those times are cpu times;
they only account for what the cpu was doing.
The disks could be working in parallel as a result
of cpu actions, and probably were - but the software
found work to do for the cpu.   You'd want to be
looking at iostat during the run to see how busy the
disks were.

As to why it takes 16 hours cpu to do the external
version but only 34 minutes for internal - some of that
will be down to data-shuffling in- and out- of disk files
which is nonetheless accounted to user-mode cpu time,
but some will be the plain inefficiency of the external
version having to effectively do work over many times
because it can't have a complete view of the problem at
hand at any one time.

--
Jeremy