Re: Caching (was Re: choosing the right platform)

Поиск
Список
Период
Сортировка
От Ron Mayer
Тема Re: Caching (was Re: choosing the right platform)
Дата
Msg-id POEDIPIPKGJJLDNIEMBEOEAACJAA.ron@intervideo.com
обсуждение исходный текст
Ответ на Re: Caching (was Re: choosing the right platform)  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: Caching (was Re: choosing the right platform)  ("scott.marlowe" <scott.marlowe@ihs.com>)
Re: Caching (was Re: choosing the right platform)  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-performance
Short summary...

  I think sort_mem matters quite a bit (20-40%) on
  my data-warehousing applications.

  Am I doing something wrong to need so much sort_mem?

Josh wrote:
>> ** It's better to not use huge amount of sort-mem...
>
>...However, I have never seen a database use more than 32mb sort mem
>in a single process, so I don't think the 2GB limit will hurt you much ...

Do you think this is true in data warehousing applications as well?

During the ETL part of data warehousing, large sorts are often
used to get the "new" values that need to be inserted
into "dimension" tables, like this:
  INSERT INTO dimension_val (id,val)
       SELECT nextval('val_seq'),val
         FROM (SELECT DISTINCT val FROM import_table
               EXCEPT
               SELECT          val FROM dimension_val) as a;
As far as I can tell, this query typically does two sorts,
one for the distinct, and one for the except.


In a data warehouse we have here, we load about 3 million rows
each week; load time improved from about 9 to 7 hours
by breaking up such queries into expressions that only require
one sort at a time, and surrounding the expressions with
"set sort_mem=something_big" statements to give it enough
space to not hit the disk.

  SET SORT_MEM=300000;
  CREATE TEMPORARY TABLE potential_new_values AS
     SELECT DISTINCT val FROM import_table;
  ...
  SET SORT_MEM=1000;

Anyone else have similar experience, or am I doing something
wrong to need so much SORT_MEM?


    Ron



PS:

Below is an example of another real-world query from the same
reporting system that benefits from a sort_mem over 32M.
Explain analyze (below) shows a 40% improvement by having
the sort fit in memory.

10Meg and 32Meg take over 22 seconds.  100Meg takes 14.

====================================================================================================
logs2=#
logs2=#
logs2=# set sort_mem=10000;
SET VARIABLE
logs2=# explain analyze select distinct category from c_transaction_credit;
NOTICE:  QUERY PLAN:

Unique  (cost=71612.82..72838.69 rows=49035 width=17) (actual time=20315.47..22457.21 rows=2914 loops=1)
  ->  Sort  (cost=71612.82..71612.82 rows=490348 width=17) (actual time=20315.46..21351.42 rows=511368 loops=1)
        ->  Seq Scan on c_transaction_credit  (cost=0.00..14096.48 rows=490348 width=17) (actual time=0.08..2932.72
rows=511368
loops=1)
Total runtime: 22475.63 msec

EXPLAIN
logs2=# set sort_mem=32000;
SET VARIABLE
logs2=# explain analyze select distinct category from c_transaction_credit;
NOTICE:  QUERY PLAN:

Unique  (cost=60442.82..61668.69 rows=49035 width=17) (actual time=22657.31..24794.19 rows=2914 loops=1)
  ->  Sort  (cost=60442.82..60442.82 rows=490348 width=17) (actual time=22657.30..23714.43 rows=511368 loops=1)
        ->  Seq Scan on c_transaction_credit  (cost=0.00..14096.48 rows=490348 width=17) (actual time=0.07..3020.83
rows=511368
loops=1)
Total runtime: 24811.65 msec

EXPLAIN
logs2=# set sort_mem=100000;
SET VARIABLE
logs2=# explain analyze select distinct category from c_transaction_credit;
NOTICE:  QUERY PLAN:

Unique  (cost=60442.82..61668.69 rows=49035 width=17) (actual time=12205.19..14012.57 rows=2914 loops=1)
  ->  Sort  (cost=60442.82..60442.82 rows=490348 width=17) (actual time=12205.18..12710.16 rows=511368 loops=1)
        ->  Seq Scan on c_transaction_credit  (cost=0.00..14096.48 rows=490348 width=17) (actual time=0.08..3001.05
rows=511368
loops=1)
Total runtime: 14187.96 msec

EXPLAIN
logs2=#


В списке pgsql-performance по дате отправления:

Предыдущее
От: Jean-Luc Lachance
Дата:
Сообщение: Re: Caching (was Re: choosing the right platform)
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Help analyzing 7.2.4 EXPLAIN