9.3.2 server creates hundreds of thousands of temporary files

Поиск
Список
Период
Сортировка
I've got a query which causes PostgreSQL to create hundreds of thousands
of temporary files, many of them empty.  The process also needs a lot of
memory.  I suspect this is due to bookkeeping for those files.

The query looks like this:

SELECT ef.arch::text, en.name, file_id, f.name
   FROM symboldb.package_set_member psm
   JOIN symboldb.file f USING (package_id)
   JOIN symboldb.elf_file ef USING (contents_id)
   JOIN symboldb.elf_needed en USING (contents_id)
   WHERE psm.set_id = 36;

The query plan looks like this:

  Hash Join  (cost=3153618.94..7866324.29 rows=86653612 width=84)
    Hash Cond: (f.contents_id = ef.contents_id)
    ->  Hash Join  (cost=2349021.27..5624009.06 rows=5647704 width=68)
          Hash Cond: (psm.package_id = f.package_id)
          ->  Bitmap Heap Scan on package_set_member psm
(cost=495.42..5385.79 rows=33870 width=4)
                Recheck Cond: (set_id = 36)
                ->  Bitmap Index Scan on package_set_member_set_id_idx
(cost=0.00..486.95 rows=33870 width=0)
                      Index Cond: (set_id = 36)
          ->  Hash  (cost=1252310.60..1252310.60 rows=45263060 width=72)
                ->  Seq Scan on file f  (cost=0.00..1252310.60
rows=45263060 width=72)
    ->  Hash  (cost=592698.57..592698.57 rows=10958808 width=28)
          ->  Hash Join  (cost=51493.85..592698.57 rows=10958808 width=28)
                Hash Cond: (en.contents_id = ef.contents_id)
                ->  Seq Scan on elf_needed en  (cost=0.00..202333.08
rows=10958808 width=20)
                ->  Hash  (cost=30622.82..30622.82 rows=1272082 width=8)
                      ->  Seq Scan on elf_file ef  (cost=0.00..30622.82
rows=1272082 width=8)

This happens with postgresql-server-9.3.2-2.fc20.x86_64.

I track this down to a lower-than-usual setting of work_mem, to 1MB,
after the upgrade to 9.3.

With work_mem set to 8MB, the query completes in a reasonable time
frame, with just a few thousand temporary files.  EXPLAIN ANALYZE gives
these numbers:

  Hash Join  (cost=3153618.94..7866324.29 rows=86653612 width=84)
(actual time=56041.178..87956.190 rows=511599 loops=1)
    Output: (ef.arch)::text, en.name, f.file_id, f.name
    Hash Cond: (f.contents_id = ef.contents_id)
    ->  Hash Join  (cost=2349021.27..5624009.06 rows=5647704 width=68)
(actual time=46754.959..75014.287 rows=3820442 loops=1)
          Output: f.file_id, f.name, f.contents_id
          Hash Cond: (psm.package_id = f.package_id)
          ->  Bitmap Heap Scan on symboldb.package_set_member psm
(cost=495.42..5385.79 rows=33870 width=4) (actual time=2.124..8.425
rows=32872 loops=1)
                Output: psm.package_id
                Recheck Cond: (psm.set_id = 36)
                ->  Bitmap Index Scan on package_set_member_set_id_idx
(cost=0.00..486.95 rows=33870 width=0) (actual time=2.089..2.089
rows=33112 loops=1)
                      Index Cond: (psm.set_id = 36)
          ->  Hash  (cost=1252310.60..1252310.60 rows=45263060 width=72)
(actual time=46699.578..46699.578 rows=45327619 loops=1)
                Output: f.file_id, f.name, f.package_id, f.contents_id
                Buckets: 8192  Batches: 2048 (originally 1024)  Memory
Usage: 8193kB
                ->  Seq Scan on symboldb.file f  (cost=0.00..1252310.60
rows=45263060 width=72) (actual time=0.003..34658.946 rows=45327619 loops=1)
                      Output: f.file_id, f.name, f.package_id, f.contents_id
    ->  Hash  (cost=592698.57..592698.57 rows=10958808 width=28) (actual
time=9261.212..9261.212 rows=10974399 loops=1)
          Output: ef.arch, ef.contents_id, en.name, en.contents_id
          Buckets: 16384  Batches: 128  Memory Usage: 5462kB
          ->  Hash Join  (cost=51493.85..592698.57 rows=10958808
width=28) (actual time=449.725..7130.583 rows=10974399 loops=1)
                Output: ef.arch, ef.contents_id, en.name, en.contents_id
                Hash Cond: (en.contents_id = ef.contents_id)
                ->  Seq Scan on symboldb.elf_needed en
(cost=0.00..202333.08 rows=10958808 width=20) (actual
time=0.008..1633.980 rows=10974399 loops=1)
                      Output: en.name, en.contents_id
                ->  Hash  (cost=30622.82..30622.82 rows=1272082 width=8)
(actual time=449.424..449.424 rows=1275149 loops=1)
                      Output: ef.arch, ef.contents_id
                      Buckets: 32768  Batches: 8  Memory Usage: 6253kB
                      ->  Seq Scan on symboldb.elf_file ef
(cost=0.00..30622.82 rows=1272082 width=8) (actual time=0.032..223.748
rows=1275149 loops=1)
                            Output: ef.arch, ef.contents_id
  Total runtime: 87983.826 ms

Is this a bug?

--
Florian Weimer / Red Hat Product Security Team


В списке pgsql-general по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: question on writing a function
Следующее
От: Tom Lane
Дата:
Сообщение: Re: 9.3.2 server creates hundreds of thousands of temporary files