Why is a hash join preferred when it does not fit in work_mem

Поиск
Список
Период
Сортировка
От Dimitrios Apostolou
Тема Why is a hash join preferred when it does not fit in work_mem
Дата
Msg-id 7ee75988-8408-36f8-3024-93dbf45b6f19@gmx.net
обсуждение исходный текст
Ответы Re: Why is a hash join preferred when it does not fit in work_mem  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-general
Hello list,

I have a very simple NATURAL JOIN that does not fit in the work_mem.  Why
does the query planner prefer a hash join that needs 361s, while with a
sort operation and a merge join it takes only 13s?

The server is an old Mac Mini with hard disk drive and only 4GB RAM.
Postgres version info:

  PostgreSQL 15.0 on x86_64-apple-darwin20.6.0, compiled by Apple clang
version 12.0.0 (clang-1200.0.32.29), 64-bit

The low work_mem and the disabled memoization are set on purpose, in order
to simplify a complex query, while reproducing the same problem that I
experienced there. This result is the simplest query I could get, where
the optimizer does not go for a faster merge join.

From my point of view a merge join is clearly faster, because the hash
table does not fit in memory and I expect a hash join to do a lot of
random I/O. But the query planner does not see that, and increasing
random_page_cost does not help either. In fact the opposite happens: the
merge join gets a higher cost difference to the hash join, as I increase
the random page cost!



# EXPLAIN (ANALYZE,VERBOSE,BUFFERS,SETTINGS) SELECT * FROM
tasks_mm_workitems NATURAL JOIN workitem_ids;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=121222.68..257633.01 rows=3702994 width=241) (actual
time=184498.464..360606.257 rows=3702994 loops=1)
    Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n,
workitem_ids.workitem_id
    Inner Unique: true
    Hash Cond: (tasks_mm_workitems.workitem_n = workitem_ids.workitem_n)
    Buffers: shared hit=15068 read=47434, temp read=56309 written=56309
    ->  Seq Scan on public.tasks_mm_workitems  (cost=0.00..53488.94
rows=3702994 width=8) (actual time=0.040..1376.084 rows=3702994 loops=1)
          Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n
          Buffers: shared read=16459
    ->  Hash  (cost=59780.19..59780.19 rows=1373719 width=237) (actual
time=184361.874..184361.875 rows=1373737 loops=1)
          Output: workitem_ids.workitem_id, workitem_ids.workitem_n
          Buckets: 4096  Batches: 512  Memory Usage: 759kB
          Buffers: shared hit=15068 read=30975, temp written=43092
          ->  Seq Scan on public.workitem_ids  (cost=0.00..59780.19
rows=1373719 width=237) (actual time=0.026..1912.312 rows=1373737 loops=1)
                Output: workitem_ids.workitem_id, workitem_ids.workitem_n
                Buffers: shared hit=15068 read=30975
  Settings: effective_cache_size = '500MB', enable_memoize = 'off',
hash_mem_multiplier = '1', max_parallel_workers_per_gather = '1', work_mem
= '1MB'
  Planning:
    Buffers: shared hit=2 read=6
  Planning Time: 0.568 ms
  Execution Time: 361106.876 ms
(20 rows)


# EXPLAIN (ANALYZE,VERBOSE,BUFFERS,SETTINGS) SELECT * FROM
tasks_mm_workitems NATURAL JOIN workitem_ids;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=609453.49..759407.78 rows=3702994 width=241) (actual time=5062.513..10866.313 rows=3702994 loops=1)
    Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n, workitem_ids.workitem_id
    Merge Cond: (workitem_ids.workitem_n = tasks_mm_workitems.workitem_n)
    Buffers: shared hit=5343 read=66053, temp read=32621 written=32894
    ->  Index Scan using workitem_ids_pkey on public.workitem_ids
(cost=0.43..81815.86 rows=1373719 width=237) (actual time=0.111..1218.363 rows=1373737 loops=1)
          Output: workitem_ids.workitem_n, workitem_ids.workitem_id
          Buffers: shared hit=5310 read=49627
    ->  Materialize  (cost=609372.91..627887.88 rows=3702994 width=8) (actual time=5062.389..7392.640 rows=3702994
loops=1)
          Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n
          Buffers: shared hit=33 read=16426, temp read=32621 written=32894
          ->  Sort  (cost=609372.91..618630.40 rows=3702994 width=8) (actual time=5062.378..6068.703 rows=3702994
loops=1)
                Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n
                Sort Key: tasks_mm_workitems.workitem_n
                Sort Method: external merge  Disk: 65256kB
                Buffers: shared hit=33 read=16426, temp read=32621 written=32894
                ->  Seq Scan on public.tasks_mm_workitems
(cost=0.00..53488.94 rows=3702994 width=8) (actual time=0.045..1177.202 rows=3702994 loops=1)
                      Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n
                      Buffers: shared hit=33 read=16426
  Settings: effective_cache_size = '500MB', enable_hashjoin = 'off', enable_memoize = 'off', hash_mem_multiplier = '1',
max_parallel_workers_per_gather= '1', work_mem = '1MB' 
  Planning:
    Buffers: shared hit=8
  Planning Time: 0.677 ms
  Execution Time: 13364.545 ms
(23 rows)


Thank you in advance,
Dimitris




В списке pgsql-general по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Changing displayed time zone in RAISE NOTICE output?
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Changing displayed time zone in RAISE NOTICE output?