Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Дата
Msg-id CAApHDvpBk5QYd_Zbh76tvomOf79hHdqV1HKVdGzs9eeKP-TJUw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers
Thanks a lot for testing this patch. It's good to see it run through a
benchmark that exercises quite a few join problems.

On Fri, 11 Dec 2020 at 05:44, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> For most queries performance is the same, some queries are executed
> faster but
> one query is 150 times slower:
>
>
> explain analyze SELECT MIN(chn.name) AS character,
...
>   Execution Time: 523002.608 ms

> I attach file with times of query execution.

I noticed the time reported in results.csv is exactly the same as the
one in the EXPLAIN ANALYZE above.  One thing to note there that it
would be a bit fairer if the benchmark was testing the execution time
of the query instead of the time to EXPLAIN ANALYZE.

One of the reasons that the patch may look less favourable here is
that the timing overhead on EXPLAIN ANALYZE increases with additional
nodes.

If I just put this to the test by using the tables and query from [1].

#  explain (analyze, costs off) select count(*) from hundredk hk inner
# join lookup l on hk.thousand = l.a;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Aggregate (actual time=1891.262..1891.263 rows=1 loops=1)
   ->  Nested Loop (actual time=0.312..1318.087 rows=9990000 loops=1)
         ->  Seq Scan on hundredk hk (actual time=0.299..15.753
rows=100000 loops=1)
         ->  Result Cache (actual time=0.000..0.004 rows=100 loops=100000)
               Cache Key: hk.thousand
               Hits: 99000  Misses: 1000  Evictions: 0  Overflows: 0
Memory Usage: 3579kB
               ->  Index Only Scan using lookup_a_idx on lookup l
(actual time=0.003..0.012 rows=100 loops=1000)
                     Index Cond: (a = hk.thousand)
                     Heap Fetches: 0
 Planning Time: 3.471 ms
 Execution Time: 1891.612 ms
(11 rows)

You can see here the query took 1.891 seconds to execute.

Same query without EXPLAIN ANALYZE.

postgres=# \timing
Timing is on.
postgres=# select count(*) from hundredk hk inner
postgres-# join lookup l on hk.thousand = l.a;
  count
---------
 9990000
(1 row)

Time: 539.449 ms

Or is it more accurate to say it took just 0.539 seconds?

Going through the same query after disabling; enable_resultcache,
enable_mergejoin, enable_nestloop, I can generate the following table
which compares the EXPLAIN ANALYZE time to the \timing on time.

postgres=# select type,ea_time,timing_time, round(ea_time::numeric /
timing_time::numeric,3) as ea_overhead from results order by
timing_time;
      type      | ea_time  | timing_time | ea_overhead
----------------+----------+-------------+-------------
 Nest loop + RC | 1891.612 |     539.449 |       3.507
 Merge join     | 2411.632 |    1008.991 |       2.390
 Nest loop      |  2484.82 |     1049.63 |       2.367
 Hash join      | 4969.284 |    3272.424 |       1.519

Result Cache will be hit a bit harder by this problem due to it having
additional nodes in the plan. The Hash Join query seems to suffer much
less from this problem.

However, saying that. It's certainly not the entire problem here:

Hits: 5  Misses: 156294 Evictions: 0  Overflows: 0  Memory Usage: 9769kB

The planner must have thought there'd be more hits than that or it
wouldn't have thought Result Caching would be a good plan.  Estimating
the cache hit ratio using n_distinct becomes much less reliable when
there are joins and filters. A.K.A the real world.

David

[1] https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: extended statistics - functional dependencies vs. MCV lists
Следующее
От: "k.jamison@fujitsu.com"
Дата:
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist