Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: MergeJoin beats HashJoin in the case of multiple hash clauses
Дата
Msg-id ZPoRuHWPZeu3qC/a@momjian.us
обсуждение исходный текст
Ответ на Re: MergeJoin beats HashJoin in the case of multiple hash clauses  (Alena Rybakina <lena.ribackina@yandex.ru>)
Список pgsql-hackers
Does anyone else have an opinion on this patch?  It looks promising.

---------------------------------------------------------------------------

On Wed, Jun 28, 2023 at 04:53:06PM +0300, Alena Rybakina wrote:
> Hi!
> 
> On 15.06.2023 11:30, Andrey Lepikhov wrote:
> 
>     Hi, all.
> 
>     Some of my clients use JOIN's with three - four clauses. Quite frequently,
>     I see complaints on unreasonable switch of JOIN algorithm to Merge Join
>     instead of Hash Join. Quick research have shown one weak place - estimation
>     of an average bucket size in final_cost_hashjoin (see q2.sql in attachment)
>     with very conservative strategy.
>     Unlike estimation of groups, here we use smallest ndistinct value across
>     all buckets instead of multiplying them (or trying to make multivariate
>     analysis).
>     It works fine for the case of one clause. But if we have many clauses, and
>     if each has high value of ndistinct, we will overestimate average size of a
>     bucket and, as a result, prefer to use Merge Join. As the example in
>     attachment shows, it leads to worse plan than possible, sometimes
>     drastically worse.
>     I assume, this is done with fear of functional dependencies between hash
>     clause components. But as for me, here we should go the same way, as
>     estimation of groups.
>     The attached patch shows a sketch of the solution.
> 
> 
> This problem is very important.
> 
> Honestly, I'm still learning your code and looking for cases on which cases
> your patch can affect for the worse or for the better. But I have already found
> something that seemed interesting to me. I have found several other interesting
> cases where your patch can solve some problem in order to choose a more correct
> plan, but in focus on memory consumption.
> 
> To make it easier to evaluate, I added a hook to your patch that makes it
> easier to switch to your or the original way of estimating the size of baskets
> (diff_estimate.diff).
> 
> Here are other cases where your fix improves the query plan.
> 
> 
> First of all, I changed the way creation of tables are created to look at the
> behavior of the query plan in terms of planning and execution time:
> 
> DROP TABLE IF EXISTS a,b CASCADE;
> CREATE TABLE a AS
>   SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
>   FROM generate_series(1,1e5) AS gs;
> CREATE TABLE b AS
>   SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
>   FROM generate_series(1,1e5) AS gs;
> ANALYZE a,b;
> 
> SET enable_cost_size = 'on';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> SET enable_cost_size = 'off';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> 
>                                 QUERY PLAN                                 
> ---------------------------------------------------------------------------
>  Hash Join (actual time=200.872..200.879 rows=0 loops=1)
>    Hash Cond: ((b.x = a.x) AND (b.y = a.y) AND (b.z = a.z))
>    ->  Seq Scan on b (actual time=0.029..15.946 rows=100000 loops=1)
>    ->  Hash (actual time=97.645..97.649 rows=100000 loops=1)
>          Buckets: 131072  Batches: 1  Memory Usage: 5612kB
>          ->  Seq Scan on a (actual time=0.024..17.153 rows=100000 loops=1)
>  Planning Time: 2.910 ms
>  Execution Time: 201.949 ms
> (8 rows)
> 
> SET
>                                 QUERY PLAN                                 
> ---------------------------------------------------------------------------
>  Merge Join (actual time=687.415..687.416 rows=0 loops=1)
>    Merge Cond: ((b.y = a.y) AND (b.x = a.x) AND (b.z = a.z))
>    ->  Sort (actual time=462.022..536.716 rows=100000 loops=1)
>          Sort Key: b.y, b.x, b.z
>          Sort Method: external merge  Disk: 3328kB
>          ->  Seq Scan on b (actual time=0.017..12.326 rows=100000 loops=1)
>    ->  Sort (actual time=111.295..113.196 rows=16001 loops=1)
>          Sort Key: a.y, a.x, a.z
>          Sort Method: external sort  Disk: 2840kB
>          ->  Seq Scan on a (actual time=0.020..10.129 rows=100000 loops=1)
>  Planning Time: 0.752 ms
>  Execution Time: 688.829 ms
> (12 rows)
> 
> Secondly, I found another case that is not related to the fact that the planner
> would prefer to choose merge join rather than hash join, but we have the
> opportunity to see that the plan has become better due to the consumption of
> less memory, and also takes less planning time.
> 
> Here, with the same query, the planning time was reduced by 5 times, and the
> number of buckets by 128 times, therefore, memory consumption also decreased:
> 
> DROP TABLE IF EXISTS a,b CASCADE;
> 
> CREATE TABLE a AS
>   SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
>   FROM generate_series(1,600) AS gs;
> CREATE TABLE b AS
>   SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
>   FROM generate_series(1,1e5) AS gs;
> ANALYZE a,b;
> 
> SET enable_cost_size = 'on';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> SET enable_cost_size = 'off';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
>                                                    QUERY
> PLAN                                                   
> ----------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=20.50..3157.58 rows=8 width=32) (actual time=95.648..95.651
> rows=0 loops=1)
>    Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND (b.z =
> (a.z)::numeric))
>    ->  Seq Scan on b  (cost=0.00..1637.00 rows=100000 width=20) (actual time=
> 0.027..17.980 rows=100000 loops=1)
>    ->  Hash  (cost=10.00..10.00 rows=600 width=12) (actual time=2.046..2.047
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 34kB
>          ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual time=
> 0.022..0.315 rows=600 loops=1)
>  Planning Time: 0.631 ms
>  Execution Time: 95.730 ms
> (8 rows)
> 
> SET
>                                                       QUERY
> PLAN                                                      
>
----------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=3387.00..8621.58 rows=8 width=32) (actual time=
> 102.873..102.877 rows=0 loops=1)
>    Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND
> ((a.z)::numeric = b.z))
>    ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual time=
> 0.014..0.131 rows=600 loops=1)
>    ->  Hash  (cost=1637.00..1637.00 rows=100000 width=20) (actual time=
> 101.920..101.921 rows=100000 loops=1)
>          Buckets: 131072  Batches: 1  Memory Usage: 6474kB
>          ->  Seq Scan on b  (cost=0.00..1637.00 rows=100000 width=20) (actual
> time=0.013..16.349 rows=100000 loops=1)
>  Planning Time: 0.153 ms
>  Execution Time: 103.518 ms
> (8 rows)
> 
> I also give an improvement relative to the left external or right connection:
> 
> DROP TABLE IF EXISTS a,b CASCADE;
> 
> CREATE TABLE a AS
>   SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
>   FROM generate_series(1,600) AS gs;
> CREATE TABLE b AS
>   SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
>   FROM generate_series(1,1e5) AS gs;
> ANALYZE a,b;
> 
> 
> SET enable_cost_size = 'on';
> 
> EXPLAIN ANALYZE
> SELECT * FROM a right join b
> on a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> SET enable_cost_size = 'off';
> EXPLAIN ANALYZE
> SELECT * FROM a right join b
> on a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
>                                                    QUERY
> PLAN                                                   
> ----------------------------------------------------------------------------------------------------------------
>  Hash Left Join  (cost=20.50..3157.58 rows=100000 width=32) (actual time=
> 1.846..102.264 rows=100000 loops=1)
>    Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND (b.z =
> (a.z)::numeric))
>    ->  Seq Scan on b  (cost=0.00..1637.00 rows=100000 width=20) (actual time=
> 0.041..15.328 rows=100000 loops=1)
>    ->  Hash  (cost=10.00..10.00 rows=600 width=12) (actual time=1.780..1.781
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 34kB
>          ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual time=
> 0.031..0.252 rows=600 loops=1)
>  Planning Time: 0.492 ms
>  Execution Time: 107.609 ms
> (8 rows)
> 
> SET
>                                                       QUERY
> PLAN                                                      
>
----------------------------------------------------------------------------------------------------------------------
>  Hash Right Join  (cost=3387.00..8500.08 rows=100000 width=32) (actual time=
> 80.919..101.613 rows=100000 loops=1)
>    Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND
> ((a.z)::numeric = b.z))
>    ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual time=
> 0.017..0.084 rows=600 loops=1)
>    ->  Hash  (cost=1637.00..1637.00 rows=100000 width=20) (actual time=
> 80.122..80.123 rows=100000 loops=1)
>          Buckets: 131072  Batches: 1  Memory Usage: 6474kB
>          ->  Seq Scan on b  (cost=0.00..1637.00 rows=100000 width=20) (actual
> time=0.015..11.819 rows=100000 loops=1)
>  Planning Time: 0.194 ms
>  Execution Time: 104.662 ms
> (8 rows)
> 
> --
> Regards,
> Alena Rybakina
> Postgres Professional
> 

> diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
> index ef475d95a18..31771dfba46 100644
> --- a/src/backend/optimizer/path/costsize.c
> +++ b/src/backend/optimizer/path/costsize.c
> @@ -153,6 +153,7 @@ bool        enable_parallel_hash = true;
>  bool        enable_partition_pruning = true;
>  bool        enable_presorted_aggregate = true;
>  bool        enable_async_append = true;
> +bool         enable_cost_size = true;
>  
>  typedef struct
>  {
> @@ -4033,11 +4034,22 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
>                  thismcvfreq = restrictinfo->left_mcvfreq;
>              }
>  
> +            if (enable_cost_size)
> +            {
> +                innerbucketsize *= thisbucketsize;
> +                innermcvfreq *= thismcvfreq;
> +            }
> +            else
> +            {
>              if (innerbucketsize > thisbucketsize)
>                  innerbucketsize = thisbucketsize;
>              if (innermcvfreq > thismcvfreq)
>                  innermcvfreq = thismcvfreq;
> +            }
>          }
> +
> +        if (enable_cost_size && innerbucketsize > virtualbuckets)
> +            innerbucketsize = 1.0 / virtualbuckets;
>      }
>  
>      /*
> diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
> index 71e27f8eb05..ded9ba3b7a9 100644
> --- a/src/backend/utils/misc/guc_tables.c
> +++ b/src/backend/utils/misc/guc_tables.c
> @@ -1007,6 +1007,19 @@ struct config_bool ConfigureNamesBool[] =
>          true,
>          NULL, NULL, NULL
>      },
> +    {
> +        {"enable_cost_size", PGC_USERSET, QUERY_TUNING_OTHER,
> +            gettext_noop("set the optimizer coefficient"
> +                         "so that custom or generic plan is selected more often. "
> +                         "by default, the value is set to 1, which means that "
> +                         "the choice of using both depends on the calculated cost"),
> +            NULL,
> +            GUC_EXPLAIN
> +        },
> +        &enable_cost_size,
> +        true,
> +        NULL, NULL, NULL
> +    },
>      {
>          {"enable_async_append", PGC_USERSET, QUERY_TUNING_METHOD,
>              gettext_noop("Enables the planner's use of async append plans."),
> diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
> index 6cf49705d3a..c79ec12e6d5 100644
> --- a/src/include/optimizer/cost.h
> +++ b/src/include/optimizer/cost.h
> @@ -71,6 +71,7 @@ extern PGDLLIMPORT bool enable_partition_pruning;
>  extern PGDLLIMPORT bool enable_presorted_aggregate;
>  extern PGDLLIMPORT bool enable_async_append;
>  extern PGDLLIMPORT int constraint_exclusion;
> +extern PGDLLIMPORT bool enable_cost_size;
>  
>  extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
>                                    double index_pages, PlannerInfo *root);


-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: A minor adjustment to get_cheapest_path_for_pathkeys