Poor performance on 9.4.4 when matching multiple columns from left side of LATERAL join

Поиск
Список
Период
Сортировка
От Steven Grimm
Тема Poor performance on 9.4.4 when matching multiple columns from left side of LATERAL join
Дата
Msg-id 5646D3F2.4060404@thesegovia.com
обсуждение исходный текст
Ответы Re: Poor performance on 9.4.4 when matching multiple columns from left side of LATERAL join  ("David G. Johnston" <david.g.johnston@gmail.com>)
Re: Poor performance on 9.4.4 when matching multiple columns from left side of LATERAL join  (David Rowley <david.rowley@2ndquadrant.com>)
Re: Poor performance on 9.4.4 when matching multiple columns from left side of LATERAL join  (Thomas Kellerer <spam_eater@gmx.net>)
Список pgsql-general
We have a table, call it "multi_id", that contains columns with IDs of
various kinds of objects in my system, and another table that's a
generic owner/key/value store for object attributes (think configuration
settings, and I'll refer to this table as "settings"). To wit:

---------------------------------------------
CREATE TABLE multi_id (
   id1 INTEGER PRIMARY KEY,
   id2 INTEGER,
   id3 INTEGER
);
CREATE TABLE settings (
   owner_id INTEGER,
   setting_id INTEGER,
   setting_value TEXT,
   PRIMARY KEY (owner_id, setting_id)
);
CREATE UNIQUE INDEX multi_id_idx_id1 ON multi_id (id1, id2);
CREATE UNIQUE INDEX multi_id_idx_id2 ON multi_id (id2, id1);
CREATE INDEX settings_idx_setting_id ON settings (setting_id,
setting_value);
---------------------------------------------

We want to find all the rows from multi_id where any of the IDs
(including its primary key) have a certain setting with a certain value.

LATERAL seemed like the tool for the job, so we tried the following:

---------------------------------------------
SELECT mid.id1
FROM multi_id AS mid,
LATERAL (
     SELECT 1
     FROM settings
     WHERE setting_id = 1
     AND setting_value = 'common_1'
     AND owner_id IN (mid.id1, mid.id2, mid.id3)
) AS setting_matcher;
---------------------------------------------

When we're searching for a common value, this query takes a LONG time.
It turns out the culprit is the IN clause in the subquery. If I change
"owner_id IN (mid.id1, mid.id2, mid.id3)" to "owner_id = mid.id1", the
query executes in about 1/900 the time. It remains that fast if I change
mid.id1 to mid.id2 or mid.id3, meaning if I do a UNION of those three
queries to get the same result set as the query above, the whole thing
is roughly 300x faster.

Execution plan for the IN version followed by the = version (for just
one of the IDs):

---------------------------------------------
Nested Loop  (cost=5.39..8107.18 rows=285 width=4) (actual
time=1.230..6456.567 rows=4499 loops=1)
    Join Filter: (settings.owner_id = ANY (ARRAY[mid.id1, mid.id2,
mid.id3]))
    Rows Removed by Join Filter: 22495501
    ->  Seq Scan on multi_id mid  (cost=0.00..78.00 rows=5000 width=12)
(actual time=0.010..1.385 rows=5000 loops=1)
    ->  Materialize  (cost=5.39..310.66 rows=95 width=4) (actual
time=0.000..0.263 rows=4500 loops=5000)
          ->  Bitmap Heap Scan on settings  (cost=5.39..310.19 rows=95
width=4) (actual time=1.207..3.210 rows=4500 loops=1)
                Recheck Cond: ((setting_id = 1) AND (setting_value =
'common_1'::text))
                Heap Blocks: exact=1405
                ->  Bitmap Index Scan on settings_idx_setting_id
(cost=0.00..5.37 rows=95 width=0) (actual time=0.930..0.930 rows=4500
loops=1)
                      Index Cond: ((setting_id = 1) AND (setting_value =
'common_1'::text))
  Planning time: 0.178 ms
  Execution time: 6456.897 ms


  Hash Join  (cost=145.98..472.93 rows=103 width=4) (actual
time=2.677..6.890 rows=4500 loops=1)
    Hash Cond: (settings.owner_id = mid.id1)
    ->  Bitmap Heap Scan on settings  (cost=5.48..330.50 rows=103
width=4) (actual time=1.194..3.477 rows=4500 loops=1)
          Recheck Cond: ((setting_id = 1) AND (setting_value =
'common_1'::text))
          Heap Blocks: exact=1405
          ->  Bitmap Index Scan on settings_idx_setting_id
(cost=0.00..5.45 rows=103 width=0) (actual time=0.854..0.854 rows=4500
loops=1)
                Index Cond: ((setting_id = 1) AND (setting_value =
'common_1'::text))
    ->  Hash  (cost=78.00..78.00 rows=5000 width=4) (actual
time=1.463..1.463 rows=5000 loops=1)
          Buckets: 1024  Batches: 1  Memory Usage: 176kB
          ->  Seq Scan on multi_id mid  (cost=0.00..78.00 rows=5000
width=4) (actual time=0.007..0.717 rows=5000 loops=1)
  Planning time: 0.311 ms
  Execution time: 7.166 ms
---------------------------------------------

What am I doing wrong in the IN version of the query, if anything?

I wrote a script to populate a test database with a simplified version
of our real data model and demonstrate the behavior I'm seeing:
https://gist.github.com/sgrimm-sg/2722068ef844d3e02129

Thanks!



В списке pgsql-general по дате отправления:

Предыдущее
От: "Pradhan, Sabin"
Дата:
Сообщение: fast refresh materialized view
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Poor performance on 9.4.4 when matching multiple columns from left side of LATERAL join