Re: Subquery in a JOIN not getting restricted?

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Subquery in a JOIN not getting restricted?
Дата
Msg-id 4EB7FEFD0200002500042B1C@gw.wicourts.gov
обсуждение исходный текст
Ответ на Subquery in a JOIN not getting restricted?  (Jay Levitt <jay.levitt@gmail.com>)
Ответы Re: Subquery in a JOIN not getting restricted?
Список pgsql-performance
Jay Levitt <jay.levitt@gmail.com> wrote:
> When I run the following query:
>
> select questions.id
> from questions
> join (
>      select u.id as user_id
>      from users as u
>      left join scores as s
>      on s.user_id = u.id
> ) as subquery
> on subquery.user_id = questions.user_id;
>
> the subquery is scanning my entire user table, even though it's
> restricted by the outer query.  (My real subquery is much more
> complicated, of course, but this is the minimal fail case.)

It's not a fail case -- it's choosing the plan it thinks is cheapest
based on your costing parameters and the statistics gathered by the
latest ANALYZE of the data.

> Is this just not a thing the optimizer can do?  Are there ways to
> rewrite this, still as a subquery, that will be smart enough to
> only produce the one row of subquery that matches
> questions.user_id?

Well, it can certainly produce the plan you seem to want, if it
looks less expensive.  It kinda did with the following script:

create table questions
  (id int not null primary key, user_id int not null);

insert into questions
  select generate_series(1,100), (random()*1000000)::int;

create table users (id int not null primary key);

insert into users select generate_series(1, 1000000);

create table scores
  (id int not null primary key, user_id int not null);

insert into scores select n, n
  from (select generate_series(1,1000000)) x(n);

vacuum freeze analyze;

explain analyze
select questions.id
from questions
join (
     select u.id as user_id
     from users as u
     left join scores as s
     on s.user_id = u.id
) as subquery
on subquery.user_id = questions.user_id;

Here's the plan I got, which scans the questions and then uses the
index to join to the users.  It's throwing the result of that into a
hash table which is then checked from a sequential scan of the
scores table.  If I had made the scores table wider, it might have
gone from the user table to scores on the index.

 Hash Right Join
     (cost=438.23..18614.23 rows=100 width=4)
     (actual time=2.776..161.237 rows=100 loops=1)
   Hash Cond: (s.user_id = u.id)
   ->  Seq Scan on scores s
           (cost=0.00..14425.00 rows=1000000 width=4)
           (actual time=0.025..77.876 rows=1000000 loops=1)
   ->  Hash
           (cost=436.98..436.98 rows=100 width=8)
           (actual time=0.752..0.752 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 4kB
         ->  Nested Loop
                 (cost=0.00..436.98 rows=100 width=8)
                 (actual time=0.032..0.675 rows=100 loops=1)
               ->  Seq Scan on questions
                       (cost=0.00..2.00 rows=100 width=8)
                       (actual time=0.010..0.042 rows=100 loops=1)
               ->  Index Only Scan using users_pkey on users u
                       (cost=0.00..4.34 rows=1 width=4)
                       (actual time=0.005..0.005 rows=1 loops=100)
                     Index Cond: (id = questions.user_id)
 Total runtime: 168.585 ms

If you want help figuring out whether it is choosing the fastest
plan, and how to get it do better if it is not, please read this
page and post the relevant information:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Subquery in a JOIN not getting restricted?
Следующее
От: Richard Yen
Дата:
Сообщение: WAL partition filling up after high WAL activity