Re: limit + order by is slow if no rows in result set

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: limit + order by is slow if no rows in result set
Дата
Msg-id 21314.1171337802@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: limit + order by is slow if no rows in result set  (Brian Cox <brian.cox@ca.com>)
Список pgsql-performance
Brian Cox <brian.cox@ca.com> writes:
>> Please run EXPLAIN ANALYZE on both queries, and send back the results.
> [ results... ]

The reason the hash plan is fairly fast is that the hash join code has a
special hack: if it reads the inner relation and finds it contains no
rows, it knows there can be no join result rows, so it can fall out
without reading the outer relation at all.  This saves it from scanning
the large ts_defects table.  (If you look close you'll see that it
actually reads just the first row from ts_defects; this is because the
inner relation isn't read until after we know the outer is nonempty,
so as to try to win for the other case of empty outer and nonempty
inner.)

The reason the nestloop/limit plan is not fast is that it has to scan
the inner relation (ts_biz_events) for each row of ts_defects, and there
are lots of them.  Even though each inner scan is just a fast index
probe, it adds up.

The reason the planner goes for the nestloop/limit plan is that it's
expecting that about 5% (98762/1932688) of the ts_defects rows will
have a match in ts_biz_events, and so it figures it'll only have to
probe ts_biz_events about 20 times before producing an output row,
and the Limit only wants one row.  So this looks a lot cheaper than
the hash plan --- especially since the latter is being costed without
any assumption that the zero-inner-rows situation applies.

The bottom line is that the plans are being chosen on "typical" rather
than corner-case assumptions, and zero matching rows is a corner case
that happens to work real well for the hash plan and not well at all for
the nestloop plan.  I'm not sure what we can do about that without
making the performance worse for the case of not-quite-zero matching
rows.

You might be able to get a better result if you increased the statistics
target for ts_status --- it looks like the planner thinks there are many
more ts_status = 3 rows than there really are.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Brian Cox
Дата:
Сообщение: Re: limit + order by is slow if no rows in result set
Следующее
От: ismo.tuononen@solenovo.fi
Дата:
Сообщение: many instances or many databases or many users?