Cost estimation problem on seq scan in a loop

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Cost estimation problem on seq scan in a loop
Дата
Msg-id CAMkU=1zHSXESAZhqH5m5oabqV8Je2iGFPAX=SK2zbGv5ZOsvBA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Cost estimation problem on seq scan in a loop
Список pgsql-hackers
While doing ad hoc queries I've seen several different problems that all seem to be variations on a theme.

The plan comes out looking like this:

 Nested Loop  (cost=826867.95..877038.04 rows=1 width=125)
   Join Filter: (foo.bar = smallish_table.bar)
   ->  Something Complicated  (cost=826867.95..876800.28 rows=1 width=81)
        .....
   ->  Seq Scan on smallish_table  (cost=0.00..142.89 rows=7389 width=44)


The estimate of rows=1 for Something Complicated is wrong and you really get 1000 or 100,000 rows.  Meaning the seq scan on smallish_table gets iterated a lot, and the time really adds up.

It would be great if Something Complicated had the correct row estimate, but since I've seen this situation arise with a lot of different Something Complicated that don't have much to do with each other (although usually an antijoin of some kind is involved) , there is little reason to think we can squash every one of them.

Is there some principled way to go about teaching the planner that hashing smallish_table on the join filter key is a cheap insurance policy against underestimating the row count of the outer loop?

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: GIN improvements part 1: additional information
Следующее
От: Tom Lane
Дата:
Сообщение: Re: planner missing a trick for foreign tables w/OR conditions