Re: Abysmal hash join

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Abysmal hash join
Дата
Msg-id 3204.1157989198@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Abysmal hash join  (Florian Weimer <fweimer@bfk.de>)
Ответы Re: Abysmal hash join
Список pgsql-performance
Florian Weimer <fweimer@bfk.de> writes:
>> Maybe larger stats targets would help.

> I've set default_statistics_target to 100 and rerun ANALYZE on that
> table.  The estimate went down to 43108 (and the hash join is still
> the preferred plan).  ANALZE with default_statistics_target = 200
> (which seems pretty large to me) is down to 26050 and the bitmap scan
> plan is chosen.

> PostgreSQL seems to think that there are only very few distinct values
> for that column (with default_statistics_target = 100 and 200):

Yeah, n_distinct estimation from a sample is inherently hard :-(.  Given
that you have such a long tail on the distribution, it might be worth
your while to crank the stats target for that column all the way to the
maximum (1000).  Also you need to experiment with extending the stats
for the smaller table.

I believe what's happening here is that the smaller table joins only to
less-frequent entries in the big table (correct?).  The hash join would
be appropriate if there were many rows joining to the very-frequent
entries, and the problem for the planner is to determine that that's not
so.  Given enough stats on the two joining columns, it should be able to
determine that.

Of course, large stats targets will slow down planning to some extent,
so you should also keep an eye on how long it takes to plan the query.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Florian Weimer
Дата:
Сообщение: Re: Abysmal hash join
Следующее
От: Brian Wipf
Дата:
Сообщение: Re: Configuring System for Speed