Re: Potential Join Performance Issue

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Potential Join Performance Issue
Дата
Msg-id 29556.1221151380@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Potential Join Performance Issue  ("Lawrence, Ramon" <ramon.lawrence@ubc.ca>)
Ответы Re: Potential Join Performance Issue
Список pgsql-hackers
"Lawrence, Ramon" <ramon.lawrence@ubc.ca> writes:
> To keep the changes simple, the update simply calls
> ExecChooseHashTableSize() in create_hashjoin_plan() to re-calculate the
> expected number of batches.  This is more efficient and results in less
> code changes than modifying the HashPath struct to store the number of
> batches and updating that variable when costing (as cost_hashjoin() will
> be called many times during costing).

I was intending to do it the other way, actually.  An extra field in
HashPath hardly costs anything.  The other reason for it is that there
are other possible uses for knowing whether a hash will be multi-batch.
(For example, if we were prepared to tell the executor that it *must*
keep the hash to one batch, we could assume that the sort order of the
left input is preserved.  I haven't looked into the risks/benefits of
that too much, but it's been in the back of the mind for a long time.)

> An ideal solution would detect at execution time if the inner relation
> remained in memory (one batch) and decide to disable/enable the
> physical-tlist optimization on the outer relation accordingly.  At this
> time, we are uncertain if this would be desirable or possible.

That seems pretty infeasible really.  Aside from changing plan node
output tuple types on-the-fly, it would mean renumbering Vars in the
join node to reference the outer relation's new output columns.  The
overhead of supporting that would be paid across-the-board in the
executor whether or not anyone got any real benefit from it.

I'd be more inclined to deal with the issue by trying to establish a
"safety margin" in the estimate of whether the hash will go multi-batch.
IOW we should disuse_physical_tlist if the hash is estimated to be close
to but still within one batch.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Markus Wanner
Дата:
Сообщение: Re: Synchronous Log Shipping Replication
Следующее
От: Gregory Stark
Дата:
Сообщение: Re: Transaction Snapshots and Hot Standby