Re: nested loop semijoin estimates

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: nested loop semijoin estimates
Дата
Msg-id 30658.1433015423@sss.pgh.pa.us
обсуждение исходный текст
Ответ на nested loop semijoin estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: nested loop semijoin estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: nested loop semijoin estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I wonder whether the
>          run_cost += inner_run_cost;
> is actually correct, because this pretty much means we assume scanning 
> the whole inner relation (once). Wouldn't something like this be more 
> appropriate?
>          run_cost += inner_run_cost * inner_scan_frac;

Well, not entirely.  The reason why it's like that is explained (evidently
not well enough) in the comment just above:
        * A complicating factor is that rescans may be cheaper than first        * scans.  If we never scan all the way
tothe end of the inner rel,        * it might be (depending on the plan type) that we'd never pay the        * whole
innerfirst-scan run cost.  However it is difficult to        * estimate whether that will happen, so be conservative
andalways        * charge the whole first-scan cost once.
 

The case this is concerned about is something like a Materialize node
above a scan node.  The Materialize node makes rescans much cheaper than
the original scan, but only once the data has been fetched to begin with.
So, while the first successful probe should indeed have cost something
like inner_run_cost * inner_scan_frac, once we make this change it's
no longer legitimate for final_cost_nestloop to suppose that the cost
of an unsuccessful probe is just inner_rescan_run_cost.  Any unsuccessful
probe will mean we must incur all of inner_run_cost in order to complete
the underlying scan and fully load the Materialize node.

However, in the case where has_indexed_join_quals() is true, I think
your change is correct, because then all the scans will either stop
on the first tuple or quickly determine that no tuples satisfy the
indexquals.  (Also, in the cases where has_indexed_join_quals() can
succeed, rescan cost is the same as initial scan cost anyhow.)

So what this seems to mean is that for SEMI/ANTI join cases, we have to
postpone all of the inner scan cost determination to final_cost_nestloop,
so that we can do this differently depending on whether
has_indexed_join_quals() is true.  That's a little bit annoying because it
will mean we take the shortcut exit less often; but since SEMI/ANTI joins
aren't that common, it's probably not going to be a big planning time hit.

Not sure yet about your other point about the indexscan getting rejected
too soon.  That doesn't seem to be happening for me, at least not in HEAD.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: [CORE] postpone next week's release
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [CORE] postpone next week's release