Re: nested loop semijoin estimates

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: nested loop semijoin estimates
Дата
Msg-id 556B9AA3.2020001@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: nested loop semijoin estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: nested loop semijoin estimates
Список pgsql-hackers
On 06/01/15 00:08, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 05/30/15 23:16, Tom Lane wrote:
>>> Attached is a draft patch for that.  It fixes the problem for me:
>
>> Seems to be working OK, but I still do get a Bitmap Heap Scan there (but
>> more about that later).
>
> Attached is an incremental patch (on top of the previous one) to
> allow startup cost of parameterized paths to be considered when the
> relation is the RHS of a semi or anti join. It seems reasonably clean
> except for one thing: logically, we perhaps should remove the checks
> on path->param_info from the last half of
> compare_path_costs_fuzzily(), so as to increase the symmetry between
> parameterized paths and unparameterized ones. However, when I did
> that, I got changes in some of the regression test plans, and they
> didn't seem to be for the better. So I left that alone. As-is, this
> patch doesn't seem to affect the results for any existing regression
> tests.

Seems to be working fine. I've tried a bunch of queries modifying the 
test case in various ways, and all seem to be planned fine. I've been 
unable to come up with a query that'd get planned badly.

Regarding the remaining checks in compare_path_costs_fuzzily(), isn't 
that simply caused by very small data sets? For example the first 
"failing" plan in join.sql looks like this:

Nested Loop Left Join  (cost=0.29..22.80 rows=2 width=12)
 Nested Loop Left Join  (cost=0.29..22.80 rows=2 width=12)   Output: "*VALUES*".column1, i1.f1, (666)   Join Filter:
("*VALUES*".column1= i1.f1)   ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)         Output:
"*VALUES*".column1  ->  Materialize  (cost=0.29..22.64 rows=5 width=8)         Output: i1.f1, (666)         ->  Nested
LoopLeft Join  (cost=0.29..22.61 rows=5 width=8)               Output: i1.f1, 666               ->  Seq Scan on
public.int4_tbli1  (cost=0.00..1.05 ...                     Output: i1.f1               ->  Index Only Scan using
tenk1_unique2on public....                     Output: i2.unique2                     Index Cond: (i2.unique2 = i1.f1)
 

while with the changes it'd look like this:
 Hash Right Join  (cost=0.34..22.70 rows=2 width=12)   Output: "*VALUES*".column1, i1.f1, (666)   Hash Cond: (i1.f1 =
"*VALUES*".column1)  ->  Nested Loop Left Join  (cost=0.29..22.61 rows=5 width=8)         Output: i1.f1, 666         ->
Seq Scan on public.int4_tbl i1  (cost=0.00..1.05 ...               Output: i1.f1         ->  Index Only Scan using
tenk1_unique2on public.tenk1 ...               Output: i2.unique2               Index Cond: (i2.unique2 = i1.f1)   ->
Hash (cost=0.03..0.03 rows=2 width=4)         Output: "*VALUES*".column1         ->  Values Scan on "*VALUES*"
(cost=0.00..0.03rows=2 ...               Output: "*VALUES*".column1
 
(14 rows)

So the planner actually believes the plan to be cheaper, although only 
by a tiny margin. And I see pretty much no difference in planning/exec 
time (but I'm on a machine with power-management and VMs, so a lot of 
random noise).

But once the int4_tbl gets bigger (say, 160k rows instead of the 5), 
even the current the hash join clearly wins. Actually, it switches from 
the current plan way sooner (at 500 rows it's already using the hash 
join, and I see about the same timings).

I don't really see why you think those plan changes to be bad? And even 
if they are,  isn't that simply a matter of tuning the cost parameters?


>> Do you plan to push that into 9.5, or 9.6? I assume it's a
>> behavior change so that no back-patching, right?
>
> Mumble. It's definitely a planner bug fix, and I believe that the
> effects are narrowly constrained to cases where significantly better
> plans are possible. So personally I'd be willing to back-patch it.
> But others might see that differently, especially since it's been
> like this for a long time and we only have one field complaint.

+1 to back-patching from me. It's true we only have one field complaint, 
but I believe there are more users impacted by this. They just didn't 
notice - we only really got this complaint because of the plan 
discrepancy between queries that are almost exactly the same.

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: nested loop semijoin estimates
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [CORE] postpone next week's release