Re: nested loop semijoin estimates

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: nested loop semijoin estimates
Дата
Msg-id 5569F09C.7090204@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: nested loop semijoin estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On 05/30/15 03:52, Tomas Vondra wrote:
>
>
> On 05/30/15 01:20, Tomas Vondra wrote:
>>
>> Notice the cost - it's way lover than the previous plan (9.2 vs
>> ~111k), yet this plan was not chosen. So either the change broke
>> something (e.g. by violating some optimizer assumption), or maybe
>> there's a bug somewhere else ...
>
> After a bit more investigation, what I think is happening here is
> add_path() does not realize this is a SEMI join and a single tuple is
> enough, and discards the simple indexscan path in favor of the bitmap
> index scan as that seems cheaper when scanning everything.
>
> So not really a bug, but maybe 'consider_startup' would help?

A bit more info - the reason why the IndexPath gets discarded (in favor
of BitmapHeapScan) seems to be twofold:

(a) Semijoin does no imply 'consider_startup' for the inner path, so it
     gets discarded by the BitmapHeapScan, as it has lower total cost.

     I'm not sure where exactly the consider_startup should be set, so
     I've forced that at the beginning on add_path() to see what happens.
     Sadly, this alone is not sufficient, because of the next point.

(b) To actually use startup costs in compare_path_costs_fuzzily, there
     is an additional condition about handling parameterized paths, as
     explained in this comment:

     * This function also includes special hacks to support a policy
     * enforced by its sole caller, add_path(): paths that have any
     * parameterization cannot win comparisons on the grounds of having
     * cheaper startup cost, since we deem only total cost to be of
     * interest for a parameterized path. (Unparameterized paths are
     * more common, so we check for this case last.)

     so the startup_cost comparisons look like this:

     if (consider_startup &&
         path2->startup_cost > path1->startup_cost * fuzz_factor &&
         path1->param_info == NULL)

     Of course, path1 (IndexPath) actually has parameters, which makes
     the whole condition false, and the IndexPath gets discarded :-(

     ISTM the reported test case seems like a counter example to the
     policy, so maybe this should really be relaxed somehow.

To see what happens, I tweaked both (a) and (b) by forcing

     consider_startup = true

at the beginning of add_path(), and removing the (param_info==NULL)
conditions (patch attached). It also includes the run_cost tweak
mentioned in the first message.

I'm not claiming this is the correct fix, the only goal was to see what
happens. And it really does produce the 'right' plan:

                              QUERY PLAN
-----------------------------------------------------------------------
  Nested Loop Semi Join  (cost=0.99..9.20 rows=1 width=74)
    ->  Index Scan using term_facttablename_columnname_idx on term t
          (cost=0.55..8.57 rows=1 width=74)
          Index Cond: (((facttablename)::text =
               'facttable_stat_fta4'::text) AND ((columnname)::text =
               'berechnungsart'::text))
    ->  Index Only Scan using facttable_stat_fta4_berechnungsart_idx on
        facttable_stat_fta4 f (cost=0.43..310525.02 rows=4870496 width=2)
          Index Cond: (berechnungsart = (t.term)::text)
(5 rows)


If I get the comments in pathnode.c about the startup_cost policy, it's
"only" an optimization to reduce the number of parameterized paths, so I
believe this does not break anything. Also, it's no fun if the planning
is a tad faster, but the query runtime is 1000x higher.

Ideas how to relax the policy without significant impact on optimizer?

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: [Proposal] More Vacuum Statistics
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [CORE] postpone next week's release