Re: BUG #16324: bad cost estimates for generic query plans

Поиск
Список
Период
Сортировка
От Todd A. Cook
Тема Re: BUG #16324: bad cost estimates for generic query plans
Дата
Msg-id A8C6C425-D2AE-4695-B4E9-FDE1E999E728@synopsys.com
обсуждение исходный текст
Ответ на Re: BUG #16324: bad cost estimates for generic query plans  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On 3/27/20, 12:36 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:   
    It's not "ignoring" the cost.  What it is doing, since this is an EXISTS
    subplan, is assuming that it will fetch the first tuple and stop, the same
    as if there'd been a LIMIT 1 in the subquery.  Since the estimated number
    of rows is (wrongly) very high, that results in a low estimated cost to
    obtain the EXISTS result, specifically 6796877.67 / 14532272 or about
    0.47.  Then in reality there are *no* tuples in the result, so that the
    seqscan has to run to completion to find that out.  Ooops.
    
    So in general this is just an instance of the well-known difficulty of
    estimating costs with small LIMIT accurately.  On the other hand, since
    we know the context is EXISTS, maybe we could do better?  There's an
    argument that the user wouldn't be bothering to test EXISTS if there
    weren't a chance of a false result, hence we ought to assume that the
    subquery might need to run to completion; which would lead to taking the
    cost as being the full run cost not the estimated-time-to-first-tuple.
    On the other hand that seems like it would discourage use of fast-start
    plans for this purpose, which is probably a net loss.  On the third hand,
    it looks like we have already settled on the subplan's plan by this
    point so maybe that objection is bogus.  If you want to experiment you
    could try changing this bit in cost_subplan:
    
            if (subplan->subLinkType == EXISTS_SUBLINK)
            {
                /* we only need to fetch 1 tuple; clamp to avoid zero divide */
                sp_cost.per_tuple += plan_run_cost / clamp_row_est(plan->plan_rows);
            }

Thanks for looking at this.  I will experiment with that and report back.

As you predicted, changing the query to
    SELECT 1 FROM audit_event WHERE id > $1 AND event_name IN ($2,$3,$4,$5,$6,$7,$8,$9,$10) limit 1
results in the same behavior.

-- todd
    
    The comment above that suggests that this logic needs to match
    make_subplan, but I'm thinking we would want to intentionally make them
    not match, since the other code is what drives picking a fast-start plan
    for the subplan.
    
    You could make an argument for charging full run cost, or maybe just
    half of that as we do for ALL/ANY cases, depending on whether you
    think we should be taking worst-case estimates or not.
    
                regards, tom lane
    


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16324: bad cost estimates for generic query plans
Следующее
От: legrand legrand
Дата:
Сообщение: Re: pg_stat_statements: rows not updated for CREATE TABLE AS SELECTstatements