Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Дата
Msg-id CAEZATCW9AXzf+1T44B0=rBd8a0AD+pWO_PpkhApqLBE9JS1Cyg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On 18 March 2018 at 23:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> Attached is an updated version of the patch series, addressing issues
> pointed out by Alvaro.

I've just been reading the new code in
statext_clauselist_selectivity() and mcv_clauselist_selectivity(), and
I'm having a hard time convincing myself that it's correct.

This code in statext_clauselist_selectivity() looks a bit odd:

    /*
     * Evaluate the MCV selectivity. See if we got a full match and the
     * minimal selectivity.
     */
    if (stat->kind == STATS_EXT_MCV)
        s1 = mcv_clauselist_selectivity(root, stat, clauses, varRelid,
                                        jointype, sjinfo, rel,
                                        &fullmatch, &mcv_lowsel);

    /*
     * If we got a full equality match on the MCV list, we're done (and the
     * estimate is likely pretty good).
     */
    if (fullmatch && (s1 > 0.0))
        return s1;

    /*
     * If it's a full match (equalities on all columns) but we haven't found
     * it in the MCV, then we limit the selectivity by frequency of the last
     * MCV item. Otherwise reset it to 1.0.
     */
    if (!fullmatch)
        mcv_lowsel = 1.0;

    return Min(s1, mcv_lowsel);

So if fullmatch is true and s1 is greater than 0, it will return s1.
If fullmatch is true and s1 equals 0, it will return Min(s1,
mcv_lowsel) which will also be s1. If fullmatch is false, mcv_lowsel
will be set to 1 and it will return Min(s1, mcv_lowsel) which will
also be s1. So it always just returns s1, no? Maybe there's no point
in computing fullmatch.

Also, wouldn't mcv_lowsel potentially be a significant overestimate
anyway? Perhaps 1 minus the sum of the MCV frequencies might be
closer, but even that ought to take into account the number of
distinct values remaining, although that information may not always be
available.

Also, just above that, in statext_clauselist_selectivity(), it
computes the list stat_clauses, then doesn't appear to use it
anywhere. I think that would have been the appropriate thing to pass
to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses
into mcv_clauselist_selectivity() will cause it to fail to find any
matches and then underestimate.

I've also come across a few incorrect/out-of-date comments:

/*
 * mcv_clauselist_selectivity
 *      Return the estimated selectivity of the given clauses using MCV list
 *      statistics, or 1.0 if no useful MCV list statistic exists.
 */

-- I can't see any code path that returns 1.0 if there are no MCV
stats. The last part of that comment is probably more appropriate to
statext_clauselist_selectivity()


/*
 * mcv_update_match_bitmap
 * [snip]
 * The function returns the number of items currently marked as 'match', and
 * ...

-- it doesn't seem to return the number of items marked as 'match'.

Then inside that function, this comment is wrong (copied from the
preceding comment):

                /* AND clauses assume nothing matches, initially */
                memset(bool_matches, STATS_MATCH_FULL, sizeof(char) *
mcvlist->nitems);

Still reading...

Regards,
Dean


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Damir Simunic
Дата:
Сообщение: Re: Proposal: http2 wire format
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Proposal: http2 wire format