Re: [sqlsmith] Crash in mcv_get_match_bitmap

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [sqlsmith] Crash in mcv_get_match_bitmap
Дата
Msg-id 19208.1563032395@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [sqlsmith] Crash in mcv_get_match_bitmap  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [sqlsmith] Crash in mcv_get_match_bitmap  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:
>> On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:
>>> The right way to determine operator semantics is to look to see
>>> whether they are in a btree opclass.  That's what the rest of the
>>> planner does, and there is no good reason for the mcv code to
>>> do it some other way.

>> Hmmm, but that will mean we're unable to estimate operators that are not
>> part of a btree opclass. Which is a bit annoying, because that would also
>> affect equalities (and thus functional dependencies), in which case the
>> type may easily have just hash opclass or something.

After thinking about this more, I may have been analogizing to the wrong
code.  It's necessary to use opclass properties when we're reasoning about
operators in a way that *must* be correct, for instance to conclude that
a partition can be pruned from a query.  But this code is just doing
selectivity estimation, so the correctness standards are a lot lower.
In particular I see that the long-established range-query-detection
code in clauselist_selectivity is looking for operators that have
F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you
lifted parts of the mcv code from that, cause it looks pretty similar).
So if we've gotten away with that so far over there, there's probably
no reason not to do likewise here.

I am a little troubled by the point I made about operators possibly
wanting to have a more-specific estimator than scalarltsel, but that
seems like an issue to solve some other time; and if we do change that
logic then clauselist_selectivity needs to change too.

> Here are WIP patches addressing two of the issues:

> 1) determining operator semantics by matching them to btree opclasses

Per above, I'm sort of inclined to drop this, unless you feel better
about doing it like this than the existing way.

> 2) deconstructing OpExpr into Var/Const nodes

deconstruct_opexpr is still failing to verify that the Var is a Var.
I'd try something like

    leftop = linitial(expr->args);
    while (IsA(leftop, RelabelType))
        leftop = ((RelabelType *) leftop)->arg;
    // and similarly for rightop
    if (IsA(leftop, Var) && IsA(rightop, Const))
        // return appropriate results
    else if (IsA(leftop, Const) && IsA(rightop, Var))
        // return appropriate results
    else
        // fail

Also, I think deconstruct_opexpr is far too generic a name for what
this is actually doing.  It'd be okay as a static function name
perhaps, but not if you're going to expose it globally.

> a) I don't think unary-argument OpExpr are an issue, because this is
> checked when determining which clauses are compatible (and the code only
> allows the case with 2 arguments).

OK.

> b) Const with constisnull=true - I'm not yet sure what to do about this.
> The easiest fix would be to deem those clauses incompatible, but that
> seems a bit too harsh. The right thing is probably passing the NULL to
> the operator proc (but that means we can't use FunctionCall).

No, because most of the functions in question are strict and will just
crash on null inputs.  Perhaps you could just deem that cases involving
a null Const don't match what you're looking for.

> Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
> calling the operator is the right thing. We're using type->typcollation
> when building the stats, so maybe we should do the same thing here.

Yeah, I was wondering that too.  But really you should be using the
column's collation not the type's default collation.  See commit
5e0928005.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: refactoring - share str2*int64 functions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: POC: converting Lists into arrays