Re: pgsql: Extended statistics on expressions

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: pgsql: Extended statistics on expressions
Дата
Msg-id c61cfe32-251f-e9cb-1be1-0e15218cda17@enterprisedb.com
обсуждение исходный текст
Ответ на Re: pgsql: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: pgsql: Extended statistics on expressions  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-committers

On 3/31/21 7:54 PM, Tomas Vondra wrote:
> On 3/31/21 7:08 PM, Alvaro Herrera wrote:
>> On 2021-Mar-31, Tomas Vondra wrote:
>>
>>> Thanks for the report, I'll take a look. You're right this seems like an
>>> out-of-bounds access, but mcv_match_expression is only expected to be
>>> run on expressions we know are in the statistics (because we pick the
>>> statistics like that). Clearly, that does not happen here, not sure why.
>>>
>>> It's quite weird that we end up running textlike(), when the statistics
>>> is on (double precision, boolean) columns ...
>>
>> Uninitialized values somewhere?  Maybe valgrind would help.
>>
> 
> Unlikely, I've ran it through valgrind repeatedly, including right
> before commit (both on x86_64 and arm).
> 
> FWIW I'm unable to reproduce it, so not sure what's going on. David,
> what configure option are you using? Anything special?
> 
> 
> It's a bit strange, because statext_mcv_clauselist_selectivity should
> only estimate "matching" clauses on the statistics. So how come this
> estimates such a complex expression using textlike(), when neither of
> those columns is text?
> 
> It'd be interesting to know what's happening in the code after
> 
>   stat = choose_best_statistics(...);
> 
> i.e. what clauses it considers "compatible" with the statistics and why.
> In fact, I wouldn't have expected the statistics to be used at all.
> 

OK, I managed to reproduce/trigger the issue. The simplest query that
triggers the issue for me is this:

    SELECT t1.c0 FROM ONLY t1 WHERE
    (
      upper('x') LIKE ('x'||('[0,1]'::int4range))
      AND
      (t1.c0 IN (0, 1) OR t1.c1)
    )

I think the code matching clauses to the statistics gets a bit confused
when processing the AND clause. It extracts 2 attnums for the OR part,
but the first part should be "incompatible" with the statistics. But
after picking the statistics to apply, it gets confused and includes the
first expression (the whole LIKE clause) as compatible too.

The attached patch fixes this for me. David, can you check if this
resolves the issue for you?

I don't feel like I want to push a fix at midnight, and I'd like to
think about maybe making this part of the code a bit clearer tomorrow.
It's not very comprehensible, I'm afraid.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-committers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: pgsql: Disable force_parallel_mode in libpq_pipeline
Следующее
От: Alvaro Herrera
Дата:
Сообщение: pgsql: Initialize conn->Pfdebug to NULL when creating a connection