Обсуждение: Expr. extended stats are skipped with equality operator

Поиск
Список
Период
Сортировка

Expr. extended stats are skipped with equality operator

От
Danny Shemesh
Дата:
Hey all !

I'm on a quest to help the planner (on pg14) use the best of several partial, expressional indices we have on some large tables (few TBs in size, billions of records).

As we know, stats for expressions in partial indices aren't gathered by default - so I'm tinkering with expressional extended stats to cover for those.

I've tackled two interesting points there:
1. Seems like expressional stats involving the equality operator are skipped or mismatched (fiddle)
Let's take the following naive example:
create table t1 (x integer[]);
insert into t1 select array[1]::integer[] from generate_series(1, 100000, 1);
create statistics s1 on (x[1] = 1) from t1;
analyze t1;
explain analyze select * from t1 where x[1] = 1;
> Seq Scan on t1 (cost=0.00..1986.00 rows=500 width=29) (actual time=0.009..36.035 rows=100000 loops=1)

Now, of course one can just create the stat on x[1] directly in this case, but I have a more complex use case where an equality operator is beneficial; 
should the above case be supported ? feels like I'm just missing something fundamental.

2. Less important, just a minor note - feel free to ignore - although the eq. operator above seems to be skipped when matching the ext. stats, I can work around this by using a CASE expression (fiddle);
Building on the above example, we can:
create statistics s2 on (case x[1] when 1 then true else false end) from t1;
explain analyze select * from t1 where (case x[1] when 1 then true else false end
>  Seq Scan on t1 (cost=0.00..1986.00 rows=100000 width=25) (actual time=0.011..33.721 rows=100000 loops=1)

What's a bit problematic here, though, is that if we mix other dependent columns to the extended stat, and specifically if we create an mcv, 
queries involving the CASE expression throw with `error: unknown clause type 130`, where clause type == T_CaseExpr.

The second point for me would be that I've found it a bit non intuitive that creating an extended statistic can fail queries at query time; it makes sense that the mcv wouldn't work for case expressions, but it might've been a bit clearer to:

a. Fail this at statistic creation time, potentially, or 
b. Convert the type numeric in the above error to its text representation, if we can extract it out at runtime somehow - 
I couldn't find a mapping of clause type numerics to their names, and as the node tags are generated at compile time, it could be build-dependent and a bit hard to track down if one doesn't control the build flags


Thanks a ton for your help - appreciate your time,
Danny

Re: Expr. extended stats are skipped with equality operator

От
Tomas Vondra
Дата:

On 8/5/22 11:50, Danny Shemesh wrote:
> Hey all !
> 
> I'm on a quest to help the planner (on pg14) use the best of several
> partial, expressional indices we have on some large tables (few TBs in
> size, billions of records).
> 
> As we know, stats for expressions in partial indices aren't gathered by
> default - so I'm tinkering with expressional extended stats to cover for
> those.
> 
> I've tackled two interesting points there:
> 1. Seems like expressional stats involving the equality operator are
> skipped or mismatched (fiddle
> <https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/5379>)
> Let's take the following naive example:
> /create table t1 (x integer[]);
> insert into t1 select array[1]::integer[] from generate_series(1,
> 100000, 1);
> create statistics s1 on (x[1] = 1) from t1;
> analyze t1;
> /
> /explain analyze select * from t1 where x[1] = 1;/
> /> Seq Scan on t1 (cost=0.00..1986.00 rows=500 width=29) (actual
> time=0.009..36.035 rows=100000 loops=1)/
> /
> /
> Now, of course one can just create the stat on x[1] directly in this
> case, but I have a more complex use case where an equality operator is
> beneficial; 
> should the above case be supported ? feels like I'm just missing
> something fundamental.
> 

Hmmm. The problem here is that the expression may be interpreted either
as an operator clause (A=B), or as a boolean expression ((A=B)=TRUE). In
principle we might check which option matches extended statistics, but
that's not for free :-(

So what the current code does is trying to match the more specific cases
first, leaving the "bool expression" as a last resort. That seems like a
reasonable trade off to me, but I'll think about this a bit more.

There are probably other ways to match expressions, and we can't
possibly explore all of them. For example you may create statistics on
(x=1 AND y=2) and I doubt we'll match that, because we'll try matching
individual clauses not some arbitrary combinations of clauses. (Maybe we
shouldn't even allow creating such statistics ...)

> 2. Less important, just a minor note - feel free to ignore - although
> the eq. operator above seems to be skipped when matching the ext. stats,
> I can work around this by using a CASE expression (fiddle
> <https://www.db-fiddle.com/f/wJZNH1rNwJSo3D5aByQiWX/1>);
> Building on the above example, we can:
> /create statistics s2 on (case x[1] when 1 then true else false end)
> from t1;/
> /explain analyze select * from t1 where (case x[1] when 1 then true else
> false end/
> />  Seq Scan on t1 (cost=0.00..1986.00 rows=100000 width=25) (actual
> time=0.011..33.721 rows=100000 loops=1)/
> /

Yes, because this end ups not being matches as opclause, and therefore
goes all the way to the boolvarsel() in clause_selectivity_ext.

> /
> What's a bit problematic here, though, is that if we mix other dependent
> columns to the extended stat, and specifically if we create an mcv, 
> queries involving the CASE expression throw with `error: unknown clause
> type 130`, where clause type == T_CaseExpr.
> 
> The second point for me would be that I've found it a bit non intuitive
> that creating an extended statistic can fail queries at query time; it
> makes sense that the mcv wouldn't work for case expressions, but it
> might've been a bit clearer to:
> 
> a. Fail this at statistic creation time, potentially, or 
> b. Convert the type numeric in the above error to its text
> representation, if we can extract it out at runtime somehow - 
> I couldn't find a mapping of clause type numerics to their names, and as
> the node tags are generated at compile time, it could be build-dependent
> and a bit hard to track down if one doesn't control the build flags
> 

Yeah, this seems like a clear bug - we should not fail queries like
this. It's a sign statext_is_compatible_clause() and the MCV code
disagrees which clauses are compatible.

Can you share an example triggering this?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Expr. extended stats are skipped with equality operator

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> Yeah, this seems like a clear bug - we should not fail queries like
> this. It's a sign statext_is_compatible_clause() and the MCV code
> disagrees which clauses are compatible.

Indeed.  I attempted to fix that at e33ae53dd.

            regards, tom lane



Re: Expr. extended stats are skipped with equality operator

От
Tomas Vondra
Дата:
On 8/7/22 19:28, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>> Yeah, this seems like a clear bug - we should not fail queries like
>> this. It's a sign statext_is_compatible_clause() and the MCV code
>> disagrees which clauses are compatible.
> 
> Indeed.  I attempted to fix that at e33ae53dd.
> 

Thanks! That probably explains why I've been unable to reproduce that, I
haven't realized there's a fix already.



-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company