[HACKERS] user-defined numeric data types triggering ERROR: unsupported type

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
Дата
Msg-id b68441b6-d18f-13ab-b43b-9a72188a4e02@2ndquadrant.com
обсуждение исходный текст
Ответы Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

while testing a custom data type FIXEDDECIMAL [1], implementing a
numeric-like data type with limited range, I ran into a several issues
that I suspect may not be entirely intentional / expected behavior.

[1] https://github.com/2ndQuadrant/fixeddecimal

Attached is a minimal subset of the extension SQL definition, which may
be more convenient when looking into the issue.

The most important issue is that when planning a simple query, the
estimation of range queries on a column with the custom data type fails
like this:

test=# create table t (a fixeddecimal);
CREATE TABLE
test=# insert into t select random() from generate_series(1,100000);
INSERT 0 100000
test=# analyze t;
ANALYZE
test=# select * from t where a > 0.9;
ERROR:  unsupported type: 16385

The error message here comes from convert_numeric_to_scalar, which gets
called during histogram processing (ineq_histogram_selectivity) when
approximating the histogram. convert_to_scalar does this:

switch (valuetypeid)
{
  ...
  case NUMERICOID:
  ...
    *scaledvalue = convert_numeric_to_scalar(value, valuetypid);
    *scaledlobound = convert_numeric_to_scalar(lobound, boundstypid);
    *scaledhibound = convert_numeric_to_scalar(hibound, boundstypid);
    return true;

  ...
}

The first call works fine, as the constant really is numeric
(valuetypeid=1700). But the histogram boundaries are using the custom
data type, causing the error as convert_numeric_to_scalar expects only a
bunch of hard-coded data types. So it's pretty much guaranteed to fail
with any user-defined data type.

This seems a bit unfortunate :-(

One solution would be to implement custom estimation function, replacing
scalarltsel/scalargtsel. But that seems rather unnecessary, especially
considering there is an implicit cast from fixeddecimal to numeric.
Another thing is that when there's just an MCV, the estimation works
just fine.

So I see two basic ways to fix this:

* Make convert_numeric_to_scalar smarter, so that it checks if there is
an implicit cast to numeric, and fail only if it does not find one.

* Make convert_to_scalar smarter, so that it does return false for
unexpected data types, so that ineq_histogram_selectivity uses the
default estimate of 0.5 (for that one bucket).

Both options seem more favorable than what's happening currently. Is
there anything I missed, making those fixes unacceptable?

If anything, the fact that MCV estimates work while histogram does not
makes this somewhat unpredictable - a change in the data distribution
(or perhaps even just a different sample in ANALYZE) may result in
sudden failures.


I ran into one additional strange thing while investigating this. The
attached SQL script defines two operator classes - fixeddecimal_ops and
fixeddecimal_numeric_ops, defining (fixeddecimal,fixeddecimal) and
(fixeddecimal,numeric) operators. Dropping one of those operator classes
changes the estimates in a somewhat suspicious ways.

When only fixeddecimal_ops is defined, we get this:

test=# explain select * from t where a > 0.1;
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on t  (cost=0.00..1943.00 rows=33333 width=8)
   Filter: ((a)::numeric > 0.1)
(2 rows)

That is, we get the default estimate for inequality clauses, 33%. But
when only fixeddecimal_numeric_ops, we get this:

test=# explain select * from t where a > 0.1;
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on t  (cost=0.00..1693.00 rows=50000 width=8)
   Filter: (a > 0.1)
(2 rows)

That is, we get 50% estimate, because that's what scalarineqsel uses
when it ineq_histogram_selectivity can't compute selectivity from a
histogram for some reason.

Wouldn't it make it more sense to use the default 33% estimate here?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: [HACKERS] subscription worker signalling wal writer too much
Следующее
От: Petr Jelinek
Дата:
Сообщение: Re: [HACKERS] src/test/subscription/t/002_types.pl hanging onparticular environment