Обсуждение: Odd estimation issue with user-defined type

Поиск
Список
Период
Сортировка

Odd estimation issue with user-defined type

От
Florian Weimer
Дата:
I've created a few user-defined types quite similar to uuid which we
use to store various hashes in the database.  (The types use binary
encoding internally, but only expose hexadecimal strings externally.)

The hashes are roughly equidistributed, so when I do a range query
which is essentially based on a hash prefix(*), I expect the result to
contain N * 2**(-k) results, where N is the table size and k the
number of bits in the range.  Actual query results show that this is
the case.  The odd thing is that the planner thinks that the range
query will return about one quarter of the table, independently of the
range specified.  Of course, the row estimates are quite far off as a
result, leading to suboptimal plans.

Any idea what could cause this?  Do I need to provide some estimator
function somewhere?

(*) I don't use LIKE, because its optimization is hard-coded to a few
    types, but explicit BETWEEN ... AND queries.
--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: Odd estimation issue with user-defined type

От
Tom Lane
Дата:
Florian Weimer <fweimer@bfk.de> writes:
> I've created a few user-defined types quite similar to uuid which we
> use to store various hashes in the database.  (The types use binary
> encoding internally, but only expose hexadecimal strings externally.)

> The hashes are roughly equidistributed, so when I do a range query
> which is essentially based on a hash prefix(*), I expect the result to
> contain N * 2**(-k) results, where N is the table size and k the
> number of bits in the range.  Actual query results show that this is
> the case.  The odd thing is that the planner thinks that the range
> query will return about one quarter of the table, independently of the
> range specified.  Of course, the row estimates are quite far off as a
> result, leading to suboptimal plans.

> Any idea what could cause this?  Do I need to provide some estimator
> function somewhere?

If you haven't, then how would you expect the planner to know that?

Less flippantly, you really need to tell us exactly what planner support
you did provide, before you can expect any intelligent comment.  Has the
type got a default btree opclass?  What selectivity estimators did you
attach to the comparison operators?  Do you get MCV and/or histogram
entries in pg_stats when you ANALYZE one of these columns, and if so
do they look sane?

            regards, tom lane

Re: Odd estimation issue with user-defined type

От
Florian Weimer
Дата:
* Tom Lane:

>> Any idea what could cause this?  Do I need to provide some estimator
>> function somewhere?
>
> If you haven't, then how would you expect the planner to know that?

Perhaps it's psychic, or there is some trick I don't know about? 8-)

> Less flippantly, you really need to tell us exactly what planner support
> you did provide, before you can expect any intelligent comment.  Has the
> type got a default btree opclass?

Yes, I think so (because of CREATE OPERATOR CLASS ... USING btree).

> What selectivity estimators did you attach to the comparison
> operators?

Ah, I see, I probably need to provide a RESTRICT clause in the
operator definition.  That should do the trick, and should be fairly
easy to implement in this case.

Sorry, I just missed this piece of information in the documentation, I
should have read it more carefully.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99