Обсуждение: BUG #16586: deduplicate_items=true can be configured for numeric indexes

Поиск
Список
Период
Сортировка

BUG #16586: deduplicate_items=true can be configured for numeric indexes

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16586
Logged by:          Matthias van de Meent
Email address:      matthias.vandemeent@cofano.nl
PostgreSQL version: 13beta3
Operating system:   Debian Stretch (9.13)
Description:

> CREATE INDEX numerical_index ON table USING btree ((num::numeric)) WITH
(deduplicate_items=true);
CREATE INDEX
> \d+ numerical_index
                    Index "public.numerical_index"
 Column |  Type   | Key? | Definition | Storage | Stats target 
--------+---------+------+------------+---------+--------------
 num    | numeric | yes  | num        | main    | 
btree, for table "public.table"
Options: deduplicate_items=true

There is no error for specifying the "deduplicate_items" -flag. As
deduplication is not supported for indexes with numeric type, I expected the
index creation statement to error.


Re: BUG #16586: deduplicate_items=true can be configured for numeric indexes

От
Peter Geoghegan
Дата:
On Thu, Aug 20, 2020 at 4:52 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> There is no error for specifying the "deduplicate_items" -flag. As
> deduplication is not supported for indexes with numeric type, I expected the
> index creation statement to error.

I don't think that there should be an error. While the
"equalimage"-ness of an operator class (such as btree/numeric_ops) is
in theory static, in practice it could change in either direction. For
example, it's possible (though very unlikely) that somebody will make
the mistake of marking an operator class as equalimage/dedup safe when
they shouldn't have. If this actually happens, a REINDEX shouldn't
raise errors with the same spelling of REINDEX that worked the first
time (e.g. when restoring a dump).

The deduplicate_items storage parameter is kind of an advisory thing.
Deduplication is always applied selectively in unique indexes, even
though it might be slightly better to do so consistently with some
workloads. Also, it's possible that we'll find a way to make some of
the operator classes (though not btree/numeric_ops) deduplication safe
in the future. For example, we could teach container types to report
their "equalimage"-ness by invoking the underlying support function of
contained types. So you could use deduplication with a composite type,
provided it didn't contain unsafe scalar types like numeric.

In general I don't expect that users will consciously think about
deduplication very often -- it's supposed to have very little overhead
in cases that don't benefit, so it will probably fade into the
background even in installations where it provides a lot of benefit. I
don't expect many users will want to make sure that it's enabled in
one index but definitely not enabled in another.

With all of that said, it would be nice if I could raise a NOTICE or
even a WARNING here if and only if the user spelled out
"deduplicate_items = on". Hard to see how to do that with the current
design of reloptions, though, unless it's okay to show it even when
"deduplicate_items = on" was not specifically provided (I don't think
that it's okay). An index access method (such as nbtree) can tell
whether or not all storage params should come from the defaults by
checking if the rel's rd_options is NULL or not, but that's not the
same thing -- it'll be set when fillfactor was explicitly set, for
example.

--
Peter Geoghegan



Re: BUG #16586: deduplicate_items=true can be configured for numeric indexes

От
Matthias van de Meent
Дата:
On Sat, 22 Aug 2020 at 00:49, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Aug 20, 2020 at 4:52 AM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> > There is no error for specifying the "deduplicate_items" -flag. As
> > deduplication is not supported for indexes with numeric type, I expected the
> > index creation statement to error.
>
> I don't think that there should be an error. While the
> "equalimage"-ness of an operator class (such as btree/numeric_ops) is
> in theory static, in practice it could change in either direction. For
> example, it's possible (though very unlikely) that somebody will make
> the mistake of marking an operator class as equalimage/dedup safe when
> they shouldn't have. If this actually happens, a REINDEX shouldn't
> raise errors with the same spelling of REINDEX that worked the first
> time (e.g. when restoring a dump).
>
> The deduplicate_items storage parameter is kind of an advisory thing.

The current documentation is quite unclear about that, as the flag
itself is documented as "Controls usage of the B-tree deduplication
technique described in Section 63.4.2.". A note "Even when configured,
the feature will not be used if it does not pass the limitations as
described in section 63.4.2" would help in preventing confusion.

> Deduplication is always applied selectively in unique indexes, even
> though it might be slightly better to do so consistently with some
> workloads. Also, it's possible that we'll find a way to make some of
> the operator classes (though not btree/numeric_ops) deduplication safe
> in the future. For example, we could teach container types to report
> their "equalimage"-ness by invoking the underlying support function of
> contained types. So you could use deduplication with a composite type,
> provided it didn't contain unsafe scalar types like numeric.
>
> In general I don't expect that users will consciously think about
> deduplication very often -- it's supposed to have very little overhead
> in cases that don't benefit, so it will probably fade into the
> background even in installations where it provides a lot of benefit. I
> don't expect many users will want to make sure that it's enabled in
> one index but definitely not enabled in another.
>
> With all of that said, it would be nice if I could raise a NOTICE or
> even a WARNING here if and only if the user spelled out
> "deduplicate_items = on". Hard to see how to do that with the current
> design of reloptions, though, unless it's okay to show it even when
> "deduplicate_items = on" was not specifically provided (I don't think
> that it's okay). An index access method (such as nbtree) can tell
> whether or not all storage params should come from the defaults by
> checking if the rel's rd_options is NULL or not, but that's not the
> same thing -- it'll be set when fillfactor was explicitly set, for
> example.

Thanks for the reply, it was very insightful.

- Matthias


> --
> Peter Geoghegan