Обсуждение: REINDEX filtering in the backend

Поиск
Список
Период
Сортировка

REINDEX filtering in the backend

От
Julien Rouhaud
Дата:
Hi,

I had some spare time tonight so I started a prototype to allow
filtering the indexes that are processed using the REINDEX command, as
Peter suggested in the parallel reindexdb thread [1].

I didn't want to spend too much time enjoying bison and adding new
unreserved keywords, so for now I just implemented this syntax to
start a discussion for this feature in the next commitfest:

REINDEX ( FILTER = COLLATION ) [...]

The FILTER clause can be used multiple times, each one is OR-ed with
the ReindexStmt's option, so we could easily add a LIBC, ICU and other
filters, also making COLLATION (or more realistically a better new
keyword) an alias for (LIBC | ICU) for instance.

The filtering is done at table level (with and without the
concurrently option), so SCHEMA, DATABASE and SYSTEM automatically
benefit from it.  If this clause is used with a REINDEX INDEX, the
statement errors out, as I don't see a valid use case for providing a
single index name and asking to possibly filter it at the same time.

Under the hood, the filtering is for now done in a single function by
appending elements, not removing them.  An empty oid list is created,
all indexes belonging to the underlying relation are processed by the
specific filter(s), and any index that fails to be discarded by at
least one filter, even partially, is added to the final list.

I also added some minimal documentation and regression tests.  I'll
add this patch to the next commitfest.

[1] https://www.postgresql.org/message-id/7140716c-679e-a0b9-a273-b201329d8891%402ndquadrant.com

Вложения

Re: REINDEX filtering in the backend

От
Michael Paquier
Дата:
On Thu, Jul 11, 2019 at 11:14:20PM +0200, Julien Rouhaud wrote:
> I didn't want to spend too much time enjoying bison and adding new
> unreserved keywords, so for now I just implemented this syntax to
> start a discussion for this feature in the next commitfest:
>
> REINDEX ( FILTER = COLLATION ) [...]
>
> The FILTER clause can be used multiple times, each one is OR-ed with
> the ReindexStmt's option, so we could easily add a LIBC, ICU and other
> filters, also making COLLATION (or more realistically a better new
> keyword) an alias for (LIBC | ICU) for instance.

I would prefer keeping the interface simple with only COLLATION, so as
only collation sensitive indexes should be updated, including icu and
libc ones.  Or would it be useful to have the filtering for both as
libicu can break things similarly to glibc in an update still a
breakage on one or the other would not happen at the same time?  I
don't know enough of libicu regarding that, eager to learn.  In which
case, wouldn't it be better to support that from the start?

> The filtering is done at table level (with and without the
> concurrently option), so SCHEMA, DATABASE and SYSTEM automatically
> benefit from it.  If this clause is used with a REINDEX INDEX, the
> statement errors out, as I don't see a valid use case for providing a
> single index name and asking to possibly filter it at the same time.

Supporting that case would not be that much amount of work, no?

> I also added some minimal documentation and regression tests.  I'll
> add this patch to the next commitfest.
>
> [1] https://www.postgresql.org/message-id/7140716c-679e-a0b9-a273-b201329d8891%402ndquadrant.com

+    if ((stmt->options & REINDEXOPT_ALL_FILTERS) != 0)
+        elog(ERROR, "FILTER clause is not compatible with REINDEX INDEX");
[...]
+      discard indexes whose ordering does not depend on a collation. Note that
+      the FILTER option is not compatible with <command>REINDEX
+      SCHEMA</command>.

Why do you have both limitations?  I think that it would be nice to be
able to do both, generating an error for REINDEX INDEX if the index
specified is not compatible, and a LOG if the index is not filtered
out when a list is processed.  Please note that elog() cannot be used
for user-facing failures, only for internal ones.

+REINDEX (VERBOSE, FILTER = COLLATION) TABLE reindex_verbose;
+-- One column, not depending on a collation
In order to make sure that a reindex has been done for a given entry
with the filtering, an idea is to save the relfilenode before the
REINDEX and check it afterwards.  That would be nice to make sure that
only the wanted indexes are processed, but it is not possible to be
sure of that based on your tests, and some tests should be done on
relations which have collation-sensitive as well as
collation-insensitive indexes.

+       index = index_open(indexOid, AccessShareLock);
+       numAtts = index->rd_index->indnatts;
+       index_close(index, AccessShareLock);
Wouldn't it be better to close that after doing the scan?

Nit: I am pretty sure that this should be indented.

Could you add tests with REINDEX CONCURRENTLY?
--
Michael

Вложения

Re: REINDEX filtering in the backend

От
Michael Paquier
Дата:
On Wed, Aug 28, 2019 at 02:02:08PM +0900, Michael Paquier wrote:
> +       index = index_open(indexOid, AccessShareLock);
> +       numAtts = index->rd_index->indnatts;
> +       index_close(index, AccessShareLock);
> Wouldn't it be better to close that after doing the scan?
>
> Nit: I am pretty sure that this should be indented.
>
> Could you add tests with REINDEX CONCURRENTLY?

Bonus: support for reindexdb should be added.  Let's not forget about
it.
--
Michael

Вложения

Re: REINDEX filtering in the backend

От
Julien Rouhaud
Дата:
On Wed, Aug 28, 2019 at 7:03 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jul 11, 2019 at 11:14:20PM +0200, Julien Rouhaud wrote:
> > I didn't want to spend too much time enjoying bison and adding new
> > unreserved keywords, so for now I just implemented this syntax to
> > start a discussion for this feature in the next commitfest:
> >
> > REINDEX ( FILTER = COLLATION ) [...]
> >
> > The FILTER clause can be used multiple times, each one is OR-ed with
> > the ReindexStmt's option, so we could easily add a LIBC, ICU and other
> > filters, also making COLLATION (or more realistically a better new
> > keyword) an alias for (LIBC | ICU) for instance.
>
> I would prefer keeping the interface simple with only COLLATION, so as
> only collation sensitive indexes should be updated, including icu and
> libc ones.  Or would it be useful to have the filtering for both as
> libicu can break things similarly to glibc in an update still a
> breakage on one or the other would not happen at the same time?  I
> don't know enough of libicu regarding that, eager to learn.  In which
> case, wouldn't it be better to support that from the start?

I'm not sure either.  Another thing would be to add extra syntax to be
able to discard even more indexes.  For instance we could store the
version of the underlying lib used when the index is (re)created, and
do something like
REINDEX (FILTER = LIBC!=2.28) or REINDEX (FILTER = LIBC==2.27) or
something similar.

> > The filtering is done at table level (with and without the
> > concurrently option), so SCHEMA, DATABASE and SYSTEM automatically
> > benefit from it.  If this clause is used with a REINDEX INDEX, the
> > statement errors out, as I don't see a valid use case for providing a
> > single index name and asking to possibly filter it at the same time.
>
> Supporting that case would not be that much amount of work, no?

Probably not, but I'm dubious about the use case.  Adding the lib
version in the catalog would be more useful for people who want to
write their own rules to reindex specific set of indexes.

> > I also added some minimal documentation and regression tests.  I'll
> > add this patch to the next commitfest.
> >
> > [1] https://www.postgresql.org/message-id/7140716c-679e-a0b9-a273-b201329d8891%402ndquadrant.com
>
> +    if ((stmt->options & REINDEXOPT_ALL_FILTERS) != 0)
> +        elog(ERROR, "FILTER clause is not compatible with REINDEX INDEX");
> [...]
> +      discard indexes whose ordering does not depend on a collation. Note that
> +      the FILTER option is not compatible with <command>REINDEX
> +      SCHEMA</command>.
>
> Why do you have both limitations?

That's actually a typo, the documentation should have specified that
it's not compatible with REINDEX INDEX, not REINDEX SCHEMA, I'll fix.

>  I think that it would be nice to be
> able to do both, generating an error for REINDEX INDEX if the index
> specified is not compatible, and a LOG if the index is not filtered
> out when a list is processed.

Do you mean having an error if the index does not contain any
collation based type?  Also, REINDEX only accept a single name, so
there shouldn't be any list processing for REINDEX INDEX?  I'm not
really in favour of adding extra code the filtering when user asks for
a specific index name to be reindexed.

>  Please note that elog() cannot be used
> for user-facing failures, only for internal ones.

Indeed, I'll change with an ereport and ERRCODE_FEATURE_NOT_SUPPORTED.

>
> +REINDEX (VERBOSE, FILTER = COLLATION) TABLE reindex_verbose;
> +-- One column, not depending on a collation
> In order to make sure that a reindex has been done for a given entry
> with the filtering, an idea is to save the relfilenode before the
> REINDEX and check it afterwards.  That would be nice to make sure that
> only the wanted indexes are processed, but it is not possible to be
> sure of that based on your tests, and some tests should be done on
> relations which have collation-sensitive as well as
> collation-insensitive indexes.

That's what I did when I first submitted the feature in reindexdb.  I
didn't use them because it means switching to TAP tests.  I can drop
the simple regression test (especially since I now realize than one is
quite broken) and use the TAP one if, or should I keep both?

>
> +       index = index_open(indexOid, AccessShareLock);
> +       numAtts = index->rd_index->indnatts;
> +       index_close(index, AccessShareLock);
> Wouldn't it be better to close that after doing the scan?

Yes indeed.

> Could you add tests with REINDEX CONCURRENTLY?

Sure!

> Bonus: support for reindexdb should be added.  Let's not forget about
> it.

Yep.  That was a first prototype to see if this approach is ok.  I'll
add more tests, run pgindent and reindexdb support if this approach is
sensible.



Re: REINDEX filtering in the backend

От
Michael Paquier
Дата:
On Wed, Aug 28, 2019 at 10:22:07AM +0200, Julien Rouhaud wrote:
>>> The filtering is done at table level (with and without the
>>> concurrently option), so SCHEMA, DATABASE and SYSTEM automatically
>>> benefit from it.  If this clause is used with a REINDEX INDEX, the
>>> statement errors out, as I don't see a valid use case for providing a
>>> single index name and asking to possibly filter it at the same time.
>>
>> Supporting that case would not be that much amount of work, no?
>
> Probably not, but I'm dubious about the use case.  Adding the lib
> version in the catalog would be more useful for people who want to
> write their own rules to reindex specific set of indexes.

Hearing from others here would be helpful.  My take is that having a
simple option doing the filtering, without the need to store anything
in the catalogs, would be helpful enough for users mixing both index
types on a single table.  Others may not agree.

>>  I think that it would be nice to be
>> able to do both, generating an error for REINDEX INDEX if the index
>> specified is not compatible, and a LOG if the index is not filtered
>> out when a list is processed.
>
> Do you mean having an error if the index does not contain any
> collation based type?  Also, REINDEX only accept a single name, so
> there shouldn't be any list processing for REINDEX INDEX?  I'm not
> really in favour of adding extra code the filtering when user asks for
> a specific index name to be reindexed.

I was referring to adding an error if trying to reindex an index with
the filtering enabled.  That's useful to inform the user that what he
intends to do is not compatible with the options provided.

> That's what I did when I first submitted the feature in reindexdb.  I
> didn't use them because it means switching to TAP tests.  I can drop
> the simple regression test (especially since I now realize than one is
> quite broken) and use the TAP one if, or should I keep both?

There is no need for TAP I think.  You could for example store the
relid and its relfilenode in a temporary table before running the
reindex, run the REINDEX and then compare with what pg_class stores.
And that's much cheaper than setting a new instance for a TAP test.
--
Michael

Вложения

Re: REINDEX filtering in the backend

От
Julien Rouhaud
Дата:
On Thu, Aug 29, 2019 at 2:09 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Aug 28, 2019 at 10:22:07AM +0200, Julien Rouhaud wrote:
> >>> The filtering is done at table level (with and without the
> >>> concurrently option), so SCHEMA, DATABASE and SYSTEM automatically
> >>> benefit from it.  If this clause is used with a REINDEX INDEX, the
> >>> statement errors out, as I don't see a valid use case for providing a
> >>> single index name and asking to possibly filter it at the same time.
> >>
> >> Supporting that case would not be that much amount of work, no?
> >
> > Probably not, but I'm dubious about the use case.  Adding the lib
> > version in the catalog would be more useful for people who want to
> > write their own rules to reindex specific set of indexes.
>
> Hearing from others here would be helpful.  My take is that having a
> simple option doing the filtering, without the need to store anything
> in the catalogs, would be helpful enough for users mixing both index
> types on a single table.  Others may not agree.

That was already suggested by Thomas and seconded by Peter E., see
https://www.postgresql.org/message-id/2b1504ac-3d6c-11ec-e1ce-3daf132b3d37%402ndquadrant.com.

I personally think that it's a sensible approach, and I'll be happy to
propose a patch for that too if no one worked on this yet.

> >>  I think that it would be nice to be
> >> able to do both, generating an error for REINDEX INDEX if the index
> >> specified is not compatible, and a LOG if the index is not filtered
> >> out when a list is processed.
> >
> > Do you mean having an error if the index does not contain any
> > collation based type?  Also, REINDEX only accept a single name, so
> > there shouldn't be any list processing for REINDEX INDEX?  I'm not
> > really in favour of adding extra code the filtering when user asks for
> > a specific index name to be reindexed.
>
> I was referring to adding an error if trying to reindex an index with
> the filtering enabled.  That's useful to inform the user that what he
> intends to do is not compatible with the options provided.

Ok, I can add it if needed.

> > That's what I did when I first submitted the feature in reindexdb.  I
> > didn't use them because it means switching to TAP tests.  I can drop
> > the simple regression test (especially since I now realize than one is
> > quite broken) and use the TAP one if, or should I keep both?
>
> There is no need for TAP I think.  You could for example store the
> relid and its relfilenode in a temporary table before running the
> reindex, run the REINDEX and then compare with what pg_class stores.
> And that's much cheaper than setting a new instance for a TAP test.

Oh indeed, good point!  I'll work on better tests using this approach then.



Re: REINDEX filtering in the backend

От
Michael Paquier
Дата:
On Thu, Aug 29, 2019 at 10:52:55AM +0200, Julien Rouhaud wrote:
> That was already suggested by Thomas and seconded by Peter E., see
> https://www.postgresql.org/message-id/2b1504ac-3d6c-11ec-e1ce-3daf132b3d37%402ndquadrant.com.
>
> I personally think that it's a sensible approach, and I'll be happy to
> propose a patch for that too if no one worked on this yet.

That may be interesting to sort out first then because we'd likely
want to know what is first in the catalogs before designing the
filtering processing looking at it, no?
--
Michael

Вложения

Re: REINDEX filtering in the backend

От
Peter Eisentraut
Дата:
On 2019-08-30 02:10, Michael Paquier wrote:
> On Thu, Aug 29, 2019 at 10:52:55AM +0200, Julien Rouhaud wrote:
>> That was already suggested by Thomas and seconded by Peter E., see
>> https://www.postgresql.org/message-id/2b1504ac-3d6c-11ec-e1ce-3daf132b3d37%402ndquadrant.com.
>>
>> I personally think that it's a sensible approach, and I'll be happy to
>> propose a patch for that too if no one worked on this yet.
> 
> That may be interesting to sort out first then because we'd likely
> want to know what is first in the catalogs before designing the
> filtering processing looking at it, no?

Right.  We should aim to get per-object collation version tracking done.
 And then we might want to have a REINDEX variant that processes exactly
those indexes with an out-of-date version number -- and then updates
that version number once the reindexing is done.  I think that project
is achievable for PG13.

I think we can keep this present patch in our back pocket for, say, the
last commit fest if we don't make sufficient progress on those other
things.  Right now, it's hard to review because the target is moving,
and it's unclear what guidance to give users.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: REINDEX filtering in the backend

От
Michael Paquier
Дата:
On Wed, Sep 04, 2019 at 01:54:53PM +0200, Peter Eisentraut wrote:
> Right.  We should aim to get per-object collation version tracking done.
>  And then we might want to have a REINDEX variant that processes exactly
> those indexes with an out-of-date version number -- and then updates
> that version number once the reindexing is done.  I think that project
> is achievable for PG13.
>
> I think we can keep this present patch in our back pocket for, say, the
> last commit fest if we don't make sufficient progress on those other
> things.  Right now, it's hard to review because the target is moving,
> and it's unclear what guidance to give users.

Okay, agreed.  I have marked the patch as returned with feedback
then.
--
Michael

Вложения