Обсуждение: Handling of opckeytype / CREATE OPERATOR CLASS (bug?)
Hi, while working on the new BRIN opclasses in [1], I stumbled on something I think is actually a bug in how CREATE OPERATOR CLASS handles the storage type. If you look at built-in opclasses, there's a bunch of cases where (opcintype == opckeytype), for example the BRIN opclasses for inet data type: test=# select oid, opcname, opcfamily, opcintype, opckeytype from pg_opclass where opcintype = 869 order by oid; oid | opcname | opcfamily | opcintype | opckeytype -------+-----------------------+-----------+-----------+------------ 10105 | inet_minmax_ops | 4075 | 869 | 869 10106 | inet_inclusion_ops | 4102 | 869 | 869 The fact that opckeytype is set is important, because this allows the opclass to handle data with cidr data type - there's no opclass for cidr, but we can do this: create table t (a cidr); insert into t values ('127.0.0.1'); insert into t values ('127.0.0.2'); insert into t values ('127.0.0.3'); create index on t using brin (a inet_minmax_ops); This seems to work fine. Now, if you manually update the opckeytype to 0, you'll get this: test=# update pg_opclass set opckeytype = 0 where oid = 10105; UPDATE 1 test=# create index on t using brin (a inet_minmax_ops); ERROR: missing operator 1(650,650) in opfamily 4075 Unfortunately, it turns out it's impossible to re-create this opclass using CREATE OPERATOR CLASS, because the opclasscmds.c does this: /* Just drop the spec if same as column datatype */ if (storageoid == typeoid && false) storageoid = InvalidOid; So the storageoid is reset to 0. This only works for the built-in opclasses because those are injected directly into the catalogs. Either the CREATE OPERATOR CLASS is not considering something before resetting the storageoid, or maybe it should be reset for all opclasses (including the built-in ones) and the code that's using it should restore it in those cases (AFAICS opckeytype=0 means it's the same as opcintkey). Attached is a PoC patch doing the first thing - this does the trick for me, but I'm not 100% sure it's the right fix. [1] https://www.postgresql.org/message-id/54b47e66-bd8a-d44a-2090-fd4b2f49abe6%40enterprisedb.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > while working on the new BRIN opclasses in [1], I stumbled on something > I think is actually a bug in how CREATE OPERATOR CLASS handles the > storage type. Hm. Both catalogs.sgml and pg_opclass.h say specifically that opckeytype should be zero if it's to be the same as the input column type. I don't think just dropping the enforcement of that is the right answer. I don't recall for sure what-all might depend on that. I suspect that it's mostly for the benefit of polymorphic opclasses, so maybe the right thing is to say that the opckeytype can be polymorphic if opcintype is, and then we resolve it as per the usual polymorphism rules. In any case, it's fairly suspicious that the only opclasses violating the existing rule are johnny-come-lately BRIN opclasses. regards, tom lane
On 3/23/21 3:13 AM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> while working on the new BRIN opclasses in [1], I stumbled on something >> I think is actually a bug in how CREATE OPERATOR CLASS handles the >> storage type. > > Hm. Both catalogs.sgml and pg_opclass.h say specifically that > opckeytype should be zero if it's to be the same as the input > column type. I don't think just dropping the enforcement of that > is the right answer. > Yeah, that's possible. I was mostly just demonstrating the difference in behavior. Maybe the right fix is to fix the catalog contents and then tweak the AM code, or something. > I don't recall for sure what-all might depend on that. I suspect > that it's mostly for the benefit of polymorphic opclasses, so > maybe the right thing is to say that the opckeytype can be > polymorphic if opcintype is, and then we resolve it as per > the usual polymorphism rules. > I did an experiment - fixed all the opclasses violating the rule by removing the opckeytype, and ran make checke. The only cases causing issues were cidr and int4range. Not that it proves anything. > In any case, it's fairly suspicious that the only opclasses > violating the existing rule are johnny-come-lately BRIN opclasses. > Right, that seems suspicious. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 3/23/21 3:13 AM, Tom Lane wrote: >> Hm. Both catalogs.sgml and pg_opclass.h say specifically that >> opckeytype should be zero if it's to be the same as the input >> column type. I don't think just dropping the enforcement of that >> is the right answer. > Yeah, that's possible. I was mostly just demonstrating the difference in > behavior. Maybe the right fix is to fix the catalog contents and then > tweak the AM code, or something. Digging in our git history, the rule about zero opckeytype dates to 2001 (f933766ba), which precedes our invention of polymorphic types in 2003 (somewhere around 730840c9b). So I'm pretty sure that that was a poor man's substitute for polymorphic opclasses, which we failed to clean up entirely after we got real polymorphic opclasses. Now, I'd be in favor of cleaning that up and just using standard polymorphism rules throughout. But (without having studied the code) it looks like the immediate issue is that something in the BRIN code is unfamiliar with the rule for zero opckeytype. It might be a noticeably smaller patch to fix that than to get rid of the convention about zero. regards, tom lane
On 3/23/21 6:15 AM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> On 3/23/21 3:13 AM, Tom Lane wrote: >>> Hm. Both catalogs.sgml and pg_opclass.h say specifically that >>> opckeytype should be zero if it's to be the same as the input >>> column type. I don't think just dropping the enforcement of that >>> is the right answer. > >> Yeah, that's possible. I was mostly just demonstrating the difference in >> behavior. Maybe the right fix is to fix the catalog contents and then >> tweak the AM code, or something. > > Digging in our git history, the rule about zero opckeytype dates to > 2001 (f933766ba), which precedes our invention of polymorphic types > in 2003 (somewhere around 730840c9b). So I'm pretty sure that that > was a poor man's substitute for polymorphic opclasses, which we > failed to clean up entirely after we got real polymorphic opclasses. > > Now, I'd be in favor of cleaning that up and just using standard > polymorphism rules throughout. But (without having studied the code) > it looks like the immediate issue is that something in the BRIN code is > unfamiliar with the rule for zero opckeytype. It might be a noticeably > smaller patch to fix that than to get rid of the convention about zero. > That's possible. I'm not familiar with how we deal with polymorphic opclasses etc. but I tried to look for places dealing with opckeytype, so that I can compare BRIN vs. the other AMs, but the only references seem to be in amvalidate() functions. So either the difference is not very obvious, or maybe the other AMs don't trigger this for some reason. For example btree has a separate opclass for cidr, so it does not have to use "inet_ops" as polymorphic. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 3/23/21 6:15 AM, Tom Lane wrote: >> Digging in our git history, the rule about zero opckeytype dates to >> 2001 (f933766ba), which precedes our invention of polymorphic types >> in 2003 (somewhere around 730840c9b). So I'm pretty sure that that >> was a poor man's substitute for polymorphic opclasses, which we >> failed to clean up entirely after we got real polymorphic opclasses. > That's possible. I'm not familiar with how we deal with polymorphic > opclasses etc. but I tried to look for places dealing with opckeytype, > so that I can compare BRIN vs. the other AMs, but the only references > seem to be in amvalidate() functions. > So either the difference is not very obvious, or maybe the other AMs > don't trigger this for some reason. For example btree has a separate > opclass for cidr, so it does not have to use "inet_ops" as polymorphic. I think the difference is that brin is trying to look up opclass members based on the recorded type of the index's column (not the underlying table column). I don't recall that anyplace else does that. btree for instance does some lookups based on opcintype, but I don't think it looks at the index column type anywhere. After poking at it a bit more, the convention for zero does allow us to do some things that regular polymorphism won't. As an example: test=# create table vc (id varchar(9) primary key); CREATE TABLE test=# \d+ vc_pkey Index "public.vc_pkey" Column | Type | Key? | Definition | Storage | Stats target --------+----------------------+------+------------+----------+-------------- id | character varying(9) | yes | id | extended | primary key, btree, for table "public.vc" If btree text_ops had opckeytype = 'text' then this index column would show as just "text", which while not fatal seems like a loss of information. So I'm coming around to the idea that opckeytype = opcintype and opckeytype = 0 are valid but distinct situations, and CREATE OPCLASS indeed ought not smash one to the other. But we'd better poke around at the documentation, pg_dump, etc and make sure everything plays nice with that. regards, tom lane