Bogus dependency calculation for expressions involving casts

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Bogus dependency calculation for expressions involving casts
Дата
Msg-id 17384.1529017626@sss.pgh.pa.us
обсуждение исходный текст
Список pgsql-hackers
I've identified the underlying cause of the misbehavior reported in
https://www.postgresql.org/message-id/7cb957dd12774a52ac8a680b73910c5b@imaginesoftware.com
and it's a bit embarrassing: we're doing dependency logging for casts
all wrong.  A cast expression produces a FuncExpr parse node, which
find_expr_references_walker happily logs as a dependency on the function
implementing the cast, rather than the cast per se.  Therefore pg_dump
has no idea that it needs to dump the CREATE CAST before it can dump
expressions relying on the availability of the cast notation.

It's a bit astonishing that this flaw hasn't been identified before now.
I think probably the explanation is that pg_dump's default dump order
(cf. pg_dump_sort.c) puts out casts before most object types that could
involve an expression depending on a cast, so that we accidentally emit
things in a safe order anyway.  But types --- in particular, default
expressions for domain types --- are an exception to that rule, and in
any case dump-order rearrangements made to satisfy other dependencies
could result in a failure.

I'm afraid that this might not be easy to fix.  The straightforward
approach would be to expand FuncExpr (and also RelabelType, CoerceViaIO,
ArrayCoerceExpr, RowExpr, and ConvertRowtypeExpr -- everything with a
CoercionForm field) to include the OID of the cast if the node came from
cast syntax, and then teach find_expr_references_walker to report the
dependency as being a dependency on the cast not the cast's implementation
function(s).  However, quite aside from the invasiveness and
non-back-patch-ability of this approach, I'm not even sure it totally
fixes the problem:

1. There are cases where we intentionally omit a RelabelType so that
the expression will be seen to have its original type.  It's possible
that this is all right because all such cases correspond to, essentially,
hard-wired polymorphism behavior rather than droppable casts.  But I'm not
quite sure that that's true.

2. As noted in the comments for enum CoercionForm, we'd really prefer that
equal() treat semantically-equivalent nodes as equal even if they have
different display output; this would imply ignoring the cast OID fields
in equal().  But I'm unsure that we can get away with that for expressions
that actually have different dependency lists ... it's not hard to imagine
ALTER operations screwing up and leaving an object with a dependency list
that doesn't match the current contents of its expression.

Another approach we could take is to not change the parse representation,
but instead teach find_expr_references_walker to look into pg_cast when
it sees a node that's marked as "display as coercion", and if it finds a
match then report the cast OID.  This'd make find_expr_references_walker
noticeably slower for such cases, and I'm not sure that it could expect
to find the correct match in every case because of cheating in binary
compatibility situations.  Also, if problem #1 or #2 above really is a
problem then it affects this approach just as much.  On the plus side,
this way would lead to an at-least-theoretically back-patchable solution
... not that it would fix existing misreported dependencies, but at least
we could record new ones correctly.

Another idea, which is not a bulletproof fix but might take care of nearly
all practical cases, is to adjust pg_dump's default object ordering rules
so that domain types come out after casts.

Seeing that this has been wrong since we invented dependencies and just
now came to light, I don't feel a need to panic about it; in particular
I think we probably ought not try to squeeze a fix into v11, unless we
go with the relatively non-invasive way of hacking pg_dump's ordering.
But we should think about finding a fix going forward.

BTW, the existence of ALTER DOMAIN SET DEFAULT means that it's probably
possible to set up situations with circular dependencies, by adding a
domain default after-the-fact that has an expression that somehow depends
on the domain's existence.  In principle pg_dump could be taught to break
such a circularity by using SET DEFAULT to dump the default expression
separately --- but it has no such logic now, nor does pg_depend contain
sufficient information to determine where to place the SET DEFAULT,
because we record any dependencies of the expression as direct
dependencies of the domain type.  So fixing that would be quite a bit of
work, which I'm disinclined to do because I can't think of any real-world
use case that would justify it.

Also ... the current situation means that it's possible to store an
expression "foo::mytype" and then drop the cast from foo's type to mytype,
as long as you keep the cast's implementation function.  The expression
will continue to print in that form, but it'll fail to dump-and-reload.
So this is an independent reason why just hacking pg_dump's ordering rules
isn't a full solution.

            regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "R, Siva"
Дата:
Сообщение: Re: Duplicate Item Pointers in Gin index
Следующее
От: Amit Langote
Дата:
Сообщение: Re: Partitioning with temp tables is broken