Re: [HACKERS] 7.3 gotchas for applications and client libraries
От | Bruce Momjian |
---|---|
Тема | Re: [HACKERS] 7.3 gotchas for applications and client libraries |
Дата | |
Msg-id | 200209180518.g8I5I7r01678@candle.pha.pa.us обсуждение исходный текст |
Ответ на | 7.3 gotchas for applications and client libraries (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-interfaces |
I have copied Tom's fine email to: http://www.ca.postgresql.org/docs/momjian/upgrade_7.3 and have added a mention of it in the HISTORY file: A dump/restore using "pg_dump" is required for those wishing to migrate data from any previous release. A summary ofchanges needed in client applications is at http://www.ca.postgresql.org/docs/momjian/upgrade_7.3. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce suggested that we need a porting guide to help people look for > application and client-library code that will be broken by the changes > in PG 7.3. Here is a first cut at documenting the issues. > Comments welcome --- in particular, what have I missed? > > regards, tom lane > > > Revising client-side code for PG 7.3 system catalogs > > > Here are some notes about things to look out for in updating client-side > code for PG 7.3. Almost anything that looks at the system catalogs is > probably going to need work, if you want it to behave reasonably when you > start using 7.3's new features such as schemas and DROP COLUMN. > > As an example, consider the task of listing the names and datatypes for > a table named "foo". In the past you may have done this with a query like > > SELECT a.attname, format_type(a.atttypid, a.atttypmod) > FROM pg_class c, pg_attribute a > WHERE c.relname = 'foo' > AND a.attnum > 0 AND a.attrelid = c.oid > ORDER BY a.attnum > > (this in fact is exactly what 7.2 psql uses to implement "\d foo"). > This query will work perfectly well in 7.2 or 7.1, but it's broken in half > a dozen ways for 7.3. > > The biggest problem is that with the addition of schemas, there might be > several tables named "foo" listed in pg_class. The old query will produce > a list of all of their attributes mixed together. For example, after > create schema a; > create schema b; > create table a.foo (f1 int, f2 text); > create table b.foo (f1 text, f2 numeric(10,1)); > we'd get: > > attname | format_type > ---------+--------------- > f1 | text > f1 | integer > f2 | text > f2 | numeric(10,1) > (4 rows) > > Not good. We need to decide exactly which foo we want, and restrict the > query to find only that row in pg_class. There are a couple of ways to > do this, depending on how fancy you want to get. > > If you just want to handle an unqualified table name "foo", and find the > same foo that would be found if you said "select * from foo", then one way > to do it is to restrict the query to "visible" rows of pg_class: > > SELECT ... > FROM ... > WHERE c.relname = 'foo' AND pg_table_is_visible(c.oid) > AND ... > > pg_table_is_visible() will only return true for pg_class rows that are in > your current search path and are not hidden by similarly-named tables that > are in earlier schemas of the search path. > > An alternative way is to eliminate the explicit join to pg_class, and > instead use the new datatype "regclass" to look up the correct pg_class > OID: > > SELECT ... > FROM pg_attribute a > WHERE a.attrelid = 'foo'::regclass > AND a.attnum > 0 > ORDER BY a.attnum > > The regclass input converter looks up the given string as a table name > (obeying schema visibility rules) and produces an OID constant that you > can compare directly to attrelid. This is more efficient than doing > the join, but there are a couple of things to note about it. One is > that if there isn't any "foo" table, you'll get an ERROR message from > the regclass input converter, whereas with the old query you got zero > rows out and no error message. You might or might not prefer the old > behavior. Another limitation is that there isn't any way to adapt > this approach to search for a partially-specified table name; > whereas in the original query you could use a LIKE or regex pattern to > match the table name, not only a simple equality test. > > Now, what if you'd like to be able to specify a qualified table name > --- that is, show the attributes of "a.foo" or "b.foo" on demand? > It will not work to say > WHERE c.relname = 'a.foo' > so this is another way in which the original query fails for 7.3. > > It turns out that the regclass method will work for this: if you say > WHERE a.attrelid = 'a.foo'::regclass > then the right things happen. > > If you don't want to use regclass then you're going to have to do an > explicit join against pg_namespace to find out which foo you want: > > SELECT a.attname, format_type(a.atttypid, a.atttypmod) > FROM pg_namespace n, pg_class c, pg_attribute a > WHERE n.nspname = 'a' AND c.relname = 'foo' > AND c.relnamespace = n.oid > AND a.attnum > 0 AND a.attrelid = c.oid > ORDER BY a.attnum > > This is somewhat tedious because you have to be prepared to split the > qualified name into its components on the client side. An advantage > is that once you've done that, you can again consider using LIKE or > regex patterns instead of simple name equality. This is essentially > what 7.3 psql does to support wildcard patterns like "\dt a*.f*". > > Okay, I think we've about beaten the issue of "which foo do you want" > to death. But what other ways are there for the schema feature to cause > trouble in this apparently now well-fixed-up query? > > One way is that the system catalogs themselves live in a schema, and > if that schema isn't frontmost in your search path then your references > to pg_class and so forth might find the wrong tables. (It's legal now > for ordinary users to create tables named like "pg_xxx", so long as they > don't try to put 'em in the system's schema.) This is probably not a > big issue for standalone applications, which can assume they know what the > search path is. But in client-side libraries, psql, and similar code > that has to be able to deal with someone else's choice of search path, > we really ought to make the references to system catalogs be fully > qualified: > > SELECT ... > FROM pg_catalog.pg_namespace n, pg_catalog.pg_class c, > pg_catalog.pg_attribute a > WHERE ... > > (If you weren't using table aliases in your queries before, here is a good > place to start...) > > In fact it's worse than that: function names, type names, etc also live in > schemas. So you really ought to qualify references to built-in functions > and types too: > > SELECT ..., pg_catalog.format_type(...) ... > > WHERE a.attrelid = 'foo'::pg_catalog.regclass ... > > The truly paranoid might want to qualify their operator names too, > though I draw the line at this because of the horribly ugly syntax needed: > > WHERE a.attrelid OPERATOR(pg_catalog.=) 'foo'::pg_catalog.regclass > > There's another, non-schema-related, gotcha in this apparently simple task > of showing attribute names: in 7.3 you need to exclude dropped attributes > from your display. ALTER TABLE DROP COLUMN doesn't remove the > pg_attribute entry for the dropped column, it only changes it to have > attisdropped = true. So you will typically want to add > > WHERE NOT attisdropped > > when looking at pg_attribute. > > Note however that excluding dropped columns like this means there may be > gaps in the series of attnum values you see. That doesn't bother this > particular query, but it could easily confuse applications that expect the > attributes to have consecutive attnums 1 to N. For example, pg_dump makes > an array of attributes and wants to index into the array with attnums. > It proved easier to make pg_dump include dropped attributes in its array > (and filter them out later) than to change the indexing logic. > > If you have client-side code that looks in pg_proc, pg_type, or > pg_operator then exactly the same sorts of schema-related issues appear: > the name alone is no longer unique, you have to think about identifying > the function, type, or operator within the schema you want. > > That's about all the mileage I can get out of the "show a table's > attributes" example, but there are still more schema-related trouble > items to check for. > > One problem to look for is code that scans pg_class (or another system > table, but most commonly pg_class) to make a list of things to operate > on. For example, various people have built scripts to automatically > reindex every table in a database. Such code will fail as soon as you > start using schemas, because it will find tables that aren't in your > current schema search path and try to operate on them. Depending on what > you want to do, you could change the code to emit fully qualified names > of tables it wants to operate on (so it will work no matter which schema > they are in), or you could restrict the pg_class search to find only > visible tables. > > If you want to use qualified names, the straightforward way to do it > is to join against pg_namespace to get the schema name: > > SELECT nspname, relname FROM pg_class c, pg_namespace n > WHERE relnamespace = n.oid AND relname LIKE 'foo%' AND ... > > A less obvious way is to use the regclass output converter: > > SELECT c.oid::regclass FROM pg_class c > WHERE relname LIKE 'foo%' AND ... > > This will give you back a table name that is qualified only if it needs to > be (i.e., the table is not visible in your search path), so you can use > it directly in the command you want to give next. Another interesting > property of the regclass converter is that it will double-quote the name > correctly if necessary --- for example, you'll get "TEST" (with the > quotes) if the table is named TEST. So you can splice the name directly > into a SQL command without any special pushups and be confident that it > will produce the right results. > > BTW, there are similar output converters for type, function, and operator > names, if you need them. > > Another thing to look for is code that tries to exclude system tables by > excluding tablenames beginning with "pg_"; typical code is like > WHERE relname NOT LIKE 'pg\\_%' > or > WHERE relname !~ '^pg_' > This is not the preferred method anymore: the right way to do this is to > join against pg_namespace and exclude tables that live in schemas whose > names begin with "pg_". > > A related point is that temporary tables no longer have names (in the > catalogs) of the form "pg_temp_NNN"; rather they have exactly the name > that the creating user gave them. They are kept separate from other > tables by placing them in schemas named like "pg_temp_NNN" (where now > NNN identifies an active backend, not a single table). So if you wanted > your scan to exclude temp tables then you'd definitely better change to > excluding on the basis of schema name not table name. On the upside, > if you do want your scan to show temp tables then it's much easier than > before. (BTW, the pg_table_is_visible function is the best way of > distinguishing your own session's temp tables from other people's. Yours > will be visible, other people's won't.) > > Other things that are less likely to concern most applications, but could > break some: > > Aggregate functions now have entries in pg_proc; pg_aggregate has lost > most of its columns and now is just an extension of a pg_proc entry. > If you have code that knows the difference between a plain function and > an aggregate function then it will surely need work. > > pg_class.relkind has a new possible value, 'c' for a composite type. > > pg_type.typtype has two new possible values, 'd' for a domain and 'p' for > a pseudo-type. > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
В списке pgsql-interfaces по дате отправления:
Следующее
От: g.hintermayer@inode.at (Gerhard Hintermayer)Дата:
Сообщение: pg_exec not returning after 16 calls/ libpgtcl