Обсуждение: Functions on tables
In the object-relational context, the definition of a "relation" is much the same as the idea of a "class"; the columns in a table are analogous to the attributes of a class. The names of Postgres' system catalogs reflect this correlation (pg_class, pg_attribute). Likewise, each tuple within a relation is like an instance of the class. So I was thinking, why is it we have such a direct representation of class attributes (in the form of columns), but no equivalent representation of class methods? Say you have the following table: CREATE TABLE person ( id serial PRIMARY KEY, firstname text NOT NULL, lastname text NOT NULL ); Then you define a function: CREATE FUNCTION person_name(firstname text, lastname text) RETURNS text AS $$ SELECT $1 || ' ' || $2; $$ LANGUAGE SQL IMMUTABLE; So now you can do SELECT id, person_name(firstname, lastname) AS name FROM person ORDER BY name; That works fine, but wouldn't it be far more elegant if you could do this instead: CREATE TABLE person ( id SERIAL PRIMARY KEY, firstname TEXT NOT NULL, lastname TEXT NOT NULL, FUNCTION name() RETURNS text AS $$ SELECT firstname || ' ' || lastname; $$ LANGUAGE SQL IMMUTABLE ); Now the function name() belongs to the "person" table: it is, in effect, a method of the "person" class. Which means we can do this: SELECT id, name() FROM person ORDER BY name(); Just as with methods in an OO programming language, when you call name() on a tuple of the "person" relation, it has access to the attributes of that tuple (here firstname and lastname). There is no need to pass arguments to the function, nor any need to actually know which attributes of "person" go into making up the return value of name(). You could later decide to add an attribute for a person's preferred name, or middle initial, and then factor that into the logic of name() without the query author needing to know anything about it. Of course there would be implementation challenges, and admittedly I haven't considered those, but on the surface this feels like a good idea. It taps into some of the real usefulness of OOP, and it uses a feature we already have: user-defined functions. I look forward to your comments. Regards, BJ
"Brendan Jurd" <direvus@gmail.com> writes: > That works fine, but wouldn't it be far more elegant if you could do > this instead: > CREATE TABLE person ( > id SERIAL PRIMARY KEY, > firstname TEXT NOT NULL, > lastname TEXT NOT NULL, > FUNCTION name() RETURNS text AS $$ SELECT firstname || ' ' || > lastname; $$ LANGUAGE SQL IMMUTABLE > ); 90% of the value this would have is already available with views, I think, without going outside bog-standard SQL: create view ... firstname || ' ' || lastname as name, ... Also, there's already a Berkeley-era syntax hack in PG that gets much of the rest: if x is of composite type, the notations x.y and y(x) are interchangeable. Thus: regression=# create function name(person) returns text as $$ regression$# select $1.firstname || ' ' || $1.lastname regression$# $$ language sql immutable; CREATE FUNCTION regression=# select person.name from person; name ---------- joe blow (1 row) > Now the function name() belongs to the "person" table: it is, in > effect, a method of the "person" class. Which means we can do this: > SELECT id, name() FROM person ORDER BY name(); [ itch... ] That seems to risk breaking a whole lot of existing code by introducing name collisions --- the entire namespace of ordinary functions is at risk as soon as you have any of these per-table functions, if they can be called like that. But having said all that, I think there are bits of SQL2003 that do some of what you're after. I don't think anyone has looked hard at what would be involved in merging those new SQL features with historical Postgres behaviors. regards, tom lane
On 12/17/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > 90% of the value this would have is already available with views, > I think, without going outside bog-standard SQL: Views also work fine, but one of the big advantages of having table methods is that all the things your table can do are contained within the table definition. Your table definition becomes like the API of a class. When you use separate views and/or functions to provide methods for a table, your schema becomes pretty scattered (I know mine is). It can be non-trivial to find, say, all functions that relate to a person tuple. > Also, there's already a Berkeley-era syntax hack in PG that gets much of > the rest: if x is of composite type, the notations x.y and y(x) are > interchangeable. Thus: I didn't know about that, and you're right, this actually delivers a lot of the notational convenience that I'm looking for. But again, you don't win the ability to look at your table definition and immediately understand what methods the table exposes. It looks like you can't use this hack to conveniently handle methods that have arguments. For example, say you had a method to return a person's birthday in a given year, defined as "birthday(person, int) returns date". You wouldn't be able to write person.birthday(2007) in a query and get the expected result. > [ itch... ] That seems to risk breaking a whole lot of existing code by > introducing name collisions --- the entire namespace of ordinary > functions is at risk as soon as you have any of these per-table > functions, if they can be called like that. > What if we used scope resolution? That is, Postgres first looks for functions which are local to the table, and if it doesn't find a match, then looks for functions in the normal namespace, perhaps raising a notice to warn the user of the ambiguity. Say there was a function name() in the public namespace. You can then easily resolve the ambiguity by specifying either person.name() or public.name(). Most front-end queries have more than one table in them, so in practice I think you'll almost always be using table aliases and identifying your methods explicitly anyway, e.g.: SELECT p.name(), a.label() as address FROM person p INNER JOIN address a ON p.postal_address = a.id; ... so I'm not passionately attached to the idea of being able to call the method without prefixing the table name.
I wholeheartedly support the approach BJ is advocating. The notion that methods (functions) and variables (tables) can be defined together is a very useful OO approach. I too find it difficult to recall which functions "belong" to which tables. Of course, some of my functions are very generic and wouldn't appropriately "belong" to any one table, but many are meant solely to operate on data in one type of object (table). I've taken to using schemas to collect together functions and tables that "belong" together. This requires the use of the schema name, as you say BJ, > ... so I'm not passionately attached to the idea of being able to call > the method without prefixing the table name. In my approach, the schema name becomes the object name and the functions "belong" to the schema. Most OO approaches only allow one definition of variables (tables), and I can easily allow each schema to have only one table. But I can also use multiple tables. The extra tables can be instances, much like BJ's use of rows as instances. Using separate tables allows me to have groups of instances that are grouped together for some reason. I can also have tables that are sub-classes of the original table. TJ http://www.gnova.com/
On 12/17/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > But having said all that, I think there are bits of SQL2003 that do some > of what you're after. I don't think anyone has looked hard at what > would be involved in merging those new SQL features with historical > Postgres behaviors. I've been looking into SQL2003, and there are indeed some features there that correspond to what I want. ISO/IEC 9075-2:2003 - Foundation (SQL/Foundation) talks about "structured types" (like a user-defined composite type in Postgres, and somewhat like a "struct" in C). It applies many OO concepts to these structured types: inheritance, encapsulation, overloading. It goes so far as to say that every structured type has an implied constructor method, and for every attribute within the type, one implied "observer" method and one implied "mutator" method. The standard provides for adding user-defined methods to these types, which have an implied first parameter "SELF", which is exactly the sort of thing I am looking for. The standard doesn't specifically mention doing this with tables, or row types. However, the conceptual distinction between a user-defined composite type and a table is not vast, and AIUI Postgres already implies a composite type for every table, with each tuple in the table being an object of that composite type. Frankly I don't have much experience reading SQL standards, and the language they use is a bit abstruse. But as far as I can tell, my suggestion is quite nicely compliant with the behaviour the standard recommends ... indeed the standard takes the OO idea much further than I initially hoped to. Alternatively, the standard also specifies "generated columns" within a table, which would allow you to achieve a similar effect to my person.name() method like so: CREATE TABLE person ( id serial PRIMARY KEY, firstname text NOT NULL, lastname text NOT NULL, name GENERATED ALWAYS AS (firstname || ' ' || lastname) ); I think the generated column idea has some usefulness, but isn't quite what I'm after. You can't use it with additional parameters (c.f. my earlier birthday(int) example), and I like the idea of keeping attributes and methods totally separate. A generated column is made to appear like an attribute, and it is referenced like an attribute, but in actual fact it is not an attribute, it is a derived value.
On Mon, Dec 18, 2006 at 12:15:34AM +1100, Brendan Jurd wrote: > On 12/17/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >But having said all that, I think there are bits of SQL2003 that do some > >of what you're after. I don't think anyone has looked hard at what > >would be involved in merging those new SQL features with historical > >Postgres behaviors. > > I've been looking into SQL2003, and there are indeed some features > there that correspond to what I want. > > ISO/IEC 9075-2:2003 - Foundation (SQL/Foundation) talks about > "structured types" (like a user-defined composite type in Postgres, > and somewhat like a "struct" in C). It applies many OO concepts to > these structured types: inheritance, encapsulation, overloading. It > goes so far as to say that every structured type has an implied > constructor method, and for every attribute within the type, one > implied "observer" method and one implied "mutator" method. > > The standard provides for adding user-defined methods to these types, > which have an implied first parameter "SELF", which is exactly the > sort of thing I am looking for. The standard doesn't specifically > mention doing this with tables, or row types. However, the conceptual > distinction between a user-defined composite type and a table is not > vast, and AIUI Postgres already implies a composite type for every > table, with each tuple in the table being an object of that composite > type. > > Frankly I don't have much experience reading SQL standards, and the > language they use is a bit abstruse. But as far as I can tell, my > suggestion is quite nicely compliant with the behaviour the standard > recommends ... indeed the standard takes the OO idea much further than > I initially hoped to. > > Alternatively, the standard also specifies "generated columns" within > a table, which would allow you to achieve a similar effect to my > person.name() method like so: > > CREATE TABLE person ( > id serial PRIMARY KEY, > firstname text NOT NULL, > lastname text NOT NULL, > name GENERATED ALWAYS AS (firstname || ' ' || lastname) > ); Illustra had the equivalent of columns defined as expressions. I don't remember the syntax, but it was simpler than "GENERATED". But better go with the spec. Late on the thread, --elein > > I think the generated column idea has some usefulness, but isn't quite > what I'm after. You can't use it with additional parameters (c.f. my > earlier birthday(int) example), and I like the idea of keeping > attributes and methods totally separate. A generated column is made > to appear like an attribute, and it is referenced like an attribute, > but in actual fact it is not an attribute, it is a derived value. > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >