Обсуждение: Event Triggers: adding information
Hi, Please find attached v3 of the patch to add information to event triggers, including command string. This work has mostly been part of the original goal and its design is the result of much reviewing here, in particular from Robert and Tom (rewriting a command before is done is called magic, not code, and that's why we have ddl_command_trace now). Many thanks. I hope that Robert will be in a position to continue reviewing and taking care of that patch series. He's clearly interested but really busy these days, so if another commiter wants to see for himself, please don't be shy! The current patch provides the following features: - New events: ddl_command_end and ddl_command_trace - New information in the TG_VARIABLE: object name, schema name, object id, object kind, operation (CREATE|ALTER|DROP), context (GENERATED|TOPLEVEL|…) normalized command string - New event filtering on CONTEXT (see after sig for an example) + create event trigger regress_event_trigger_end on ddl_command_end + when context in ('toplevel', 'generated', 'query', 'subcommand') + execute procedure test_event_trigger(); - Documentation of listed features - pg_dump support of the new features (context support) The command string normalization is a necessary feature and takes most of this patch footprint. The main use case of this feature is in-core logical replication support, the main non-core users will be the usual replication suspects (Slony, Londiste, Bucardo) and the audit systems. We might be able to also support some advanced Extension capabilities if we get "command diverting" support in next commit fest, and we could already deprecate the extension whitelisting code I had to do for 9.1 and 9.2. As we need to support all of CREATE, ATLER and DROP cases, I can't see another way than to work from a very loose notion of the parse tree. Up to now I've seen 3 cases worth reporting: - simple commands, where you can rely on the parsetree as is, and that need no transform stage (create extension, drop object, alter sequence, text search *, most commands really) - medium complex commands, which transform the parsetree in some ways but for which it still is easy to use it directly if you take care of only using the transformed parse tree (create table, create domain, create view — oh, the work was already done) - complex commands where you need to hack them to return the actual parse tree like structure they prepared out of the transformed parse tree so that you can get to kmow what they've just done, and that's only ALTER TABLE really, up to now So my guess is that it will get real easier from now, and I will add some more de parsing support with stats to have some ideas about that. I've been told that the maintenance cost is going to be so great as to limit our abilities to continue adding features to existing DDLs and coming up with new ones. I have two answers: first, I don't suppose we're going to include logical replication without DDL support in core and I don't see another realistic way to implement it, second, let's have some data to think about the case at hand. For example, I added CREATE SCHEMA objectid and command normalisation and here's the footprint: git diff --stat src/backend/commands/schemacmds.c | 6 ++++-- src/backend/tcop/utility.c | 4 ++-- src/backend/utils/adt/ddl_rewrite.c | 31 +++++++++++++++++++++++++++++++ src/include/commands/schemacmds.h | 2 +- 4 files changed, 38 insertions(+), 5 deletions(-) For the whole DefineStmt command set (CREATE AGGREGATE, OPERATOR, TYPE, TEXT SEARCH *, COLLATION), we're talking about: 11 files changed, 186 insertions(+), 70 deletions(-) 95 lines of those in ddl_rewrite.c For the normalisation of CREATE CONVERSATION command string, this time it's 4 files changed, 40 insertions(+), 5 deletions(-) And CREATE DOMAIN, which is more involved as you need to take care of rewriting DEFAULT and CHECK constraints, we have 4 files changed, 163 insertions(+), 3 deletions(-) The other commands I need to be working on if we want more data are ALTER kind of commands, and I've already worked on the most complex one of those, namely ALTER TABLE. I don't expect most DROP commands to offer any difficulty here, as I already have support for DropStmt. The question for DropStmt is what to do about object specific information when a single command can have as many targets as you want to? Current patch is filling-in information for the "main" target which would be the first object in the sequence, but that's a very weak position and I would easily agree to just leave the fields NULL in that case. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support A mildly interesting example that you can have now: create schema baz authorization dim create table distributors (did serial primary key, name varchar(40), f2 text check (upper(f2) = f2), unique(name) with (fillfactor=70) ) with (fillfactor=70); I added the blank lines to help the reader in that console paste: NOTICE: snitch event: ddl_command_end, context: GENERATED, tag: CREATE SEQUENCE, operation: CREATE, type: SEQUENCE NOTICE: oid: 41633, schema: baz, name: distributors_did_seq NOTICE: command: CREATE SEQUENCE baz.distributors_did_seq; NOTICE: snitch event: ddl_command_end, context: SUBCOMMAND, tag: CREATE TABLE, operation: CREATE, type: TABLE NOTICE: oid: 41635, schema: baz, name: distributors NOTICE: command: CREATE TABLE baz.distributors (did integer, name pg_catalog.varchar, f2 text, CHECK ((upper(f2) =f2))) WITH (fillfactor=70); NOTICE: snitch event: ddl_command_end, context: GENERATED, tag: CREATE INDEX, operation: CREATE, type: INDEX NOTICE: oid: 41643, schema: baz, name: distributors_pkey NOTICE: command: CREATE UNIQUE INDEX distributors_pkey ON baz.distributors USING btree (did); NOTICE: snitch event: ddl_command_end, context: GENERATED, tag: CREATE INDEX, operation: CREATE, type: INDEX NOTICE: oid: 41645, schema: baz, name: distributors_name_key NOTICE: command: CREATE UNIQUE INDEX distributors_name_key ON baz.distributors USING btree (name) WITH (fillfactor=70); NOTICE: snitch event: ddl_command_end, context: GENERATED, tag: ALTER SEQUENCE, operation: ALTER, type: SEQUENCE NOTICE: oid: 41633, schema: baz, name: distributors_did_seq NOTICE: command: ALTER SEQUENCE baz.distributors_did_seq OWNED BY baz.distributors.did; NOTICE: snitch event: ddl_command_end, context: TOPLEVEL, tag: CREATE SCHEMA, operation: CREATE, type: SCHEMA NOTICE: oid: 41632, schema: <NULL>, name: baz NOTICE: command: CREATE SCHEMA baz AUTHORIZATION dim; CREATE SCHEMA
Вложения
Hi again, The previously attached patch already needs a rebase since Tom fixed the single user mode where you're not supposed to access potentially corrupted system indexes. Please find attached a new version of the patch that should applies cleanly to master (I came to trust git). Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > The current patch provides the following features: > > - New events: ddl_command_end and ddl_command_trace > > - New information in the TG_VARIABLE: > object name, schema name, object id, object kind, > operation (CREATE|ALTER|DROP), context (GENERATED|TOPLEVEL|…) > normalized command string > > - New event filtering on CONTEXT (see after sig for an example) > > + create event trigger regress_event_trigger_end on ddl_command_end > + when context in ('toplevel', 'generated', 'query', 'subcommand') > + execute procedure test_event_trigger(); > > - Documentation of listed features > > - pg_dump support of the new features (context support) Still the same, plus some more command rewriting support so that I have some more data points to offer about the code maintainance burden we all want to limit on that feature. > For example, I added CREATE SCHEMA objectid and command normalisation > and here's the footprint: > 4 files changed, 38 insertions(+), 5 deletions(-) > > For the whole DefineStmt command set (CREATE AGGREGATE, OPERATOR, TYPE, > TEXT SEARCH *, COLLATION), we're talking about: > 11 files changed, 186 insertions(+), 70 deletions(-) > 95 lines of those in ddl_rewrite.c > > For the normalisation of CREATE CONVERSATION command string, this time > 4 files changed, 40 insertions(+), 5 deletions(-) > > And CREATE DOMAIN, which is more involved as you need to take care of > rewriting DEFAULT and CHECK constraints, we have > 4 files changed, 163 insertions(+), 3 deletions(-) The whole set of RenameStmt commands: 32 files changed, 382 insertions(+), 119 deletions(-) Then after that, sharing a lot of the previous code, support for both AlterOwnerStmt and AlterObjectSchemaStmt rewriting: 1 file changed, 102 insertions(+), 17 deletions(-) As you can guess I separated away the EventTriggerTargetOid tracking: 20 files changed, 96 insertions(+), 57 deletions(-) Again, those data point are not commented here, it's only to get some better idea of what we are talking about as the cost of maintaining the new command normalisation feature that's brewing in this patch. I would like that we commit to that idea and the current implementation of it, and include the set of commands I already had time to prepare in the current submission, knowing that I will continue adding commands support up to the next commitfest with the goal to have a full coverage. I also will be working on more features (complete PL support, some kind of command diverting, a "table_rewrite" event, some ready to use event triggers functions). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
On Wed, Dec 12, 2012 at 4:47 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > The previously attached patch already needs a rebase since Tom fixed the > single user mode where you're not supposed to access potentially > corrupted system indexes. Please find attached a new version of the > patch that should applies cleanly to master (I came to trust git). Sorry for the delay - I'm looking at this now. My first observation (a.k.a. gripe) is this patch touches an awful lot of code all over the backend. We just did a big old refactoring to try to get all the rename and alter owner commands to go through the same code path, but if this patch is any indication it has not done us a whole lot of good. The whole idea of all that work is that when people wanted to make systematic changes to those operations (like involving sepgsql, or returning the OID) there would not be 47 places to change. Apparently, we aren't there yet. Maybe we need some more refactoring of that stuff before tackling this? Does anyone object to the idea of making every command that creates a new SQL object return the OID of an object, and similarly for RENAME / ALTER TO? If so, maybe we ought to go through and do those things first, as separate patches, and then rebase this over those changes for simplicity. I can probably do that real soon now, based on this patch, if there are no objections, but I don't want to do the work and then have someone say, ack, what have you done? I'm basically implacably opposed to the ddl_rewrite.c part of this patch. I think that the burden of maintaining reverse-parsing code for all the DDL we support is an unacceptable level of maintenance overhead for this feature. It essentially means that every future change to gram.y that affects any of the supported commands will require a compensating change to ddl_rewrite.c. That is a whole lot of work and it is almost guaranteed that future patch authors will sometimes fail to do it correctly. At an absolute bare minimum I think we would need some kind of comprehensive test suite for this functionality, as opposed to the less than 60 lines of new test cases this patch adds which completely fail to test this code in any way at all, but really I think we should just not have it. It WILL get broken (if it isn't already) and it WILL slow down future development of SQL features. It also WILL have edge cases where it doesn't work properly, such as where the decision of the actual index name to use is only decided at execution time and cannot be inferred from the parse tree. I know that you feel that all of these are tolerable costs, but I 100% disgaree. I predict that if we commit this, it will becomes a never-ending and irrecoverable font of trouble. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Robert Haas <robertmhaas@gmail.com> writes: > Sorry for the delay - I'm looking at this now. Thanks very much! > My first observation (a.k.a. gripe) is this patch touches an awful lot > of code all over the backend. We just did a big old refactoring to > try to get all the rename and alter owner commands to go through the > same code path, but if this patch is any indication it has not done us > a whole lot of good. The whole idea of all that work is that when > people wanted to make systematic changes to those operations (like > involving sepgsql, or returning the OID) there would not be 47 places > to change. Apparently, we aren't there yet. Maybe we need some more > refactoring of that stuff before tackling this? It's hard to devise exactly what kind of refactoring we need to do before trying to write a patch that benefits from it, in my experience. In some cases the refactoring will make things more complex, not less. > Does anyone object to the idea of making every command that creates a > new SQL object return the OID of an object, and similarly for RENAME / > ALTER TO? If so, maybe we ought to go through and do those things > first, as separate patches, and then rebase this over those changes > for simplicity. I can probably do that real soon now, based on this > patch, if there are no objections, but I don't want to do the work and > then have someone say, ack, what have you done? That's a good idea. I only started quite late in that patch to work on the ObjectID piece of information, that's why I didn't split it before doing so. We might want to commit the other parts of the new infrastructure and information first then get back on ObjectID and command string, what do you think? Do you want me to prepare a patch that way, or would you rather hack your way around the ObjectID APIs then I would rebase the current patch? Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP statements, so my current patch is still some bricks shy of a load… (I added ObjectID only in the commands I added rewrite support for, apart from DROP). > I'm basically implacably opposed to the ddl_rewrite.c part of this > patch. I think that the burden of maintaining reverse-parsing code > for all the DDL we support is an unacceptable level of maintenance > overhead for this feature. It essentially means that every future I hear you. That feature is still required for any automatic consumption of the command string because you want to resolve schema names, index and constraint names, type name lookups and some more. I think that this feature is also not an option for in-core logical replication to support DDL at all. What alternative do you propose? > change to gram.y that affects any of the supported commands will > require a compensating change to ddl_rewrite.c. That is a whole lot > of work and it is almost guaranteed that future patch authors will > sometimes fail to do it correctly. At an absolute bare minimum I > think we would need some kind of comprehensive test suite for this > functionality, as opposed to the less than 60 lines of new test cases > this patch adds which completely fail to test this code in any way at > all, but really I think we should just not have it. It WILL get I had a more complete test suite last rounds and you did oppose to it for good reasons. I'm ok to revisit that now, and I think the test case should look like this: - install an auditing command trigger that will insert any DDL command string into a table with a sequence ordering - select * from audit - \d… of all objects created - drop schema cascade - DO $$ loop for sql in select command from audit do execute sql … - \d… of all objects created again Then any whacking of the grammar should alter the output here and any case that's not supported will fail too. We might be able to have a better way of testing the feature here, but I tried to stay into the realms of what I know pg_regress able to do. > broken (if it isn't already) and it WILL slow down future development > of SQL features. It also WILL have edge cases where it doesn't work > properly, such as where the decision of the actual index name to use > is only decided at execution time and cannot be inferred from the > parse tree. I know that you feel that all of these are tolerable The way to solve that, I think, as Tom said, is to only rewrite the command string when the objects exist in the catalogs. That's why we now have ddl_command_trace which is an alias to either start or end depending on whether we're doing a DROP or a CREATE or ALTER operation. > costs, but I 100% disgaree. I predict that if we commit this, it will > becomes a never-ending and irrecoverable font of trouble. I'm not saying the cost of doing things that way are easy to swallow, I'm saying that I don't see any other way to get that feature reliably enough for in-core logical replication to just work with DDLs. Do you? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Hi, On 2012-12-21 09:48:22 -0500, Robert Haas wrote: > On Wed, Dec 12, 2012 at 4:47 PM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: > > The previously attached patch already needs a rebase since Tom fixed the > > single user mode where you're not supposed to access potentially > > corrupted system indexes. Please find attached a new version of the > > patch that should applies cleanly to master (I came to trust git). > > Sorry for the delay - I'm looking at this now. > > My first observation (a.k.a. gripe) is this patch touches an awful lot > of code all over the backend. We just did a big old refactoring to > try to get all the rename and alter owner commands to go through the > same code path, but if this patch is any indication it has not done us > a whole lot of good. The whole idea of all that work is that when > people wanted to make systematic changes to those operations (like > involving sepgsql, or returning the OID) there would not be 47 places > to change. Apparently, we aren't there yet. Maybe we need some more > refactoring of that stuff before tackling this? Its hard to do such refactorings without very concrete use-cases, so I think its not unlikely that this patch points out some things that can be committed independently. ISTM that a good part of the "return oid;" changes are actually such a part. > Does anyone object to the idea of making every command that creates a > new SQL object return the OID of an object, and similarly for RENAME / > ALTER TO? If so, maybe we ought to go through and do those things > first, as separate patches, and then rebase this over those changes > for simplicity. I can probably do that real soon now, based on this > patch, if there are no objections, but I don't want to do the work and > then have someone say, ack, what have you done? FWIW I am all for it. > I'm basically implacably opposed to the ddl_rewrite.c part of this > patch. I think that the burden of maintaining reverse-parsing code > for all the DDL we support is an unacceptable level of maintenance > overhead for this feature. It essentially means that every future > change to gram.y that affects any of the supported commands will > require a compensating change to ddl_rewrite.c. That is a whole lot > of work and it is almost guaranteed that future patch authors will > sometimes fail to do it correctly. Thats - unsurprisingly - where I have a different opinion. ruleutils.c does that for normal SQL and while DDL is a bit bigger in the grammar than the DQL support I would say the data structures of of the latter are far more complex than for DDL. If you look at what changes were made to ruleutils.c in the last years the rate of bugs isn't, as far as I can see, higher than in other areas of the code. The ruleutils.c precedent also means that patch authors *have* to be aware of it already. Not too many features get introduced that only change DDL. > At an absolute bare minimum I > think we would need some kind of comprehensive test suite for this > functionality, as opposed to the less than 60 lines of new test cases > this patch adds which completely fail to test this code in any way at > all, but really I think we should just not have it. Absolutely aggreed. > It WILL get > broken (if it isn't already) and it WILL slow down future development > of SQL features. It also WILL have edge cases where it doesn't work > properly, such as where the decision of the actual index name to use > is only decided at execution time and cannot be inferred from the > parse tree. That obviousl needs to be solved, but I think the idea of invoking this command trigger whenever its appropriate (which I think was actually your idea?) should take care of most of the problems around this. > I know that you feel that all of these are tolerable costs, but I 100% > disgaree. I think that missing DDL support is one of the major weaknesses of all potential "logical" replication solutions. Be it slonly, londiste, BDR, or even something what some day may be in core. I just don't see any more realistic way to fix that besides supporting something like this. > I predict that if we commit this, it will > becomes a never-ending and irrecoverable font of trouble. Why? Its really not that different from whats done currently? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Dec 21, 2012 at 11:35 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > It's hard to devise exactly what kind of refactoring we need to do > before trying to write a patch that benefits from it, in my experience. > In some cases the refactoring will make things more complex, not less. Perhaps. I have a feeling there's some more simplification that can be done here, somehow. > That's a good idea. I only started quite late in that patch to work on > the ObjectID piece of information, that's why I didn't split it before > doing so. That's fine. I've committed that part. I think the remainder of the patch is really several different features that ought to be split apart and considered independently. We may want some but not others, or some may be ready to go in but not others. > Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP > statements, so my current patch is still some bricks shy of a load… (I > added ObjectID only in the commands I added rewrite support for, apart > from DROP). I shall rely on you to provide those bricks which are still missing. >> change to gram.y that affects any of the supported commands will >> require a compensating change to ddl_rewrite.c. That is a whole lot >> of work and it is almost guaranteed that future patch authors will >> sometimes fail to do it correctly. At an absolute bare minimum I >> think we would need some kind of comprehensive test suite for this >> functionality, as opposed to the less than 60 lines of new test cases >> this patch adds which completely fail to test this code in any way at >> all, but really I think we should just not have it. It WILL get > > I had a more complete test suite last rounds and you did oppose to it > for good reasons. I'm ok to revisit that now, and I think the test case > should look like this: > > - install an auditing command trigger that will insert any DDL command > string into a table with a sequence ordering > > - select * from audit > > - \d… of all objects created > > - drop schema cascade > > - DO $$ loop for sql in select command from audit do execute sql … > > - \d… of all objects created again > > Then any whacking of the grammar should alter the output here and any > case that's not supported will fail too. > > We might be able to have a better way of testing the feature here, but I > tried to stay into the realms of what I know pg_regress able to do. I was thinking that we might need to go beyond what pg_regress can do in this case. For example, one idea would be to install an audit trigger that records all the DDL that happens during the regression tests. Then, afterwards, replay that DDL into a new database. Then do a schema-only dump of the old and new databases and diff the dump files. That's a little wacky by current community standards but FWIW EDB has a bunch of internal tests that we run to check our proprietary stuff; some of them work along these lines and it's pretty effective at shaking out bugs. In this particular case, it would significantly reduce the maintenance burden of putting a proper test framework in place, because we can hope that the regression tests create (and leave around) at least one object of any given type, and that any new DDL features will be accompanied by similar regression tests. We already rely on the regression database for pg_upgrade testing, so if it's not complete, it impacts pg_upgrade test coverage as well. >> broken (if it isn't already) and it WILL slow down future development >> of SQL features. It also WILL have edge cases where it doesn't work >> properly, such as where the decision of the actual index name to use >> is only decided at execution time and cannot be inferred from the >> parse tree. I know that you feel that all of these are tolerable > > The way to solve that, I think, as Tom said, is to only rewrite the > command string when the objects exist in the catalogs. That's why we now > have ddl_command_trace which is an alias to either start or end > depending on whether we're doing a DROP or a CREATE or ALTER operation. Hmm. I have to study that more, maybe. I certainly agree that if you can look at the catalogs, you should be able to reliably reconstruct what happened - which isn't possible just looking at the parse tree. However, it feels weird to me to do something that's partly based on the parse tree and partly based on the catalog contents. Moreover, the current pre-create hook isn't the right place to gather information from the catalogs anyway as it happens before locks have been taken. Now, there's another thing that is bothering me here, too. How complete is the support you've implemented in this patch? Does it cover ALL CREATE/ALTER/DROP commands, including all options to and all variants of those commands? Because while I'm not very sanguine about doing this at all, it somehow feels like doing it only halfway would be even worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> That's a good idea. I only started quite late in that patch to work on >> the ObjectID piece of information, that's why I didn't split it before >> doing so. > > That's fine. I've committed that part. I think the remainder of the > patch is really several different features that ought to be split > apart and considered independently. We may want some but not others, > or some may be ready to go in but not others. Thank you for this partial commit, and Simon and Andres to fill in the gaps. I should mention that the missing header parts were all in my patch, and that headers hacking is proving suprisingly uneasy. I've been having several rounds of problems with the gcc tool chain when modifying headers. Worst case has been a successful compiling followed by random errors. Then gdb was not giving any clue (botched memory when returned from a function, palloc'ed memory not freed yet). After spending more time on it that I care to admit, I did a `make maintainer-clean`, built again and the problem disappeared all by itself. The gcc tool chain is not my favorite developer environment. >> Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP >> statements, so my current patch is still some bricks shy of a load… (I >> added ObjectID only in the commands I added rewrite support for, apart >> from DROP). > > I shall rely on you to provide those bricks which are still missing. Cool, will prepare a separate patch implementing that API now, and base the following patches on top of that. My thinking is to do as following: - API patch to get Oid from all CREATE/ALTER commands - API patch for getting the Oid(s) in (before) DROP commands - EventTrigger Infos patch, depending on the previous ones - Command String Rewrite and its publishing in Event Triggers Of course for the DROP parts, we'd better have the Oid and use it before the command runs through completion or it will not be usable from the event trigger (the target is ddl_command_start in that case). >> We might be able to have a better way of testing the feature here, but I >> tried to stay into the realms of what I know pg_regress able to do. > > I was thinking that we might need to go beyond what pg_regress can do > in this case. For example, one idea would be to install an audit > trigger that records all the DDL that happens during the regression > tests. Then, afterwards, replay that DDL into a new database. Then > do a schema-only dump of the old and new databases and diff the dump > files. That's a little wacky by current community standards but FWIW > EDB has a bunch of internal tests that we run to check our proprietary > stuff; some of them work along these lines and it's pretty effective > at shaking out bugs. Are you in a position to contribute those parts to the community? Implementing them again then having to support two different variants of them does not look like the best use of both our times. > Hmm. I have to study that more, maybe. I certainly agree that if you > can look at the catalogs, you should be able to reliably reconstruct > what happened - which isn't possible just looking at the parse tree. Well in fact we're looking at the parsetree once edited to host the exact data that goes into the catalogs. In most cases that data is easily available to further consumption, or it's possible to use the transform API without touching catalogs again. In some places I'm doing the same processing twice, which I don't like very much, but it's only happening if an Event Trigger is going to fire so I though we could optimize that later. The aim here has been to limit the changes to review, it looks like we have enough of them already. Concerned parts are dealing with raw and cooked expressions for CHECK and DEFAULT and WHERE clauses (create table, create index, alter table, create constraint, create domain). > However, it feels weird to me to do something that's partly based on > the parse tree and partly based on the catalog contents. Moreover, > the current pre-create hook isn't the right place to gather > information from the catalogs anyway as it happens before locks have > been taken. So all the information is derived from the parse tree. The trick is that this information is only valid once it has hit the catalogs (think of a CHECK constraint in a CREATE TABLE statement, referencing some column attribute, same with a DEFAULT expression depending on another col). The case where we should certainly prefer looking at the catalog caches are when we want to know the actual schema where the object has been created. The current patch is deriving that information mostly from the parse tree, using the first entry of the search_path when the schema is not given in the command. It's ok because we are still in the same transaction and no command has been able to run in between the user command and our lookup, but I could easily get convinced to look up the catalogs instead, even more so once we have the OID of the object easily available in all places. > Now, there's another thing that is bothering me here, too. How > complete is the support you've implemented in this patch? Does it > cover ALL CREATE/ALTER/DROP commands, including all options to and all > variants of those commands? Because while I'm not very sanguine about > doing this at all, it somehow feels like doing it only halfway would > be even worse. This patch is implementing complete support for all commands for most of the information, and partial support for some information that comes from the parse tree analysis. The goal for this commit fest is to agree on how to do it, then finish the whole command set for the next one. As decided at the hackers meeting, now is the time to address anything controversial, once done we can "return with feedback". So that's my goal for this commit fest, concerning the Normalized Command String. Given your review, I think the way forward is pretty clear now, and my plan is as following: - commit Oid related API changes in this commit fest - commit Event Trigger Information changes in this commit fest - agreeon how to tackle exposing the normalized command string Merry Christmas! -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes: >> Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP >> statements, so my current patch is still some bricks shy of a load… (I >> added ObjectID only in the commands I added rewrite support for, apart >> from DROP). > > I shall rely on you to provide those bricks which are still missing. Please find attached a patch to change most functions called from standard_ProcessUtility() to return an Oid (passes `make maintainer-clean; configure; make install check`). Most of them only, because it only make sense for functions touching an object that exists in the catalogs and have a distinct Oid. That completes ALTER and CREATE ObjectID support, I did nothing about the DROP case in the attached. The way I intend to solve that problem is using get_object_address() and do an extra lookup from within the Event Trigger code path. Yes that's an extra lookup, the only alternative that I see would be another giant refactoring so that all and any DROP support function is split in a lookup and lock part first, then the actual DROP. I don't see that as a particular win either, as the advantage of doing a separate (extra) lookup in the ddl_command_start Event Trigger is that the actual DROP code then will check that the target still exists. The attached patch is known to miss support for ExecCreateTableAs because I wanted to keep the ball rolling and couldn't figure out where to find the target relation id in time. If you feel like filling that gap that would be awesome, if not I will take care of that later, either in a version v1 of that patch, or in a follow-up patch, or embedded into the next patch to come, the cleaned up Event Trigger Info patch without Normalized Command String support. As you see fit. Merry Christmas All, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
Hi, Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Cool, will prepare a separate patch implementing that API now, and base > the following patches on top of that. My thinking is to do as following: > > - API patch to get Oid from all CREATE/ALTER commands > - API patch for getting the Oid(s) in (before) DROP commands > - Event Trigger Infos patch, depending on the previous ones > - Command String Rewrite and its publishing in Event Triggers > > Of course for the DROP parts, we'd better have the Oid and use it before > the command runs through completion or it will not be usable from the > event trigger (the target is ddl_command_start in that case). Please find attached two versions of the same patch implementing added information for Event Triggers, without any support for command strings normalizing. - event_trigger_infos.5.patch.gz applies on top of current's master branch - event_trigger_infos_partial.5.patch.gz applies on top of oid-api.0.patch.gz applied to current's master branch I guess you're going to use the partial one, but if you want to run a quick testing the aggregated patch might make sense. The way I've been adding schema and object names is to do extra lookups to the system caches (or the catalogs when we don't have caches), so that the information is know to be correct in all cases. That means I had to add a CommandCounterIncrement(), and that the information is only available when the object actually exists (use ddl_command_trace). Also, in the DROP case, the extra lookup that happens before the DROP command is done with a ShareUpdateExclusiveLock, so that the object doesn't disappear from concurrent activity while we run the User Defined Trigger. Should we be concerned about deadlock possibilities here when later the transaction will need to upgrade its Lock? Given lock upgrade concerns, I guess there's not so much we can do apart from taking an AccessExclusiveLock from the get go in the case of a DROP Event Trigger at ddl_command_start, right? > Are you in a position to contribute those parts to the community? > Implementing them again then having to support two different variants of > them does not look like the best use of both our times. I didn't include new testing in that patch version, waiting for your answer. > As decided at the hackers meeting, now is the time to address anything > controversial, once done we can "return with feedback". So that's my > goal for this commit fest, concerning the Normalized Command String. > > Given your review, I think the way forward is pretty clear now, and my > plan is as following: > > - commit Oid related API changes in this commit fest > - commit Event Trigger Information changes in this commit fest > - agree on how to tackle exposing the normalized command string Including this email now, the first two items are getting addressed (pending review then commit, at least that's the goal here). Let's not forget the third point: we need to reach an agreement on how to address the Command String Normalisation, too. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
On 21 December 2012 14:48, Robert Haas <robertmhaas@gmail.com> wrote: > I predict that if we commit this, it will > becomes a never-ending and irrecoverable font of trouble. Event triggers are a new feature with many powerful uses: replication, online upgrade, data transport. New features require extra code. To decide how big a maintenance burden, we need to look at whether we want the feature, and if we have it, how much code it would be to maintain it. pg_dump needs to dump what is there, so dumping "current state" information is required. That can be accessed from catalog, but requires significant extra code to bring that all back together into DDL text. pg_dump is useful for schema backups and pg_upgrade. This code already exists in core. Replication needs to see "delta" information, so we can track changes as they happen. Trying to deduce exact deltas by diffing two "current state" dumps is close to impossible. The only way is a "delta" stream. This would be used by replication and on-line upgrade. This code can be added to core by the patches under discussion. The code is broadly comparable between "current state" and "delta". It's basically just assembling the text of DDL statements from available information. In one case from the catalog, in the other case from the command parse/execution trees. It's not especially hard to code, nor will it be fiddly or difficult to understand - just developing normalised text versions of the change in progress. Normalisation is important and useful; we spent time in 9.2 doing things the right way for pg_stat_statements. In both cases, the code really should live in core. So we can tie the code directly to each release. Realistically, doing things outside core means there are often holes or problems that cannot be easily solved. That seems almost certain to be the case here. pg_dump maintenance is required but its not a huge burden. pg_dump hasn't slowed down the addition of new DDL features. It just requires a little extra code, mostly very simple. The PostgreSQL project needs rewrite support for DDL, so it will come to exist inside, or outside core. Knowing it is there, being able to rely on that, means we'll make faster progress towards the new features we are building. The approach proposed here has been discussed over many years, minuted in detail at cluster hackers meetings: "DDL Triggers". My thoughts are that the approach here has been well thought through and is exactly what we need. It *is* extra code, but not complex code, and it is code designed to bring us closer to our goals, not to hamper the development of new DDL statements. With the right tests, the changes required would be easy to spot and add code for them. It looks to me that we already have a good suggestion on exactly how to do that (from Robert). So overall, it certainly is worth thinking hard about whether we should include the rewrite code. Balancing the costs and the benefits, I think we should include. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Dec 25, 2012 at 10:34 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Thank you for this partial commit, and Simon and Andres to fill in the > gaps. I should mention that the missing header parts were all in my > patch, and that headers hacking is proving suprisingly uneasy. Apologies to all about the missed bits. I believe that I had them in my version of the patch as well, but I think the changes ended up unstaged in my repository rather than folded into the patch. I'm usually smarter than that, but, obviously, not this time. >> I was thinking that we might need to go beyond what pg_regress can do >> in this case. For example, one idea would be to install an audit >> trigger that records all the DDL that happens during the regression >> tests. Then, afterwards, replay that DDL into a new database. Then >> do a schema-only dump of the old and new databases and diff the dump >> files. That's a little wacky by current community standards but FWIW >> EDB has a bunch of internal tests that we run to check our proprietary >> stuff; some of them work along these lines and it's pretty effective >> at shaking out bugs. > > Are you in a position to contribute those parts to the community? > Implementing them again then having to support two different variants of > them does not look like the best use of both our times. If I thought there were some useful code, I would try to see if we could contribute it, but I'm pretty sure there isn't. We have a bunch of pg_dump tests, but their intended purpose is to verify that our proprietary stuff dumps and reloads properly, which is not what we need here. The reason I mentioned it was just to make the point that it is possible to have a large suite of tests that goes beyond what is possible with pg_regress, and that this can be a useful way of finding out whether you've broken things. In our case, we actually cheat pretty hard by using things like \! pg_dump in the .sql files, so it actually ends up going through pg_regress after all, technically anyway. I'm not quite sure what to think about that approach from a community point of view. It certainly ends up reducing the number of new things that you need to invent in order to do new kinds of testing, but it's still not as flexible as I would like - for example, I don't see a way to do the exact kind of testing we need here in a cross-platform way using that infrastructure. I'm tempted to propose that we define a Perl-based testing API for "additional regression tests" that can't be handled using pg_regress. Like: we put a bunch of Perl modules in a certain directory, and then write a script that goes through and do require $module; $module->regress; ...on each one, based on some schedule file. If a test passes or fails, it calls some API that we provide to report the result. That way, whatever tests we write for this effort can potentially become the core of a larger test suite, and by basing it on Perl we add no new tool dependencies and can (hopefully) run on both Linux and Windows. I'm open up to other ideas, of course. > The case where we should certainly prefer looking at the catalog caches > are when we want to know the actual schema where the object has been > created. The current patch is deriving that information mostly from the > parse tree, using the first entry of the search_path when the schema is > not given in the command. It's ok because we are still in the same > transaction and no command has been able to run in between the user > command and our lookup, but I could easily get convinced to look up the > catalogs instead, even more so once we have the OID of the object easily > available in all places. I haven't looked at the code, but it seems to me that there have got to be edge cases where that will fail. For one thing, we only do pretty minimal locking on namespaces, so I'm not at all sure that someone couldn't (for example) rename the old namespace out of the way, thus causing one search_path lookup to find the first schema in the search_path and the other lookup to find the second schema in the search_path. Frankly, I think we have some hazards of that type in the DDL code as it stands, so it's not like nobody's ever made that mistake before, but it would be nice not to propagate it. Really, the only safe way to do these things is to have any given name lookup done in one and only one place. We're some distance from there and there and I don't know that this patch should be required to carry the burden of nailing down all the loose ends in that area, but I think it's reasonable to insist that it shouldn't do anything which makes it impossible for someone else to nail down those loose ends, which is still fairly high on my personal to-do list. The other danger here is - what exactly do you mean by "no command has been able to run between the user command and our lookup"? Because you can do stupid things with functions like set_config(). This would only be safe if no user-provided expressions can possibly get evaluated between point A and point B, and that strikes me as the sort of thing that could easily be false unless this is all done VERY close to the start of processing. A belated Merry Christmas to you as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 25, 2012 at 1:42 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP >>> statements, so my current patch is still some bricks shy of a load… (I >>> added ObjectID only in the commands I added rewrite support for, apart >>> from DROP). >> >> I shall rely on you to provide those bricks which are still missing. > > Please find attached a patch to change most functions called from > standard_ProcessUtility() to return an Oid (passes `make > maintainer-clean; configure; make install check`). Most of them only, > because it only make sense for functions touching an object that exists > in the catalogs and have a distinct Oid. OK, I committed this. > That completes ALTER and CREATE ObjectID support, I did nothing about > the DROP case in the attached. The way I intend to solve that problem is > using get_object_address() and do an extra lookup from within the Event > Trigger code path. An extra lookup that occurs always, or only when event triggers are in use? A re-resolution of the name, or some other kind of lookup? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > If I thought there were some useful code, I would try to see if we > could contribute it, but I'm pretty sure there isn't. We have a bunch Oh. Too bad. [... description of the tool ...] > I don't see a way to do the exact kind of testing we need here in a > cross-platform way using that infrastructure. Well, do we have \! in psql in windows at all? does it work in a similar way to the unix one? I just don't know. > I'm tempted to propose that we define a Perl-based testing API for > "additional regression tests" that can't be handled using pg_regress. I know that our build requirements are pretty stable for very good reasons and I appreciate that. Now, I don't do perl. I'll happily use whatever you come up with, though. >> command and our lookup, but I could easily get convinced to look up the >> catalogs instead, even more so once we have the OID of the object easily >> available in all places. > > I haven't looked at the code, but it seems to me that there have got > to be edge cases where that will fail. For one thing, we only do In between that email and the latest patch I've removed any processing of the parse tree, because without the ddl_rewrite module it didn't make much sense anyway, and I still want to be sure we have ObjectID, Name and Schema as a base information set. > Really, the only safe way to do these things is to have any given name > lookup done in one and only one place. We're some distance from there > and there and I don't know that this patch should be required to carry > the burden of nailing down all the loose ends in that area, but I > think it's reasonable to insist that it shouldn't do anything which > makes it impossible for someone else to nail down those loose ends, > which is still fairly high on my personal to-do list. The current version of the patch now does its own lookup, and only does so at a time when the object exists in the catalogs. So I think we're now good on that front in the current Event Triggers patch, even if it's not refactoring all and any DDL to be able to add hooks in between lookups and locks, for example. Also, several cases appear to need an extra lookup anyway, such as ALTER OBJECT … RENAME TO … or ALTER OBJECT … SET SCHEMA … when you want to feed information to a ddl_command_end Event Trigger… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes: > OK, I committed this. Thanks. Please find attached a rebased patch, v6. I think it's looking more like what you would expect now: COLUMNS=70 git diff --stat postgres/master.. doc/src/sgml/event-trigger.sgml | 196 +++- doc/src/sgml/plpgsql.sgml | 95 +- doc/src/sgml/ref/create_event_trigger.sgml | 20 +- src/backend/catalog/objectaddress.c | 22 +- src/backend/commands/event_trigger.c | 870 ++++++++++++++++- src/backend/commands/trigger.c | 39 + src/backend/commands/typecmds.c | 2 +- src/backend/tcop/utility.c | 385 +++++--- src/backend/utils/cache/evtcache.c | 15 + src/bin/pg_dump/pg_dump.c | 21 +- src/bin/pg_dump/pg_dump.h | 1 + src/bin/psql/describe.c | 9 +- src/include/catalog/objectaddress.h | 21 + src/include/catalog/pg_event_trigger.h | 2 + src/include/commands/event_trigger.h | 47 +- src/include/commands/trigger.h | 1 + src/include/utils/evtcache.h | 6 +- src/pl/plpgsql/src/pl_comp.c | 48 + src/pl/plpgsql/src/pl_exec.c | 57 +- src/pl/plpgsql/src/plpgsql.h | 6 + src/test/regress/expected/event_trigger.out | 53 +- src/test/regress/sql/event_trigger.sql | 51 +- 22 files changed, 1700 insertions(+), 267 deletions(-) >> That completes ALTER and CREATE ObjectID support, I did nothing about >> the DROP case in the attached. The way I intend to solve that problem is >> using get_object_address() and do an extra lookup from within the Event >> Trigger code path. > > An extra lookup that occurs always, or only when event triggers are in > use? A re-resolution of the name, or some other kind of lookup? Only when event triggers are in use, and when the object is known to exists in the catalogs (ddl_command_start for a DROP operation, or ddl_command_end for a CREATE or ALTER operation). And it's only a name resolution using the catcache when it exists, or a catalog index scan when we have to. The key we have is the OID. The OID is filled in from utility.c in all CREATE and ALTER operations that we are interested into, and in the case of a DROP operation we do either a RangeVarGetRelid for relations or get_object_address() for the other object kinds. Given the OID, we use the ObjectProperty[] structure to know which cache or catalog to scan in order to get the name and namespace of the object. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
On 12/29/2012 08:41 AM, Dimitri Fontaine wrote: > Well, do we have \! in psql in windows at all? does it work in a similar > way to the unix one? I just don't know. > > Of course we do, Why ever would you think we don't? The command must be written in the Windows shell language, but the behaviour is the same. If system() and popen() didn't work on Windows we'd be in serious trouble having a Windows port at all. cheers andrew
Robert Haas <robertmhaas@gmail.com> writes: > The other danger here is - what exactly do you mean by "no command has > been able to run between the user command and our lookup"? Because > you can do stupid things with functions like set_config(). This would > only be safe if no user-provided expressions can possibly get > evaluated between point A and point B, and that strikes me as the sort > of thing that could easily be false unless this is all done VERY close > to the start of processing. To me, the largest single risk of the whole event triggers feature is precisely that it will result in random user-provided code getting executed in fairly random places, thus breaking assumptions of this type that may be hard or impossible to fix. But given that allowing that is more or less exactly the point of the feature, I'm not sure why you're blaming the patch for it. It should have been rejected on day one if you're not willing to have that type of risk. regards, tom lane
On Sat, 2012-12-29 at 08:11 -0500, Robert Haas wrote: > On Tue, Dec 25, 2012 at 1:42 PM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >>> Also, keep in mind we want the ObjectID in all CREATE, ALTER and DROP > >>> statements, so my current patch is still some bricks shy of a load… (I > >>> added ObjectID only in the commands I added rewrite support for, apart > >>> from DROP). > >> > >> I shall rely on you to provide those bricks which are still missing. > > > > Please find attached a patch to change most functions called from > > standard_ProcessUtility() to return an Oid (passes `make > > maintainer-clean; configure; make install check`). Most of them only, > > because it only make sense for functions touching an object that exists > > in the catalogs and have a distinct Oid. > > OK, I committed this. There is a new compiler warning coming from this, I believe: copy.c: In function ‘DoCopy’: copy.c:835:2: error: ‘relid’ may be used uninitialized in this function [-Werror=maybe-uninitialized] return relid; ^ copy.c:753:6: note: ‘relid’ was declared here Oid relid; ^
Hi, Peter Eisentraut <peter_e@gmx.net> writes: > There is a new compiler warning coming from this, I believe: I don't get this warning here, at least not in -O2, and I did forget to make maintainer-clean then rebuild with -O0 just in case gcc is now complaining about something else. I wish this situation could be fixed. Now, this warning needs to be fixed, too, and here's a patch for that. Thanks, regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
On Sun, 2012-12-30 at 13:18 +0100, Dimitri Fontaine wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > There is a new compiler warning coming from this, I believe: > > I don't get this warning here, at least not in -O2, and I did forget to > make maintainer-clean then rebuild with -O0 just in case gcc is now > complaining about something else. I wish this situation could be fixed. I get this warning in Debian squeeze and wheezy with fairly standard build options. I don't think -O0 will give you any more warnings. > Now, this warning needs to be fixed, too, and here's a patch for that. Thanks, I fixed it slightly differently.
Dimitri Fontaine escribió: > Robert Haas <robertmhaas@gmail.com> writes: > > OK, I committed this. > > Thanks. Please find attached a rebased patch, v6. I think it's looking > more like what you would expect now: I gave this patch a look. I'm not done with it; I mostly gave the utility.c changes some love so that the end result is not as cluttered. I did this by creating a couple of macros that call the Start() thing, then set the OID, then call the End() thing. This made pretty obvious the places that were missing to set the object OID; I changed the CREATE TABLE AS code to return it, for instance. The patch I attach applies on top of Dimitri's v6 patch. With these changes, it's much easier to review the big standard_ProcessUtility() switch and verify cases that are being handled correctly. (A few places cannot use the macros I defined because they use more complex arrangements. This is fine, I think, because they are few enough that can be verified manually.) I also noticed that ALTER DEFAULT PRIVILEGES was trying to fire event triggers, even though we don't support GRANT and REVOKE; it seems to me that the three things out to be supported together. I'm told David Fetter would call this "DCL support", and we're not supporting DCL at this stage. So DEFAULT PRIVILEGES ought to be not supported. I also observed that the SECURITY LABEL commands does not fire event triggers; I think it should. But then maybe this can be rightly called DCL as well, so perhaps it's all right to leave it out. DROP OWNED is not firing event triggers. There is a comment therein that says "we don't fire them for shared objects", but this is wrong: this command is dropping local objects, not shared, so it seems necessary to me that triggers are fired. I also noticed that some cases such as DROP <whatever> do not report the OIDs of all the objects that are being dropped, only one of them. This seems bogus to me; if you do "DROP TABLE foo,bar" then both tables should be reported. > Given the OID, we use the ObjectProperty[] structure to know which cache > or catalog to scan in order to get the name and namespace of the object. The changes to make ObjectPropertyType visible to code outside objectaddress.c look bogus to me. I think we should make this new code use the interfaces we already have. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera escribió: > I gave this patch a look. I'm not done with it; I mostly gave the > utility.c changes some love so that the end result is not as cluttered. > I did this by creating a couple of macros that call the Start() thing, > then set the OID, then call the End() thing. This made pretty obvious > the places that were missing to set the object OID; I changed the CREATE > TABLE AS code to return it, for instance. The patch I attach applies on > top of Dimitri's v6 patch. ... and here's the attachment. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I gave this patch a look. I'm not done with it; I mostly gave the > utility.c changes some love so that the end result is not as cluttered. Thanks Álvaro! > I did this by creating a couple of macros that call the Start() thing, > then set the OID, then call the End() thing. This made pretty obvious > the places that were missing to set the object OID; I changed the CREATE > TABLE AS code to return it, for instance. The patch I attach applies on > top of Dimitri's v6 patch. With these changes, it's much easier to > review the big standard_ProcessUtility() switch and verify cases that > are being handled correctly. (A few places cannot use the macros I > defined because they use more complex arrangements. This is fine, I > think, because they are few enough that can be verified manually.) That sounds great. I'm not at ease with creating avdanved C macros and I'm very happy you did it :) > I also noticed that ALTER DEFAULT PRIVILEGES was trying to fire event > triggers, even though we don't support GRANT and REVOKE; it seems to me > that the three things out to be supported together. I'm told David > Fetter would call this "DCL support", and we're not supporting DCL at > this stage. So DEFAULT PRIVILEGES ought to be not supported. Agreed. Do you want me to edit my patch to remove support for that command and the other arrangement we are talking about, or do you want to continue having at it? > I also observed that the SECURITY LABEL commands does not fire event > triggers; I think it should. But then maybe this can be rightly called > DCL as well, so perhaps it's all right to leave it out. I would agree with calling that a DCL. > DROP OWNED is not firing event triggers. There is a comment therein > that says "we don't fire them for shared objects", but this is wrong: > this command is dropping local objects, not shared, so it seems > necessary to me that triggers are fired. I think the best way to address DROP OWNED is the same as to address DROP CASCADE: have the inner code (doDeletion, I think) prepare another DropStmt and give control back to ProcessUtility() with a context of PROCESS_UTILITY_CASCADE. The good news is that the unified DropStmt support allows us to think about that development, even if it still is a non trivial refactoring. I would like to hear that we can consider the current patch first then refactor the drop cascade operations so that we automatically have full support for DROP OWNED. > I also noticed that some cases such as DROP <whatever> do not report the > OIDs of all the objects that are being dropped, only one of them. This > seems bogus to me; if you do "DROP TABLE foo,bar" then both tables > should be reported. When we have the DROP CASCADE refactoring, we could maybe implement drop on a list of objects as getting back to ProcessUtility() also, with yet another context (SUBCOMMAND or maybe GENERATED would do here). The other way to address the information problem would be to expose it as a relation (resultset, setof record, cursor, you name it) variable to PLpgSQL. Again, I hope we can decide on the approach now, commit what we have and expand on it on the next commit fest. >> Given the OID, we use the ObjectProperty[] structure to know which cache >> or catalog to scan in order to get the name and namespace of the object. > > The changes to make ObjectPropertyType visible to code outside > objectaddress.c look bogus to me. I think we should make this new code > use the interfaces we already have. We need to move the new fonction get_object_name_and_namespace() into the objectaddress.c "module" then, and decide about returning yet another structure (because we have two values to return) or to pass that function a couple of char **. Which devil do you prefer? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine escribió: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> Given the OID, we use the ObjectProperty[] structure to know which cache > >> or catalog to scan in order to get the name and namespace of the object. > > > > The changes to make ObjectPropertyType visible to code outside > > objectaddress.c look bogus to me. I think we should make this new code > > use the interfaces we already have. > > We need to move the new fonction get_object_name_and_namespace() into > the objectaddress.c "module" then, and decide about returning yet > another structure (because we have two values to return) or to pass that > function a couple of char **. Which devil do you prefer? I made some changes to this, and I think the result (attached) is cleaner overall. Now, this review is pretty much unfinished as far as I am concerned; mainly I've been trying to figure out how it all works and improving some stuff as I go, and I haven't looked at the complete patch yet. We might very well doing some things quite differently; for example I'm not really sure of the way we handle CommandTags, or the way we do object lookups at the event trigger stage, only to have it repeated later when the actual deletion takes place. In particular, since event_trigger.c does not know the lock strength that's going to be taken at deletion, it only grabs ShareUpdateExclusiveLock; so there is lock escalation here which could lead to deadlocks. This might need to be rethought. I added a comment somewhere, noting that perhaps it's ProcessUtility's job to set the object classId (or at least the ObjectType) at the same time the TargetOid is being set, rather than have event_trigger.c figure it out from the parse node. And so on. I haven't looked at plpgsql or pg_dump yet, either. We've been discussing on IM the handling of DROP in general. The current way it works is that the object OID is reported only if the toplevel command specifies to drop a single object (so no OID if you do DROP TABLE foo,bar); and this is all done by standard_ProcessUtility. This seems bogus to me, and it's also problematic for things like DROP SCHEMA, as well as DROP OWNED BY (which, as noted upthread, is not handled at all but should of course be). I think the way to do it is have performDeletion and performMultipleDeletions (dependency.c) call into the event trigger code, by doing something like this: 1. look up all objects to drop (i.e. resolve the list of objects specified by the user, and complete with objects dependent on those) 2. iterate over this list, firing DROP at-start event triggers 3. do the actual deletion of objects (i.e. deleteOneObject, I think) 4. iterate again over the list firing DROP at-end event triggers We would have one or more event triggers being called with a context of TOPLEVEL (for objects directly mentioned in the command), and then some more calls with a context of CASCADE. Exactly how should DROP OWNED BY be handled is not clear; perhaps we should raise one TOPLEVEL event for the objects directly owned by the role? Maybe we should just do CASCADE for all objects? This is not clear yet. Thoughts? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Wed, Jan 9, 2013 at 11:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I made some changes to this, and I think the result (attached) is > cleaner overall. > > Now, this review is pretty much unfinished as far as I am concerned; > mainly I've been trying to figure out how it all works and improving > some stuff as I go, and I haven't looked at the complete patch yet. We > might very well doing some things quite differently; for example I'm not > really sure of the way we handle CommandTags, or the way we do object > lookups at the event trigger stage, only to have it repeated later when > the actual deletion takes place. In particular, since event_trigger.c > does not know the lock strength that's going to be taken at deletion, it > only grabs ShareUpdateExclusiveLock; so there is lock escalation here > which could lead to deadlocks. This might need to be rethought. I > added a comment somewhere, noting that perhaps it's ProcessUtility's job > to set the object classId (or at least the ObjectType) at the same time > the TargetOid is being set, rather than have event_trigger.c figure it > out from the parse node. And so on. I haven't looked at plpgsql or > pg_dump yet, either. I think this points to a couple of problems: this patch isn't well-enough thought out, and it's got several features jammed into a single patch. This should really be split up into several patches and each one submitted separately. I think that ddl_command_trace is an unnecessary frammish that should be ripped out entirely. It is easy enough to accomplish the same thing with ddl_command_start and ddl_command_end, and so we're just increasing the number of cases we have to support in the future for no real benefit. > We've been discussing on IM the handling of DROP in general. The > current way it works is that the object OID is reported only if the > toplevel command specifies to drop a single object (so no OID if you do > DROP TABLE foo,bar); and this is all done by standard_ProcessUtility. > This seems bogus to me, and it's also problematic for things like DROP > SCHEMA, as well as DROP OWNED BY (which, as noted upthread, is not > handled at all but should of course be). I think the way to do it is > have performDeletion and performMultipleDeletions (dependency.c) call > into the event trigger code, by doing something like this: > > 1. look up all objects to drop (i.e. resolve the list of objects > specified by the user, and complete with objects dependent on those) > > 2. iterate over this list, firing DROP at-start event triggers > > 3. do the actual deletion of objects (i.e. deleteOneObject, I think) > > 4. iterate again over the list firing DROP at-end event triggers This gets right back to an argument Dimitri and I have been having since v1 of this patch, which is whether these are command triggers or event triggers. I think we ought to support both, but they are not the same thing. I think the design you are proposing is just fine for an event called sql_drop, but it makes no sense for an event called ddl_command_end, which needs to be called once per DDL statement - at the end. Not once per object dropped. > We would have one or more event triggers being called with a context of > TOPLEVEL (for objects directly mentioned in the command), and then some > more calls with a context of CASCADE. Exactly how should DROP OWNED BY > be handled is not clear; perhaps we should raise one TOPLEVEL event for > the objects directly owned by the role? Maybe we should just do CASCADE > for all objects? This is not clear yet. TOPLEVEL is supposed to correspond to a complete SQL statement entered by the user: typedef enum { PROCESS_UTILITY_TOPLEVEL, /* toplevel interactive command */ PROCESS_UTILITY_QUERY, /* a complete query, but not toplevel */ PROCESS_UTILITY_SUBCOMMAND, /* a piece of a query */ PROCESS_UTILITY_GENERATED /* internally generated node query node */ } ProcessUtilityContext; IMHO, "DROP OWNED BY" probably shouldn't fire ddl_command_start/end. I excluded it from ddl_command_start in the initial commit because it's such a weird case, and it didn't seem to cleanly fit in the framework. Now, that could be changed, but surely that should be a separate patch if we're going to do it rather than lumped in with a bunch of other dubious changes, and more than that, it just underscores, again, the fact that we're pouring a lot of effort into fleshing out ddl_command_start/end instead of doing what we probably should be doing, which is adding events that are actually targeting what we want to do, like sql_drop. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I think this points to a couple of problems: this patch isn't > well-enough thought out, and it's got several features jammed into a > single patch. This should really be split up into several patches and > each one submitted separately. Ok. Now I want to talk about our process a little. That's a 2 paragraphs diversion, after that it's getting back to technical matters. There's a difference between "it's not the way I would have done it" and "the author didn't think about what he's doing". That's also the reason why it's very hard to justify sending a polished enough patch as a non commiter. And then this patch is like the next one in a long series that is lasting for about 2 years now, and spliting it is just more work for everybody, and then you take the risk that the next commiter who looks at the patch prefers to see a complete enough view of the goal you're trying to reach. > I think that ddl_command_trace is an unnecessary frammish that should > be ripped out entirely. It is easy enough to accomplish the same > thing with ddl_command_start and ddl_command_end, and so we're just > increasing the number of cases we have to support in the future for no > real benefit. I think you might want to review the use case behing ddl_command_trace, that has been asked by who users wanted to see their use case supported in some easier way than just what you're talking about here. > This gets right back to an argument Dimitri and I have been having > since v1 of this patch, which is whether these are command triggers or > event triggers. I think we ought to support both, but they are not > the same thing. I think the design you are proposing is just fine for > an event called sql_drop, but it makes no sense for an event called > ddl_command_end, which needs to be called once per DDL statement - at > the end. Not once per object dropped. What I think you're missing here is the proposal flying around to have drop operation get back to ProcessUtility so that C hooks and event triggers both can have at it. It's been debated on the list earlier, way before we talked about event triggers, also as a way to clean up things, IIRC. I agree that if we keep the code as it is to implement cascading behavior, a sql_drop event makes more sense. > TOPLEVEL is supposed to correspond to a complete SQL statement entered > by the user: Having the DROP command get back to ProcessUtility would allow us to have the nested events as GENERATED, but CASCADE would allow an easier way to just match that if that's what users are interested into. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Jan 16, 2013 at 4:16 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Ok. Now I want to talk about our process a little. That's a 2 paragraphs > diversion, after that it's getting back to technical matters. > > There's a difference between "it's not the way I would have done it" and > "the author didn't think about what he's doing". That's also the reason > why it's very hard to justify sending a polished enough patch as a non > commiter. There's no rule that anyone's got to agree with my opinion on anything, but the idea that a patch should do one thing and do it well is not a novel concept on this mailing list, however much you may feel otherwise. This patch: - adds ddl_command_trace and ddl_command_end events - causes those events to be called not only for actual SQL commands but also for things that happen, at present, to go through the same code path - adds additional magic variables to PL/pgsql to expose new information not previously exposed - adds CONTEXT as an additional event-trigger-filter variable, along with TAG In my mind, that's four or five separate patches. You don't have to agree, but that's what I think, and I'm not going to apologize for thinking it. Moreover, I think you will struggle to find examples of patches committed in the last three years that made as many different, only loosely-related changes as you're proposing for this one. As for the patch not being well-thought-out, I'm sticking by that, too. Alvaro's comments about lock escalations should be enough to tickle your alarm bells, but there are plenty of other problems here, too. > I think you might want to review the use case behing ddl_command_trace, > that has been asked by who users wanted to see their use case supported > in some easier way than just what you're talking about here. That argument carries no water with me. You're asking every PostgreSQL user in the universe to carry the overhead of a feature that 90% of them will not use. That is mighty expensive syntactic sugar. I am not talking about code-complexity, either. It is pretty obvious that there is run-time overhead of having this feature. I am not aware of any case where we have accepted run-time overhead for a feature not mandated by the SQL standard. Given the rest of what you have here, it is quite simple to arrange for the same function to be called after a create or alter and before a drop. Being able to do that in one command instead of two is not a sufficient reason to add another event type. >> This gets right back to an argument Dimitri and I have been having >> since v1 of this patch, which is whether these are command triggers or >> event triggers. I think we ought to support both, but they are not >> the same thing. I think the design you are proposing is just fine for >> an event called sql_drop, but it makes no sense for an event called >> ddl_command_end, which needs to be called once per DDL statement - at >> the end. Not once per object dropped. > > What I think you're missing here is the proposal flying around to have > drop operation get back to ProcessUtility so that C hooks and event > triggers both can have at it. It's been debated on the list earlier, way > before we talked about event triggers, also as a way to clean up things, > IIRC. I'm very much opposed to that proposal, as I am to your proposal to expose internal and generated events to users. Recursing into ProcessUtility is a nasty, messy hook that is responsible for subtle bugs and locking problems in our current DDL implementation. We should be trying to refactor that to clean it up, not exposing it as a user-visible detail. I do NOT want a future refactoring effort to clean up race conditions and duplicate name lookups in the DDL code to be blocked because someone is relying on the details of the existing implementation in their event trigger implementation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > There's a difference between "it's not the way I would have done it" and > "the author didn't think about what he's doing". That's also the reason > why it's very hard to justify sending a polished enough patch as a non > commiter. > And then this patch is like the next one in a long series that is > lasting for about 2 years now, and spliting it is just more work for > everybody, and then you take the risk that the next commiter who looks > at the patch prefers to see a complete enough view of the goal you're > trying to reach. What was discussed at the last dev meeting was assigning a committer to each large patch to start with, which would reduce the risk of the goalposts moving that way. It seems to me that Robert's at least unofficially taken that role for event triggers. You should be happy, because if I were reviewing it I'd likely bounce the whole thing. I'm not convinced this will *ever* be a stable feature that doesn't create more problems than it fixes. > What I think you're missing here is the proposal flying around to have > drop operation get back to ProcessUtility so that C hooks and event > triggers both can have at it. I've seen no such proposal, and it seems like a nonstarter just from the idea. dependency.c doesn't have a syntax tree to describe each object that it finds to drop; creating one, and then doing a lookup to re-find the object, is just going to make drops hugely slower and buggier. Not to mention the large amount of code that would have to be added and maintained. Not to mention that the objects dependency.c works with aren't necessarily all that interesting from the user's level --- for instance, do you really want to see each column default expression dropped individually? Not to mention that the permissions considerations are different from a standalone DROP. The bigger picture there is that it's taken us years, and multiple major iterations, to get cascaded drops to work properly and reliably. I'm disinclined to rip that code open and rewrite it completely; let alone add hooks that might inject random operations at any point in the process. regards, tom lane
On Wed, Jan 16, 2013 at 6:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > What was discussed at the last dev meeting was assigning a committer to > each large patch to start with, which would reduce the risk of the > goalposts moving that way. It seems to me that Robert's at least > unofficially taken that role for event triggers. You should be happy, > because if I were reviewing it I'd likely bounce the whole thing. > I'm not convinced this will *ever* be a stable feature that doesn't > create more problems than it fixes. And speaking of the goalposts moving... I don't think that's the problem, here. Rather, I think the problem is that the design is ardently refusing to move. It might be a slight overstatement to say that every review I've ever posted for this patch has complained about design decisions that expose implementation details to the user that we might want to change later, but not by much. And yet, two years on, we've got proposals on the table to artificially force *more* things through ProcessUtility(). There's no particularly consistency to which things do and don't go through that function today, and no reason whatsoever to try to force everything to go through there. I agree with everything you say in the portion of the email I didn't quote, and I'm pretty sure I've made similar points more than once in the past. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > - adds ddl_command_trace and ddl_command_end events > - causes those events to be called not only for actual SQL commands > but also for things that happen, at present, to go through the same > code path > - adds additional magic variables to PL/pgsql to expose new > information not previously exposed > - adds CONTEXT as an additional event-trigger-filter variable, along with TAG > > In my mind, that's four or five separate patches. You don't have to > agree, but that's what I think, and I'm not going to apologize for > thinking it. Moreover, I think you will struggle to find examples of > patches committed in the last three years that made as many different, > only loosely-related changes as you're proposing for this one. So, say I do that. Now we have 5 patches to review. They all are against the same set of files, so there's a single possible ordering to apply them, and so to review them. When we reach to an agreement out-of-order (we will), it takes a sensible amount of time to rebuild the patch set in a different order. I don't think we have the time to do that. Now, if that's what it takes, I'll spend time on it. In which exact order do you want to be reviewing and applying that series of patches? > As for the patch not being well-thought-out, I'm sticking by that, > too. Alvaro's comments about lock escalations should be enough to > tickle your alarm bells, but there are plenty of other problems here, > too. Here's the comment in the code where the lock problems are discussed: /** Fill in the EventTriggerTargetOid for a DROP command and a* "ddl_command_start" Event Trigger: the lookup didn't happenyet and we* still want to provide the user with that information when possible.** We only do the lookup if the commandcontains a single element, which is* often forced to be the case as per the grammar (see the drop_type:* productionfor a list of cases where more than one object per command is* allowed).** We take no exclusive lock here, themain command will have to do the* name lookup all over again and take appropriate locks down after the* User Defined Coderuns anyway, and the commands executed by the User* Defined Code will take their own locks. We only lock the objectshere so* that they don't disapear under us in another concurrent transaction,* hence using ShareUpdateExclusiveLock. XXX this seems prone to lock* escalation troubles.*/ My thinking is that the user code can do anything, even run a DROP command against the object that we are preparing ourselves to be dropping. I don't think we can bypass doing the lookup twice. The way not to have to do any lookup in our code is not to report the OID of the Object being dropped. Then, the event trigger code will do that lookup as its first thing, certainly. The lookup will happen. RAISE NOTICE 'drop table % (%)', tg_objectname::regclass, tg_objectname::regclass::oid; That is, at least, the angle I considered when crafting this patch. I'm happy to change that angle if needed and adjust to things I missed and you will tell me about. >> I think you might want to review the use case behing ddl_command_trace, >> that has been asked by who users wanted to see their use case supported >> in some easier way than just what you're talking about here. > > […] > Being able to do > that in one command instead of two is not a sufficient reason to add > another event type. That feature was proposed in past october (with a patch) and received no comment, so I went on with it. The performance costs of this patch is another couple of cache lookups when doing a DDL command, which I saw as acceptable. http://www.postgresql.org/message-id/m28vbgcc30.fsf@2ndQuadrant.fr I'm not so attached to it that I will put the rest of the patch in danger with it, I'm fine about removing that part if we have a good reason to. And yes, you (politely) saying "I don't like it" is a good enough reason to. Our process for a non-commiter proposal seems to me that you have to send a patch showing what you're talking about for a thread to form and to get some comments on the idea, either rejecting it or reaching to a consensus about it. So the opening of that thread was in october and the discussion happens now: it's very fine, and I could be happy and thankfull about this fact with some subtle changes in the form of the messages. > I'm very much opposed to that proposal, as I am to your proposal to > expose internal and generated events to users. Recursing into > ProcessUtility is a nasty, messy hook that is responsible for subtle > bugs and locking problems in our current DDL implementation. We We are publishing 2 pieces of information tied to the implementation in the current patch: - the parse tree, only available to C code, and without any API stability promises - the fact that somme commands are running nested commands: serial does a create sequence, create schema create table issuesa create table. My understanding of your proposal not to call them ddl_command_* events is that you are exposing exactly the same set of implementation dependent information to the users, just under another name. What I mean is that if a user wants to log any sequence creation, he will have to either know that some of them happen within the CONTEXT named 'GENERATED', or to know that some of them happen as ddl_command_* events and some of them happen as sequence_for_serial_* events, say. In other words, while I can follow your general approach to things, I don't see any practical impact that makes me think it's a much cleaner proposal in terms of not publishing the implementation details, because I don't think that it is possible to both give the users the information they need and hide those implementation details. > should be trying to refactor that to clean it up, not exposing it as a > user-visible detail. I do NOT want a future refactoring effort to > clean up race conditions and duplicate name lookups in the DDL code to > be blocked because someone is relying on the details of the existing > implementation in their event trigger implementation. That problem is going to happen in some way or another. It's not about avoiding the problem completely, but choosing a set of compromises. I'm yet to understand what your set of compromises is in detailed enough practical implementation terms (not just "it's another set of events") and how it's a better compromise over all (I trust your judgement on that, I still want to understand). After all, the whole point of the Event Trigger patch is to expose some level of implementation details. Tom Lane <tgl@sss.pgh.pa.us> writes: > What was discussed at the last dev meeting was assigning a committer to > each large patch to start with, which would reduce the risk of the > goalposts moving that way. It seems to me that Robert's at least > unofficially taken that role for event triggers. You should be happy, > because if I were reviewing it I'd likely bounce the whole thing. As Robert, I don't think the problem is about moving goal posts here, it's more about how we want to spend our time. I don't want to be refactoring 5 branches each time we reach on a new consensus only to realise that of course the inter-dependent set of patches just need to be redone from scratch because we will be applying another one first. As for the nested commands design, see above for why I'm not already buying into his counter proposal. Hint: I don't think it solves the problem Robert is complaining about. Details: I certainly don't understand what problem he is complaining about really, and I know I've been trying to get that information before. It must be something obvious. > I'm not convinced this will *ever* be a stable feature that doesn't > create more problems than it fixes. I'm still open to hearing about another way to implement - ddl support for logical replication - extensions to ddl behaviour I want to be able to install an extension that will change the way some DDL are run, and I know that what I want to do here will never get accepted for the core version of PostgreSQL. That's a very fine position that I understand well enough to work on a patch series where the only goal is that I can do my hacking outside of PostgreSQL core. > I've seen no such proposal, and it seems like a nonstarter just from the > idea. dependency.c doesn't have a syntax tree to describe each object > that it finds to drop; creating one, and then doing a lookup to re-find > the object, is just going to make drops hugely slower and buggier. Not We wouldn't want to be doing it that way, of course. One idea was to create a new parse node (CascadeDropStmt, say) that gets the OID of the object to drop, and only does another schema/name lookup when there's an event trigger to fire. So the only obvious part of that proposal is that the problem is complex enough and the solution not ready yet. That's also why that problem is not addressed in any patch I sent to this day. The status of that part of the problem is still to find together a good design to implement. That part of the discussion really should live in a separate thread. > to mention the large amount of code that would have to be added and > maintained. Not to mention that the objects dependency.c works with > aren't necessarily all that interesting from the user's level --- for > instance, do you really want to see each column default expression > dropped individually? Not to mention that the permissions > considerations are different from a standalone DROP. No, we don't want to see all the details. Yes, we will need to come up with a list of the ones we want to see. As I'm currently working on the DDL part of the Event Triggers, my current thinking is to only include objects that we could have been DROPing with a TOPLEVEL command. > The bigger picture there is that it's taken us years, and multiple major > iterations, to get cascaded drops to work properly and reliably. I'm > disinclined to rip that code open and rewrite it completely; let alone > add hooks that might inject random operations at any point in the > process. Fair enough. If that's the consensus, there's nothing to change in my current patch, or maybe a paragraph or two in the docs to set users expectations. Maybe we should ask the logical replication guys their opinion about how to solve DROP CASCADE support for them, so that we all agree that not doing anything is a good enough answer. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Jan 17, 2013 at 5:18 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Now, if that's what it takes, I'll spend time on it. In which exact > order do you want to be reviewing and applying that series of patches? Let's agree on which things we even want to do first. Here's my take: - adds ddl_command_trace and ddl_command_end events I think ddl_command_end is OK and I'm willing to commit that if extracted as its own patch. I think ddl_command_trace is unnecessary syntactic sugar. - causes those events to be called not only for actual SQL commands but also for things that happen, at present, to go through the same code path I think this is a bad idea, not only because, as I said before, it exposes internal implementation details, but also because it's probably not safe. As Noah (I believe, or maybe Tom), observed in a previous post, we've gone to a lot of work to make sure that nothing you do inside a trigger can crash the server, corrupt the catalogs, trigger elog() messages, etc. That applies in spades to DDL. Despite Tom's skepticism, I'm pretty sure that it's safe for us to execute trigger function calls either completely before or completely after we execute the DDL command. It should be no different than executing them as separate commands. But doing something in the middle of a command is something else altogether. Consider, for example, that ALTER TABLE does a bunch of crap internally to itself, and then recurses into ProcessUtility to do some more stuff. This is a terrible design, but it's what we've got. Are you absolutely sure that the intermediate state that exists after that first bunch of stuff is done and before those other things are done is a safe place to execute arbitrary user-provided code? I'm not. And the more places we add that recurse into ProcessUtility, or indeed, add event triggers in general, the more places we're going to add new failure modes. Fixing that is part of the price of adding a new event trigger and I'm OK with that. But I'm not OK with baking into the code the assumption that any current or future callers of ProcessUtility() should be prepared for arbitrary code execution. The code wasn't written with that in mind, and it will not end well. - adds additional magic variables to PL/pgsql to expose new information not previously exposed I'm not sure that I agree with the implementation of this, but I think I'm OK with the concept. I will review it when it is submitted in a form not intertwined with other things. - adds CONTEXT as an additional event-trigger-filter variable, along with TAG I'm OK with adding new event-trigger-filter variables, provided that they are done without additional run-time overhead for people who are not using event triggers; I think it's nice that we can support that. But I think this particular proposal is DOA because nothing except TOPLEVEL is actually safe. >> As for the patch not being well-thought-out, I'm sticking by that, >> too. Alvaro's comments about lock escalations should be enough to >> tickle your alarm bells, but there are plenty of other problems here, >> too. > > Here's the comment in the code where the lock problems are discussed: > > /* > * Fill in the EventTriggerTargetOid for a DROP command and a > * "ddl_command_start" Event Trigger: the lookup didn't happen yet and we > * still want to provide the user with that information when possible. > * > * We only do the lookup if the command contains a single element, which is > * often forced to be the case as per the grammar (see the drop_type: > * production for a list of cases where more than one object per command is > * allowed). > * > * We take no exclusive lock here, the main command will have to do the > * name lookup all over again and take appropriate locks down after the > * User Defined Code runs anyway, and the commands executed by the User > * Defined Code will take their own locks. We only lock the objects here so > * that they don't disapear under us in another concurrent transaction, > * hence using ShareUpdateExclusiveLock. XXX this seems prone to lock > * escalation troubles. > */ > > My thinking is that the user code can do anything, even run a DROP > command against the object that we are preparing ourselves to be > dropping. I don't think we can bypass doing the lookup twice. Sure, but there's also the issue of getting the right answer. For example, suppose you are planning to drop a function. You do a name lookup on the name and call the event trigger. Meanwhile, someone else renames the function and creates a new one with the same name. After the event trigger returns, the actual drop code latches onto the one with the new name. Now, the event trigger ran with OID A and the actual object dropped was one with OID B. It's possible that KaiGai's RENAME refactoring in 9.3 has strengthened the locking enough that this particular failure mode is no longer possible in that specific scenario, but I am fairly confident it would have been an issue in 9.2, and it may still be. In general, the only way to be sure that things like that can't happen is to only do the name lookup once. Our locking on non-table objects is still pretty flim-flammy, and there are a lot of concurrent scenarios where things that seem like they shouldn't change under you actually can. There's another issue here, too, which is that this change reintroduces a failure mode that I spent a lot of time cleaning up during the 9.2 cycle - namely, taking locks on a table before permissions have been checked, thus providing new methods for unprivileged users to block operations on tables they don't have rights to. >> I'm very much opposed to that proposal, as I am to your proposal to >> expose internal and generated events to users. Recursing into >> ProcessUtility is a nasty, messy hook that is responsible for subtle >> bugs and locking problems in our current DDL implementation. We > > We are publishing 2 pieces of information tied to the implementation in > the current patch: > > - the parse tree, only available to C code, and without any API > stability promises I'm OK with that. People who program in C are taking their life in their hands anyway as far as new versions are concerned. > - the fact that somme commands are running nested commands: serial does > a create sequence, create schema create table issues a create table. > > My understanding of your proposal not to call them ddl_command_* events > is that you are exposing exactly the same set of implementation > dependent information to the users, just under another name. What I'm proposing is that we have a separate set of events that get invoked every time an object is created, altered, or dropped, regardless of the cause. Yes, the name will be different, but the point is not to only change the name, but rather to also change the definition. There are two separate things you might want here: - to get a callback every time a command is executed (e.g. so you can log it, check permissions, or reject the entire statement) - to get a callback every time a certain kind of SQL object is created, altered, or destroyed (e.g. so you can replicate those operations on another server) For example, suppose a user executes the command "DROP OWNED BY fred".For a logging hook, you want the first thing: one callback,saying, hey, the user is trying to run DROP OWNED BY fred. You log it once, and go home. For a replication hook, you want the second thing: one callback per object, with details about each one, so you can drop them on the remote side. Both things are useful, but they are different. As a further example, suppose that in 9.4 (or 9.17) we add a command "DROP TABLES IN SCHEMA fred WHERE name LIKE 'bob%'". Well, the logging trigger still just works (because it's only writing the statement, without caring about its contents). And the replication trigger still just works too (because it will still get a callback for every table that gets dropped, even though it knows nothing about the new syntax). That's very powerful. Of course, there will still be cases where things have to change, but you can minimize it by clean definitions. It pains me that I've evidently failed to communicate this concept clearly despite a year or more of trying. Does that make sense? Is there some way I can make this more clear? The difference seems very clear-cut to me, but evidently I'm not conveying it well. > That problem is going to happen in some way or another. It's not about > avoiding the problem completely, but choosing a set of compromises. I'm > yet to understand what your set of compromises is in detailed enough > practical implementation terms (not just "it's another set of events") > and how it's a better compromise over all (I trust your judgement on > that, I still want to understand). > > After all, the whole point of the Event Trigger patch is to expose some > level of implementation details. I don't really agree with that. I think the point is to expose what the system is doing to the DBA. I'm OK with exposing the fact that creating a table with a serial column also creates a sequence. There is no problem with that - it's all good. What we DON'T want to do is set up a system where the fact that it creates a sequence is exposed because that happens to go through ProcessUtility() and the fact that it creates a constraint is not because that happens not to go through ProcessUtility(). That is not the sort of quality that our users have come to expect from PostgreSQL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-01-17 11:15:24 -0500, Robert Haas wrote: > As a further example, suppose that in 9.4 (or 9.17) we add a command > "DROP TABLES IN SCHEMA fred WHERE name LIKE 'bob%'". Well, the > logging trigger still just works (because it's only writing the > statement, without caring about its contents). And the replication > trigger still just works too (because it will still get a callback for > every table that gets dropped, even though it knows nothing about the > new syntax). That's very powerful. Of course, there will still be > cases where things have to change, but you can minimize it by clean > definitions. > > It pains me that I've evidently failed to communicate this concept > clearly despite a year or more of trying. Does that make sense? Is > there some way I can make this more clear? The difference seems very > clear-cut to me, but evidently I'm not conveying it well. Well, I didn't manage to follow the topic with the level of detail I would have liked to/should have so its very well possible that I missed a proposal of how to implement that concept in a way you like? With some squinting you can actually see the proposed ddl_trace callsite as exactly that kind of replication trigger, but with a worse name... Without piggy-backing on the internal commands generated - which currently seems to be achieved by passing everything through ProcessUtility, but could work in a quite different way - and reconstructing sql from that I don't immediately see how you want to do this? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > - adds ddl_command_trace and ddl_command_end events > > I think ddl_command_end is OK and I'm willing to commit that if > extracted as its own patch. I think ddl_command_trace is unnecessary > syntactic sugar. Ok. Will prepare a non controversial patch for ddl_command_end. After having played with the patch, I happen to quite like ddl_command_trace, but I won't try to protect it from its death any more than that. > - causes those events to be called not only for actual SQL commands > but also for things that happen, at present, to go through the same > code path > > I think this is a bad idea, not only because, as I said before, it > exposes internal implementation details, but also because it's > probably not safe. As Noah (I believe, or maybe Tom), observed in a > previous post, we've gone to a lot of work to make sure that nothing > you do inside a trigger can crash the server, corrupt the catalogs, > trigger elog() messages, etc. That applies in spades to DDL. I naively though that doing CommandCounterIncrement() before running the event trigger would go a long enough way towards making the operation safe when the current code is calling back into ProcessUtility(). > Despite Tom's skepticism, I'm pretty sure that it's safe for us to > execute trigger function calls either completely before or completely > after we execute the DDL command. It should be no different than > executing them as separate commands. But doing something in the > middle of a command is something else altogether. Consider, for > example, that ALTER TABLE does a bunch of crap internally to itself, > and then recurses into ProcessUtility to do some more stuff. This is > a terrible design, but it's what we've got. Are you absolutely sure > that the intermediate state that exists after that first bunch of > stuff is done and before those other things are done is a safe place > to execute arbitrary user-provided code? I'm not. And the more > places we add that recurse into ProcessUtility, or indeed, add event > triggers in general, the more places we're going to add new failure > modes. While I hear you, I completely fail to understand which magic is going to make your other idea safer than this one. Namely, calling an event trigger named sql_drop rather than ddl_command_start from the same place in the code is certainly not helping at all. If your answer to that is how to implement it (e.g. keeping a queue of events for which to fire Event Triggers once we reach a safe point in the code), then why couldn't we do exactly the same thing under the other name? > Fixing that is part of the price of adding a new event trigger and I'm > OK with that. But I'm not OK with baking into the code the assumption > that any current or future callers of ProcessUtility() should be > prepared for arbitrary code execution. The code wasn't written with > that in mind, and it will not end well. Again, while I agree with the general principle, it happens that the calls to the arbitrary code are explicit in ProcessUtility(), so that if the place is known to be unsafe it's easy enough to refrain from calling the user code. We do that already for CREATE INDEX CONCURRENTLY at ddl_command_end. > - adds additional magic variables to PL/pgsql to expose new > information not previously exposed > > I'm not sure that I agree with the implementation of this, but I think > I'm OK with the concept. I will review it when it is submitted in a > form not intertwined with other things. Fair enough. > - adds CONTEXT as an additional event-trigger-filter variable, along with TAG > > I'm OK with adding new event-trigger-filter variables, provided that > they are done without additional run-time overhead for people who are > not using event triggers; I think it's nice that we can support that. > But I think this particular proposal is DOA because nothing except > TOPLEVEL is actually safe. Yeah, let's first see about that. >> My thinking is that the user code can do anything, even run a DROP >> command against the object that we are preparing ourselves to be >> dropping. I don't think we can bypass doing the lookup twice. > > Sure, but there's also the issue of getting the right answer. For > example, suppose you are planning to drop a function. You do a name > lookup on the name and call the event trigger. Meanwhile, someone > else renames the function and creates a new one with the same name. > After the event trigger returns, the actual drop code latches onto the > one with the new name. Now, the event trigger ran with OID A and the > actual object dropped was one with OID B. It's possible that KaiGai's Is your scenario even possible when we take a ShareUpdateExclusiveLock on he object before running the Event Trigger, in order to protect against exactly that scenario, as said in the comment? Maybe another lock should be taken. Which one? > There's another issue here, too, which is that this change > reintroduces a failure mode that I spent a lot of time cleaning up > during the 9.2 cycle - namely, taking locks on a table before > permissions have been checked, thus providing new methods for > unprivileged users to block operations on tables they don't have > rights to. Aren't Event Triggers superuser only? Do your concern means that we should force Event Triggers Functions to be SECURITY DEFINER? >> - the parse tree, only available to C code, and without any API >> stability promises > > I'm OK with that. People who program in C are taking their life in > their hands anyway as far as new versions are concerned. Cool. > What I'm proposing is that we have a separate set of events that get > invoked every time an object is created, altered, or dropped, > regardless of the cause. Yes, the name will be different, but the > point is not to only change the name, but rather to also change the > definition. There are two separate things you might want here: In the rest of your email, I can't decide what kind of information you want those new events to give to the Event Trigger User Code. The other thing is, as I said before, that I can't see why those new events would suddenly make the call point safer. Other than that, I must say that your proposal is sound to me and that I like it too. I just fail to understand how it makes any difference other than in naming things, really. Why would a command that the system executes for you has to be considered such a different beast from the same command when you happen to have sent explicitely? A GENERATED context sounds better to me than a "sql_create" event as a user interface, obviously, and implementing the later has the *exact* same set of challenges to address, as far as I understand. > For example, suppose a user executes the command "DROP OWNED BY fred". > For a logging hook, you want the first thing: one callback, saying, > hey, the user is trying to run DROP OWNED BY fred. You log it once, > and go home. For a replication hook, you want the second thing: one Maybe it's only me, but I'm really annoyed that there's currently no way to get the list of objects actually dropped other than parsing the logs. So I know that I would want my audit trigger to see them all. > callback per object, with details about each one, so you can drop them > on the remote side. Both things are useful, but they are different. Sure. And you have exactly that feature in my patch, where by default CONTEXT is TOPLEVEL and you see only the main command, but you can also register against both TOPLEVEL and GENERATED, say CASCADE in the future, and have all the details. Or only register against CASCADE, too. > As a further example, suppose that in 9.4 (or 9.17) we add a command > "DROP TABLES IN SCHEMA fred WHERE name LIKE 'bob%'". Well, the > logging trigger still just works (because it's only writing the > statement, without caring about its contents). And the replication > trigger still just works too (because it will still get a callback for > every table that gets dropped, even though it knows nothing about the > new syntax). That's very powerful. Of course, there will still be > cases where things have to change, but you can minimize it by clean > definitions. Your use case here is exactly the reason why we proposed to have the main DROP command implemented via ProcessUtility, so that an Event Trigger called with a CASCADE context works the same way whatever the TOPLEVEL command looks like, and always against a single object. > It pains me that I've evidently failed to communicate this concept > clearly despite a year or more of trying. Does that make sense? Is > there some way I can make this more clear? The difference seems very > clear-cut to me, but evidently I'm not conveying it well. Communication is hard, and apparently even more so for computer-people, or they wouldn't prefer to spend their jobs talking to a computer rather than talking to people. I've heard that so many times that it almost could make it true, sometimes… >> After all, the whole point of the Event Trigger patch is to expose some >> level of implementation details. > > I don't really agree with that. I think the point is to expose what > the system is doing to the DBA. I'm OK with exposing the fact that > creating a table with a serial column also creates a sequence. There > is no problem with that - it's all good. What we DON'T want to do is > set up a system where the fact that it creates a sequence is exposed > because that happens to go through ProcessUtility() and the fact that > it creates a constraint is not because that happens not to go through > ProcessUtility(). That is not the sort of quality that our users have > come to expect from PostgreSQL. I see what you mean. I don't see any CREATE CONSTRAINT command in the following reference, though. http://www.postgresql.org/docs/9.2/static/sql-commands.html Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 17 January 2013 16:15, Robert Haas <robertmhaas@gmail.com> wrote: > As a further example, suppose that in 9.4 (or 9.17) we add a command > "DROP TABLES IN SCHEMA fred WHERE name LIKE 'bob%'". Well, the > logging trigger still just works (because it's only writing the > statement, without caring about its contents). And the replication > trigger still just works too (because it will still get a callback for > every table that gets dropped, even though it knows nothing about the > new syntax). That's very powerful. Of course, there will still be > cases where things have to change, but you can minimize it by clean > definitions. > > It pains me that I've evidently failed to communicate this concept > clearly despite a year or more of trying. Does that make sense? Is > there some way I can make this more clear? The difference seems very > clear-cut to me, but evidently I'm not conveying it well. > >> That problem is going to happen in some way or another. It's not about >> avoiding the problem completely, but choosing a set of compromises. I'm >> yet to understand what your set of compromises is in detailed enough >> practical implementation terms (not just "it's another set of events") >> and how it's a better compromise over all (I trust your judgement on >> that, I still want to understand). >> >> After all, the whole point of the Event Trigger patch is to expose some >> level of implementation details. > > I don't really agree with that. I think the point is to expose what > the system is doing to the DBA. I'm OK with exposing the fact that > creating a table with a serial column also creates a sequence. There > is no problem with that - it's all good. What we DON'T want to do is > set up a system where the fact that it creates a sequence is exposed > because that happens to go through ProcessUtility() and the fact that > it creates a constraint is not because that happens not to go through > ProcessUtility(). That is not the sort of quality that our users have > come to expect from PostgreSQL. "The point", i.e. the main use case, is to replicate the DDL in a useful form. Discussions and reasoning need to focus on the main use case, not on weird futures or qualitative points. Yes, the discussion has gone on for years and it really should come to a useful conclusion sometime soon. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 17 January 2013 16:15, Robert Haas <robertmhaas@gmail.com> wrote: >> It pains me that I've evidently failed to communicate this concept >> clearly despite a year or more of trying. Does that make sense? Is >> there some way I can make this more clear? The difference seems very >> clear-cut to me, but evidently I'm not conveying it well. > "The point", i.e. the main use case, is to replicate the DDL in a useful form. > Discussions and reasoning need to focus on the main use case, not on > weird futures or qualitative points. I have to completely disagree with that. If we don't want PostgreSQL to soon subside into an unfixable morass, as I think Brooks puts it, we must *not* simply patch things in a way that expediently provides an approximation of some desired feature. We have to consider, and put substantial weight on, having a clean and maintainable system design. This is particularly the case if we're talking about a feature that is going to expose backend internals to users to any degree. We're either going to be constrained to not change those internals any more, or expect to break users' applications when we do change them, and neither result is very appealing. Especially not when we're talking about the structure of Postgres' DDL code, which can most charitably be described as "it just grew", not as something that has any clear architecture to it. Now if we could quantify these things, that would be great, but AFAICS "qualitative points" is all we've got to go on. I think that we're not realistically going to be able to introduce event triggers in very many of the places we'd like to have them without first doing a lot of fundamental refactoring. And that is hard, dangerous, slow work that tends to break things in itself. As we've been repeatedly reminded in connection with KaiGai-san's refactoring patches. So my opinion is that most of what we'd like to have here is material for 9.4 or 9.5 or even further out. If we can get the event trigger catalog infrastructure and some *very* basic events, like end-of-toplevel-command, into place for 9.3, we'll have done well. regards, tom lane
On Thu, Jan 17, 2013 at 12:06 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Ok. Will prepare a non controversial patch for ddl_command_end. Thanks. I will make a forceful effort to review that in a timely fashion when it's posted. >> I think this is a bad idea, not only because, as I said before, it >> exposes internal implementation details, but also because it's >> probably not safe. As Noah (I believe, or maybe Tom), observed in a >> previous post, we've gone to a lot of work to make sure that nothing >> you do inside a trigger can crash the server, corrupt the catalogs, >> trigger elog() messages, etc. That applies in spades to DDL. > > I naively though that doing CommandCounterIncrement() before running the > event trigger would go a long enough way towards making the operation > safe when the current code is calling back into ProcessUtility(). It's something, but I'd be shocked if it covers all the bases. Let me try to give a concrete example of how I think another firing point could be made to work along the lines I'm suggesting. I'm not going to use DROP as an example because I think upon reflection that will be a particularly challenging case to make work correctly. Instead, let me use the example of ALTER. Goal: Every time an ALTER command is used on object *that actually exists*, we will call some user-defined function and pass the object type, the OID of the object, and some details about what sort of alteration the user has requested. Where shall we put the code to do this? Clearly, ProcessUtility is too early, because at that point we have not done the name lookup, locked the object, or checked permissions. So the OID of the target object is not and cannot be known at that point. We cannot do an extra name lookup at that point without taking a lock, because the answer might change, or, due to non-MVCC catalog access, the lookup might fail to find an object altogether. And we can't take a lock without checking permissions first. And the permissions-checking logic can't be done inside ProcessUtility(), because it's object-type specific and we don't want to duplicate it. So, instead, what we need to do is go into each function that implements ALTER, and modify it so that after it does the dance where we check permissions, take locks, and look up names, we call the trigger function. That's a bit of a nuisance, because we probably have a bunch of call sites to go fix, but in theory it should be doable. The problem of course, is that it's not intrinsically safe to call user-defined code there. If it drops the object and creates a new one, say, hilarity will probably ensue. But that should be a fixable problem. The only things we've done at this point are check permissions, lock, and look up names. I think we can assume that if the event trigger changes permissions, it's safe for the event trigger to ignore that. The trigger cannot have released the lock, so we're OK there. The only thing it can really have done that's problematic is change the results of the name lookup, say by dropping or renaming the object. But that's quite simple to protect against: after firing the trigger, we do a CommandCounterIncrement() and repeat the name lookup, and if we get a different answer, then we bail out with an error and tell the user they're getting far too cute. Then we're safe. Now, there is a further problem: all of the information about the ALTER statement that we want to provide to the trigger may not be available at that point. Simple cases like ALTER .. RENAME won't require anything more than that, but things like ALTER TABLE .. ADD FOREIGN KEY get tricky, because while at this point we have a solid handle on the identity of the relation to which we're adding the constraint, we do NOT yet have knowledge of or a lock on the TARGET relation, whose OID could still change on us up to much later in the computation. To get reliable information about *that*, we'll need to refactor the sequence of events inside ALTER TABLE. Hopefully you can see what I'm driving toward here. In a design like this, we can pretty much prove - after returning from the event trigger - that we're in a state no different from what would have been created by executing a series of commands - lock table, then SELECT event_trigger_func(), then the actual ALTER in sequence. Maybe there's a flaw in that analysis - that's what patch review is for - but it sounds pretty darn tight to me. I could write more, but I have to take the kids to dance class, and this email may be too long already. Does that help at all to clarify the lines along which I am thinking? Note that the above is clearly not ddl_command_start but something else, and the different firing point and additional safety cross-checks are what enable us to pass additional information to the trigger without fear of breaking either the trigger or the world. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Goal: Every time an ALTER command is used on object *that actually > exists*, we will call some user-defined function and pass the object > type, the OID of the object, and some details about what sort of > alteration the user has requested. Ok, in current terms of the proposed patch, it's a ddl_command_end event trigger, and you can choose when to fire it in utility.c. The current place is choosen to be just after the AlterTable() call, because that sounded logical if you believe in CommandCounterIncrement. > So, instead, what we need to do is go into each function that > implements ALTER, and modify it so that after it does the dance where > we check permissions, take locks, and look up names, we call the > trigger function. That's a bit of a nuisance, because we probably > have a bunch of call sites to go fix, but in theory it should be > doable. The problem of course, is that it's not intrinsically safe to > call user-defined code there. If it drops the object and creates a > new one, say, hilarity will probably ensue. You're trying to find a dangerous point when to fire the trigger here, rather than trying to solve the problem you're describing. For the problem you're describing, either you want the Object Specific Information and the trigger fires at ddl_command_end, or you don't and you can register your trigger at ddl_command_start. If you want something else, well, it won't ship in 9.3. > Now, there is a further problem: all of the information about the > ALTER statement that we want to provide to the trigger may not be > available at that point. Simple cases like ALTER .. RENAME won't > require anything more than that, but things like ALTER TABLE .. ADD > FOREIGN KEY get tricky, because while at this point we have a solid > handle on the identity of the relation to which we're adding the > constraint, we do NOT yet have knowledge of or a lock on the TARGET > relation, whose OID could still change on us up to much later in the > computation. To get reliable information about *that*, we'll need to > refactor the sequence of events inside ALTER TABLE. The only valid answer here has already been given by Tom. You can only provide the information if it's available in the catalogs. So again, it's a ddl_command_end event. It can not happen before. > Hopefully you can see what I'm driving toward here. In a design like > this, we can pretty much prove - after returning from the event > trigger - that we're in a state no different from what would have been > created by executing a series of commands - lock table, then SELECT > event_trigger_func(), then the actual ALTER in sequence. Maybe > there's a flaw in that analysis - that's what patch review is for - > but it sounds pretty darn tight to me. The only case when we need to do a lookup BEFORE actually running the command is when that command is a DROP, because that's the only timing when the information we want is still in the catalogs. So that's the only case where we do a double object lookup, and we take a ShareUpdateExclusiveLock lock on the object when doing so, so that we can't lose it from another concurrent activity. > Note that the above is clearly > not ddl_command_start but something else, and the different firing > point and additional safety cross-checks are what enable us to pass > additional information to the trigger without fear of breaking either > the trigger or the world. That's the reason why we are only going to have ddl_command_start and ddl_command_end in 9.3, and also the reason why the extra information is only provided when this very information still exists in the catalogs at the moment when we want to expose it. And that's the reason why ddl_command_trace is attractive too: you won't ever get Object OID and Name and Schema from the event where the object does not exists, and maybe you want an easy way to tell the system you're interested into an event *with* the information, and be done, don't think. Again, I don't care much about keeping ddl_command_trace because it's not interesting much for implementing DDL support in logical replication and my other use cases, but still, I though I would explain it now that we're talking about it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Tom Lane <tgl@sss.pgh.pa.us> writes: > I have to completely disagree with that. If we don't want PostgreSQL > to soon subside into an unfixable morass, as I think Brooks puts it, > we must *not* simply patch things in a way that expediently provides > an approximation of some desired feature. We have to consider, and put > substantial weight on, having a clean and maintainable system design. When in Ottawa next may, I will have to buy you a glass of your favorite wine and explain to you my main use case and vision. You will then realize that all this time that I've been spending on Event Triggers is a sideway to build something else entirely, because I know that it's the only way to get the feature I want in core PostgreSQL. So yes, I know about the clean and maintainable system design constraint. I've a lot to learn still, and I appreciate your help very much. Rest assured that the path you describe is the one I'm taking. > I think that we're not realistically going to be able to introduce > event triggers in very many of the places we'd like to have them > without first doing a lot of fundamental refactoring. We're only talking about ddl_command_start and ddl_command_end now. The table_rewrite event is still on the horizon, but is not realistic to get in 9.3 anymore, I think :( So we're talking about adding some call points only in utility.c, only before or after a ddl command is run, including nested sub-commands. > So my opinion is that most of what we'd like to have here is material > for 9.4 or 9.5 or even further out. If we can get the event trigger > catalog infrastructure and some *very* basic events, like > end-of-toplevel-command, into place for 9.3, we'll have done well. My feeling is that I'm sending patches that only implement the *very* basic events here, and nothing more. Most of the heat of this thread came from a discussion about a feature that's very hard to design and that is not in my patch series to review. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 17 January 2013 20:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 17 January 2013 16:15, Robert Haas <robertmhaas@gmail.com> wrote: >>> It pains me that I've evidently failed to communicate this concept >>> clearly despite a year or more of trying. Does that make sense? Is >>> there some way I can make this more clear? The difference seems very >>> clear-cut to me, but evidently I'm not conveying it well. > >> "The point", i.e. the main use case, is to replicate the DDL in a useful form. > >> Discussions and reasoning need to focus on the main use case, not on >> weird futures or qualitative points. > > I have to completely disagree with that. If we don't want PostgreSQL > to soon subside into an unfixable morass, as I think Brooks puts it, > we must *not* simply patch things in a way that expediently provides > an approximation of some desired feature. We have to consider, and put > substantial weight on, having a clean and maintainable system design. So why does focusing on a use case make this turn into an unfixable morass? The two things are completely unrelated. If the patch is anywhere close to an unfixable morass, let's just reject it now. But Robert's arguments were much more tenuous than that. My comments were in response to this > I don't really agree with that. I think the point is to expose what > the system is doing to the DBA. I'm OK with exposing the fact that > creating a table with a serial column also creates a sequence. There > is no problem with that - it's all good. What we DON'T want to do is > set up a system where the fact that it creates a sequence is exposed > because that happens to go through ProcessUtility() and the fact that > it creates a constraint is not because that happens not to go through > ProcessUtility(). That is not the sort of quality that our users have > come to expect from PostgreSQL. The above functionality is sufficient to allow DDL replication. What else do we want to do that requires some other capability? Discussing things in terms of what we will actually use a feature for is how we know we've got it right. And if it does that, we don't need anything else. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > Let me try to give a concrete example of how I think another firing > point could be made to work along the lines I'm suggesting. > [ snip description of how an event trigger might safely be fired just > after identification and locking of the target object for an ALTER ] Reading that and reflecting on my gripe about the lack of any clear architecture for our DDL code, I had the beginnings of an idea. Perhaps it would improve matters if we refactored DDL processing so that there were separate "parse analysis" and "execution" phases, where parse analysis is (perhaps among other responsibilities) responsible for identifying and locking all objects to be used in the command. I note that locking the referenced tables is the responsibility of the parse analysis step in DML processing, so there's solid precedent for this. Also, we have some of this approach already for certain commands such as CREATE TABLE, cf parse_utilcmd.c. If we did that, then it'd be feasible to fire event triggers after the parse analysis step, and the rechecking that Robert describes could be encapsulated as "redo the parse analysis and see if the result changed". It's not clear to me just how this ought to extend to the cascaded-DROP or inherited-table-ALTER cases, but hey, it's only the beginnings of an idea. regards, tom lane
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> I think that we're not realistically going to be able to introduce >> event triggers in very many of the places we'd like to have them >> without first doing a lot of fundamental refactoring. > We're only talking about ddl_command_start and ddl_command_end now. The > table_rewrite event is still on the horizon, but is not realistic to get > in 9.3 anymore, I think :( > So we're talking about adding some call points only in utility.c, only > before or after a ddl command is run, including nested sub-commands. Well, that's already a problem, because as Robert keeps saying, what goes through utility.c and what doesn't is pretty random right at the moment, and we shouldn't expose that behavior to users for fear of not being able to change it later. I think we could possibly ship event triggers defined as start and end of a *top level* command and have them working reliably in 9.3. If you want users to be looking at subcommands, there is a lot more work to do there than we have any chance of getting done for 9.3 (if it is to ship in 2013, that is). Alternatively, if you want to get something into 9.3 that has not necessarily got a long-term-stable API, I'd be inclined to suggest that we forget about a SQL-level event trigger facility for now, and just put in some hook call points. It's pretty well established that we're willing to change the API seen by hook functions across major releases. TBH this might be the best thing anyway if your long-term goals have to do with replication, because it'd be a lot cheaper to get to the point where you could write the replication code and see if it all actually works. I'm fairly concerned that we might spend many man-months getting event triggers with definitions A,B,C into the system, only to find out later that what is really needed for logical replication is definitions D,E,F. regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes: > On 17 January 2013 20:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: > My comments were in response to this >> I don't really agree with that. I think the point is to expose what >> the system is doing to the DBA. I'm OK with exposing the fact that >> creating a table with a serial column also creates a sequence. There >> is no problem with that - it's all good. What we DON'T want to do is >> set up a system where the fact that it creates a sequence is exposed >> because that happens to go through ProcessUtility() and the fact that >> it creates a constraint is not because that happens not to go through >> ProcessUtility(). That is not the sort of quality that our users have >> come to expect from PostgreSQL. > The above functionality is sufficient to allow DDL replication. What > else do we want to do that requires some other capability? I think you and Robert are talking past each other. Whether it is or is not sufficient for DDL replication is not what he is concerned about; what he's worried about is whether we can refactor the backend in future without causing a user-visible change in the events that event triggers see. I think that that is an entirely valid concern, and it's not going to go away on the strength of "but this is enough to let us do replication". The other point I'd make here is what I just said to Dimitri: it's far from proven, AFAIK, that this actually *is* sufficient to allow DDL replication. I'd be a lot happier if we had a working proof of that before we lock down a SQL-level facility that we're going to have a hard time changing the details of. Maybe we should just introduce some C-level hooks and let you guys go off and code replication using the hooks, before we put in a lot of expensive infrastructure that might turn out to be Not Quite The Right Thing. After we *know* the hooks are in the right places and supply the right information, we can look at replacing them with some more formalized facility. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Well, that's already a problem, because as Robert keeps saying, what > goes through utility.c and what doesn't is pretty random right at the > moment, and we shouldn't expose that behavior to users for fear of not > being able to change it later. It didn't feel that random to me. In most cases, the rule is that if the system wants to execute a command that you could have typed as a user in the prompt, then it hacks together a parsenode and call ProcessUtility. The main exception to that rule is the CASCADE implementation, for reasons that you detailed in that thread earlier. > I think we could possibly ship event triggers defined as start and end > of a *top level* command and have them working reliably in 9.3. If you > want users to be looking at subcommands, there is a lot more work to do > there than we have any chance of getting done for 9.3 (if it is to ship > in 2013, that is). By default, you only get top level in what I've been cooking. I think the other contexts are needed for the logical replication, and maybe it would be good enough if they are restricted to either only internal code or event triggers coded in C. What do you think? > Alternatively, if you want to get something into 9.3 that has not > necessarily got a long-term-stable API, I'd be inclined to suggest that > we forget about a SQL-level event trigger facility for now, and just put > in some hook call points. It's pretty well established that we're > willing to change the API seen by hook functions across major releases. Or just a hook. That would want to have about the exact same amount of information as the Event Trigger anyway, so I'm thinking we could maybe do that the same way as the parsetree exposure? Another idea would be yet another GUC, ala allow_system_table_mods, so that only intrepid users can have at it. Well, as long as the logical replication use case is covered, I'm done here, so I want to hear from Simon and Andres on that (and maybe the Slony and Londiste guys, etc), and from you to triage what is possible and what is crazy. > TBH this might be the best thing anyway if your long-term goals have to > do with replication, because it'd be a lot cheaper to get to the point > where you could write the replication code and see if it all actually > works. I'm fairly concerned that we might spend many man-months getting > event triggers with definitions A,B,C into the system, only to find out > later that what is really needed for logical replication is definitions > D,E,F. I'm worried about that too, and that's why I'm trying to remain general and quite transparent about the way the system actually works. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Well, that's already a problem, because as Robert keeps saying, what >> goes through utility.c and what doesn't is pretty random right at the >> moment, and we shouldn't expose that behavior to users for fear of not >> being able to change it later. > It didn't feel that random to me. In most cases, the rule is that if the > system wants to execute a command that you could have typed as a user in > the prompt, then it hacks together a parsenode and call ProcessUtility. Uh, no, there's no such "rule". In some places it was convenient to do that so people did it --- but often it's easier to just call the appropriate function directly, especially if you have any special locking or permissions requirements. I don't think there's any great consistency to it. To be a little more concrete, I looked through backend/catalog and backend/commands, which ought to pretty much cover all the places where commands do things that could be considered subcommands. I find three uses of ProcessUtility: extension.c: uses ProcessUtility to handle non-DML commands read from an extension script. schemacmds.c: CreateSchemaCommand uses ProcessUtility for any subcommand of a CREATE SCHEMA statement. This is really just direct execution of something the user typed, it's not the system "creating" a subcommand. trigger.c: ConvertTriggerToFK uses ProcessUtility to execute a consed-up ALTER TABLE ADD FOREIGN KEY command in place of a series of legacy CREATE CONSTRAINT TRIGGER commands. Now this is the very definition of ugly, and shouldn't drive our design decisions --- but I think that any user event trigger would be darn surprised to find this being emitted, especially when the two preceding CREATE CONSTRAINT TRIGGERs hadn't done any such thing, and indeed hadn't created any constraint trigger. AFAICT, everything else in those directories is using direct calls and not going through utility.c. So the only other cases where a trigger in ProcessUtility would trap subcommands are where those subcommands are things that parse_utilcmd.c generated during expansion of a CREATE TABLE or such. And that whole area is something that I feel strongly is implementation detail we don't particularly want to expose to users. For instance, the fact that a UNIQUE clause in a CREATE TABLE works that way and not at some lower level is nothing but implementation artifact. [ pokes around a bit more... ] Having now actually looked at every call point of ProcessUtility in the current code, I find myself agreeing very much with Robert: we do *not* want to expose that exact set of events to users. Except for the original top-level commands, it's almost entirely implementation artifacts that can and will change from release to release. regards, tom lane
On Thu, Jan 17, 2013 at 4:43 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Goal: Every time an ALTER command is used on object *that actually >> exists*, we will call some user-defined function and pass the object >> type, the OID of the object, and some details about what sort of >> alteration the user has requested. > > Ok, in current terms of the proposed patch, it's a ddl_command_end event > trigger, and you can choose when to fire it in utility.c. The current > place is choosen to be just after the AlterTable() call, /me scratches my head. Depends. What I described would fire before the command was executed, not after, so it's not quite the same thing, but it might be just as good for your purposes. It wasn't so much intended as "we should do this exact thing" as much as "this is the kind of thought process that you need to go through if you want to safely run an event trigger in the middle of another command". Now, maybe we don't actually need to do that to serve your purposes. As already discussed, it seems like passing information to ddl_command_end is for most purposes sufficient for what you need - and we can do that without any special gymnastics, because that way it runs completely after the command, not in the middle. If I understand correctly, and I may not, there are basically two problems with that approach: (1) It's not exactly clear how to pass the information about the statement through to the ddl_command_end trigger. I know you've proposed a couple of things, but none of them seem completely satisfying. At least not to me. (2) If the method of transmission is "the OIDs of the affected objects", which I believe is your current proposal, the ddl_command_end trigger can't go look those objects up in the catalog, because the catalog entries are already gone by that point. It's possible we could solve both of those problems without running event triggers in the middle of commands at all. In which case, my whole example is moot for now. But I think it would be smart to get ddl_command_end (without any additional information) committed first, and then argue about these details, so that we at least make some definable progress here. > because that > sounded logical if you believe in CommandCounterIncrement. I'm somewhat bemused by this comment. I don't think CommandCounterIncrement() is an article of faith. If you execute a command to completion, and do a CommandCounterIncrement(), then whatever you do next will look just like a new command, so it should be safe to run user-provided code there. But if you stop in the middle of a command, do a CommandCounterIncrement(), run some user-provided code, and then continue on, the CommandCounterIncrement() by itself protects you from nothing. If the code is not expecting arbitrary operations to be executed at that point, and you execute them, you get to keep both pieces. Indeed, there are some places in the code where inserting a CommandCounterIncrement() *by itself* could be unsafe. I don't believe that's a risk in ProcessUtility(), but that doesn't mean there aren't any risks in ProcessUtility(). >> So, instead, what we need to do is go into each function that >> implements ALTER, and modify it so that after it does the dance where >> we check permissions, take locks, and look up names, we call the >> trigger function. That's a bit of a nuisance, because we probably >> have a bunch of call sites to go fix, but in theory it should be >> doable. The problem of course, is that it's not intrinsically safe to >> call user-defined code there. If it drops the object and creates a >> new one, say, hilarity will probably ensue. > > You're trying to find a dangerous point when to fire the trigger here, Yes, I am. I'm not asking you to implement what I proposed there - I'm just giving an example of how to make a dangerous place safe. You've ALSO found a dangerous point when to fire the trigger. The difference is that my example dangerous point is probably fixable to be safe, whereas your actual dangerous point is probably unfixable, because there are many current paths of execution that go through that function in an extremely ad-hoc fashion, and there may be more in the future. ddl_command_start and ddl_command_end are safe enough *when restricted to toplevel commands*, but that's about as far as it goes. Perhaps there are other special cases that are also safe, but just throwing everything into the pot with no analysis and no future-proofing certainly isn't. >> Now, there is a further problem: all of the information about the >> ALTER statement that we want to provide to the trigger may not be >> available at that point. Simple cases like ALTER .. RENAME won't >> require anything more than that, but things like ALTER TABLE .. ADD >> FOREIGN KEY get tricky, because while at this point we have a solid >> handle on the identity of the relation to which we're adding the >> constraint, we do NOT yet have knowledge of or a lock on the TARGET >> relation, whose OID could still change on us up to much later in the >> computation. To get reliable information about *that*, we'll need to >> refactor the sequence of events inside ALTER TABLE. > > The only valid answer here has already been given by Tom. You can only > provide the information if it's available in the catalogs. So again, > it's a ddl_command_end event. It can not happen before. I don't see why you say that. It's perfectly possible to have that information available at the right time; the work just hasn't been done yet. >> Hopefully you can see what I'm driving toward here. In a design like >> this, we can pretty much prove - after returning from the event >> trigger - that we're in a state no different from what would have been >> created by executing a series of commands - lock table, then SELECT >> event_trigger_func(), then the actual ALTER in sequence. Maybe >> there's a flaw in that analysis - that's what patch review is for - >> but it sounds pretty darn tight to me. > > The only case when we need to do a lookup BEFORE actually running the > command is when that command is a DROP, because that's the only timing > when the information we want is still in the catalogs. > > So that's the only case where we do a double object lookup, and we take > a ShareUpdateExclusiveLock lock on the object when doing so, so that we > can't lose it from another concurrent activity. As noted, I don't think that's sufficient to really guarantee no shenanigans. One idea would be to save off all the names of the objects actually dropped at the time the drops are done, and then pass it to the event trigger afterward. But frankly I think that's way beyond the scope of what we should try to get done for 9.3. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 17, 2013 at 5:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Perhaps it would improve matters if we refactored DDL processing so that > there were separate "parse analysis" and "execution" phases, where parse > analysis is (perhaps among other responsibilities) responsible for > identifying and locking all objects to be used in the command. I note > that locking the referenced tables is the responsibility of the parse > analysis step in DML processing, so there's solid precedent for this. > Also, we have some of this approach already for certain commands such > as CREATE TABLE, cf parse_utilcmd.c. > > If we did that, then it'd be feasible to fire event triggers after the > parse analysis step, and the rechecking that Robert describes could be > encapsulated as "redo the parse analysis and see if the result changed". > > It's not clear to me just how this ought to extend to the cascaded-DROP > or inherited-table-ALTER cases, but hey, it's only the beginnings of > an idea. This is pretty much where I think we should be headed, long term. I believe it would help with the previously-stated desire to have EXPLAIN ALTER TABLE and similar, too. I am not sure it's necessary to go this far to get some of what Dimitri wants, but it may be a necessary prerequisite to getting all of it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-01-17 23:48:23 +0100, Dimitri Fontaine wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > Alternatively, if you want to get something into 9.3 that has not > > necessarily got a long-term-stable API, I'd be inclined to suggest that > > we forget about a SQL-level event trigger facility for now, and just put > > in some hook call points. It's pretty well established that we're > > willing to change the API seen by hook functions across major releases. > > Or just a hook. That would want to have about the exact same amount of > information as the Event Trigger anyway, so I'm thinking we could maybe > do that the same way as the parsetree exposure? > > Another idea would be yet another GUC, ala allow_system_table_mods, so > that only intrepid users can have at it. Well, as long as the logical > replication use case is covered, I'm done here, so I want to hear from > Simon and Andres on that (and maybe the Slony and Londiste guys, etc), > and from you to triage what is possible and what is crazy. > > TBH this might be the best thing anyway if your long-term goals have to > > do with replication, because it'd be a lot cheaper to get to the point > > where you could write the replication code and see if it all actually > > works. I'm fairly concerned that we might spend many man-months getting > > event triggers with definitions A,B,C into the system, only to find out > > later that what is really needed for logical replication is definitions > > D,E,F. I have no problem requiring C code to use the even data, be it via hooks or via C functions called from event triggers. The problem I have with putting in some hooks is that I doubt that you can find sensible spots with enough information to actually recreate the DDL for a remote system without doing most of the work for command triggers. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/18/2013 04:24 AM, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 17 January 2013 16:15, Robert Haas <robertmhaas@gmail.com> wrote: >>> It pains me that I've evidently failed to communicate this concept >>> clearly despite a year or more of trying. Does that make sense? Is >>> there some way I can make this more clear? The difference seems very >>> clear-cut to me, but evidently I'm not conveying it well. >> "The point", i.e. the main use case, is to replicate the DDL in a useful form. >> Discussions and reasoning need to focus on the main use case, not on >> weird futures or qualitative points. > I have to completely disagree with that. If we don't want PostgreSQL > to soon subside into an unfixable morass, as I think Brooks puts it, > we must *not* simply patch things in a way that expediently provides > an approximation of some desired feature. We have to consider, and put > substantial weight on, having a clean and maintainable system design. > > This is particularly the case if we're talking about a feature that > is going to expose backend internals to users to any degree. We're > either going to be constrained to not change those internals any more, > or expect to break users' applications when we do change them, and > neither result is very appealing. I actually wonder if - for now - breaking applications that use certain limited well defined features is an acceptable outcome in exchange for getting a usable first version in place. I'm approaching this from more of an end user PoV: meeting user needs with the least possible pain to users and to the Pg team. A non-forward compatible feature wouldn't be OK to rely on in end user applications, but it'd potentially be usable (if not ideal) for things like replication tools, where practically anything is better than the current situation with DDL. They could special case their integration code by server version if they had to, adding version-specific triggers and handling code. The user application wouldn't care about the changes, they'd just need to make sure they upgraded Slony (or whatever) whenever they upgraded to a new Pg major version. I'm not suggesting that as a long term measure, but as a possible stop-gap that'd: - Allow replication to start using these features now, so they'd be ready and more mature when the final version is available; and - Permit prototyping of applications to use the new replication features What I'm thinking about is packaging the client interface in an extension that's explicitly documented as not cross-version compatible and that must be explicitly loaded by named version in order to get the feature. This would require a function-like interface for setup; if that's not possible, an alternative might be a GUC that must be explicitly set in postgresql.conf (only, not via SET) like "enable_non_forward_compatible_features" or "enable_experimental_features". I suspect that if this feature goes in, even in an ugly form, it'll help guide the development of a final version concurrently with the cleanups and refactoring that appears to be necessary. Note that non-forward-compatible changes have been made repeatedly in the system catalogs, despite the fact that end user applications are known to use them via JDBC drivers and libpq, as well as directly. I wonder if we can give this the same kind of compatibility requirements, where it must not break within a given major version but may break between them. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/18/2013 09:33 AM, Andres Freund wrote: > I have no problem requiring C code to use the even data, be it via > hooks or via C functions called from event triggers. The problem I > have with putting in some hooks is that I doubt that you can find > sensible spots with enough information to actually recreate the DDL > for a remote system without doing most of the work for command triggers. Is that so much of a problem if it's OK to break it between major versions? Maybe require compilation with -DPG_NON_FORWARD_COMPATIBLE to make the hooks visible in the headers, so nobody can claim they didn't know? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-01-18 10:31:28 +0800, Craig Ringer wrote: > On 01/18/2013 09:33 AM, Andres Freund wrote: > > I have no problem requiring C code to use the even data, be it via > > hooks or via C functions called from event triggers. The problem I > > have with putting in some hooks is that I doubt that you can find > > sensible spots with enough information to actually recreate the DDL > > for a remote system without doing most of the work for command triggers. > > Is that so much of a problem if it's OK to break it between major > versions? Maybe require compilation with -DPG_NON_FORWARD_COMPATIBLE to > make the hooks visible in the headers, so nobody can claim they didn't know? I can't follow the connection between what you quote from my mail and what you are saying ;) The problem I see is providing enough information to the hooks, not cross version compatibility. If we go the hook route I don't have the slightest problem of breaking them in the next releases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 17, 2013 at 4:43 PM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: >> because that >> sounded logical if you believe in CommandCounterIncrement. > I'm somewhat bemused by this comment. I don't think > CommandCounterIncrement() is an article of faith. If you execute a > command to completion, and do a CommandCounterIncrement(), then > whatever you do next will look just like a new command, so it should > be safe to run user-provided code there. But if you stop in the > middle of a command, do a CommandCounterIncrement(), run some > user-provided code, and then continue on, the > CommandCounterIncrement() by itself protects you from nothing. If the > code is not expecting arbitrary operations to be executed at that > point, and you execute them, you get to keep both pieces. Indeed, > there are some places in the code where inserting a > CommandCounterIncrement() *by itself* could be unsafe. I think it's quite likely that we should *not* expect that we can safely do CommandCounterIncrement at any random place. That isn't just "x++". It causes relcache and catcache flushes for anything affected by the command-so-far, since it's making the command's results visible --- and the code may not be expecting the results to become visible yet, and almost certainly isn't expecting cache entries to disappear under it. So a CCI could break things whether or not the event trigger itself did anything at all. But this is just one part of the general problem that injecting random code mid-command is risky as hell, and isn't likely to become less so. Frankly I don't really wish to see that happen, and don't immediately see good end-user reasons (as opposed to structure-of-the-current-code reasons) to need it. I'd really like to get to a point where we can define things as happening like this: * collect information needed to interpret the DDL command (lookup and lock objects, etc)* fire "before" event triggers,if any (and if so, recheck info)* do the command* fire "after" event triggers, if any This might require that the command execution save aside information that would help us fire appropriate event triggers later. But I'd much rather pay that overhead than fire triggers mid-command just because that's where the info is currently available. BTW, how badly do we need the "before" case at all? If we were to restrict our ambitions to "after" triggers, it would perhaps be relatively simple for DDL execution to create data structures noting what it does as it goes along, and then use those to fire "after" triggers at the end (and provide desired info to them too). Still a lot of new code of course, but not very much risk involved. [ reads further... ] Robert seems to have pretty much that same idea down at the end: > One idea would be to save off all the names of the objects actually > dropped at the time the drops are done, and then pass it to the event > trigger afterward. regards, tom lane
On 01/18/2013 10:34 AM, Andres Freund wrote: > On 2013-01-18 10:31:28 +0800, Craig Ringer wrote: >> On 01/18/2013 09:33 AM, Andres Freund wrote: >>> I have no problem requiring C code to use the even data, be it via >>> hooks or via C functions called from event triggers. The problem I >>> have with putting in some hooks is that I doubt that you can find >>> sensible spots with enough information to actually recreate the DDL >>> for a remote system without doing most of the work for command triggers. >> Is that so much of a problem if it's OK to break it between major >> versions? Maybe require compilation with -DPG_NON_FORWARD_COMPATIBLE to >> make the hooks visible in the headers, so nobody can claim they didn't know? > I can't follow the connection between what you quote from my mail and > what you are saying ;) ... in which case I probably misunderstood part of the discussion. I'm trying to skim things fairly quickly in order to keep up and continue working through the CF backlog. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > I have no problem requiring C code to use the even data, be it via hooks > or via C functions called from event triggers. The problem I have with > putting in some hooks is that I doubt that you can find sensible spots > with enough information to actually recreate the DDL for a remote system > without doing most of the work for command triggers. "most of the work"? No, I don't think so. Here's the thing that's bothering me about that: if the use-case that everybody is worried about is replication, then triggers of any sort are the Wrong Thing, IMO. The difference between a replication hook and a trigger is that a replication hook has no business changing any state of the local system, whereas a trigger *has to be expected to change the state of the local system* because if it has no side-effects you might as well not bother firing it. And the fear of having to cope with arbitrary side-effects occuring in the midst of DDL is about 80% of my angst with this whole concept. If we're only interested in replication, let's put in some hooks whose contract does not allow for side-effects on the local catalogs, and be done. Otherwise we'll be putting in man-years of unnecessary (or at least unnecessary for this use-case) work. regards, tom lane
On 2013-01-17 21:48:01 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > I have no problem requiring C code to use the even data, be it via hooks > > or via C functions called from event triggers. The problem I have with > > putting in some hooks is that I doubt that you can find sensible spots > > with enough information to actually recreate the DDL for a remote system > > without doing most of the work for command triggers. > "most of the work"? No, I don't think so. Here's the thing that's > bothering me about that: if the use-case that everybody is worried about > is replication, then triggers of any sort are the Wrong Thing, IMO. > The difference between a replication hook and a trigger is that a > replication hook has no business changing any state of the local system, > whereas a trigger *has to be expected to change the state of the local > system* because if it has no side-effects you might as well not bother > firing it. And the fear of having to cope with arbitrary side-effects > occuring in the midst of DDL is about 80% of my angst with this whole > concept. I *personally* would be *very* happy to get the capability to do two things: a) get enough information to replicate DDL to another system b) raise an error that aborts the current action Imo b) fits well enough with not allowing arbitrary side effects. > If we're only interested in replication, let's put in some hooks whose > contract does not allow for side-effects on the local catalogs, and be > done. Otherwise we'll be putting in man-years of unnecessary (or at > least unnecessary for this use-case) work. Its a thing of perspective I guess. I can't imagine a hook-ey solution, without quite a bit of work, that gets enough information to regenerate SQL that performs the same action on another system. ISTM that the refactoring to make that consistently "easy" is the hard part, but I hope I am wrong. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-17 21:48:01 -0500, Tom Lane wrote: >> If we're only interested in replication, let's put in some hooks whose >> contract does not allow for side-effects on the local catalogs, and be >> done. Otherwise we'll be putting in man-years of unnecessary (or at >> least unnecessary for this use-case) work. > Its a thing of perspective I guess. I can't imagine a hook-ey solution, > without quite a bit of work, that gets enough information to regenerate > SQL that performs the same action on another system. ISTM that the > refactoring to make that consistently "easy" is the hard part, but I > hope I am wrong. The problem of how to get the information needed is real and large, I agree. But that's not any easier for a trigger --- if anything, it's probably harder, because then you not only need to *get* the information but you have to figure out how to provide it in a way that makes sense at SQL level. regards, tom lane
On Thu, Jan 17, 2013 at 8:33 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I have no problem requiring C code to use the even data, be it via hooks > or via C functions called from event triggers. The problem I have with > putting in some hooks is that I doubt that you can find sensible spots > with enough information to actually recreate the DDL for a remote system > without doing most of the work for command triggers. It should be noted that the point of KaiGai's work over the last three years has been to solve exactly this problem. Well, KaiGai wants to check security rather than do replication, but he wants to be able grovel through the entire node tree and make security decisions based on stuff core PG doesn't care about, so in effect the requirements are identical. Calling the facility "event triggers" rather than "object access hooks" doesn't make the underlying problem any easier to solve.I actually believe that the object access hook stuffis getting pretty close to a usable solution if you don't mind coding in C, but I've had trouble convincing anyone else of that. I find it a shame that it hasn't been taken more seriously, because it really does solve the same problem. sepgsql, for example, has no trouble at all checking permissions for dropped objects. You can't call procedural code from the spot where we've got that hook, but you sure can call C code, with the usual contract that if it breaks you get to keep both pieces. The CREATE stuff works fine too. Support for ALTER is not all there yet, but that's because it's a hard problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane <tgl@sss.pgh.pa.us> writes: > Robert Haas <robertmhaas@gmail.com> writes: >> I'm somewhat bemused by this comment. I don't think >> CommandCounterIncrement() is an article of faith. If you execute a >> command to completion, and do a CommandCounterIncrement(), then >> whatever you do next will look just like a new command, so it should >> be safe to run user-provided code there. But if you stop in the Exactly my thinking. >> middle of a command, do a CommandCounterIncrement(), run some >> user-provided code, and then continue on, the >> CommandCounterIncrement() by itself protects you from nothing. If the Yes. That's why I've been careful not to add CommandCounterIncrement() when adding the Event Trigger facilities. The only one we added is so that the next trigger sees what the previous one just did. That's all. So the patch is not adding CCI calls to pretend that an existing place in the code is now safe, it's relying on CCI as existing protocol in the implementation to denote existing places in the code where calling user code should be safe, because a logical block of a command just has been done and is finished. >> code is not expecting arbitrary operations to be executed at that >> point, and you execute them, you get to keep both pieces. Indeed, >> there are some places in the code where inserting a >> CommandCounterIncrement() *by itself* could be unsafe. I agree, it's pretty obvious. And that's why I'm not adding any. > I think it's quite likely that we should *not* expect that we can safely > do CommandCounterIncrement at any random place. That isn't just "x++". Agreed. That's what the patch is doing, relying on the ones that are already in the code, and being very careful not to add any new one. > It causes relcache and catcache flushes for anything affected by the > command-so-far, since it's making the command's results visible --- and > the code may not be expecting the results to become visible yet, and > almost certainly isn't expecting cache entries to disappear under it. > So a CCI could break things whether or not the event trigger itself did > anything at all. Yes. > But this is just one part of the general problem that injecting random > code mid-command is risky as hell, and isn't likely to become less so. Yes. That's why there's nothing in the proposed patch that does that. The user code is run either before the command starts, the event is called ddl_command_start, and in the case of a DROP command is has to do an extra lookup only to guarantee that we're not trying to inject user code in a random point in the execution of the existing code. Or the user code is run after the command is done, when it has called itself CommandCounterIncrement() and is ready to resume execution to its caller, and that event is called ddl_command_end. > Frankly I don't really wish to see that happen, and don't immediately > see good end-user reasons (as opposed to structure-of-the-current-code > reasons) to need it. I'd really like to get to a point where we can > define things as happening like this: > > * collect information needed to interpret the DDL command > (lookup and lock objects, etc) > * fire "before" event triggers, if any (and if so, recheck info) > * do the command > * fire "after" event triggers, if any I think that if you do the lookup and lock of objects before running the command, then you also want to do it again after the event trigger has run, because you just don't know what it did. That's why I implemented that way: * no information is given to ddl_command_start event triggers * information is collected while the code runs normally * the ddl_command_end event has the information and only uses it if the object still exists Now, the extra complexity is that we also want the information at ddl_command_start for a DROP command, so there's an explicit extra lookup here in *only that case* and *only when some event triggers are registered* (which is checked from a cache lookup), and only when there's a single object targeted by the command. And when doing that extra lookup we take a ShareUpdateExclusiveLock on the object so that someone else won't drop it before we get to run the Event Trigger User Function. > This might require that the command execution save aside information > that would help us fire appropriate event triggers later. But I'd much > rather pay that overhead than fire triggers mid-command just because > that's where the info is currently available. And that's what is implemented in my patch. The information that's kept aside is the OID of the object being the target of the command. Last year, in the patch that didn't make it in 9.2, I used to keep the whole EventTriggerData structure around and fill it in (it had another name in the 9.2 era patch) from all the commands/* implementations. Robert didn't like that, at all, because of the code foot print IIRC. So I've been very careful in that version of the patch not to implement things exactly as you're saying, because what you're saying looks very much like my first approach from last year, the one that crashed and burnt. For full disclosure though, IIRC, the patch from that era also did try to call Event Trigger User Code at dangerous places in the middle of the commands implementation. That was removed before the end of the commit fests, though, but we then had to say no for 9.2. > BTW, how badly do we need the "before" case at all? If we were to > restrict our ambitions to "after" triggers, it would perhaps be > relatively simple for DDL execution to create data structures noting > what it does as it goes along, and then use those to fire "after" > triggers at the end (and provide desired info to them too). Still > a lot of new code of course, but not very much risk involved. I personally need the ddl_command_start case for implementing some CREATE EXTENSION features in user code. And I would need to also have the extension's name from the parsetree at that point. In the current state of the patch, that will mean that I need a little new function coded in C and loaded in the backend to be able to get at it. > [ reads further... ] Robert seems to have pretty much that same idea > down at the end: > >> One idea would be to save off all the names of the objects actually >> dropped at the time the drops are done, and then pass it to the event >> trigger afterward. We can go even further in simplicity. Document that you only get information about objects that exists at the time the event trigger user function is run. Then you don't have any specific information for a DROP command at ddl_command_end, and you don't need any code to make it happen. You also don't have the information at ddl_command_start for a CREATE operation. That happens to be what my patch implements, too. I think that the refactoring of the DDL commands in several explicit steps of parsing, analysis then execution is the way forward, and I didn't see any way to provide that in 1 cycle (that was my first target, remember), and even if I had then known that it would take 2 cycles for the limited edition of the feature, I don't think I would have called that path realistic. As you said, this refactoring looks more like a 3 cycles effort in itself, let alone building anything on top of it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 2013-01-17 22:39:18 -0500, Robert Haas wrote: > On Thu, Jan 17, 2013 at 8:33 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > I have no problem requiring C code to use the even data, be it via hooks > > or via C functions called from event triggers. The problem I have with > > putting in some hooks is that I doubt that you can find sensible spots > > with enough information to actually recreate the DDL for a remote system > > without doing most of the work for command triggers. > > It should be noted that the point of KaiGai's work over the last three > years has been to solve exactly this problem. Well, KaiGai wants to > check security rather than do replication, but he wants to be able > grovel through the entire node tree and make security decisions based > on stuff core PG doesn't care about, so in effect the requirements are > identical. Calling the facility "event triggers" rather than "object > access hooks" doesn't make the underlying problem any easier to solve. > I actually believe that the object access hook stuff is getting > pretty close to a usable solution if you don't mind coding in C, but > I've had trouble convincing anyone else of that. > > I find it a shame that it hasn't been taken more seriously, because it > really does solve the same problem. sepgsql, for example, has no > trouble at all checking permissions for dropped objects. You can't > call procedural code from the spot where we've got that hook, but you > sure can call C code, with the usual contract that if it breaks you > get to keep both pieces. The CREATE stuff works fine too. Support > for ALTER is not all there yet, but that's because it's a hard > problem. I don't have a problem reusing the object access infrastructure at all. I just don't think its providing even remotely enough. You have (co-)written that stuff, so you probably know more than I do, but could you explain to me how it could be reused to replicate a CREATE TABLE? Problems I see: - afaics for CREATE TABLE the only hook is in ATExecAddColumn - no access to the CreateStm, making it impossible to decipher whether e.g. a sequence was created as part of this or not - No way to regenerate the table definition for execution on the remote system without creating libpqdump. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 18, 2013 at 9:07 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-01-17 22:39:18 -0500, Robert Haas wrote: >> On Thu, Jan 17, 2013 at 8:33 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> > I have no problem requiring C code to use the even data, be it via hooks >> > or via C functions called from event triggers. The problem I have with >> > putting in some hooks is that I doubt that you can find sensible spots >> > with enough information to actually recreate the DDL for a remote system >> > without doing most of the work for command triggers. >> >> It should be noted that the point of KaiGai's work over the last three >> years has been to solve exactly this problem. Well, KaiGai wants to >> check security rather than do replication, but he wants to be able >> grovel through the entire node tree and make security decisions based >> on stuff core PG doesn't care about, so in effect the requirements are >> identical. Calling the facility "event triggers" rather than "object >> access hooks" doesn't make the underlying problem any easier to solve. >> I actually believe that the object access hook stuff is getting >> pretty close to a usable solution if you don't mind coding in C, but >> I've had trouble convincing anyone else of that. >> >> I find it a shame that it hasn't been taken more seriously, because it >> really does solve the same problem. sepgsql, for example, has no >> trouble at all checking permissions for dropped objects. You can't >> call procedural code from the spot where we've got that hook, but you >> sure can call C code, with the usual contract that if it breaks you >> get to keep both pieces. The CREATE stuff works fine too. Support >> for ALTER is not all there yet, but that's because it's a hard >> problem. > > I don't have a problem reusing the object access infrastructure at all. I just > don't think its providing even remotely enough. You have (co-)written that > stuff, so you probably know more than I do, but could you explain to me how it > could be reused to replicate a CREATE TABLE? > > Problems I see: > - afaics for CREATE TABLE the only hook is in ATExecAddColumn No, there's one also in heap_create_with_catalog. Took me a minute to find it, as it does not use InvokeObjectAccessHook. The idea is that OAT_POST_CREATE fires once per object creation, regardless of the object type - table, column, whatever. > - no access to the CreateStm, making it impossible to decipher whether e.g. a > sequence was created as part of this or not Yep, that's a problem. We could of course add additional hook sites with relevant context information - that's what this infrastructure is supposed to allow for. > - No way to regenerate the table definition for execution on the remote system > without creating libpqdump. IMHO, that is one of the really ugly problems that we haven't come up with a good solution for yet. If you want to replicate DDL, you have basically three choices: 1. copy over the statement text that was used on the origin server and hope none of the corner cases bite you 2. come up with some way of reconstituting a DDL statement based on (a) the parse tree or (b) what the server actually decided to do 3. reconstitute the state of the object from the catalogs after the command has run (2a) differs from (2b) for things like CREATE INDEX, where the index name might be left for the server to determine, but when replicating you'd like to get the same name out. (3) is workable for CREATE but not ALTER or DROP. The basic problem here is that (1) and (3) are not very reliable/complete and (2) is a lot of work and introduces a huge code maintenance burden. But it's unfair to pin that on the object-access hook mechanism - any reverse-parsing or catalog-deconstruction solution for DDL is going to have that problem. The decision we have to make as a community is whether we're prepared to support and maintain that code for the indefinite future. Although I think it's easy to say "yes, because DDL replication would be really cool" - and I sure agree with that - I think it needs to be thought through a bit more deeply than that. I have been involved in PostgreSQL development for about 4.5 years now. This is less time than many people here, but it's still long enough to say a whole lot of people ask for some variant of this idea, and yet I have yet to see anybody produce a complete, working version of this functionality and maintain it outside of the PostgreSQL tree for one release cycle (did I miss something?). If we pull that functionality into core, that's just a way of taking work that nobody's been willing to do and force the responsibility to be spread across the whole development group. Now, sometimes it is OK to do that, but sometimes it means that we're just bolting more things onto the already-long list of reasons for which patches can be rejected. Anybody who is now feeling uncomfortable at the prospect of not having that facility in core ought to think about how they'll feel about, say, one additional patch per year not getting committed in every future release because of this requirement. Does that feel OK? What if the number is two, or three? What's the tipping point where you'd say the cost is too high? (Note: If anyone reading this is tempted to answer that there is no such tipping point, then you have got a bad case of patch myopia.) There's a totally legitimate debate to be had here, but my feeling about what you're calling libpqdump is: - It's completely separate from event triggers. - It's completely separate from object access hooks. - The fact that noone to my knowledge has maintained such a thing outside of core, and that it seems like a hard project with lots of required maintenance, makes me very wary about pushing it into core. [ BTW: Sorry our IM session got cut off. It started erroring out every time I messaged you. But no problem, anyway. ] -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 18 January 2013 02:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> I have no problem requiring C code to use the even data, be it via hooks >> or via C functions called from event triggers. The problem I have with >> putting in some hooks is that I doubt that you can find sensible spots >> with enough information to actually recreate the DDL for a remote system >> without doing most of the work for command triggers. > > "most of the work"? No, I don't think so. Here's the thing that's > bothering me about that: if the use-case that everybody is worried about > is replication, then triggers of any sort are the Wrong Thing, IMO. > > The difference between a replication hook and a trigger is that a > replication hook has no business changing any state of the local system, > whereas a trigger *has to be expected to change the state of the local > system* because if it has no side-effects you might as well not bother > firing it. And the fear of having to cope with arbitrary side-effects > occuring in the midst of DDL is about 80% of my angst with this whole > concept. > > If we're only interested in replication, let's put in some hooks whose > contract does not allow for side-effects on the local catalogs, and be > done. Otherwise we'll be putting in man-years of unnecessary (or at > least unnecessary for this use-case) work. I would be happy to see something called "replication triggers" or "replication hooks" committed. The things I want to be able to do are: * Replicate DDL for use on other systems, in multiple ways selected by user, possibly >1 at same time * Put in a block to prevent all DDL, for use during upgrades or just a general lockdown, then remove it again when complete. * Perform auditing of DDL statements (which requires more info than simply the text submitted) If there are questions about wider functionality, or risks with that, then reducing the functionality is OK, for me. The last thing we need is a security hole introduced by this, or weird behaviour from people interfering with things. (If you want that, write an executor plugin and cook your own spaghetti). We can do this by declaring clearly that there are limits on what can be achieved with event triggers, perhaps even testing to check bad things haven't been allowed to occur. I don't see that as translating into massive change in the patch, just focusing on the main cases, documenting that and perhaps introducing some restrictions on wider usage. We will likely slowly expand the functionality of event triggers over time, so keeping things general still works as a long range goal. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-01-18 09:58:53 -0500, Robert Haas wrote: > On Fri, Jan 18, 2013 at 9:07 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > I don't have a problem reusing the object access infrastructure at all. I just > > don't think its providing even remotely enough. You have (co-)written that > > stuff, so you probably know more than I do, but could you explain to me how it > > could be reused to replicate a CREATE TABLE? > > > > Problems I see: > > - afaics for CREATE TABLE the only hook is in ATExecAddColumn > > No, there's one also in heap_create_with_catalog. Took me a minute to > find it, as it does not use InvokeObjectAccessHook. The idea is that > OAT_POST_CREATE fires once per object creation, regardless of the > object type - table, column, whatever. Ah. Chose the wrong term to grep for. I am tempted to suggest adding a comment explaining why we InvokeObjectAccessHook isn't used just for enhanced grepability. > > - No way to regenerate the table definition for execution on the remote system > > without creating libpqdump. > > IMHO, that is one of the really ugly problems that we haven't come up > with a good solution for yet. If you want to replicate DDL, you have > basically three choices: > > 1. copy over the statement text that was used on the origin server and > hope none of the corner cases bite you > 2. come up with some way of reconstituting a DDL statement based on > (a) the parse tree or (b) what the server actually decided to do > 3. reconstitute the state of the object from the catalogs after the > command has run > > (2a) differs from (2b) for things like CREATE INDEX, where the index > name might be left for the server to determine, but when replicating > you'd like to get the same name out. (3) is workable for CREATE but > not ALTER or DROP. > > The basic problem here is that (1) and (3) are not very > reliable/complete and (2) is a lot of work and introduces a huge code > maintenance burden. I agree with that analysis. > But it's unfair to pin that on the object-access > hook mechanism I agree. I don't think anybody has pinned it on it though. > - any reverse-parsing or catalog-deconstruction > solution for DDL is going to have that problem. The decision we have > to make as a community is whether we're prepared to support and > maintain that code for the indefinite future. Although I think it's > easy to say "yes, because DDL replication would be really cool" - and > I sure agree with that - I think it needs to be thought through a bit > more deeply than that. > > I have been involved in PostgreSQL development for about 4.5 years > now. This is less time than many people here, but it's still long > enough to say a whole lot of people ask for some variant of this idea, > and yet I have yet to see anybody produce a complete, working version > of this functionality and maintain it outside of the PostgreSQL tree > for one release cycle (did I miss something?). I don't really think thats a fair argument. For one, I didn't really ask for a libpgdump - I said that I don't see any way to generate re-executable SQL without it if we don't get the parse-tree but only the oid of the created object. Not to speak of the complexities of supporting ALTER that way (which is, as you note, basically not the way to do this). Also, even though I don't think its the right way *for this*, I think pg_dump and pgadmin pretty much prove that it's possible? The latter even is out-of-core and has existed for multiple years. Its also not really fair to compare out-of-tree effort of maintaining such a library to in-core support. pg_dump *needs* to be maintained, so the additional maintenance overhead once the initial development is done shouldn't really be higher than now. Lower if anything, due to getting rid of a good bit of ugly code ;). There's no such effect out of core. If youre also considering parsetree->SQL transformation under the libpgdump umbrella its even less fair. The backend already has a *lot* of infrastructure to regenerate sql from trees, even if its mostly centered arround around DQL. A noticeable amount of that code is unavailable to extensions (i.e. many are static funcs). Imo its completely unreasonable to expose that code to the outside and expect it to have a stable interface. We *will* need to rewrite parts of that regularly. For those reasons I think the only reasonable way to create textual DDL is in the backend trying to allow outside code to do that will impose far greater limitations. Whats the way you suggest to go on about this? > There's a totally legitimate debate to be had here, but my feeling > about what you're calling libpqdump is: > - It's completely separate from event triggers. > - It's completely separate from object access hooks. Well, it is one of the major use-cases (or even *the* major use case) for event triggers that I heard of. So it seems a valid point in a dicussion on how to design the whole mechanism, doesn't it? Some version of the event trigger patch contained partial support for regenerating the DDL so it seems like a justified point there. Your complained that suggestions about reusing object access hooks were ignored, so mentioning that they currently don't provide *enough* (they *do* provide a part, but it doesn't seem to be the most critical one) also seems justifyable. > [ BTW: Sorry our IM session got cut off. It started erroring out > every time I messaged you. But no problem, anyway. ] NP, its good to keep the technical stuff here anyway... Stupid technology. In which business are we in again? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 18, 2013 at 10:47 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> No, there's one also in heap_create_with_catalog. Took me a minute to >> find it, as it does not use InvokeObjectAccessHook. The idea is that >> OAT_POST_CREATE fires once per object creation, regardless of the >> object type - table, column, whatever. > > Ah. Chose the wrong term to grep for. I am tempted to suggest adding a > comment explaining why we InvokeObjectAccessHook isn't used just for > enhanced grepability. I cannot parse that sentence, but I agree that the ungreppability of it is suboptimal. I resorted to git log -SInvokeObjectAccessHook -p to find it. >> I have been involved in PostgreSQL development for about 4.5 years >> now. This is less time than many people here, but it's still long >> enough to say a whole lot of people ask for some variant of this idea, >> and yet I have yet to see anybody produce a complete, working version >> of this functionality and maintain it outside of the PostgreSQL tree >> for one release cycle (did I miss something?). > > I don't really think thats a fair argument. > > For one, I didn't really ask for a libpgdump - I said that I don't see > any way to generate re-executable SQL without it if we don't get the > parse-tree but only the oid of the created object. Not to speak of the > complexities of supporting ALTER that way (which is, as you note, > basically not the way to do this). Oh, OK. I see. Well, I've got no problem making the parse tree available via InvokeObjectAccessHook. Isn't that just a matter of adding a new hook type and a new call site? > Also, even though I don't think its the right way *for this*, I think > pg_dump and pgadmin pretty much prove that it's possible? The latter > even is out-of-core and has existed for multiple years. Fair point. > Its also not really fair to compare out-of-tree effort of maintaining > such a library to in-core support. pg_dump *needs* to be maintained, so > the additional maintenance overhead once the initial development is done > shouldn't really be higher than now. Lower if anything, due to getting > rid of a good bit of ugly code ;). There's no such effect out of core. This I don't follow. Nothing we might add to core to reverse-parse either the catalogs or the parse tree is going to make pg_dump go away. > If youre also considering parsetree->SQL transformation under the > libpgdump umbrella its even less fair. The backend already has a *lot* > of infrastructure to regenerate sql from trees, even if its mostly > centered arround around DQL. A noticeable amount of that code is > unavailable to extensions (i.e. many are static funcs). > Imo its completely unreasonable to expose that code to the outside and > expect it to have a stable interface. We *will* need to rewrite parts of > that regularly. > For those reasons I think the only reasonable way to create textual DDL > is in the backend trying to allow outside code to do that will impose > far greater limitations. I'm having trouble following this. Can you restate? I wasn't sure what you meant by libpqdump. I assumed you were speaking of a parsetree->DDL or catalog->DDL facility. > Well, it is one of the major use-cases (or even *the* major use case) > for event triggers that I heard of. So it seems a valid point in a > dicussion on how to design the whole mechanism, doesn't it? Yes, but is a separate problem from how we give control to the event trigger (or object access hook). > Some version of the event trigger patch contained partial support for > regenerating the DDL so it seems like a justified point there. Your > complained that suggestions about reusing object access hooks were > ignored, so mentioning that they currently don't provide *enough* (they > *do* provide a part, but it doesn't seem to be the most critical one) > also seems justifyable. Sure, but we could if we wanted decide to commit the partial support for regenerating the DDL and tell people to use it via object access hooks. Of course, that would thin out even further the number of people who would actually be using that code, which I fear will already be too small to achieve bug-free operation in a reasonable time. If we add some hooks now and someone maintains the DDL reverse-parsing code as an out-of-core replication solution for a few releases, and that doesn't turn out to be hideous, I'd be a lot more sanguine about incorporating it at that point. I basically don't think that there's any way we're going to commit something along those lines now and not have it turn out to be a far bigger maintenance headache than anyone wants. What that tends to end up meaning in practice is that Tom gets suckered into fixing it because nobody else can take the time, and that's neither fair nor good for the project. > NP, its good to keep the technical stuff here anyway... Stupid > technology. In which business are we in again? I don't know, maybe we can figure it out based on the objects in our immediate vicinity. I see twelve cans of paint, most of them opened, and something that looks sort of like a badly-made shed. Does that help at all? :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-01-18 11:42:47 -0500, Robert Haas wrote: > On Fri, Jan 18, 2013 at 10:47 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> No, there's one also in heap_create_with_catalog. Took me a minute to > >> find it, as it does not use InvokeObjectAccessHook. The idea is that > >> OAT_POST_CREATE fires once per object creation, regardless of the > >> object type - table, column, whatever. > > > > Ah. Chose the wrong term to grep for. I am tempted to suggest adding a > > comment explaining why we InvokeObjectAccessHook isn't used just for > > enhanced grepability. > > I cannot parse that sentence, but I agree that the ungreppability of > it is suboptimal. I resorted to git log -SInvokeObjectAccessHook -p > to find it. I was thinking of adding a comment like /* We don't use InvokeObjectAccessHook here because we need to ... */ not because the comment in itself is important but because it allows to grep ;) > >> I have been involved in PostgreSQL development for about 4.5 years > >> now. This is less time than many people here, but it's still long > >> enough to say a whole lot of people ask for some variant of this idea, > >> and yet I have yet to see anybody produce a complete, working version > >> of this functionality and maintain it outside of the PostgreSQL tree > >> for one release cycle (did I miss something?). > > > > I don't really think thats a fair argument. > > > > For one, I didn't really ask for a libpgdump - I said that I don't see > > any way to generate re-executable SQL without it if we don't get the > > parse-tree but only the oid of the created object. Not to speak of the > > complexities of supporting ALTER that way (which is, as you note, > > basically not the way to do this). > > Oh, OK. I see. Well, I've got no problem making the parse tree > available via InvokeObjectAccessHook. Isn't that just a matter of > adding a new hook type and a new call site? Sure. Its probably not easy to choose the correct callsites though, they need to be somewhat different for create & alter in comparison to drop. create & alter * after the object is created/altered drop: * after the object is locked, but *before* its dropped As far as I understood thats basically the behind the ddl_trace event trigger mentioned earlier. > > Also, even though I don't think its the right way *for this*, I think > > pg_dump and pgadmin pretty much prove that it's possible? The latter > > even is out-of-core and has existed for multiple years. > > Fair point. Imo its already a good justification for a libpgdump, not that I am volunteering, but ... > > Its also not really fair to compare out-of-tree effort of maintaining > > such a library to in-core support. pg_dump *needs* to be maintained, so > > the additional maintenance overhead once the initial development is done > > shouldn't really be higher than now. Lower if anything, due to getting > > rid of a good bit of ugly code ;). There's no such effect out of core. > > This I don't follow. Nothing we might add to core to reverse-parse > either the catalogs or the parse tree is going to make pg_dump go > away. I was still talking about the hypothetical libpgdump we don't really need for this ;) > > If youre also considering parsetree->SQL transformation under the > > libpgdump umbrella its even less fair. The backend already has a *lot* > > of infrastructure to regenerate sql from trees, even if its mostly > > centered arround around DQL. A noticeable amount of that code is > > unavailable to extensions (i.e. many are static funcs). > > Imo its completely unreasonable to expose that code to the outside and > > expect it to have a stable interface. We *will* need to rewrite parts of > > that regularly. > > For those reasons I think the only reasonable way to create textual DDL > > is in the backend trying to allow outside code to do that will impose > > far greater limitations. > > I'm having trouble following this. Can you restate? I wasn't sure > what you meant by libpqdump. I assumed you were speaking of a > parsetree->DDL or catalog->DDL facility. Yea, that wasn't really clear, sorry for that. What I was trying to say that I think doing parstree->text conversion out of tree is noticeably more complex and essentially places a higher maintenance burden on the project because 1) the core already has a lot of infrastructure for such conversions 2) any external project would either need to copy that infrastructure or make a large number of functions externally visible 3) any refactoring in that area would now break external tools even if it would be trivial to adjust potential in-core supportfor such a conversion > > Some version of the event trigger patch contained partial support for > > regenerating the DDL so it seems like a justified point there. Your > > complained that suggestions about reusing object access hooks were > > ignored, so mentioning that they currently don't provide *enough* (they > > *do* provide a part, but it doesn't seem to be the most critical one) > > also seems justifyable. > > Sure, but we could if we wanted decide to commit the partial support > for regenerating the DDL and tell people to use it via object access > hooks. Yes, thats a possibility. > Of course, that would thin out even further the number of > people who would actually be using that code, which I fear will > already be too small to achieve bug-free operation in a reasonable > time. Well, replication solutions seem to be the main potential consumer of such a capability and while the C level stuff wouldn't be used by many, hopefully the stuff built on top will. > If we add some hooks now and someone maintains the DDL > reverse-parsing code as an out-of-core replication solution for a few > releases, and that doesn't turn out to be hideous, I'd be a lot more > sanguine about incorporating it at that point. As I said above, I think due to the available infrastructure doing this out of tree is *far* harder and I fear that a good part of that code would turn out not to be applicable for inclusion into core. > I basically don't > think that there's any way we're going to commit something along those > lines now and not have it turn out to be a far bigger maintenance > headache than anyone wants. What that tends to end up meaning in > practice is that Tom gets suckered into fixing it because nobody else > can take the time, and that's neither fair nor good for the project. Now as is in "at this point of the cycle" or now as in "without a several year old out-of-core project"? > > NP, its good to keep the technical stuff here anyway... Stupid > > technology. In which business are we in again? > > I don't know, maybe we can figure it out based on the objects in our > immediate vicinity. I see twelve cans of paint, most of them opened, > and something that looks sort of like a badly-made shed. Does that > help at all? :-) By the smell and looks from where I am sitting I seem to be a barrista making rather deliciously smelling coffee and my boss seems to suck at decorating ;). Oh, and I seem to have a terrible taste in music. Regards, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-18 11:42:47 -0500, Robert Haas wrote: >> I'm having trouble following this. Can you restate? I wasn't sure >> what you meant by libpqdump. I assumed you were speaking of a >> parsetree->DDL or catalog->DDL facility. > Yea, that wasn't really clear, sorry for that. > What I was trying to say that I think doing parstree->text conversion > out of tree is noticeably more complex and essentially places a higher > maintenance burden on the project because Ah. That's not pg_dump's job though, so you're choosing a bad name for it. What I think you are really talking about is extending the deparsing logic in ruleutils.c to cover utility statements. Which is not a bad plan in principle; we've just never needed it before. It would be quite a lot of new code though. An issue that would have to be thought about is that there are assorted scenarios where we generate "parse trees" that contain options that cannot be generated from SQL, so there's no way to reverse-list them into working SQL. But often those hidden options are essential to know what the statement will really do, so I'm not sure ignoring them will be a workable answer for replication purposes. regards, tom lane
On 2013-01-18 12:44:13 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-18 11:42:47 -0500, Robert Haas wrote: > >> I'm having trouble following this. Can you restate? I wasn't sure > >> what you meant by libpqdump. I assumed you were speaking of a > >> parsetree->DDL or catalog->DDL facility. > > > Yea, that wasn't really clear, sorry for that. > > > What I was trying to say that I think doing parstree->text conversion > > out of tree is noticeably more complex and essentially places a higher > > maintenance burden on the project because > > Ah. That's not pg_dump's job though, so you're choosing a bad name for > it. I really only mentioned libpgdump because the object access hooks at the moment only get passed the object oid and because Robert doubted the need for deparsing the parsetrees in the past. Without the parsetree you obviousl cannot deparse the parsetree ;) I do *not* think that using something that reassembles the statement solely based on the catalog context is a good idea. > What I think you are really talking about is extending the > deparsing logic in ruleutils.c to cover utility statements. Which is > not a bad plan in principle; we've just never needed it before. It > would be quite a lot of new code though. Yes, I think that is the only realistic approach at providing somewhat unambigious SQL to replicate DDL. > An issue that would have to be thought about is that there are assorted > scenarios where we generate "parse trees" that contain options that > cannot be generated from SQL, so there's no way to reverse-list them > into working SQL. But often those hidden options are essential to know > what the statement will really do, so I'm not sure ignoring them will be > a workable answer for replication purposes. Dimitri's earlier patches had support for quite some commands via deparsing and while its a noticeable amount of code it seemed to work ok. The last revision I could dig out is https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c I think some things in there would have to change (e.g. I doubt that its good that EventTriggerData is involved at that level) but I think it shows very roughly how it could look. Greetings, Andres --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund escribió: > On 2013-01-18 12:44:13 -0500, Tom Lane wrote: > > An issue that would have to be thought about is that there are assorted > > scenarios where we generate "parse trees" that contain options that > > cannot be generated from SQL, so there's no way to reverse-list them > > into working SQL. But often those hidden options are essential to know > > what the statement will really do, so I'm not sure ignoring them will be > > a workable answer for replication purposes. > > Dimitri's earlier patches had support for quite some commands via > deparsing and while its a noticeable amount of code it seemed to work > ok. > The last revision I could dig out is > https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c > > I think some things in there would have to change (e.g. I doubt that its > good that EventTriggerData is involved at that level) but I think it > shows very roughly how it could look. I looked at that code back then and didn't like it very much. The problem I see (as Robert does, I think) is that it raises the burden when you change the grammar -- you now need to edit not only gram.y, but the ddl_rewrite.c stuff to ensure your new thing can be reverse-parsed. One idea I had was to add annotations on gram.y so that an external (Perl?) program could generate the ddl_rewrite.c equivalent, without forcing the developer to do it by hand. Another problem is what Tom mentions: there are internal options that cannot readily be turned back into SQL. We would have to expose these at the SQL level somehow; but to me that looks painful because we would be offering users the option to break their systems by calling commands that do things that should only be done internally. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-01-18 15:22:55 -0300, Alvaro Herrera wrote: > Andres Freund escribió: > > On 2013-01-18 12:44:13 -0500, Tom Lane wrote: > > > > An issue that would have to be thought about is that there are assorted > > > scenarios where we generate "parse trees" that contain options that > > > cannot be generated from SQL, so there's no way to reverse-list them > > > into working SQL. But often those hidden options are essential to know > > > what the statement will really do, so I'm not sure ignoring them will be > > > a workable answer for replication purposes. > > > > Dimitri's earlier patches had support for quite some commands via > > deparsing and while its a noticeable amount of code it seemed to work > > ok. > > The last revision I could dig out is > > https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c > > > > I think some things in there would have to change (e.g. I doubt that its > > good that EventTriggerData is involved at that level) but I think it > > shows very roughly how it could look. > > I looked at that code back then and didn't like it very much. The > problem I see (as Robert does, I think) is that it raises the burden > when you change the grammar -- you now need to edit not only gram.y, but > the ddl_rewrite.c stuff to ensure your new thing can be reverse-parsed. Yea, but thats nothing new, you already need to do all that for normal SQL. Changes that to the grammar that don't involve modifying ruleutils.c et al. are mostly trivial enough that this doesn't really place that much of a burden. And I yet to hear any other proposal of how to do this? > Another problem is what Tom mentions: there are internal options that > cannot readily be turned back into SQL. We would have to expose these > at the SQL level somehow; but to me that looks painful because we would > be offering users the option to break their systems by calling commands > that do things that should only be done internally. Hm. Which options are you two thinking of right now? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Andres Freund escribi�: >> Dimitri's earlier patches had support for quite some commands via >> deparsing and while its a noticeable amount of code it seemed to work >> ok. >> The last revision I could dig out is >> https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c > I looked at that code back then and didn't like it very much. The > problem I see (as Robert does, I think) is that it raises the burden > when you change the grammar -- you now need to edit not only gram.y, but > the ddl_rewrite.c stuff to ensure your new thing can be reverse-parsed. Well, that burden already exists for non-utility statements --- why should utility statements get a pass? Other than that we tend to invent new utility syntax freely, which might be a good thing to discourage anyhow. regards, tom lane
On Fri, Jan 18, 2013 at 1:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Andres Freund escribió: >>> Dimitri's earlier patches had support for quite some commands via >>> deparsing and while its a noticeable amount of code it seemed to work >>> ok. >>> The last revision I could dig out is >>> https://github.com/dimitri/postgres/blob/d2996303c7bc6daa08cef23e3d5e07c3afb11191/src/backend/utils/adt/ddl_rewrite.c > >> I looked at that code back then and didn't like it very much. The >> problem I see (as Robert does, I think) is that it raises the burden >> when you change the grammar -- you now need to edit not only gram.y, but >> the ddl_rewrite.c stuff to ensure your new thing can be reverse-parsed. > > Well, that burden already exists for non-utility statements --- why > should utility statements get a pass? Other than that we tend to invent > new utility syntax freely, which might be a good thing to discourage > anyhow. My concerns are that (1) it will slow down the addition of new features to PostgreSQL by adding yet another barrier to commit and (2) it won't be get enough use or regression test coverage to be, or remain, bug-free. There may be other concerns with respect to the patch-as-proposed, but those are my high-level concerns. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jan 18, 2013 at 1:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Well, that burden already exists for non-utility statements --- why >> should utility statements get a pass? Other than that we tend to invent >> new utility syntax freely, which might be a good thing to discourage >> anyhow. > My concerns are that (1) it will slow down the addition of new > features to PostgreSQL by adding yet another barrier to commit and (2) > it won't be get enough use or regression test coverage to be, or > remain, bug-free. Meh. The barriers to inventing new statements are already mighty tall. As for (2), I agree there's risk of bugs, but what alternative have you got that is likely to be less bug-prone? At least a reverse-list capability could be tested standalone without having to set up a logical replication configuration. This should not be interpreted as saying I'm gung-ho to do this, mind you. I'm just saying that if our intention is to support logical replication of all DDL operations, none of the alternatives look cheap. regards, tom lane
On Fri, Jan 18, 2013 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jan 18, 2013 at 1:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Well, that burden already exists for non-utility statements --- why >>> should utility statements get a pass? Other than that we tend to invent >>> new utility syntax freely, which might be a good thing to discourage >>> anyhow. > >> My concerns are that (1) it will slow down the addition of new >> features to PostgreSQL by adding yet another barrier to commit and (2) >> it won't be get enough use or regression test coverage to be, or >> remain, bug-free. > > Meh. The barriers to inventing new statements are already mighty tall. > As for (2), I agree there's risk of bugs, but what alternative have you > got that is likely to be less bug-prone? At least a reverse-list > capability could be tested standalone without having to set up a logical > replication configuration. > > This should not be interpreted as saying I'm gung-ho to do this, mind > you. I'm just saying that if our intention is to support logical > replication of all DDL operations, none of the alternatives look cheap. Well, we agree on that, anyway. :-) I honestly don't know what to do about this. I think you, Alvaro, and I all have roughly the same opinion of this, which is that it doesn't sound fun, but there's probably nothing better. So, what do we do when a really cool potential feature (logical replication of DDL) butts up against an expensive future maintenance requirement? I'm not even sure what the criteria should be for making a decision on whether it's "worth it". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> Ok. Will prepare a non controversial patch for ddl_command_end. > > Thanks. I will make a forceful effort to review that in a timely > fashion when it's posted. Please find it attached to this email. COLUMNS=72 git diff --stat doc/src/sgml/event-trigger.sgml | 93 ++++++- src/backend/commands/event_trigger.c | 122 +++++++-- src/backend/tcop/utility.c | 273 ++++++++++--------- src/backend/utils/cache/evtcache.c | 2 + src/include/commands/event_trigger.h | 1 + src/include/utils/evtcache.h | 3 +- src/test/regress/expected/event_trigger.out | 6 +- src/test/regress/sql/event_trigger.sql | 4 + 8 files changed, 359 insertions(+), 145 deletions(-) What is the next part you want to see separated away? We have a choice of: - Context exposing and filtering My preference for that point is to include it and limit it to event triggers written in C, leaving the door open to add new events in the future, and allowing existing consumers of that framework see through implementation details should they need it. It's better than just a ProcessUtility_hook because you can see it in \dy, decide to disable it with ALTER EVENT TRIGGER (e.g. in a transcation to avoid recursive calling into itself), and it won't run in single user mode even when in shared_preload_libraries. Another idea is to protect it with a GUC. Yet another idea is to add a new property in the CREATE EVENT TRIGGER command so that user can request explicitely to have that and know the feature will break at next release and all the ones after that: CREATE EVENT TRIGGER foo ON ddl_command_start WHEN show_internals in ('Yes I know what I''m doing', 'Thanks') EXECUTE PROCEDURE …(); That particular phrasing and option choice might be revisited before commit, though, I'm not wedded to it. - Additional Information to PL/pgSQL We're talking about : - operation (CREATE, ALTER, DROP) - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …) - object OID (when there's a single object target) - object shema name (when there's a single object target) - object name (when there's a single object target) - Additional Information to PL/* It's separated away because you did opt-out from reviewing that part, so I guess we will continue to make that a separate patch, and it will still need to happen. I've been said that I shouldn't be on the hook to provide for that, and I wonder, but anyway I did already implement it in a previous version so I can do it again as soon as we know what we expose. - Command String Normalisation I removed it already from the main patch because we needed to talk about it more, and it's kind of a big part in itself. Meanwhile it seems to me that an agreement has been reached and that I will be able to follow the consensus and resume working on that part. - ddl_event_trace Now is the time to kill it (for 9.3) or allow it in. It means another cache lookup at DDL Event time (so 2 cache lookups per DDL in 9.3); and allow users who just want to have the information not to have to care about when it is available. - Non DDL related Events Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best, unless someone else wants to have at it, maybe? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Please find it attached to this email. Nice clean patch, thanks! Committed, after tinkering with the CommandCounterIncrement() stuff a bit. I will respond to the rest of your email later. Reading through this patch left me with a slight concern regarding both ddl_command_start and ddl_command_end: what happens if there's more than one event trigger scheduled to fire, and one of them does something like drop (with cascade) the function that a later one uses? Admittedly, that seems like an unlikely case, but we probably want to check that nothing too awful happens (e.g. crashing the server) and maybe add a regression test to cover this scenario. Another thing is that we might want to document that if a command errors out, ddl_command_end will never be reached; and perhaps also that if ddl_command_start errors out, the command itself will never be reached. Perhaps this is so obvious as to not bear mentioning, I don't know, but the thought crossed my mind that someone might fail to realize it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2013/1/21 Robert Haas <robertmhaas@gmail.com>: > Another thing is that we might want to document that if a command > errors out, ddl_command_end will never be reached; and perhaps also > that if ddl_command_start errors out, the command itself will never be > reached. Perhaps this is so obvious as to not bear mentioning, I > don't know, but the thought crossed my mind that someone might fail to > realize it. I think that should be a mention about that in docs, someone could expect that ddl_command_end be reached even when ddl_command_start erros out, and try to use it in some way. Regards, -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://github.com/guedes - http://guedesoft.net http://www.postgresql.org.br
Robert Haas <robertmhaas@gmail.com> writes: > Nice clean patch, thanks! > > Committed, after tinkering with the CommandCounterIncrement() stuff a bit. Cool. thanks! > I will respond to the rest of your email later. Reading through this > patch left me with a slight concern regarding both ddl_command_start > and ddl_command_end: what happens if there's more than one event > trigger scheduled to fire, and one of them does something like drop > (with cascade) the function that a later one uses? Admittedly, that > seems like an unlikely case, but we probably want to check that > nothing too awful happens (e.g. crashing the server) and maybe add a > regression test to cover this scenario. I added some in the attached patch. doc/src/sgml/event-trigger.sgml | 10 ++ src/backend/commands/event_trigger.c | 6 +- src/test/regress/expected/event_trigger.out | 106 +++++++++++++++++++ src/test/regress/sql/event_trigger.sql | 47 ++++++++ 4 files changed, 167 insertions(+), 2 deletions(-) I had to make quiet a part of the regression tests because of the ERROR message containing an OID, where I think it's the right error message. I don't think we want to add complexity to the code to be able to report the procedure name that we depend on in the event cache and that the user just deleted while in the middle of actually running the event trigger list we already got from the cache, right? "Dickson S. Guedes" <listas@guedesoft.net> writes: > 2013/1/21 Robert Haas <robertmhaas@gmail.com>: >> Another thing is that we might want to document that if a command >> errors out, ddl_command_end will never be reached; and perhaps also >> that if ddl_command_start errors out, the command itself will never be >> reached. Perhaps this is so obvious as to not bear mentioning, I >> don't know, but the thought crossed my mind that someone might fail to >> realize it. > > I think that should be a mention about that in docs, someone could > expect that ddl_command_end be reached even when > ddl_command_start erros out, and try to use it in some way. Yeah, I share both your views. I've been playing with event triggers for too long to think about such documentation it seems, thanks for reminding me about it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > I added some in the attached patch. > > doc/src/sgml/event-trigger.sgml | 10 ++ > src/backend/commands/event_trigger.c | 6 +- > src/test/regress/expected/event_trigger.out | 106 +++++++++++++++++++ > src/test/regress/sql/event_trigger.sql | 47 ++++++++ > 4 files changed, 167 insertions(+), 2 deletions(-) And I did drop a comment line I didn't mean to when trying things out, so here's the update copy. There's a bug fix in there too, in both the versions of the patch, that the new regression tests exercise. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
On 22 January 2013 12:02, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > > "Dickson S. Guedes" <listas@guedesoft.net> writes: > > 2013/1/21 Robert Haas <robertmhaas@gmail.com>: > >> Another thing is that we might want to document that if a command > >> errors out, ddl_command_end will never be reached; and perhaps also > >> that if ddl_command_start errors out, the command itself will never be > >> reached. Perhaps this is so obvious as to not bear mentioning, I > >> don't know, but the thought crossed my mind that someone might fail to > >> realize it. > > > > I think that should be a mention about that in docs, someone could > > expect that ddl_command_end be reached even when > > ddl_command_start erros out, and try to use it in some way. > > Yeah, I share both your views. I've been playing with event triggers for > too long to think about such documentation it seems, thanks for > reminding me about it. I haven't followed the long discussions this patch has undergone, so forgive me if what I'm about to ask has already been discussed and discarded. Would it be desirable to have ddl_command_success and ddl_command_failed events. These would effectively be subsets to ddl_command_end. Or should this be something handled by the event trigger function? This would mean that you could have a command end trigger that wouldn't necessarily fire, depending on the outcome of the command. Alternatively, and more elegantly, there could be a filter_variable of (...thinks...) ERROR (?) which can have a value of true or false. (previous paragraph seems like an awful idea now) CREATE EVENT TRIGGER log_bad_event ON ddl_command_end WHEN ERROR IN (true) EXECUTE PROCEDURE abort_any_command(); This, unfortunately, introducing awkwardness with the WHEN clause restriction which doesn't accommodate simple equality. And looking at the IN part of the syntax, it looks awful: WHEN TAG IN ('DROP SEQUENCE' AND 'CREATE TABLE'). This looks like a confusing hybrid of a standard WHERE clause and an IN list, but is neither. You'd expect an IN to evaluate a list of expressions, but instead it's saying what TAGs are allowed. And this is where it starts to read like a WHERE clause, except such logic in a WHERE clause wouldn't make any sense as AND might suggest it must equal both. Please, please let it not stay like this. :) So note that I'm coming into this syntax kinda fresh, so just giving first impressions of the current implementation. -- Thom
Thom Brown <thom@linux.com> writes: > Would it be desirable to have ddl_command_success and > ddl_command_failed events. These would effectively be subsets to No, because you can't run any SQL in a failed transaction. > This, unfortunately, introducing awkwardness with the WHEN clause > restriction which doesn't accommodate simple equality. And looking at > the IN part of the syntax, it looks awful: WHEN TAG IN ('DROP > SEQUENCE' AND 'CREATE TABLE'). The syntax is using a comma, not an "AND", as seen in the tests: create event trigger regress_event_trigger2 on ddl_command_start when tag in ('create table', 'CREATE FUNCTION') execute procedure test_event_trigger(); > So note that I'm coming into this syntax kinda fresh, so just giving > first impressions of the current implementation. Thanks for that, I'm all for getting something better at the end! Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 22 January 2013 13:28, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Thom Brown <thom@linux.com> writes: >> Would it be desirable to have ddl_command_success and >> ddl_command_failed events. These would effectively be subsets to > > No, because you can't run any SQL in a failed transaction. Okay, I had misunderstood something someone wrote previously, and that makes sense now. >> This, unfortunately, introducing awkwardness with the WHEN clause >> restriction which doesn't accommodate simple equality. And looking at >> the IN part of the syntax, it looks awful: WHEN TAG IN ('DROP >> SEQUENCE' AND 'CREATE TABLE'). > > The syntax is using a comma, not an "AND", as seen in the tests: > > create event trigger regress_event_trigger2 on ddl_command_start > when tag in ('create table', 'CREATE FUNCTION') > execute procedure test_event_trigger(); Ah, in that case, the docs are wrong: http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html -- Thom
Thom Brown <thom@linux.com> writes: > Ah, in that case, the docs are wrong: > http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html Oh. It's missing the comma and applying the AND at the wrong level, here's a fix: diff --git a/doc/src/sgml/ref/create_event_trigger.sgml b/doc/src/sgml/ref/create_event_trigger.sgml index 040df11..3088ffa 100644 --- a/doc/src/sgml/ref/create_event_trigger.sgml +++ b/doc/src/sgml/ref/create_event_trigger.sgml @@ -23,7 +23,7 @@ PostgreSQL documentation<synopsis>CREATE EVENT TRIGGER <replaceable class="PARAMETER">name</replaceable> ON <replaceable class="PARAMETER">event</replaceable> - [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [ AND ... ] ) ] + [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [, ... ]) [ AND ... ] ] EXECUTEPROCEDURE <replaceable class="PARAMETER">function_name</replaceable>()</synopsis> </refsynopsisdiv> Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 22 January 2013 14:45, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Thom Brown <thom@linux.com> writes: >> Ah, in that case, the docs are wrong: >> http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html > > Oh. It's missing the comma and applying the AND at the wrong level, > here's a fix: > > diff --git a/doc/src/sgml/ref/create_event_trigger.sgml b/doc/src/sgml/ref/create_event_trigger.sgml > index 040df11..3088ffa 100644 > --- a/doc/src/sgml/ref/create_event_trigger.sgml > +++ b/doc/src/sgml/ref/create_event_trigger.sgml > @@ -23,7 +23,7 @@ PostgreSQL documentation > <synopsis> > CREATE EVENT TRIGGER <replaceable class="PARAMETER">name</replaceable> > ON <replaceable class="PARAMETER">event</replaceable> > - [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [ AND ... ] ) ] > + [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [, ... ]) [ AND ... ] ] > EXECUTE PROCEDURE <replaceable class="PARAMETER">function_name</replaceable>() > </synopsis> > </refsynopsisdiv> Okay, that makes sense now. :) -- Thom
On 22 January 2013 14:47, Thom Brown <thom@linux.com> wrote: > On 22 January 2013 14:45, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> Thom Brown <thom@linux.com> writes: >>> Ah, in that case, the docs are wrong: >>> http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html >> >> Oh. It's missing the comma and applying the AND at the wrong level, >> here's a fix: >> >> diff --git a/doc/src/sgml/ref/create_event_trigger.sgml b/doc/src/sgml/ref/create_event_trigger.sgml >> index 040df11..3088ffa 100644 >> --- a/doc/src/sgml/ref/create_event_trigger.sgml >> +++ b/doc/src/sgml/ref/create_event_trigger.sgml >> @@ -23,7 +23,7 @@ PostgreSQL documentation >> <synopsis> >> CREATE EVENT TRIGGER <replaceable class="PARAMETER">name</replaceable> >> ON <replaceable class="PARAMETER">event</replaceable> >> - [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [ AND ... ] ) ] >> + [ WHEN <replaceable class="PARAMETER">filter_variable</replaceable> IN (filter_value [, ... ]) [ AND ... ] ] >> EXECUTE PROCEDURE <replaceable class="PARAMETER">function_name</replaceable>() >> </synopsis> >> </refsynopsisdiv> > > Okay, that makes sense now. :) ALTER INDEX is missing from the event trigger matrix. When renaming a column on a foreign table, tg_tag reports 'ALTER TABLE' instead of 'ALTER FOREIGN TABLE'. It doesn't do this for any other ALTER FOREIGN TABLE operation, including altering, adding or dropping a column. DROP OWNED BY doesn't fire any type of event trigger. Is this right? And also I can't get ddl_command_end to do anything: test=# CREATE OR REPLACE FUNCTION public.cmd_trg_info() test-# RETURNS event_trigger test-# LANGUAGE plpgsql test-# AS $function$ test$# BEGIN test$# RAISE NOTICE E'Command trigger: tg_event=\'%\' tg_tag=\'%\'', tg_event, tg_tag; test$# END; $function$; CREATE FUNCTION test=# CREATE EVENT TRIGGER cmd_trg_before_create_table ON ddl_command_end WHEN TAG IN ('CREATE TABLE') EXECUTE PROCEDURE cmd_trg_info(); CREATE EVENT TRIGGER test=# CREATE TABLE test4 AS SELECT 1::int id, ''::text test; SELECT 1 test=# Tried this with every possible command and the corresponding trigger never fires. Was this actually tested or am I missing something? :S Also, I'm assuming the 'ANY COMMAND' special tag is supposed to be removed? postgres=# CREATE EVENT TRIGGER cmd_trg_before_any_command ON ddl_command_start WHEN TAG IN ('ANY COMMAND') EXECUTE PROCEDURE cmd_trg_info_any(); ERROR: filter value "ANY COMMAND" not recognized for filter variable "tag" ... but it lets me use it for ddl_command_end triggers: postgres=# CREATE EVENT TRIGGER cmd_trg_after_any_command ON ddl_command_end WHEN TAG IN ('ANY COMMAND') EXECUTE PROCEDURE cmd_trg_info(); CREATE EVENT TRIGGER This doesn't actually fire anyway due to the issue I mentioned above where ddl_command_end doesn't appear to be functional. However, I *can* get ddl_command_end to run on any command by omitting the WHEN clause, but this is the only scenario where I can get a trigger firing when using ddl_command_end: postgres=# CREATE EVENT TRIGGER cmd_trg_after_any_command_all ON ddl_command_end EXECUTE PROCEDURE cmd_trg_info(); CREATE EVENT TRIGGER postgres=# CREATE TABLE moooo2(id serial); NOTICE: Command trigger: tg_event='ddl_command_end' tg_tag='CREATE TABLE' CREATE TABLE -- Thom
On 22 January 2013 16:28, Thom Brown <thom@linux.com> wrote: > On 22 January 2013 14:47, Thom Brown <thom@linux.com> wrote: > > postgres=# CREATE EVENT TRIGGER cmd_trg_after_any_command ON > ddl_command_end WHEN TAG IN ('ANY COMMAND') EXECUTE PROCEDURE > cmd_trg_info(); > CREATE EVENT TRIGGER I can see why this works now. The ddl_command_end option stops the TAG being evaluated: test2=# CREATE EVENT TRIGGER cmd_trg_before_any_command ON ddl_command_end WHEN TAG IN ('MEOW LIKE A CAT') EXECUTE PROCEDURE cmd_trg_info_any(); CREATE EVENT TRIGGER ...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT command coming in 9.3. ;) -- Thom
Thom Brown <thom@linux.com> writes: > ALTER INDEX is missing from the event trigger matrix. > > When renaming a column on a foreign table, tg_tag reports 'ALTER > TABLE' instead of 'ALTER FOREIGN TABLE'. It doesn't do this for any > other ALTER FOREIGN TABLE operation, including altering, adding or > dropping a column. Will see about that later, thanks for reporting. > DROP OWNED BY doesn't fire any type of event trigger. Is this right? It's currently being discussed. > And also I can't get ddl_command_end to do anything: Due to a bug in the form of a stray & in the patch that got commited, and that I fixed this morning: http://www.postgresql.org/message-id/m2vcapl7tl.fsf@2ndQuadrant.fr > Also, I'm assuming the 'ANY COMMAND' special tag is supposed to be > removed? Yes, just omit the WHEN clause for the TAG. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Thom Brown <thom@linux.com> writes: > test2=# CREATE EVENT TRIGGER cmd_trg_before_any_command ON > ddl_command_end WHEN TAG IN ('MEOW LIKE A CAT') EXECUTE PROCEDURE > cmd_trg_info_any(); > CREATE EVENT TRIGGER > > ...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT > command coming in 9.3. ;) Oe Noe, my carefully planted Easter Egg! Seriously it's covered in the regression tests and shouldn't work, I'm surprised here. Will see about that tomorrow with the other findings of yours, thanks again. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 2013-01-18 17:20:51 -0500, Robert Haas wrote: > On Fri, Jan 18, 2013 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Fri, Jan 18, 2013 at 1:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Well, that burden already exists for non-utility statements --- why > >>> should utility statements get a pass? Other than that we tend to invent > >>> new utility syntax freely, which might be a good thing to discourage > >>> anyhow. > > > >> My concerns are that (1) it will slow down the addition of new > >> features to PostgreSQL by adding yet another barrier to commit and (2) > >> it won't be get enough use or regression test coverage to be, or > >> remain, bug-free. > > > > Meh. The barriers to inventing new statements are already mighty tall. > > As for (2), I agree there's risk of bugs, but what alternative have you > > got that is likely to be less bug-prone? At least a reverse-list > > capability could be tested standalone without having to set up a logical > > replication configuration. > > > > This should not be interpreted as saying I'm gung-ho to do this, mind > > you. I'm just saying that if our intention is to support logical > > replication of all DDL operations, none of the alternatives look cheap. > > Well, we agree on that, anyway. :-) > > I honestly don't know what to do about this. I think you, Alvaro, and > I all have roughly the same opinion of this, which is that it doesn't > sound fun, but there's probably nothing better. Count me in. > So, what do we do > when a really cool potential feature (logical replication of DDL) > butts up against an expensive future maintenance requirement? I just tried to estimate the effort of that by looking at: git log REL9_1_0..REL9_2_0 src/backend/parser/gram.y src/include/nodes/ which seems like it would catch most such patches. First thing I have to say is there are more patches potentially affecting such a potential feature than I previously had thought. On the other hand, just about all of those patches are so big that the amount of changes to the ruleutils.c equivalent for DDL look like a minor pain in comparison to the rest. The patches that seemed relevant during 9.1=>9.2 seem to be to be the following when counting ones that directly would add new deparsing code: * NOT VALID * SECURITY LABEL * per column fdw options * range datatypes * security barrier * LEAKPROOF * privileges on types * ALTER FDW * rename domain constraints Then there were a bunch of patches adding new features that required restructuring of existing code that would have affected possible deparsing code: * plancache restructuring * parametrized indexscans * DROP ... consolidation In all of those but "ALTER FDW" and "rename domain constraints" possible deparsing code looks like it would be an absolutely minor part of the code. And in those its only a relevant part because they are really small. There were some bugfixes that would have required touching it as well, but those seemed to be cases that are basically s/a/b/. Not sure what the consequence of that is, but I wanted to share it anyway after spending the time ;). > I'm not > even sure what the criteria should be for making a decision on whether > it's "worth it". In my experience it is the weakest point in the non-WAL based postgres replication tools. So I personally think it is worth some overhead when adding new features. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 22, 2013 at 7:29 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: >> I added some in the attached patch. >> >> doc/src/sgml/event-trigger.sgml | 10 ++ >> src/backend/commands/event_trigger.c | 6 +- >> src/test/regress/expected/event_trigger.out | 106 +++++++++++++++++++ >> src/test/regress/sql/event_trigger.sql | 47 ++++++++ >> 4 files changed, 167 insertions(+), 2 deletions(-) > > And I did drop a comment line I didn't mean to when trying things out, > so here's the update copy. There's a bug fix in there too, in both the > versions of the patch, that the new regression tests exercise. I think these new regression tests are no good, because I doubt that the number of recursive calls that can fit into any given amount of stack space is guaranteed to be the same on all platforms. I have committed the bug fixes themselves, however. I wasn't entirely happy with your proposed documentation so I'm attaching a counter-proposal. My specific complaints are (1) telling people that event triggers are run in a savepoint seems a little too abstract; I have attempted to make the consequences more concrete; (2) RAISE EXCEPTION is PL/pgsql specific and not the only possible reason for an error; I have attempted to be more generic; and (3) in the process of fiddling with this, I noticed that the ddl_command_end documentation can, I believe, be made both shorter and more clear by turning it into a rider on the previous paragraph. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Tue, Jan 22, 2013 at 7:02 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > I had to make quiet a part of the regression tests because of the ERROR > message containing an OID, where I think it's the right error message. I have a feeling that's the sort of error message that we don't want people to get. Can we arrange for something cleaner? Not sure exactly what. Maybe something like this would do it: ERROR: queued event trigger function not found Or maybe we should just silently ignore failures to look up the event trigger. That might be better, because the DBA could always do: DROP FUNCTION myeventtrgfn() CASCADE; ...and it would be undesirable for other sessions to error out in that case due to SnapshotNow effects. > I don't think we want to add complexity to the code to be able to report > the procedure name that we depend on in the event cache and that the > user just deleted while in the middle of actually running the event > trigger list we already got from the cache, right? Yeah ... that seems like more code than the situation justifies. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jan 22, 2013 at 11:38 AM, Thom Brown <thom@linux.com> wrote: > ...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT > command coming in 9.3. ;) +1 for adding that. I've been wanting it for years. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<div class="moz-cite-prefix">On 23/01/13 16:02, Robert Haas wrote:<br /></div><blockquote cite="mid:CA+TgmoaaLyKD_jnStBffe_5VCUJYhHjvhPD+21YoCKwO7PumVQ@mail.gmail.com"type="cite"><pre wrap="">On Tue, Jan 22, 2013at 11:38 AM, Thom Brown <a class="moz-txt-link-rfc2396E" href="mailto:thom@linux.com"><thom@linux.com></a> wrote: </pre><blockquote type="cite"><pre wrap="">...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT command coming in 9.3. ;) </pre></blockquote><pre wrap=""> +1 for adding that. I've been wanting it for years. </pre></blockquote><font size="-1">Can we have it back ported to 9.2, please Please<font size="-1">, PLEASE!!!<br /><br /><fontsize="-1">(Or perhaps, it is scheduled for the day after Thursday F<font size="-1">e</font>b 28th?)</font><br /><br/><font size="-1">It is obviously f<font size="-1">ar more important than any other cha<font size="-1">nge t<font size="-1">o</font>pg! :-)</font></font></font><br /><br /><br /><font size="-1">Cheers,<br /><font size="-1">Gavin</font><br/></font></font></font>
On Wed, Jan 23, 2013 at 12:23 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
-- Can we have it back ported to 9.2, please Please, PLEASE!!!On 23/01/13 16:02, Robert Haas wrote:On Tue, Jan 22, 2013 at 11:38 AM, Thom Brown <thom@linux.com> wrote:...unless I've coincidentally stumbled upon the new MEOW LIKE A CAT command coming in 9.3. ;)+1 for adding that. I've been wanting it for years.
New features are not back ported to maintenance branch ;)
But... It might be possible to do an exception?
But... It might be possible to do an exception?
Michael Paquier
http://michael.otacoo.com
On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > - Context exposing and filtering > > My preference for that point is to include it and limit it to event > triggers written in C, leaving the door open to add new events in > the future, and allowing existing consumers of that framework see > through implementation details should they need it. > > It's better than just a ProcessUtility_hook because you can see it > in \dy, decide to disable it with ALTER EVENT TRIGGER (e.g. in a > transcation to avoid recursive calling into itself), and it won't > run in single user mode even when in shared_preload_libraries. > > Another idea is to protect it with a GUC. > > Yet another idea is to add a new property in the CREATE EVENT > TRIGGER command so that user can request explicitely to have that > and know the feature will break at next release and all the ones > after that: > > CREATE EVENT TRIGGER foo ON ddl_command_start > WHEN show_internals in ('Yes I know what I''m doing', > 'Thanks') > EXECUTE PROCEDURE …(); > > That particular phrasing and option choice might be revisited before > commit, though, I'm not wedded to it. I'm not feeling very sanguine about any of this. I feel strongly that we should try to do something that's more principled than just throwing random junk into ProcessUtility. > - Additional Information to PL/pgSQL > > We're talking about : > > - operation (CREATE, ALTER, DROP) > - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …) > - object OID (when there's a single object target) > - object shema name (when there's a single object target) > - object name (when there's a single object target) In ddl_command_start, I believe you can only expose the information that is available in the parse tree. That is, if the user typed CREATE TABLE foo, you can tell the event trigger that the user didn't specify a schema and that they mentioned the name foo. You cannot predict what schema foo will actually end up in at that point, and we shouldn't try. I don't have a problem with exposing the information we have at this point with the documented caveat that if you rely on it to mean something it doesn't you get to keep both pieces. But I wonder: wouldn't it be better to just expose the raw string the user typed? I mean, in all the cases we really care about, the information we can *reliably* expose at this point is going to be pretty nearly the same as extracting the third space-delimited word out of the command text and splitting it on a period. And you can do that much easily even in PL/pgsql. You could also extract a whole lot of OTHER potentially useful information from the command text that you couldn't get if we just exposed the given schema name and object name.For example, this would allow you to write an eventtrigger that lets you enforce a policy that all column names must use Hungarian notation. Note that this is a terrible idea and you shouldn't do it, but you could if you wanted to. And, the trigger would be relatively long and complex and you might have bugs, but that's still better than the status quo where you are simply out of luck. Now, in a ddl_command_end trigger, there's a lot more information that you can usefully expose. In theory, if the command records "what it did" somewhere, you can extract that information with as much precision as it cared to record. However, I'm pretty skeptical of the idea of exposing a single OID. I mean, if the "main" use case here is replication, there are holes big enough to drive a truck through there. It's not really better for other use cases, either: if you're auditing or logging or whatever, you can drive the same truck through the same hole. It seems to me that to get anything really useful here, you need some more complex way of passing data, like oh say an array of OIDs. Really, you're going to need something quite a bit more complicated than that to describe something like ALTER TABLE, but even for pretty simple cases a single OID doesn't cut it. > - Additional Information to PL/* > > It's separated away because you did opt-out from reviewing that > part, so I guess we will continue to make that a separate patch, and > it will still need to happen. I've been said that I shouldn't be on > the hook to provide for that, and I wonder, but anyway I did already > implement it in a previous version so I can do it again as soon as > we know what we expose. I'm not sure what you mean that you shouldn't be on the hook to provide that. I don't think that you're obliged to submit patches to make this work for other PLs - indeed, I think it's too late for that for 9.3 - but I don't think anyone else is required to do it, either. It doesn't seem like it would be too hard to get the other PLs up to the same level that PL/pgsql is at; I just didn't want to try to familiarize myself with PL/perl, PL/python, and PL/tcl, none of which I know the guts of and most of which I don't compile regularly, at the same time I was trying to grok everything else in the patch. > - Command String Normalisation > > I removed it already from the main patch because we needed to talk > about it more, and it's kind of a big part in itself. Meanwhile it > seems to me that an agreement has been reached and that I will be > able to follow the consensus and resume working on that part. I'm not sure we've reached a consensus here (what is it?), but let me run an idea or two up the flagpole and see if anyone salutes. It seems to me that there's some consensus that reverse-parsing may be a reasonable thing to include in some form, but there are a lot of unsolved problems there, including (1) that the parse tree might not contain enough information for the reverse-parser to describe what the command actually did, as opposed to what the user asked for, and (2) there may be important decisions which were made at execution time and which are needed for correct replication but which actually can't be represented in the syntax at all (I am not sure what these would be: but Tom seemed to think that such cases exist, so they need to be understood and thought about). If we're going to do reverse parsing, I would argue that we need to put some effort into making that more flexible than the designs I've seen so far. For example, I believe your concept of command string normalization includes the idea that we'd fully schema-qualify the object name. Like... instead of saying CREATE TABLE foo (a text), we'd say CREATE TABLE public.foo (a text). That would guarantee that when we replay the command on another machine, it ends up in the same schema. But who is to say that's what the user wants? You can imagine, for example, that a user might want to replicate schema public on node A into schema node_a on node B. If we're back to parsing the SQL at that point we are still ahead - but not very far ahead. And, of course, the proper canonicalization is really CREATE TABLE public.foo (a pg_catalog.text), because there could be another type called text in some other schema on node B. That may seem like a borderline case when the name of the type is "text" but if you substitute "bar" for "text" it suddenly starts to seem totally plausible that there could be more than one bar. Of course if you substitute "hstore" for "text" but hstore is in a different schema in the source and target databases then you've got a problem if you DO schema-qualify things; and if the goal is just to pretty-print things for the user's benefit, all of this may be overkill. I recognize that it's probably impossible or unrealistic to get an exactly-perfect solution to all of these problems, but I'm working my way around to a proposal. Suppose we added a type pg_parse_tree, and a bunch of accessor functions that operate on it. You could have things like get_operation(pg_parse_tree) and get_object_type(pg_parse_tree) and get_array_of_object_names_and_schemas(pg_parse_tree), which would largely obviate the need to pass 27 magic variables for all of the things somebody might want - anything we decide we want to expose just becomes another accessor function. Deparsing is also a function, which means we don't incur the overhead of doing it unless the user asks for it. And you might even be able to have mutators, so that you can for example take a pg_parse_tree, make a new one by changing the schema name that's mentioned, and then deparse that. I'm sure Tom is scared now, if he's reading this, but maybe it's got some potential. I certainly don't think this solves every problem. In particular, we run the risk of exposing every internal detail of the parse tree, something Tom and I have both vigorously resisted in every discussion of event triggers since time immemorial. On the other hand, the discussion of this topic suggests that the main thing that people want out of event triggers is precisely to be able to get at information from the parse tree, so we're kind of damned if we do and damned if we don't. At least something like this provides a reasonable abstraction boundary that allows the information to be exposed in a controlled way. For example, if we add a function that tells you what object name the user typed, it doesn't need to expose the fact that the parse tree names that field differently in different DDL constructs, which certainly increases our odds of not going insane after a release or two of maintaining this code. The other question is what to do about the information that's gathered as the command is running that you afterwards wish to provide to the event trigger - for example, for CREATE TABLE foo (), you want to reveal the actually-selected creation schema. One idea is to have DDL commands annotate the parse tree with these sorts of details, but the disadvantage of this is that it might not be what everybody wants for all purposes; and you still have the question of what to do with information that can't be effectively represented in the parse tree at all. I'm inclined to think a better approach might be to pass around some kind of "container" data structure to which the DDL command can glue in whatever bits of information it wants to include, and then dump the result out as a 2-D text array or a JSON object or something.I'm waving my hands at this problem a bit because ofcourse one important use case is to be able to do a reverse-parse that factors this information into account, but we could do that, I think, by allowing this information to be passed to the deparse API along with the pg_parse_tree object. I'm waving my hands here, but maybe not any harder than any other design thus far proposed, and a fuller explanation would make this interminably long email even longer, so I'll stop here for now. > - ddl_event_trace > > Now is the time to kill it (for 9.3) or allow it in. > > It means another cache lookup at DDL Event time (so 2 cache lookups > per DDL in 9.3); and allow users who just want to have the > information not to have to care about when it is available. Unless we get some of the other stuff this doesn't have much value anyway, and depending on how other parts of the design shake out this might seem less relevant by the time we get done. > - Non DDL related Events > > Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best, > unless someone else wants to have at it, maybe? I'd like to take a crack at adding support for some kind of really cool event for 9.4 - I suspect this one is not what I'd pick as a first target, just because it seems hard. But I'll mull it over. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I think these new regression tests are no good, because I doubt that > the number of recursive calls that can fit into any given amount of > stack space is guaranteed to be the same on all platforms. I have > committed the bug fixes themselves, however. Thanks for commiting the fixes. About the regression tests, I think you're right, but then I can't see how to include such a test. Maybe you could add the other one, though? > I wasn't entirely happy with your proposed documentation so I'm > attaching a counter-proposal. My specific complaints are (1) telling > people that event triggers are run in a savepoint seems a little too > abstract; I have attempted to make the consequences more concrete; (2) > RAISE EXCEPTION is PL/pgsql specific and not the only possible reason > for an error; I have attempted to be more generic; and (3) in the > process of fiddling with this, I noticed that the ddl_command_end > documentation can, I believe, be made both shorter and more clear by > turning it into a rider on the previous paragraph. > > Comments? +1 for this version, thanks. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Jan 23, 2013 at 4:57 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Thanks for commiting the fixes. About the regression tests, I think > you're right, but then I can't see how to include such a test. Maybe you > could add the other one, though? Can you point me specifically at what you have in mind so I can make sure I'm considering the right thing? > +1 for this version, thanks. OK, committed that also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jan 23, 2013 at 09:33:58AM -0500, Robert Haas wrote: > On Wed, Jan 23, 2013 at 4:57 AM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: > > Thanks for commiting the fixes. About the regression tests, I think > > you're right, but then I can't see how to include such a test. Maybe you > > could add the other one, though? > > Can you point me specifically at what you have in mind so I can make > sure I'm considering the right thing? > > > +1 for this version, thanks. > > OK, committed that also. Also, I assume we no longer want after triggers on system tables, so I removed that from the TODO list and added event triggers as a completed item. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Jan 23, 2013 at 2:36 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Wed, Jan 23, 2013 at 09:33:58AM -0500, Robert Haas wrote: >> On Wed, Jan 23, 2013 at 4:57 AM, Dimitri Fontaine >> <dimitri@2ndquadrant.fr> wrote: >> > Thanks for commiting the fixes. About the regression tests, I think >> > you're right, but then I can't see how to include such a test. Maybe you >> > could add the other one, though? >> >> Can you point me specifically at what you have in mind so I can make >> sure I'm considering the right thing? >> >> > +1 for this version, thanks. >> >> OK, committed that also. > > Also, I assume we no longer want after triggers on system tables, so I > removed that from the TODO list and added event triggers as a completed > item. Seems reasonable. Event triggers are not completed in the sense that there is a lot more stuff we can do with this architecture, but we've got a basic implementation now and that's progress. And they do address the use case that triggers on system tables would have targeted, I think, but better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jan 23, 2013 at 03:02:24PM -0500, Robert Haas wrote: > On Wed, Jan 23, 2013 at 2:36 PM, Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, Jan 23, 2013 at 09:33:58AM -0500, Robert Haas wrote: > >> On Wed, Jan 23, 2013 at 4:57 AM, Dimitri Fontaine > >> <dimitri@2ndquadrant.fr> wrote: > >> > Thanks for commiting the fixes. About the regression tests, I think > >> > you're right, but then I can't see how to include such a test. Maybe you > >> > could add the other one, though? > >> > >> Can you point me specifically at what you have in mind so I can make > >> sure I'm considering the right thing? > >> > >> > +1 for this version, thanks. > >> > >> OK, committed that also. > > > > Also, I assume we no longer want after triggers on system tables, so I > > removed that from the TODO list and added event triggers as a completed > > item. > > Seems reasonable. Event triggers are not completed in the sense that > there is a lot more stuff we can do with this architecture, but we've > got a basic implementation now and that's progress. And they do > address the use case that triggers on system tables would have > targeted, I think, but better. Right. Users would always be chasing implementation details if they tried to trigger on system tables. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Robert Haas <robertmhaas@gmail.com> writes: > Or maybe we should just silently ignore failures to look up the event > trigger. That might be better, because the DBA could always do: > > DROP FUNCTION myeventtrgfn() CASCADE; > > ...and it would be undesirable for other sessions to error out in that > case due to SnapshotNow effects. What about taking a lock on the functions we decide we will need to be running, maybe a ShareUpdateExclusiveLock, so that the function can not disappear under us from concurrent activity? Note to self, most probably using:LockRelationOid(fnoid, ShareUpdateExclusiveLock); After all, we might be right not to optimize for DDL concurrency… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: >> - Context exposing and filtering > > I'm not feeling very sanguine about any of this. I feel strongly that > we should try to do something that's more principled than just > throwing random junk into ProcessUtility. The goal is to allow for advanced users of that feature to get at the sequence, constraints and index names that have been picked automatically by PostgreSQL when doing some CREATE TABLE statement that involves them: CREATE TABLE foo (id serial PRIMARY KEY, f1 text); Andres and Steve, as you're the next ones to benefit directly from that whole feature and at different levels, do you have a strong opinion on whether you really need that to operate at the logical level? The trick is not implementing the feature (that's quite easy really), it's about being able to provide it without any API promises, because it's an opening on the internals. Maybe the answer is to provide the same level of information as you get in an Event Trigger to new hooks. >> - Additional Information to PL/pgSQL >> >> We're talking about : >> >> - operation (CREATE, ALTER, DROP) >> - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …) >> - object OID (when there's a single object target) >> - object shema name (when there's a single object target) >> - object name (when there's a single object target) I realise I omited something important here. The current patch effort is only adding those information if the target object exists in the catalogs at the time of the information lookup. > In ddl_command_start, I believe you can only expose the information > that is available in the parse tree. That is, if the user typed > CREATE TABLE foo, you can tell the event trigger that the user didn't > specify a schema and that they mentioned the name foo. You cannot I used to do that, and that's the reason why I had the Command String Normalisation in the same patch: both are walking the parsetree to some extend, and I didn't want to have different routines doing different walks to get at the information. > predict what schema foo will actually end up in at that point, and we > shouldn't try. I don't have a problem with exposing the information > we have at this point with the documented caveat that if you rely on > it to mean something it doesn't you get to keep both pieces. Fair enough. I retracted from doing that to be able to separate away the parse tree walking code for later review, and I'm now wondering if that's really what we want to do. If auditing, you can log the information after the command has been run (in case of CREATE and ALTER command) or before the object has been deleted (in case of a DROP command). If you want to cancel the whole operation depending on the name of the object, or the schema it ends up in, then you can do it at the end of the command's execution with a RAISE EXCEPTION. > But I wonder: wouldn't it be better to just expose the raw string the > user typed? I mean, in all the cases we really care about, the > information we can *reliably* expose at this point is going to be > pretty nearly the same as extracting the third space-delimited word > out of the command text and splitting it on a period. And you can do > that much easily even in PL/pgsql. You could also extract a whole lot Totally Agreed. That's another reason why I want to provide users with the Normalized Command String, it will be then be even easier for them to do what you just say. After having been working on that for some time, though, I'm leaning toward Tom's view that you shouldn't try to expose information about objets that don't exists in the catalogs, because "that's not code, that's magic". > of OTHER potentially useful information from the command text that you > couldn't get if we just exposed the given schema name and object name. > For example, this would allow you to write an event trigger that lets > you enforce a policy that all column names must use Hungarian > notation. Note that this is a terrible idea and you shouldn't do it, > but you could if you wanted to. And, the trigger would be relatively > long and complex and you might have bugs, but that's still better than > the status quo where you are simply out of luck. You seem to be talking about System Hungarian, not Apps Hungarian, but I don't think that's the point here. http://www.joelonsoftware.com/articles/Wrong.html > Now, in a ddl_command_end trigger, there's a lot more information that > you can usefully expose. In theory, if the command records "what it > did" somewhere, you can extract that information with as much > precision as it cared to record. However, I'm pretty skeptical of the > idea of exposing a single OID. I mean, if the "main" use case here is There's precedent in PostgreSQL: how do you get information about each row that were in the target from a FOR EACH STATEMENT trigger? > array of OIDs. Really, you're going to need something quite a bit > more complicated than that to describe something like ALTER TABLE, but > even for pretty simple cases a single OID doesn't cut it. If you want to solve that problem, let's talk about pseudo relations that you can SELECT FROM, please. We can already expose a tuplestore in PL code by means of cursors and SRF, but I'm not sure that's how we want to expose the "statement level information" here. The only commands that will target more than one object are: - DROP, either in the command or by means of CASCADE; - CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage; At that, not all of the DropStmt variants allow more than one object, only those are concerned: TABLE, SEQUENCE, VIEW, INDEX, FOREIGN TABLE, EVENT TRIGGER,TYPE, DOMAIN, COLLATION, CONVERSION, SCHEMA, EXTENSION,TEXT SEARCHPARSER, TEXT SEARCH DICTIONARY, TEXT SEARCHTEMPLATE, TEXT SEARCH CONFIGURATION So my proposal was to address that by exposing non-TOPLEVEL commands to the Event Triggers users, so that the create schema targets are to be found in their respective subcommand. As far as DROP are concerned, my thinking was to process a DROP targetting several objects at once like as many DROP commands as objects given in the command line, producing parse tree nodes to feed as SUBCOMMANDs again. The CASCADE case is something else entirely, and we need to be very careful about its locking behavior. Also, in the CASCADE case I can perfectly agree that we're not talking about a ddl_something event any more, and that in 9.4 we will be able to provide a sql_drop generic event that will now about that. >> - Additional Information to PL/* > > I'm not sure what you mean that you shouldn't be on the hook to > provide that. I don't think that you're obliged to submit patches to > make this work for other PLs - indeed, I think it's too late for that > for 9.3 - but I don't think anyone else is required to do it, either. Mostly agreed. Do we want to ship with only PL/pgSQL support? I've not been following how those PL features usually reach the other PLs… > It doesn't seem like it would be too hard to get the other PLs up to > the same level that PL/pgsql is at; I just didn't want to try to > familiarize myself with PL/perl, PL/python, and PL/tcl, none of which > I know the guts of and most of which I don't compile regularly, at the > same time I was trying to grok everything else in the patch. Same here. But I felt like I had to do it anyway before to submit the patch, so the time consuming part of the job is done already. >> - Command String Normalisation > > I'm not sure we've reached a consensus here (what is it?), but let me The consensus I feel we're reaching now is that it's an important feature to have and there's no other practical way to have it. > run an idea or two up the flagpole and see if anyone salutes. It > seems to me that there's some consensus that reverse-parsing may be a > reasonable thing to include in some form, but there are a lot of > unsolved problems there, including (1) that the parse tree might not > contain enough information for the reverse-parser to describe what the > command actually did, as opposed to what the user asked for, and (2) For having actually implemented the most complex of our commands, namely CREATE TABLE and ALTER TABLE, the trick is to complement the parse tree with some catalog (cache) lookups to have the exact information about what the command did, or to tweak the command's implementation to expose it, as I did for ALTER TABLE. > there may be important decisions which were made at execution time and > which are needed for correct replication but which actually can't be > represented in the syntax at all (I am not sure what these would be: > but Tom seemed to think that such cases exist, so they need to be > understood and thought about). Well, try and get the name of the exclusion constraint created when running that command, then rewrite it with the name injected in the command: CREATE TABLE xyz(i int, exclude (i WITH =) where (i > 10) deferrable); Another case is column DEFAULT values, but I don't think we care about the constraint name in that case. It's possible to inject the column and table level CHECK constraints names in the CREATE TABLE syntax, though. I don't remember having hit another blocker here. > If we're going to do reverse parsing, I would argue that we need to > put some effort into making that more flexible than the designs I've > seen so far. For example, I believe your concept of command string While that seems fair enough, I have to stress that it's also known as feature creep and a sure way to kill a patch idea. Sometimes it's exactly what needs to be done. Sometimes not. > normalization includes the idea that we'd fully schema-qualify the > object name. Like... instead of saying CREATE TABLE foo (a text), > we'd say CREATE TABLE public.foo (a text). That would guarantee that > when we replay the command on another machine, it ends up in the same > schema. But who is to say that's what the user wants? You can > imagine, for example, that a user might want to replicate schema > public on node A into schema node_a on node B. If we're back to You have to remember at this point that the patch we're talking about is not exposing *ONLY* the command string, but also the object name and the schema name is nicely separated away variables. > parsing the SQL at that point we are still ahead - but not very far > ahead. And, of course, the proper canonicalization is really CREATE > TABLE public.foo (a pg_catalog.text), because there could be another > type called text in some other schema on node B. That may seem like a > borderline case when the name of the type is "text" but if you > substitute "bar" for "text" it suddenly starts to seem totally > plausible that there could be more than one bar. Of course if you That's taken care of in my ddl rewriting branch, yes. Even int and bigint could be rewritten with their real qualified names. See a previsouly posted example in: http://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr [… cut out description of …] > get_array_of_object_names_and_schemas(pg_parse_tree), which would > I certainly don't think this solves every problem. In particular, we > run the risk of exposing every internal detail of the parse tree, > something Tom and I have both vigorously resisted in every discussion > of event triggers since time immemorial. On the other hand, the And which is why we have very limited preprocessed variables exposed in the event triggers instead, including the normalized command string, rather than the parse tree itself (available only in C). > way. For example, if we add a function that tells you what object > name the user typed, it doesn't need to expose the fact that the parse > tree names that field differently in different DDL constructs, which > certainly increases our odds of not going insane after a release or > two of maintaining this code. That's called tg_objectname and tg_schemaname in my proposal. Can you come up with either a design proposal of an "external" stable representation of the parse tree, or a more complete list of information you want exposed? My proposal is that the external stable representation format is another couple of years of work just to reach some traction on its specifics, and that adding more preprocessed information is better done in 9.4. Meanwhile, you can always get at it from an extension coded in C. > some kind of "container" data structure to which the DDL command can > glue in whatever bits of information it wants to include, and then > dump the result out as a 2-D text array or a JSON object or something. Either you want to reach agreement on the container specs, and it's a 2 years project at best, or you want to weight in so much that something gets into 9.3, and then you will make non compatible changes in 9.4, so that all this work is wasted: the original goal is to prevent changing that external format in incompatible ways in between major versions. Baring that goal, we would just be exposing the parse tree as it is. This line of though comes from the fact that I've been sitting in different meetings with people wanting to solve that very problem for external tools (SQL syntax highlighters, indentation, you name it) for several years in a row now, and I'm yet to see such a data structure specs emerge. >> - Non DDL related Events >> >> Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best, >> unless someone else wants to have at it, maybe? > > I'd like to take a crack at adding support for some kind of really > cool event for 9.4 - I suspect this one is not what I'd pick as a > first target, just because it seems hard. But I'll mull it over. I'll hand you another one then: have INSERT INTO create the destination TABLE when it does not exists, for RAD users out there. You know, those schema less NoSQL guys… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes: > Can you point me specifically at what you have in mind so I can make > sure I'm considering the right thing? Something like this part: + -- now try something crazy to ensure we don't crash the backend + create function test_event_trigger_drop_function() + returns event_trigger + as $$ + BEGIN + drop function test_event_trigger2() cascade; + END + $$ language plpgsql; + + create function test_event_trigger2() returns event_trigger as $$ + BEGIN + RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag; + END + $$ language plpgsql; + + create event trigger drop_test_a on "ddl_command_start" + when tag in ('create table') + execute procedure test_event_trigger_drop_function(); + + create event trigger drop_test_b on "ddl_command_start" + execute procedure test_event_trigger2(); + + create table event_trigger_fire1 (a int); + + -- then cleanup the crazy test + drop event trigger drop_test_a; + drop event trigger drop_test_b; + drop function test_event_trigger_drop_function(); The only problem for the regression tests being that there's an OID in the ouput, but with your proposed error message that would go away. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 13-01-24 05:43 AM, Dimitri Fontaine wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine >> <dimitri@2ndquadrant.fr> wrote: >>> - Context exposing and filtering >> >> I'm not feeling very sanguine about any of this. I feel strongly that >> we should try to do something that's more principled than just >> throwing random junk into ProcessUtility. > > The goal is to allow for advanced users of that feature to get at the > sequence, constraints and index names that have been picked > automatically by PostgreSQL when doing some CREATE TABLE statement that > involves them: > > CREATE TABLE foo (id serial PRIMARY KEY, f1 text); > > Andres and Steve, as you're the next ones to benefit directly from that > whole feature and at different levels, do you have a strong opinion on > whether you really need that to operate at the logical level? > I haven't been following this thread very closely and I'm not exactly sure what your asking. If the question is what a replication system would need to replicate DDL commands, then I need a method of translating whatever the DDL trigger is passed into either the 'CREATE TABLE public.foo(id serial primary key, f1 text)' OR some set of equivalent commands that will allow me to create the same objects on the replica. I don't have any brilliant insight on how I would go about during it. Honestly I haven't spent a lot of time thinking about it in the past 8 months. If your asking what I would need for a trigger to automatically add a new table to replication then I think I would only need to know (or be able to obtain) the fully qualified name of the table and then in another trigger call be given the name of the fully qualified name of the sequence. The triggers would need to be able to do DML to configuration tables. I don't need to know that a table and sequence are connected if all I want to do is replicate them.
[ changing subject line to keep threads of discussion separate ] On Thu, Jan 24, 2013 at 5:51 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Something like this part: > > + -- now try something crazy to ensure we don't crash the backend > + create function test_event_trigger_drop_function() > + returns event_trigger > + as $$ > + BEGIN > + drop function test_event_trigger2() cascade; > + END > + $$ language plpgsql; > + > + create function test_event_trigger2() returns event_trigger as $$ > + BEGIN > + RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag; > + END > + $$ language plpgsql; > + > + create event trigger drop_test_a on "ddl_command_start" > + when tag in ('create table') > + execute procedure test_event_trigger_drop_function(); > + > + create event trigger drop_test_b on "ddl_command_start" > + execute procedure test_event_trigger2(); > + > + create table event_trigger_fire1 (a int); > + > + -- then cleanup the crazy test > + drop event trigger drop_test_a; > + drop event trigger drop_test_b; > + drop function test_event_trigger_drop_function(); > > The only problem for the regression tests being that there's an OID in > the ouput, but with your proposed error message that would go away. This seems reasonable, but looking into it a little further, fixing the error message is not quite as simple as might be hoped. It's coming from fmgr_info, which can't be dissuaded from erroring out if the function is gone. We could do the fmgr_info lookup earlier, but it doesn't really help: if the tuple disappears, then plpgsql_compile() will eventually be reached and will go splat anyway. As far as I can see, the only reasonable way to keep this from exploding is to take AccessShareLock on the function once we've looked it up, and then repeat the syscache lookup immediately thereafter to verify that the OID is still present. If it is, it can't subsequently go away. This seems like a pretty expensive solution, though. At present event triggers only fire during DDL commands so maybe this is OK, but if you imagine using it for NOTIFY or DML then it starts to sound like a cripplingly-high cost. So I'm not sure what to do. I'm not thrilled with the idea of leaving this the way it is, but it's not a big enough problem for me to argue for ripping event triggers back out in their entirety, either. So maybe we just have to live with it until somebody comes up with a better idea. Anybody got one? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 24, 2013 at 5:43 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> But I wonder: wouldn't it be better to just expose the raw string the >> user typed? I mean, in all the cases we really care about, the >> information we can *reliably* expose at this point is going to be >> pretty nearly the same as extracting the third space-delimited word >> out of the command text and splitting it on a period. And you can do >> that much easily even in PL/pgsql. You could also extract a whole lot > > Totally Agreed. That's another reason why I want to provide users with > the Normalized Command String, it will be then be even easier for them > to do what you just say. OK, but can we lay the issue of a *normalized* command string to the side for just one minute, and talk about exposing the *raw* command string? It seems to me that this would be (1) very easy to do, (2) reasonable to slip into 9.3, and (3) useful to some people. Arguably, the normalized command string would be useful to MORE people, but I think the raw command string is not without its uses, and it's at least one order of magnitude less work. >> Now, in a ddl_command_end trigger, there's a lot more information that >> you can usefully expose. In theory, if the command records "what it >> did" somewhere, you can extract that information with as much >> precision as it cared to record. However, I'm pretty skeptical of the >> idea of exposing a single OID. I mean, if the "main" use case here is > > There's precedent in PostgreSQL: how do you get information about each > row that were in the target from a FOR EACH STATEMENT trigger? You don't. But, the fact that you can't is an unpleasant limitation, not a model of emulation. We make trade-offs about the scope of features all the time. Somebody evidently thought it was reasonable for each-row triggers to expose the tuple but to leave the corresponding feature for each-statement triggers unimplemented, and as much as I'd like to have that feature, I agree that was a reasonable trade-off. But it would be quite something else again if someone were to propose the idea of having NEW and OLD available for each-statement triggers, but only give the information about the first row affected by the statement, rather than all of them. I think that proposal would quite rightly be rejected, and I think it's morally equivalent to what you're proposing, or what I understood you to be proposing, at any rate. >> array of OIDs. Really, you're going to need something quite a bit >> more complicated than that to describe something like ALTER TABLE, but >> even for pretty simple cases a single OID doesn't cut it. > > If you want to solve that problem, let's talk about pseudo relations > that you can SELECT FROM, please. We can already expose a tuplestore in > PL code by means of cursors and SRF, but I'm not sure that's how we want > to expose the "statement level information" here. I'm not sure, either, but I think that exposing things as tables is a neat idea. I had imagined passing either JSON or arrays, but tables would be easier to work with, at least for PL/pgsql triggers. > The only commands that will target more than one object are: > > - DROP, either in the command or by means of CASCADE; > - CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage; CREATE TABLE can also create subsidiary objects (indexes, constraints); and there could be more things that do this in the future. > The CASCADE case is something else entirely, and we need to be very > careful about its locking behavior. Also, in the CASCADE case I can > perfectly agree that we're not talking about a ddl_something event any > more, and that in 9.4 we will be able to provide a sql_drop generic > event that will now about that. I've felt from the beginning that we really need that to be able to do anything worthwhile with this feature, and I said so last year around this time. Meanwhile, absent that, if we put something in here that only handles a subset of cases, the result will be DDL replication that randomly breaks. Bill Gates used to have a reputation for being able to give demos of beta software where he carefully avoided doing things that would trigger the known crash bugs, and thus make it look like everything was working great. That's fine for a marketing presentation, but releasing in that kind of state gets you a bad reputation. > Mostly agreed. Do we want to ship with only PL/pgSQL support? I've not > been following how those PL features usually reach the other PLs… That doesn't bother me. If the facility is useful enough that people want it in other PLs, they can submit patches to add it. I don't believe there's any requirement that we must support this feature in every PL before we can think about releasing. > Well, try and get the name of the exclusion constraint created when > running that command, then rewrite it with the name injected in the > command: > > CREATE TABLE xyz(i int, exclude (i WITH =) where (i > 10) deferrable); Oh. Ouch. > Another case is column DEFAULT values, but I don't think we care about > the constraint name in that case. It's possible to inject the column and > table level CHECK constraints names in the CREATE TABLE syntax, though. I don't think column DEFAULTs create constraints, do they? >> way. For example, if we add a function that tells you what object >> name the user typed, it doesn't need to expose the fact that the parse >> tree names that field differently in different DDL constructs, which >> certainly increases our odds of not going insane after a release or >> two of maintaining this code. > > That's called tg_objectname and tg_schemaname in my proposal. Can you > come up with either a design proposal of an "external" stable > representation of the parse tree, or a more complete list of information > you want exposed? I'm not sure I can, but I think that the design I'm proposing gives a lot of flexibility. If we create a pg_parse_tree type with no input function and an output function that merely returns a dummy value, and we pass a pg_parse_tree object to PL/pgsql event triggers, then it's a small win even if it offers no accessors at all - because somebody could put whatever accessor functions they want in a contrib module without waiting for 9.4. If we do offer some accessors (which I assume we would), then it's even better. You can get at the information exposed by those accessors, and if you want more, you don't need to wait for a new PG release, you can add 'em via a home-grown contrib module, again, without waiting for 9.4. The point is - we wouldn't expose the whole parse tree. Rather, we'd expose a parse-tree as an opaque object, and give the user functions that would expose specific bits of it. Those bits could grow over time without needing any changes to PLs or existing trigger functions, and they wouldn't change if the parse tree internals changed, and users could extend the amount of information exposed via contrib functions (at their own risk, of course). >>> Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best, >>> unless someone else wants to have at it, maybe? >> >> I'd like to take a crack at adding support for some kind of really >> cool event for 9.4 - I suspect this one is not what I'd pick as a >> first target, just because it seems hard. But I'll mull it over. > > I'll hand you another one then: have INSERT INTO create the destination > TABLE when it does not exists, for RAD users out there. You know, those > schema less NoSQL guys… Wow, you don't think small. I'm not sure how hard that would be, but I agree it would be cool. I think the trigger might have to be written in C to get the behavior people would likely want without an unreasonable amount of screwing around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > OK, but can we lay the issue of a *normalized* command string to the > side for just one minute, and talk about exposing the *raw* command > string? It seems to me that this would be (1) very easy to do, (2) > reasonable to slip into 9.3, and (3) useful to some people. Arguably, > the normalized command string would be useful to MORE people, but I > think the raw command string is not without its uses, and it's at > least one order of magnitude less work. My understanding is that if the command string we give to event triggers is ambiguous (sub-object names, schema qualifications, etc), it comes useless for logical replication use. I'll leave it to the consumers of that to speak up now. [… no access to tuples in per-statement triggers …] > I agree that was a reasonable trade-off. But it would be quite > something else again if someone were to propose the idea of having NEW > and OLD available for each-statement triggers, but only give the > information about the first row affected by the statement, rather than > all of them. I think that proposal would quite rightly be rejected, > and I think it's morally equivalent to what you're proposing, or what > I understood you to be proposing, at any rate. No it's not. + /* + * we only support filling-in information for DROP command if we only + * drop a single object at a time, in all other cases the ObjectID, + * Name and Schema will remain NULL. + */ + if (list_length(stmt->objects) != 1) The current patch implementation is to fill in the object id, name and schema with NULL when we have something else than a single object as the target. I did that when I realized we have a precedent with statement triggers and that we would maybe share the implementation of the "record set variable" facility for PLs here. > I'm not sure, either, but I think that exposing things as tables is a > neat idea. I had imagined passing either JSON or arrays, but tables > would be easier to work with, at least for PL/pgsql triggers. Any idea about a practical implementation that we can do in the 9.3 timeframe? Baring that my proposal is to allow ourselves not to provide the information at all in that case in 9.3, and complete the feature by 9.4 time frame. Possibly with OLD and NEW relations for per-statement triggers, if it turns out as I think that we can re-use the same piece of PLpgSQL side framework in both cases. >> The only commands that will target more than one object are: >> >> - DROP, either in the command or by means of CASCADE; >> - CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage; > > CREATE TABLE can also create subsidiary objects (indexes, > constraints); and there could be more things that do this in the > future. Subsidiary objects are not the same thing at all as a command that targets more than one object, and the difference is user visible. >> The CASCADE case is something else entirely, and we need to be very >> careful about its locking behavior. Also, in the CASCADE case I can >> perfectly agree that we're not talking about a ddl_something event any >> more, and that in 9.4 we will be able to provide a sql_drop generic >> event that will now about that. > > I've felt from the beginning that we really need that to be able to do > anything worthwhile with this feature, and I said so last year around > this time. Meanwhile, absent that, if we put something in here that > only handles a subset of cases, the result will be DDL replication > that randomly breaks. So we have two proposals here: - Have the cascading drop calls back to process utility with a new context value of PROCESS_UTILITY_CASCADE and its parsenode, wherein you only stuff the ObjectAdress, and provide event triggers support for this new cascade command; - Implement a new event called "sql_drop" where you provide the same amount of information than in a ddl_command event(same lookups, etc), but without any parsetree nor I suppose the command string that the user would have to typeto drop just that object. You objected to the first on modularity violation grounds, and on the second on development cycle timing grounds. And now you're saying that because we don't have a practical solution, I'm not sure, apparently it's dead, but what is? Please help me decipher your process of thoughs and conclusions. > That doesn't bother me. If the facility is useful enough that people > want it in other PLs, they can submit patches to add it. I don't > believe there's any requirement that we must support this feature in > every PL before we can think about releasing. Ok, I won't bother either then. > I'm not sure I can, but I think that the design I'm proposing gives a > lot of flexibility. If we create a pg_parse_tree type with no input > function and an output function that merely returns a dummy value, and > we pass a pg_parse_tree object to PL/pgsql event triggers, then it's a > small win even if it offers no accessors at all - because somebody > could put whatever accessor functions they want in a contrib module > without waiting for 9.4. That's not solving the problem we want to solve, though. You can already do what you say with the code that we have today, you just have to code the extension in C. > The point is - we wouldn't expose the whole parse tree. Rather, we'd > expose a parse-tree as an opaque object, and give the user functions > that would expose specific bits of it. Those bits could grow over > time without needing any changes to PLs or existing trigger functions, > and they wouldn't change if the parse tree internals changed, and > users could extend the amount of information exposed via contrib > functions (at their own risk, of course). We already have that, baring the extension that grovels through the parse tree. The benefit from maintaining that code as an extension is relying on a stable enough API from the backend so that you can provide the same behaviour to users accross major versions. And with pg_upgrade, the same storage format. Here I don't see how to reach that goals without anything new in core, given that the main requirement from core about the parse tree is to be able to whack it around in any release (even minor) without concern's about API nor ABI stability. >> I'll hand you another one then: have INSERT INTO create the destination >> TABLE when it does not exists, for RAD users out there. You know, those >> schema less NoSQL guys… > > Wow, you don't think small. I'm not sure how hard that would be, but > I agree it would be cool. I think the trigger might have to be > written in C to get the behavior people would likely want without an > unreasonable amount of screwing around. What I'm chasing here is being able to extend DDL in user land, without even having to code in C. We'll get there. A practical example that we saw on this list very recently is addressing the trusted bits for PL: install an event trigger on CREATE FUNCTION and have it run a SECURITY DEFINER function implemented in PLpgSQL, that will do some checks then the CREATE FUNCTION for you (taking care of disabling its own event trigger to avoid infinite recursion). Of course to get there you need INSTEAD OF event triggers and we had to remove them from my patch series last year because we were not in a position to assess that the call points in each commands where safe enough. Once the current framework is in place, we will be able to get back to INSTEAD OF event triggers. We'll get there. And, well, yes, when you embark in side projects for a 2 or 3 cycles development, you need to have something of a vision, I guess… Event Triggers are but a building block to progress towards achieving mine. That feature appeared to me to be the Right Way™ to add a generic facility in PostgreSQL that you can use for a number of different use cases, many birds with a single stone. Well, a menhir maybe. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> OK, but can we lay the issue of a *normalized* command string to the >> side for just one minute, and talk about exposing the *raw* command >> string? It seems to me that this would be (1) very easy to do, (2) >> reasonable to slip into 9.3, and (3) useful to some people. Arguably, >> the normalized command string would be useful to MORE people, but I >> think the raw command string is not without its uses, and it's at >> least one order of magnitude less work. > > My understanding is that if the command string we give to event triggers > is ambiguous (sub-object names, schema qualifications, etc), it comes > useless for logical replication use. I'll leave it to the consumers of > that to speak up now. "Useless" is a strong word, but it certainly injures usefulness pretty substantially. If it isn't normalized, then either we accept that: a) If you fail to properly qualify your inputs, when generating DDL, we can offer that it's pretty likely it'll all smash on the floor when we try to replicate it, or b) We need to capture the active search_path from the environment at the instant of the DDL capture event, and carry it over along with the DDL. If we could assume that the GUC, search_path, was the correct value, that's possibly not super difficult, but I'm not certain that's the correct thing to capture. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > My understanding is that if the command string we give to event triggers > is ambiguous (sub-object names, schema qualifications, etc), it comes > useless for logical replication use. I'll leave it to the consumers of > that to speak up now. Yeah, that's probably true. I think it might be useful for other purposes, but I think we need a bunch of infrastructure we don't have yet to make logical replication of DDL a reality. > [… no access to tuples in per-statement triggers …] >> I agree that was a reasonable trade-off. But it would be quite >> something else again if someone were to propose the idea of having NEW >> and OLD available for each-statement triggers, but only give the >> information about the first row affected by the statement, rather than >> all of them. I think that proposal would quite rightly be rejected, >> and I think it's morally equivalent to what you're proposing, or what >> I understood you to be proposing, at any rate. > > No it's not. > > + /* > + * we only support filling-in information for DROP command if we only > + * drop a single object at a time, in all other cases the ObjectID, > + * Name and Schema will remain NULL. > + */ > + if (list_length(stmt->objects) != 1) > > The current patch implementation is to fill in the object id, name and > schema with NULL when we have something else than a single object as the > target. I did that when I realized we have a precedent with statement > triggers and that we would maybe share the implementation of the "record > set variable" facility for PLs here. But I don't see how *that's* going to be any good for logical replication either. If the raw query string isn't useful, getting the OID in some cases but not all surely can't be any better. >> I'm not sure, either, but I think that exposing things as tables is a >> neat idea. I had imagined passing either JSON or arrays, but tables >> would be easier to work with, at least for PL/pgsql triggers. > > Any idea about a practical implementation that we can do in the 9.3 > timeframe? Baring that my proposal is to allow ourselves not to provide > the information at all in that case in 9.3, and complete the feature by > 9.4 time frame. No, I don't have a brilliant idea. I think that we've probably about come to the end of what we can reasonably do for 9.3. > Possibly with OLD and NEW relations for per-statement triggers, if it > turns out as I think that we can re-use the same piece of PLpgSQL side > framework in both cases. That would be a GREAT thing to have. >>> The only commands that will target more than one object are: >>> >>> - DROP, either in the command or by means of CASCADE; >>> - CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage; >> >> CREATE TABLE can also create subsidiary objects (indexes, >> constraints); and there could be more things that do this in the >> future. > > Subsidiary objects are not the same thing at all as a command that > targets more than one object, and the difference is user visible. Hmm. Really? I guess I'm not seeing why those cases are particularly different. >>> The CASCADE case is something else entirely, and we need to be very >>> careful about its locking behavior. Also, in the CASCADE case I can >>> perfectly agree that we're not talking about a ddl_something event any >>> more, and that in 9.4 we will be able to provide a sql_drop generic >>> event that will now about that. >> >> I've felt from the beginning that we really need that to be able to do >> anything worthwhile with this feature, and I said so last year around >> this time. Meanwhile, absent that, if we put something in here that >> only handles a subset of cases, the result will be DDL replication >> that randomly breaks. > > So we have two proposals here: > > - Have the cascading drop calls back to process utility with a new > context value of PROCESS_UTILITY_CASCADE and its parse node, wherein > you only stuff the ObjectAdress, and provide event triggers support > for this new cascade command; > > - Implement a new event called "sql_drop" where you provide the same > amount of information than in a ddl_command event (same lookups, > etc), but without any parsetree nor I suppose the command string > that the user would have to type to drop just that object. > > You objected to the first on modularity violation grounds, and on the > second on development cycle timing grounds. And now you're saying that > because we don't have a practical solution, I'm not sure, apparently > it's dead, but what is? > > Please help me decipher your process of thoughs and conclusions. OK, so I object to the first one on modularity grounds (as did Tom). I don't object to the second one at all except that, AFAIK, nobody's written the code yet. Maybe I'm misunderstanding something. Actually, I think we could probably use a similar set of infrastructure *either* to implement sql_drop OR to provide the information to ddl_command_end. What I'm thinking about is that we might have something sort of like after-trigger queue that we use for ordinary triggers. Instead of calling a trigger as the drops happen, we queue up a bunch of events that say "xyz got dropped". And then, we can either fire something like ddl_command_end (and pass in the whole list as an array or a table) or we can call something like sql_drop (n times, passing details one one object per invocation). In either approach, the triggers fire *after* the drops rather than in the middle of them - it's just a question of whether you want one call for all the drops or one call for each drop. >> I'm not sure I can, but I think that the design I'm proposing gives a >> lot of flexibility. If we create a pg_parse_tree type with no input >> function and an output function that merely returns a dummy value, and >> we pass a pg_parse_tree object to PL/pgsql event triggers, then it's a >> small win even if it offers no accessors at all - because somebody >> could put whatever accessor functions they want in a contrib module >> without waiting for 9.4. > > That's not solving the problem we want to solve, though. You can already > do what you say with the code that we have today, you just have to code > the extension in C. I don't agree, but I recognize there might be more than one valid opinion here. >> The point is - we wouldn't expose the whole parse tree. Rather, we'd >> expose a parse-tree as an opaque object, and give the user functions >> that would expose specific bits of it. Those bits could grow over >> time without needing any changes to PLs or existing trigger functions, >> and they wouldn't change if the parse tree internals changed, and >> users could extend the amount of information exposed via contrib >> functions (at their own risk, of course). > > We already have that, baring the extension that grovels through the > parse tree. The benefit from maintaining that code as an extension is > relying on a stable enough API from the backend so that you can provide > the same behaviour to users accross major versions. And with pg_upgrade, > the same storage format. > > Here I don't see how to reach that goals without anything new in core, > given that the main requirement from core about the parse tree is to be > able to whack it around in any release (even minor) without concern's > about API nor ABI stability. Well, the point is that if you have a function that maps a parse tree onto an object name, any API or ABI changes can be reflected in an updated definition for that function. So suppose I have the command "CREATE TABLE public.foo (a int)". And we have a call pg_target_object_namespace(), which will return "public" given the parse tree for the foregoing command. And we have a call pg_target_object_name(), which will return "foo". We can whack around the underlying parse tree representation all we want and still not break anything - because any imaginable parse tree representation will allow the object name and object namespace to be extracted. Were that not possible it could scarcely be called a parse tree any longer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 13-01-26 11:11 PM, Robert Haas wrote: > On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: >> My understanding is that if the command string we give to event triggers >> is ambiguous (sub-object names, schema qualifications, etc), it comes >> useless for logical replication use. I'll leave it to the consumers of >> that to speak up now. > > Yeah, that's probably true. I think it might be useful for other > purposes, but I think we need a bunch of infrastructure we don't have > yet to make logical replication of DDL a reality. > I agree. Does anyone have a specific use case other than DDL replication where an ambiguous command string would be useful? Even for use cases like automatically removing a table from replication when it is dropped, I would want to be able to determine which table is being dropped unambiguously. Could I determine that from an oid? I suspect so, but parsing a command string and then trying to figure out the table from the search_path doesn't sound very appealing. > Well, the point is that if you have a function that maps a parse tree > onto an object name, any API or ABI changes can be reflected in an > updated definition for that function. So suppose I have the command > "CREATE TABLE public.foo (a int)". And we have a call > pg_target_object_namespace(), which will return "public" given the > parse tree for the foregoing command. And we have a call > pg_target_object_name(), which will return "foo". We can whack around > the underlying parse tree representation all we want and still not > break anything - because any imaginable parse tree representation will > allow the object name and object namespace to be extracted. Were that > not possible it could scarcely be called a parse tree any longer. How do you get the fully qualified type of the first column? col1=pg_target_get_column(x, 0) pg_target_get_type(col1); or something similar. I think that could work but we would be adding a lot of API functions to get all the various bits of info one would want the API to expose. I also suspect executing triggers that had to make lots of function calls to walk a tree would be much slower than an extension that could just walk the parse-tree or some other abstract tree like structure. Steve
On Sun, Jan 27, 2013 at 12:08 PM, Steve Singer <ssinger@ca.afilias.info> wrote: > On 13-01-26 11:11 PM, Robert Haas wrote: >> On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine >> <dimitri@2ndquadrant.fr> wrote: >>> >>> My understanding is that if the command string we give to event triggers >>> is ambiguous (sub-object names, schema qualifications, etc), it comes >>> useless for logical replication use. I'll leave it to the consumers of >>> that to speak up now. >> >> Yeah, that's probably true. I think it might be useful for other >> purposes, but I think we need a bunch of infrastructure we don't have >> yet to make logical replication of DDL a reality. > > I agree. Does anyone have a specific use case other than DDL replication > where an ambiguous command string would be useful? Even for use cases like > automatically removing a table from replication when it is dropped, I would > want to be able to determine which table is being dropped unambiguously. > Could I determine that from an oid? I suspect so, but parsing a command > string and then trying to figure out the table from the search_path doesn't > sound very appealing. I was thinking about logging or auditing applications - e.g. log all command strings whose first word is "drop". We do get customers coming to us with that sort of request from time to time. >> Well, the point is that if you have a function that maps a parse tree >> onto an object name, any API or ABI changes can be reflected in an >> updated definition for that function. So suppose I have the command >> "CREATE TABLE public.foo (a int)". And we have a call >> pg_target_object_namespace(), which will return "public" given the >> parse tree for the foregoing command. And we have a call >> pg_target_object_name(), which will return "foo". We can whack around >> the underlying parse tree representation all we want and still not >> break anything - because any imaginable parse tree representation will >> allow the object name and object namespace to be extracted. Were that >> not possible it could scarcely be called a parse tree any longer. > > How do you get the fully qualified type of the first column? > col1=pg_target_get_column(x, 0) > pg_target_get_type(col1); > > or something similar. Or maybe return all the data for all the columns as an array of records... > I think that could work but we would be adding a lot of API functions to get > all the various bits of info one would want the API to expose. Yeah, that's the principle disadvantage of this method, AFAICS. OTOH, I am not sure anything else we might do is any better. You can avoid the complexity of defining and specifying all that interface by just taking the parse tree and running nodeToString() on it and lobbing it over the fence, but that is basically abdicating the responsibility of defining a sane interface in favor of just chucking whatever you happen to have handy over the wall and hoping the user is OK with calling that an interface. Assuming you think that's a bad idea (and Tom and I do, at least), you've got to put in the effort to design something, and that thing, however constituted, is going to be fairly extensive. > I also > suspect executing triggers that had to make lots of function calls to walk a > tree would be much slower than an extension that could just walk the > parse-tree or some other abstract tree like structure. I am kind of doubtful that this is a real problem - why should it be any slower than, say, parsing a JSON object into an AST? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> The current patch implementation is to fill in the object id, name and >> schema with NULL when we have something else than a single object as the >> target. I did that when I realized we have a precedent with statement >> triggers and that we would maybe share the implementation of the "record >> set variable" facility for PLs here. > > But I don't see how *that's* going to be any good for logical > replication either. If the raw query string isn't useful, getting the > OID in some cases but not all surely can't be any better. Well I would think that "we don't support droping several objects at once yet in 9.3" is an acceptable answer. And for DROP, it's a special case anyway as what happens is that you *stop* receiving any data from those tables/objects, so the problem is one of cleanup. >> So we have two proposals here: >> >> - Have the cascading drop calls back to process utility with a new >> context value of PROCESS_UTILITY_CASCADE and its parse node, wherein >> you only stuff the ObjectAdress, and provide event triggers support >> for this new cascade command; >> >> - Implement a new event called "sql_drop" where you provide the same >> amount of information than in a ddl_command event (same lookups, >> etc), but without any parsetree nor I suppose the command string >> that the user would have to type to drop just that object. >> >> You objected to the first on modularity violation grounds, and on the >> second on development cycle timing grounds. And now you're saying that >> because we don't have a practical solution, I'm not sure, apparently >> it's dead, but what is? >> >> Please help me decipher your process of thoughs and conclusions. > > OK, so I object to the first one on modularity grounds (as did Tom). > I don't object to the second one at all except that, AFAIK, nobody's > written the code yet. Maybe I'm misunderstanding something. I mean object as in "not in 9.3, too late", which you seem to me confirming here. So what do we want to provide in 9.3 exactly? Note that I have enough time allocated on finishing that patch that I surely can come up with an implementation of whatever we finally decide within this commit fest, if the implementation is straightforward enough. As I've been playing in and out around those topics for 2 years now, plenty of things are a SMOP now: the problem is reaching an agreement about how to implement things. > Actually, I think we could probably use a similar set of > infrastructure *either* to implement sql_drop OR to provide the > information to ddl_command_end. What I'm thinking about is that we > might have something sort of like after-trigger queue that we use for > ordinary triggers. Instead of calling a trigger as the drops happen, > we queue up a bunch of events that say "xyz got dropped". And then, > we can either fire something like ddl_command_end (and pass in the > whole list as an array or a table) or we can call something like > sql_drop (n times, passing details one one object per invocation). In > either approach, the triggers fire *after* the drops rather than in > the middle of them - it's just a question of whether you want one call > for all the drops or one call for each drop. So, do you want to see a patch implementing that to be able to provide information about DROP commands targeting more than one object and same information in other cases (object id, name, schema, operation name, object type)? That could be the next patch of the series, by Thursday. > Well, the point is that if you have a function that maps a parse tree > onto an object name, any API or ABI changes can be reflected in an > updated definition for that function. So suppose I have the command > "CREATE TABLE public.foo (a int)". And we have a call > pg_target_object_namespace(), which will return "public" given the > parse tree for the foregoing command. And we have a call > pg_target_object_name(), which will return "foo". We can whack around > the underlying parse tree representation all we want and still not > break anything - because any imaginable parse tree representation will > allow the object name and object namespace to be extracted. Were that > not possible it could scarcely be called a parse tree any longer. In my view, we have no DDL operation where we don't want to be able to get at the Object ID, name and schema of the operation. So while more complex cases might well require a full function based API, I think a set of magic variables TG_* for basic information is going to make life easier for everybody. Then, more complex cases are left alone in 9.3 and we have whole development cycle to address them, with some data type and a set of functions maybe. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Jan 27, 2013 at 12:57 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> The current patch implementation is to fill in the object id, name and >>> schema with NULL when we have something else than a single object as the >>> target. I did that when I realized we have a precedent with statement >>> triggers and that we would maybe share the implementation of the "record >>> set variable" facility for PLs here. >> >> But I don't see how *that's* going to be any good for logical >> replication either. If the raw query string isn't useful, getting the >> OID in some cases but not all surely can't be any better. > > Well I would think that "we don't support droping several objects at > once yet in 9.3" is an acceptable answer. On that point, I disagree. >>> So we have two proposals here: >>> >>> - Have the cascading drop calls back to process utility with a new >>> context value of PROCESS_UTILITY_CASCADE and its parse node, wherein >>> you only stuff the ObjectAdress, and provide event triggers support >>> for this new cascade command; >>> >>> - Implement a new event called "sql_drop" where you provide the same >>> amount of information than in a ddl_command event (same lookups, >>> etc), but without any parsetree nor I suppose the command string >>> that the user would have to type to drop just that object. >>> >>> You objected to the first on modularity violation grounds, and on the >>> second on development cycle timing grounds. And now you're saying that >>> because we don't have a practical solution, I'm not sure, apparently >>> it's dead, but what is? >>> >>> Please help me decipher your process of thoughs and conclusions. >> >> OK, so I object to the first one on modularity grounds (as did Tom). >> I don't object to the second one at all except that, AFAIK, nobody's >> written the code yet. Maybe I'm misunderstanding something. > > I mean object as in "not in 9.3, too late", which you seem to me > confirming here. So what do we want to provide in 9.3 exactly? > > Note that I have enough time allocated on finishing that patch that I > surely can come up with an implementation of whatever we finally decide > within this commit fest, if the implementation is straightforward > enough. As I've been playing in and out around those topics for 2 years > now, plenty of things are a SMOP now: the problem is reaching an > agreement about how to implement things. I don't really know how to respond to this. The CommitFest time is supposed to be when you stop working on your own stuff and work on reviewing other people's stuff, and yet, both this year and last year, you seem to have no compunctions whatever about continuing to code right through the CommitFest, which to me misses the point. The CommitFest, and especially the last one, is supposed to be the time to commit what is committable now, not to go write new code. And it is not as if I didn't make these same design points last year around this time. On the other hand, if it's really true that you can bang this out in a couple of days and come up with something committable, will I commit it? Yeah, probably. But I think that is likely to take several months of work, not several days, and the longer we go on with this patch the less time there is for me to review other things that might be a whole lot more ready to go than code that isn't written yet. >> Actually, I think we could probably use a similar set of >> infrastructure *either* to implement sql_drop OR to provide the >> information to ddl_command_end. What I'm thinking about is that we >> might have something sort of like after-trigger queue that we use for >> ordinary triggers. Instead of calling a trigger as the drops happen, >> we queue up a bunch of events that say "xyz got dropped". And then, >> we can either fire something like ddl_command_end (and pass in the >> whole list as an array or a table) or we can call something like >> sql_drop (n times, passing details one one object per invocation). In >> either approach, the triggers fire *after* the drops rather than in >> the middle of them - it's just a question of whether you want one call >> for all the drops or one call for each drop. > > So, do you want to see a patch implementing that to be able to provide > information about DROP commands targeting more than one object and same > information in other cases (object id, name, schema, operation name, > object type)? That could be the next patch of the series, by Thursday. I think this is approximately correct as a next logical step, but are we really clear on what the design for that looks like? I thought you were saying upthread that you wanted to make this information available via some kind of pseudo-table for 9.4. In any event, adding a new event type (sql_drop) is a separate patch, and adding information to the existing types of event triggers is a separate patch, so you should try to do one of those two, not both. >> Well, the point is that if you have a function that maps a parse tree >> onto an object name, any API or ABI changes can be reflected in an >> updated definition for that function. So suppose I have the command >> "CREATE TABLE public.foo (a int)". And we have a call >> pg_target_object_namespace(), which will return "public" given the >> parse tree for the foregoing command. And we have a call >> pg_target_object_name(), which will return "foo". We can whack around >> the underlying parse tree representation all we want and still not >> break anything - because any imaginable parse tree representation will >> allow the object name and object namespace to be extracted. Were that >> not possible it could scarcely be called a parse tree any longer. > > In my view, we have no DDL operation where we don't want to be able to > get at the Object ID, name and schema of the operation. So while more > complex cases might well require a full function based API, I think a > set of magic variables TG_* for basic information is going to make life > easier for everybody. > > Then, more complex cases are left alone in 9.3 and we have whole > development cycle to address them, with some data type and a set of > functions maybe. Doing some of it one way and some of it the other way seems kind of grotty, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> Well I would think that "we don't support droping several objects at >> once yet in 9.3" is an acceptable answer. > > On that point, I disagree. Ok, will provide a patch for that soon, then. I did give higher priority to exposing more information, ok to switch. Read some more for caveats. > I don't really know how to respond to this. The CommitFest time is > supposed to be when you stop working on your own stuff and work on > reviewing other people's stuff, and yet, both this year and last year, > you seem to have no compunctions whatever about continuing to code > right through the CommitFest, which to me misses the point. The You might realize that I'm not a commiter, so commit fest is when I get some review and a chance to react quickly so that we get enough momentum to reach a consensus and maybe commit something. Also, it seems to me that I've been very careful to review as many patches as I could in all commit fests where I had a patch involved. Also, please consider that if I only allowed myself to write code in between commit fests, I would still have zero patches in PostgreSQL. None. Zilch. > CommitFest, and especially the last one, is supposed to be the time to > commit what is committable now, not to go write new code. And it is > not as if I didn't make these same design points last year around this > time. Well, a commit fest for me is the only time when we can reach a common design, call it a consensus and have matching code to evaluate and maybe commit. And it's not like if I didn't rewrite this patch series several times already from scratch to follow your design reviews. The points where I didn't rewrite the code to match your ideas are those that I didn't give priority to, or for which I still think my proposal is better than yours, and so we didn't get to a consensus yet. It seems to me to be a pretty normal state of affairs, and again, if I am to follow your advice and never come up with new code while a commit fest is in progress, I'd better just stop trying to contribute to PostgreSQL. >> So, do you want to see a patch implementing that to be able to provide >> information about DROP commands targeting more than one object and same >> information in other cases (object id, name, schema, operation name, >> object type)? That could be the next patch of the series, by Thursday. > > I think this is approximately correct as a next logical step, but are > we really clear on what the design for that looks like? I thought you > were saying upthread that you wanted to make this information > available via some kind of pseudo-table for 9.4. In any event, adding > a new event type (sql_drop) is a separate patch, and adding > information to the existing types of event triggers is a separate > patch, so you should try to do one of those two, not both. That's exactly what I've been saying too from the beginnings, argueing that the first step should be providing enough information to user functions called by event triggers. If I'm to provide a solution for the multiple object drops without any specific information associated with firing a trigger, I don't much see the point of the exercise. So in my view, either we have specific information and a proper DROP CASCADE support, or only specific information without support for multiple DROP targets, or both of them. Support for CASCADE with no information about the objects being DROPed? Why would we want that? > Doing some of it one way and some of it the other way seems kind of > grotty, though. It's only following existing design: http://www.postgresql.org/docs/9.2/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING 39.6.6.1. Obtaining information about an error There are two ways to get information about the current exception in PL/pgSQL: special variables and the GET STACKED DIAGNOSTICScommand. And I think that having special variables for the most common cases (ID, name, schema) and a more complex API (later) for more complex cases only makes sense. And in 9.3, for the most complex cases, you will need to code your Event Trigger in C and walk the parse tree. What's wrong with that, in details? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
I'm poking at event triggers a bit; would like to set up some examples (and see if they work, or break down badly; both are interesting results) to do some validation of schema for Slony. What I'm basically thinking about is to set up some event triggers that run on DROP TABLE / DROP SEQUENCE, and see about cleaning up the replication side of things (e.g. - inject a request to drop the table/sequence from replication). I have a bit of a complaint as to what documentation is included; I don't see any references in the documentation to ddl_command_start / ddl_command_end, which seem to be necessary values for event triggers. I'd tend to think that there should be a new subsection in the "man page" for CREATE TRIGGER that includes at least two fully formed examples of event triggers, involving the two events in question. Is change of that sort in progress? -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
Christopher Browne <cbbrowne@gmail.com> writes: > I'm poking at event triggers a bit; would like to set up some examples > (and see if they > work, or break down badly; both are interesting results) to do some > validation of schema > for Slony. Cool, thanks! > What I'm basically thinking about is to set up some event triggers that run on > DROP TABLE / DROP SEQUENCE, and see about cleaning up the replication > side of things (e.g. - inject a request to drop the table/sequence > from replication). Sure. In what got commited from the current patch series, you will only know that a DROP TABLE (or DROP SEQUENCE) occured, and we're trying to get to an agreement with Robert if we should prefer to add visibility to such events that occurs in a CASCADE statement or rather add the OID (and maybe the name) of the Object that's going to be dropped. Your opinion is worth a lot on that matter, if you have one to share! :) > I have a bit of a complaint as to what documentation is included; I don't see > any references in the documentation to ddl_command_start / ddl_command_end, > which seem to be necessary values for event triggers. What we have now here: http://www.postgresql.org/docs/devel/static/event-trigger-matrix.html http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER Is it not visible enough, or really missing the point? > I'd tend to think that there should be a new subsection in the "man page" for > CREATE TRIGGER that includes at least two fully formed examples of event > triggers, involving the two events in question. Is change of that > sort in progress? The event triggers are addressed in a whole new chapter of the docs, maybe that's why you didn't find the docs? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Jan 28, 2013 at 6:19 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Christopher Browne <cbbrowne@gmail.com> writes: >> I'm poking at event triggers a bit; would like to set up some examples >> (and see if they >> work, or break down badly; both are interesting results) to do some >> validation of schema >> for Slony. > > Cool, thanks! > >> What I'm basically thinking about is to set up some event triggers that run on >> DROP TABLE / DROP SEQUENCE, and see about cleaning up the replication >> side of things (e.g. - inject a request to drop the table/sequence >> from replication). > > Sure. In what got commited from the current patch series, you will only > know that a DROP TABLE (or DROP SEQUENCE) occured, and we're trying to > get to an agreement with Robert if we should prefer to add visibility to > such events that occurs in a CASCADE statement or rather add the OID > (and maybe the name) of the Object that's going to be dropped. > > Your opinion is worth a lot on that matter, if you have one to share! :) Hmm. I think some information about the object is pretty needful. For the immediate case I'm poking at, namely looking for dropped tables,I could determine that which object is gone by inference; if I run the trigger as part of the ddl_command_end event, then I could run a query that searches the slony table sl_table, and if I find any tables for which there is no longer a corresponding table in pg_catalog.pg_class, then I infer which table got dropped. But I think I'd really rather know more explicitly which table is being dropped. Having the oid available in some trigger variable should suffice. It appears to me as though it's relevant to return an OID for all of the command tags. Something useful to clarify in the documentation is what differences are meaningful between ddl_command_start and ddl_command_end to make it easier to determine which event one would most want to use. Musing a bit... It seems to me that it might be a slick idea to run a trigger at both _start and _end, capturing metadata about the object into temp tables at both times, which would then allow the _end function to compare the data in the temp table to figure out what to do next. I wouldn't think that's apropos as default behaviour; that's something for the crafty developer that's building a trigger function to do. Having a parse tree for the query that initiates the event would be mighty useful, as would be a canonicalized form of the query. I think we could add some useful "protection" (e.g. - such as my example of an event trigger that generates "DROP TABLE FROM REPLICATION") using the present functionality, even perhaps without OIDs, but I don't think I'd want to get into trying to forward arbitrary DDL without having the canonicalized query available. >> I have a bit of a complaint as to what documentation is included; I don't see >> any references in the documentation to ddl_command_start / ddl_command_end, >> which seem to be necessary values for event triggers. > > What we have now here: > > http://www.postgresql.org/docs/devel/static/event-trigger-matrix.html > http://www.postgresql.org/docs/devel/static/sql-createeventtrigger.html > http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER > > Is it not visible enough, or really missing the point? Ah, I missed the second one; I was looking under CREATE TRIGGER, didn't notice that CREATE EVENT TRIGGER was separately available; that resolves most of what I thought was missing. I think a bit more needs to be said about the meanings of the events and the command tags, but what I imagined missing wasn't. >> I'd tend to think that there should be a new subsection in the "man page" for >> CREATE TRIGGER that includes at least two fully formed examples of event >> triggers, involving the two events in question. Is change of that >> sort in progress? > > The event triggers are addressed in a whole new chapter of the docs, > maybe that's why you didn't find the docs? I found that chapter, just not the command :-). -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
Christopher Browne <cbbrowne@gmail.com> writes: > Hmm. I think some information about the object is pretty needful. > > For the immediate case I'm poking at, namely looking for dropped tables,I > could determine that which object is gone by inference; if I run the trigger > as part of the ddl_command_end event, then I could run a query that > searches the slony table sl_table, and if I find any tables for which there > is no longer a corresponding table in pg_catalog.pg_class, then I infer > which table got dropped. > > But I think I'd really rather know more explicitly which table is being dropped. > > Having the oid available in some trigger variable should suffice. Ack. I think I'm going to send a patch tomorrow with support for exposing the OID of the object (when at the time of firing the event trigger it still exists) and support for DROP CASCADE objects thanks to a SRF that you will be able to call from your event trigger function. > It appears to me as though it's relevant to return an OID for all of the command > tags. Yeah, I've been working on that before and the refactoring in utility.c is already commited by Robert, so that's definitely in-scope. > Something useful to clarify in the documentation is what differences are > meaningful between ddl_command_start and ddl_command_end to make > it easier to determine which event one would most want to use. Will add. > Musing a bit... It seems to me that it might be a slick idea to run a > trigger at > both _start and _end, capturing metadata about the object into temp tables > at both times, which would then allow the _end function to compare the data > in the temp table to figure out what to do next. I wouldn't think > that's apropos > as default behaviour; that's something for the crafty developer that's building > a trigger function to do. Please remember that at the _end of a DROP the object no longer exists in the catalogs, so you will get a NULL for its OID, and same at _start of a CREATE command. For implementation reasons, same as CREATE for the ALTER case. > Having a parse tree for the query that initiates the event would be > mighty useful, > as would be a canonicalized form of the query. > > I think we could add some useful "protection" (e.g. - such as my example of > an event trigger that generates "DROP TABLE FROM REPLICATION") using > the present functionality, even perhaps without OIDs, but I don't > think I'd want > to get into trying to forward arbitrary DDL without having the canonicalized > query available. Thanks. As soon as the OID+CASCADE patch is done, I'm back on working on Command String Normalisation. > I think a bit more needs to be said about the meanings of the events > and the command tags, > but what I imagined missing wasn't. Definitely, another pass on the docs will be needed here. I'd like to be able to postpone that to beta times, as I'm arranging my schedule so as to be available and participating then. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support