Re: Event Triggers: adding information
От | Robert Haas |
---|---|
Тема | Re: Event Triggers: adding information |
Дата | |
Msg-id | CA+TgmoYimdGCiKFbKFbcZ_pJ5jT+sryxN40p24i8efq8j6BNgA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Event Triggers: adding information (Dimitri Fontaine <dimitri@2ndQuadrant.fr>) |
Ответы |
Re: Event Triggers: adding information
(Dimitri Fontaine <dimitri@2ndQuadrant.fr>)
|
Список | pgsql-hackers |
On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > - Context exposing and filtering > > My preference for that point is to include it and limit it to event > triggers written in C, leaving the door open to add new events in > the future, and allowing existing consumers of that framework see > through implementation details should they need it. > > It's better than just a ProcessUtility_hook because you can see it > in \dy, decide to disable it with ALTER EVENT TRIGGER (e.g. in a > transcation to avoid recursive calling into itself), and it won't > run in single user mode even when in shared_preload_libraries. > > Another idea is to protect it with a GUC. > > Yet another idea is to add a new property in the CREATE EVENT > TRIGGER command so that user can request explicitely to have that > and know the feature will break at next release and all the ones > after that: > > CREATE EVENT TRIGGER foo ON ddl_command_start > WHEN show_internals in ('Yes I know what I''m doing', > 'Thanks') > EXECUTE PROCEDURE …(); > > That particular phrasing and option choice might be revisited before > commit, though, I'm not wedded to it. I'm not feeling very sanguine about any of this. I feel strongly that we should try to do something that's more principled than just throwing random junk into ProcessUtility. > - Additional Information to PL/pgSQL > > We're talking about : > > - operation (CREATE, ALTER, DROP) > - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …) > - object OID (when there's a single object target) > - object shema name (when there's a single object target) > - object name (when there's a single object target) In ddl_command_start, I believe you can only expose the information that is available in the parse tree. That is, if the user typed CREATE TABLE foo, you can tell the event trigger that the user didn't specify a schema and that they mentioned the name foo. You cannot predict what schema foo will actually end up in at that point, and we shouldn't try. I don't have a problem with exposing the information we have at this point with the documented caveat that if you rely on it to mean something it doesn't you get to keep both pieces. But I wonder: wouldn't it be better to just expose the raw string the user typed? I mean, in all the cases we really care about, the information we can *reliably* expose at this point is going to be pretty nearly the same as extracting the third space-delimited word out of the command text and splitting it on a period. And you can do that much easily even in PL/pgsql. You could also extract a whole lot of OTHER potentially useful information from the command text that you couldn't get if we just exposed the given schema name and object name.For example, this would allow you to write an eventtrigger that lets you enforce a policy that all column names must use Hungarian notation. Note that this is a terrible idea and you shouldn't do it, but you could if you wanted to. And, the trigger would be relatively long and complex and you might have bugs, but that's still better than the status quo where you are simply out of luck. Now, in a ddl_command_end trigger, there's a lot more information that you can usefully expose. In theory, if the command records "what it did" somewhere, you can extract that information with as much precision as it cared to record. However, I'm pretty skeptical of the idea of exposing a single OID. I mean, if the "main" use case here is replication, there are holes big enough to drive a truck through there. It's not really better for other use cases, either: if you're auditing or logging or whatever, you can drive the same truck through the same hole. It seems to me that to get anything really useful here, you need some more complex way of passing data, like oh say an array of OIDs. Really, you're going to need something quite a bit more complicated than that to describe something like ALTER TABLE, but even for pretty simple cases a single OID doesn't cut it. > - Additional Information to PL/* > > It's separated away because you did opt-out from reviewing that > part, so I guess we will continue to make that a separate patch, and > it will still need to happen. I've been said that I shouldn't be on > the hook to provide for that, and I wonder, but anyway I did already > implement it in a previous version so I can do it again as soon as > we know what we expose. I'm not sure what you mean that you shouldn't be on the hook to provide that. I don't think that you're obliged to submit patches to make this work for other PLs - indeed, I think it's too late for that for 9.3 - but I don't think anyone else is required to do it, either. It doesn't seem like it would be too hard to get the other PLs up to the same level that PL/pgsql is at; I just didn't want to try to familiarize myself with PL/perl, PL/python, and PL/tcl, none of which I know the guts of and most of which I don't compile regularly, at the same time I was trying to grok everything else in the patch. > - Command String Normalisation > > I removed it already from the main patch because we needed to talk > about it more, and it's kind of a big part in itself. Meanwhile it > seems to me that an agreement has been reached and that I will be > able to follow the consensus and resume working on that part. I'm not sure we've reached a consensus here (what is it?), but let me run an idea or two up the flagpole and see if anyone salutes. It seems to me that there's some consensus that reverse-parsing may be a reasonable thing to include in some form, but there are a lot of unsolved problems there, including (1) that the parse tree might not contain enough information for the reverse-parser to describe what the command actually did, as opposed to what the user asked for, and (2) there may be important decisions which were made at execution time and which are needed for correct replication but which actually can't be represented in the syntax at all (I am not sure what these would be: but Tom seemed to think that such cases exist, so they need to be understood and thought about). If we're going to do reverse parsing, I would argue that we need to put some effort into making that more flexible than the designs I've seen so far. For example, I believe your concept of command string normalization includes the idea that we'd fully schema-qualify the object name. Like... instead of saying CREATE TABLE foo (a text), we'd say CREATE TABLE public.foo (a text). That would guarantee that when we replay the command on another machine, it ends up in the same schema. But who is to say that's what the user wants? You can imagine, for example, that a user might want to replicate schema public on node A into schema node_a on node B. If we're back to parsing the SQL at that point we are still ahead - but not very far ahead. And, of course, the proper canonicalization is really CREATE TABLE public.foo (a pg_catalog.text), because there could be another type called text in some other schema on node B. That may seem like a borderline case when the name of the type is "text" but if you substitute "bar" for "text" it suddenly starts to seem totally plausible that there could be more than one bar. Of course if you substitute "hstore" for "text" but hstore is in a different schema in the source and target databases then you've got a problem if you DO schema-qualify things; and if the goal is just to pretty-print things for the user's benefit, all of this may be overkill. I recognize that it's probably impossible or unrealistic to get an exactly-perfect solution to all of these problems, but I'm working my way around to a proposal. Suppose we added a type pg_parse_tree, and a bunch of accessor functions that operate on it. You could have things like get_operation(pg_parse_tree) and get_object_type(pg_parse_tree) and get_array_of_object_names_and_schemas(pg_parse_tree), which would largely obviate the need to pass 27 magic variables for all of the things somebody might want - anything we decide we want to expose just becomes another accessor function. Deparsing is also a function, which means we don't incur the overhead of doing it unless the user asks for it. And you might even be able to have mutators, so that you can for example take a pg_parse_tree, make a new one by changing the schema name that's mentioned, and then deparse that. I'm sure Tom is scared now, if he's reading this, but maybe it's got some potential. I certainly don't think this solves every problem. In particular, we run the risk of exposing every internal detail of the parse tree, something Tom and I have both vigorously resisted in every discussion of event triggers since time immemorial. On the other hand, the discussion of this topic suggests that the main thing that people want out of event triggers is precisely to be able to get at information from the parse tree, so we're kind of damned if we do and damned if we don't. At least something like this provides a reasonable abstraction boundary that allows the information to be exposed in a controlled way. For example, if we add a function that tells you what object name the user typed, it doesn't need to expose the fact that the parse tree names that field differently in different DDL constructs, which certainly increases our odds of not going insane after a release or two of maintaining this code. The other question is what to do about the information that's gathered as the command is running that you afterwards wish to provide to the event trigger - for example, for CREATE TABLE foo (), you want to reveal the actually-selected creation schema. One idea is to have DDL commands annotate the parse tree with these sorts of details, but the disadvantage of this is that it might not be what everybody wants for all purposes; and you still have the question of what to do with information that can't be effectively represented in the parse tree at all. I'm inclined to think a better approach might be to pass around some kind of "container" data structure to which the DDL command can glue in whatever bits of information it wants to include, and then dump the result out as a 2-D text array or a JSON object or something.I'm waving my hands at this problem a bit because ofcourse one important use case is to be able to do a reverse-parse that factors this information into account, but we could do that, I think, by allowing this information to be passed to the deparse API along with the pg_parse_tree object. I'm waving my hands here, but maybe not any harder than any other design thus far proposed, and a fuller explanation would make this interminably long email even longer, so I'll stop here for now. > - ddl_event_trace > > Now is the time to kill it (for 9.3) or allow it in. > > It means another cache lookup at DDL Event time (so 2 cache lookups > per DDL in 9.3); and allow users who just want to have the > information not to have to care about when it is available. Unless we get some of the other stuff this doesn't have much value anyway, and depending on how other parts of the design shake out this might seem less relevant by the time we get done. > - Non DDL related Events > > Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best, > unless someone else wants to have at it, maybe? I'd like to take a crack at adding support for some kind of really cool event for 9.4 - I suspect this one is not what I'd pick as a first target, just because it seems hard. But I'll mull it over. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Следующее
От: Tom LaneДата:
Сообщение: Re: Prepared statements fail after schema changes with surprising error