Обсуждение: Functional dependencies and GROUP BY
I have developed a patch that partially implements the "functional dependency" feature that allows some columns to be omitted from the GROUP BY clause if it can be shown that the columns are functionally dependent on the columns in the group by clause and therefore guaranteed to be unique per group. The full functional dependency deduction rules are pretty big and arcane, so I concentrated on getting a useful subset working. In particular: When grouping by primary key, the other columns can be omitted, e.g., CREATE TABLE tab1 (a int PRIMARY KEY, b int); SELECT a, b FROM tab1 GROUP BY a; This is frequently requested by MySQL converts (and possibly others). Also, when a column is compared with a constant, it can appear ungrouped: SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x; For lack of a better idea, I have made it so that merge-joinable operators qualify as equality operators. Better ideas welcome. Other rules could be added over time (but I'm current not planning to work on that myself). At this point, this patch could use some review and testing with unusual queries that break my implementation. ;-)
Вложения
On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > I have developed a patch that partially implements the "functional > dependency" feature Nice! :) -- greg
2010/6/8 Peter Eisentraut <peter_e@gmx.net>: > I have developed a patch that partially implements the "functional > dependency" feature that allows some columns to be omitted from the > GROUP BY clause if it can be shown that the columns are functionally > dependent on the columns in the group by clause and therefore guaranteed > to be unique per group. The full functional dependency deduction rules > are pretty big and arcane, so I concentrated on getting a useful subset > working. In particular: > Also, when a column is compared with a constant, it can appear > ungrouped: > > SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x; I don't see why it should be allowed. I see the insist that y must be unique value so it is ok to be ungrouped but the point of discussion is far from that; Semantically y is not grouping key. In addition, what if y is implicitly a constant? For example, SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; or there should be more complicated example including JOIN cases. I don't believe we can detect all of such cases. If the simple case is allowed, users don't understand why the complicated case doesn't allow sometimes. So it'll not be consistent. Finally, it may hide unintended bugs. ORM tools may make WHERE clause in some conditions and don't in other conditions. Regards, -- Hitoshi Harada
* Hitoshi Harada (umi.tanuki@gmail.com) wrote: > I don't see why it should be allowed. I see the insist that y must be > unique value so it is ok to be ungrouped but the point of discussion > is far from that; Semantically y is not grouping key. Ignoring the fact that it's terribly useful- isn't it part of the SQL spec? > In addition, what if y is implicitly a constant? For example, > > SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; Not sure I see the issue here? > Finally, it may hide unintended bugs. ORM tools may make WHERE clause > in some conditions and don't in other conditions. Yeah, this one I really just done buy.. If an ORM tool doesn't write correct SQL, then it's the ORM's fault, not ours. Thanks, Stephen
* Peter Eisentraut (peter_e@gmx.net) wrote: > This is frequently requested by MySQL converts (and possibly others). I'd certainly love to see it- but let's not confuse people by implying that it would actually act the way MySQL does. It wouldn't, because what MySQL does is alot closer to 'distinct on' and is patently insane to boot. Again, I'd *love* to see this be done in PG, but when we document it and tell people about it, *please* don't say it's similar in any way to MySQL's "oh, we'll just pick a random value from the columns that aren't group'd on" implementation. > At this point, this patch could use some review and testing with unusual > queries that break my implementation. ;-) I'll give it a shot... :) Thanks! Stephen
Peter Eisentraut <peter_e@gmx.net> writes: > I have developed a patch that partially implements the "functional > dependency" feature that allows some columns to be omitted from the > GROUP BY clause if it can be shown that the columns are functionally > dependent on the columns in the group by clause and therefore guaranteed > to be unique per group. The main objection to this is the same one I've had all along: it makes the syntactic validity of a query dependent on what indexes exist for the table. At minimum, that means that enforcing the check at parse time is the Wrong Thing. The var-compared-with-constant case seems like a big crock. Are we really required to provide such a thing per spec? I'm also fairly concerned about the performance of a check implemented this way --- it's going to do a lot of work, and do it over and over again as it traverses the query tree. At least some of that could be alleviated after you move the check to the planner, just by virtue of the index information already having been acquired ... but I'd still suggest expending more than no effort on caching the results. For instance, given "SELECT * FROM a_very_wide_table GROUP BY pk" you shouldn't have to prove more than once that a_very_wide_table is grouped by its PK. regards, tom lane
2010/6/7 Greg Stark <gsstark@mit.edu>: > On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> I have developed a patch that partially implements the "functional >> dependency" feature > > Nice! :) > I like this idea too. It can simplify some queries and I believe - it is very good marketing bonus for us. Pavel > -- > greg > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote: > > Also, when a column is compared with a constant, it can appear > > ungrouped: > > > > SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x; > > I don't see why it should be allowed. I see the insist that y must be > unique value so it is ok to be ungrouped but the point of discussion > is far from that; Semantically y is not grouping key. I'm not sure what your argument is. If y is uniquely determined within each group, then it's OK for it to be ungrouped. What other criteria do you have in mind for determining that instead? It looks like you are going by aesthetics. ;-) > In addition, what if y is implicitly a constant? For example, > > SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; > > or there should be more complicated example including JOIN cases. I > don't believe we can detect all of such cases. If the simple case is > allowed, users don't understand why the complicated case doesn't allow > sometimes. So it'll not be consistent. Yes, as I said, my implementation is incomplete in the sense that it only recognizes some functional dependencies. To recognize the sort of thing you show, you would need some kind of complex deduction or proof engine, and that doesn't seem worthwhile, at least for me, at this point.
On Mon, Jun 7, 2010 at 6:41 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Peter Eisentraut (peter_e@gmx.net) wrote: >> This is frequently requested by MySQL converts (and possibly others). > > I'd certainly love to see it- but let's not confuse people by implying > that it would actually act the way MySQL does. It wouldn't, because > what MySQL does is alot closer to 'distinct on' and is patently insane > to boot. Again, I'd *love* to see this be done in PG, but when we > document it and tell people about it, *please* don't say it's similar in > any way to MySQL's "oh, we'll just pick a random value from the columns > that aren't group'd on" implementation. Preface: I work as a MySQL DBA (yeah, yeah, laugh it up...). It has been my experience that the vast majority of the time when a MySQL users make use of the "fine feature" which allows them to use unaggregated columns which is not present in the GROUP BY clause in an aggregating query they have made an error because they do not understand GROUP BY. I have found this lack of understanding to be very wide spread across the MySQL developer and *DBA* communities. I also would really hesitate to compare this useful feature to the *fine feature* present in MySQL. Due to a long standing bug (http://bugs.mysql.com/bug.php?id=8510) it really is not possible to get MySQL to behave sanely. It is my impression that many programs of significant size that interact with MySQL have errors because of this issue and it would be good to not claim to have made Postgres compatible. That said, I imagine if this feature could make it into the Postgres tree it would be very useful. Would I be correct in assuming that while this feature would make query planning more expensive, it would also often decrease the cost of execution? Best, Rob Wultsch wultsch@gmail.com
On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> I have developed a patch that partially implements the "functional >> dependency" feature that allows some columns to be omitted from the >> GROUP BY clause if it can be shown that the columns are functionally >> dependent on the columns in the group by clause and therefore guaranteed >> to be unique per group. > > The main objection to this is the same one I've had all along: it makes > the syntactic validity of a query dependent on what indexes exist for > the table. At minimum, that means that enforcing the check at parse > time is the Wrong Thing. It also needs to ensure that the plan is invalidated if the constraint is dropped, which I assume amounts to the same thing. -- greg
Greg Stark <gsstark@mit.edu> writes: > On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The main objection to this is the same one I've had all along: it makes >> the syntactic validity of a query dependent on what indexes exist for >> the table. �At minimum, that means that enforcing the check at parse >> time is the Wrong Thing. > It also needs to ensure that the plan is invalidated if the constraint > is dropped, which I assume amounts to the same thing. Well, no, any cached plan will get invalidated if the index goes away. The big problem with this implementation is that you could create a *rule* (eg a view) containing a query whose validity depends on the existence of an index. Dropping the index will not cause the rule to be invalidated. Perhaps the correct fix would be to mark stored query trees as having a dependency on the index, so that dropping the index/constraint would force a drop of the rule too. Just pushing the check to plan time, as I suggested yesterday, isn't a very nice fix because it would result in the rule unexpectedly starting to fail at execution. regards, tom lane
On Tue, Jun 8, 2010 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, no, any cached plan will get invalidated if the index goes away. > The big problem with this implementation is that you could create a > *rule* (eg a view) containing a query whose validity depends on the > existence of an index. Dropping the index will not cause the rule > to be invalidated. Hm, I was incorrectly thinking of this as analogous to the cases of plans that could be optimized based on the existence of a constraint. For example removing columns from a sort key because they're unique. But this is different because not just the plan but the validity of the query itself is dependent on the constraint. -- greg
Peter Eisentraut <peter_e@gmx.net> writes: > On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote: >> In addition, what if y is implicitly a constant? For example, >> >> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; > Yes, as I said, my implementation is incomplete in the sense that it > only recognizes some functional dependencies. To recognize the sort of > thing you show, you would need some kind of complex deduction or proof > engine, and that doesn't seem worthwhile, at least for me, at this > point. The question is why bother to recognize *any* cases of this form. I find it really semantically ugly to have the parser effectively doing one deduction of this form when the main engine for that type of deduction is elsewhere; so unless there is a really good argument why we have to do this case (and NOT "it was pretty easy"), I don't want to do it. As far as I recall, at least 99% of the user requests for this type of behavior, maybe 100%, would be satisfied by recognizing the group-by-primary-key case. So I think we should do that and be happy. regards, tom lane
On 6/8/10 5:21 PM +0300, Tom Lane wrote: > Peter Eisentraut<peter_e@gmx.net> writes: >> On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote: >>> In addition, what if y is implicitly a constant? For example, >>> >>> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; > >> Yes, as I said, my implementation is incomplete in the sense that it >> only recognizes some functional dependencies. To recognize the sort of >> thing you show, you would need some kind of complex deduction or proof >> engine, and that doesn't seem worthwhile, at least for me, at this >> point. > > The question is why bother to recognize *any* cases of this form. > I find it really semantically ugly to have the parser effectively > doing one deduction of this form when the main engine for that type > of deduction is elsewhere; so unless there is a really good argument > why we have to do this case (and NOT "it was pretty easy"), I don't > want to do it. > > As far as I recall, at least 99% of the user requests for this type > of behavior, maybe 100%, would be satisfied by recognizing the > group-by-primary-key case. So I think we should do that and be happy. +1 Regards, Marko Tiikkaja
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Perhaps the correct fix would be to mark stored query trees as having a > dependency on the index, so that dropping the index/constraint would > force a drop of the rule too. Just pushing the check to plan time, as > I suggested yesterday, isn't a very nice fix because it would result > in the rule unexpectedly starting to fail at execution. Alternatively, we could rewrite the rule (not unlike what we do for "SELECT *") to actually add on the other implicitly grouped-by columns.. I don't know if that's better or worse than creating a dependency, since if the constraint were dropped/changed, people might expect the rule's output to change. Of course, as you mention, the alternative would really be for the rule to just start failing.. Still, if I wanted to change the constraint, it'd be alot nicer to just be able to change it and, presuming I'm just adding a column to it or doing some other change which wouldn't invalidate the rule, not have to drop/recreate the rule. Thanks, Stephen
On Tue, Jun 8, 2010 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The question is why bother to recognize *any* cases of this form. > I find it really semantically ugly to have the parser effectively > doing one deduction of this form when the main engine for that type > of deduction is elsewhere; so unless there is a really good argument > why we have to do this case (and NOT "it was pretty easy"), I don't > want to do it. Well it does appear to be there: 4.18.11 Known functional dependencies in the result of a <where clause> ... If AP is an equality AND-component of the <search condition> simply contained in the <where clause> and one comparand of AP is a column reference CR, and the other comparand of AP is a <literal>, then let CRC be the counterpart of CR in R. {} {CRC} is a known functional dependency in R, where {} denotes the empty set. NOTE 43 — An SQL-implementation may also choose to recognize {} {CRC} as a known functional dependency if the other comparand is a deterministic expression containing no column references. ... Since Peter's not eager to implement the whole section -- which does seem pretty baroque -- it's up to us to draw the line where we stop coding and declare it good enough. I think we're all agreed that grouping by a pk is clearly the most important case. It may be important to get some other cases just so that the PK property carries through other clauses such as joins and group bys. But ultimately the only thing stopping us from implementing the whole thing is our threshold of pain for writing and maintaining the extra code. -- greg
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Perhaps the correct fix would be to mark stored query trees as having a >> dependency on the index, so that dropping the index/constraint would >> force a drop of the rule too. > Alternatively, we could rewrite the rule (not unlike what we do for > "SELECT *") to actually add on the other implicitly grouped-by columns.. > I don't know if that's better or worse than creating a dependency, > since if the constraint were dropped/changed, people might expect the > rule's output to change. Hm. The problem with that is that one of the benefits we'd like to get from this is an efficiency win: the generated plan ought to only group by the PK, not uselessly sort/group by everything in the row. I suppose we could have the planner reverse-engineer its way to that, but it seems awfully slow and clunky to add on the extra columns and then reason our way to removing them again. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Hm. The problem with that is that one of the benefits we'd like to get > from this is an efficiency win: the generated plan ought to only group > by the PK, not uselessly sort/group by everything in the row. I suppose > we could have the planner reverse-engineer its way to that, but it seems > awfully slow and clunky to add on the extra columns and then reason our > way to removing them again. That's certainly a good point. Another issue that I realized when thinking about this again- if someone wanted to *drop* a column that's part of a PK (since it turned out to not be necessary, for example), and then wanted to recreate the rule based on what was stored in the catalog, they wouldn't be able to without modifying it, and that's certainly be annoying too. Guess my 2c would be for creating the dependency. I really dislike the idea of the rule just all of a sudden breaking. Thanks, Stephen
On tis, 2010-06-08 at 10:21 -0400, Tom Lane wrote: > The question is why bother to recognize *any* cases of this form. > I find it really semantically ugly to have the parser effectively > doing one deduction of this form when the main engine for that type > of deduction is elsewhere; so unless there is a really good argument > why we have to do this case (and NOT "it was pretty easy"), I don't > want to do it. Yeah, I'm not horribly attached to it. I began to structure the code to support multiple kinds of checks, and at the end only two kinds were reasonably doable and useful. We can remove it if no one is interested in it, which appears to be the case.
On tis, 2010-06-08 at 10:05 -0400, Tom Lane wrote: > Perhaps the correct fix would be to mark stored query trees as having > a > dependency on the index, so that dropping the index/constraint would > force a drop of the rule too. Just pushing the check to plan time, as > I suggested yesterday, isn't a very nice fix because it would result > in the rule unexpectedly starting to fail at execution. There are actually pretty explicit instructions about this in the SQL standard: <drop table constraint definition> 4) If QS is a <query specification> that contains an implicit or explicit <group by clause> and that contains a column reference to a column C in its <select list> that is not contained in an aggregated argument of a <set function specification>, and if G is the set of grouping columns of QS, and if the table constraint TC is needed to conclude that G ↦ C is a known functional dependency in QS, then QS is said to be dependent on TC. 5) If V is a view that contains a <query specification> that is dependent on a table constraint TC, then V is said to be dependent on TC. So the dependency between the view/rule and the constraint/index needs to be stored in the dependency system, and RESTRICT/CASCADE will take effect.
On mån, 2010-06-07 at 21:33 +0300, Peter Eisentraut wrote: > I have developed a patch that partially implements the "functional > dependency" feature that allows some columns to be omitted from the > GROUP BY clause if it can be shown that the columns are functionally > dependent on the columns in the group by clause and therefore > guaranteed to be unique per group. Second version: I stripped out all checks except the primary key/unique constraint checks. Views whose existence depends on one of those constraints get a dependency recorded. This depends on the patch currently in the commit fest to record not null constraints in pg_constraint, so that the dependencies on not-null constraints can be recorded. I haven't done any caching of index lookups yet. Some testing with 1600-column tables didn't show any effect. I'll test this a little more.
Вложения
On Fri, Jun 25, 2010 at 14:06, Peter Eisentraut <peter_e@gmx.net> wrote: > Second version: Hi! Ive looked this over. Looks great! I have some nits about the documentation and comments ( non issues like referencing primary keys when it really means not null unique indexes :-P ), but on the whole it works and looks good. The only corner case I have run into is creating a view with what I would call an implicit 'not null' constraint. Demonstration below: create table nn (a int4 not null, b int4, unique (a)); select * from nn group by a; -- should this work? I think not? a | b ---+--- (0 rows) create view vv as select a, b from nn group by a; select * from vv;a | b ---+--- (0 rows) =# alter table nn alter column a drop not null; =# select * from nn group by a; -- ok, broken makes sense ERROR: column "nn.b" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT * from nn group by a; =# select * from vv; -- yipes should be broken?a | b ---+--- (0 rows) Im thinking we should not allow the "select * from nn group by a;" to work. Thoughts? (FYI I do plan on doing some performance testing with large columns later, any other requests?)
On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote: > The only corner case I have run into is creating a view with what I > would call an implicit 'not null' constraint. Demonstration below: > > create table nn (a int4 not null, b int4, unique (a)); > select * from nn group by a; -- should this work? I think not? I believe I referred to this upsthread. There is another patch in the commitfest about explicitly representing NOT NULL constraints in pg_constraint. Then this case would create a dependency on those constraints. So we need to get that other patch in first.
On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut <peter_e@gmx.net> wrote: > On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote: >> The only corner case I have run into is creating a view with what I >> would call an implicit 'not null' constraint. Demonstration below: >> >> create table nn (a int4 not null, b int4, unique (a)); >> select * from nn group by a; -- should this work? I think not? > > I believe I referred to this upsthread. Aww, and here I thought I had just been diligent :). In other news its really no surprise that your test with 1600 columns had little effect. As it loops over the the indexes, then the index keys and then the group by items right? So I would expect the more indexes you had or group by items to slow it down. Not so much the number of columns. Right? Anyhow it sounds like I should try it on top of the other patch and see if it works. I assume it might still need some fixups to work with that other patch? Or do you expect it to just work?
On Fri, Jul 16, 2010 at 22:29, Alex Hunsaker <badalex@gmail.com> wrote: > (FYI I do plan on doing some performance testing with large columns > later, any other requests?) And here are the results. All tests are with an empty table with 1500 int4 columns. There is a unique non null index on the first column. (non assert build) A: select count(1) from (select * from test group by ...1500 columns...) as res; B: select count(1) from (select * from test group by a_0) as res; -- a_0 has the not null unique index CVS A: 360ms PATCH A: 370ms PATCH B: 60ms 1500 indexes (one per column, on the column): CVS: A: 670ms PATCH A: 850ms PATCH B: 561ms So it seems for tables with lots of columns the patch is faster, at least when you omit all the columns from the group by. I suspect for most "normal" (5-20 columns) usage it should be a wash. (Stupid question) Does anyone know why HEAD is quite a bit slower when there are lots off indexes? Do we end up looping and perhaps locking them or something?
On Sat, Jul 17, 2010 at 11:13, Alex Hunsaker <badalex@gmail.com> wrote: > On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut <peter_e@gmx.net> wrote: >> On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote: >>> The only corner case I have run into is creating a view with what I >>> would call an implicit 'not null' constraint. Demonstration below: >>> >>> create table nn (a int4 not null, b int4, unique (a)); >>> select * from nn group by a; -- should this work? I think not? >> >> I believe I referred to this upsthread. Yeah, I went back and reread the thread and um... yep its right where you posted patch 2. I think I read it, forgot about it and then it bubbled up to my subconscious while testing :). > Anyhow it sounds like I should try it on top of the other patch and > see if it works. I assume it might still need some fixups to work > with that other patch? Or do you expect it to just work? [ referring to the not null pg_constraint patch ] Probably no surprise to you, I tried it on top of the not null pg_constraint patch and it did not work 'out of the box'. Mainly I was curious if there was some magic going on that I did not see. In any event do you think its worth adding a regression test for this?
On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote: > its really no surprise that your test with 1600 columns had little > effect. As it loops over the the indexes, then the index keys and > then the group by items right? So I would expect the more indexes you > had or group by items to slow it down. Not so much the number of > columns. Right? At the outer level (which is not visible in this patch) it loops over all columns in the select list, and then it looks up the indexes each time. So the concern was that if the select list was * with a wide table, looking up the indexes each time would be slow. > Anyhow it sounds like I should try it on top of the other patch and > see if it works. I assume it might still need some fixups to work > with that other patch? Or do you expect it to just work? There is some work necessary to integrate the two.
On Sun, Jul 18, 2010 at 02:40, Peter Eisentraut <peter_e@gmx.net> wrote: > On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote: >> So I would expect the more indexes you >> had or group by items to slow it down. Not so much the number of >> columns. Right? > > At the outer level (which is not visible in this patch) it loops over > all columns in the select list, and then it looks up the indexes each > time. So the concern was that if the select list was * with a wide > table, looking up the indexes each time would be slow. Thanks for the explanation. >> Anyhow it sounds like I should try it on top of the other patch and >> see if it works. I assume it might still need some fixups to work >> with that other patch? Or do you expect it to just work? > > There is some work necessary to integrate the two. I just read that patch is getting pushed till at least the next commit fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php Should we push this patch back to? Alternatively we could make it work with just primary keys until the other patch gets in. I think that makes sense, find that attached. Thoughts? Note I axed the index not null attribute checking, I have not thought to deeply about it other than if its a primary key it cant have non null attributes.... Right? I may have missed something subtle hence the heads up.
Вложения
On Fri, Jul 23, 2010 at 11:00, Alex Hunsaker <badalex@gmail.com> wrote: > Alternatively we could make it > work with just primary keys until the other patch gets in. I think > that makes sense, find that attached. Thoughts? *sigh*, find attached a version that removes talk of unique not null constraints from the docs
Вложения
On fre, 2010-07-23 at 11:00 -0600, Alex Hunsaker wrote: > I just read that patch is getting pushed till at least the next commit > fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php > > Should we push this patch back to? Alternatively we could make it > work with just primary keys until the other patch gets in. I think > that makes sense, find that attached. Thoughts? I was thinking the same thing. > Note I axed the index not null attribute checking, I have not thought > to deeply about it other than if its a primary key it cant have non > null attributes.... Right? I may have missed something subtle hence > the heads up. Another open question I thought of was whether we should put the dependency record on the pg_index row, or the pg_constraint row, or perhaps the pg_class row. Right now, it is using pg_index, because that was easiest to code up, but I suspect that once we have not-null constraints in pg_constraint, it will be more consistent to make all dependencies go against pg_constraint rather than a mix of several catalogs.
On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut <peter_e@gmx.net> wrote: > Another open question I thought of was whether we should put the > dependency record on the pg_index row, or the pg_constraint row, or > perhaps the pg_class row. Right now, it is using pg_index, because that > was easiest to code up, but I suspect that once we have not-null > constraints in pg_constraint, it will be more consistent to make all > dependencies go against pg_constraint rather than a mix of several > catalogs. I think for primary keys pg_index is OK. However for the not-null case we have to use pg_constraint... So given that we end up having to code that anyways, it seems like it will end up being cleaner/consistent to always use the pg_constraint row(s). So +1 for using pg_constraint instead of pg_index from me.
On mån, 2010-07-26 at 10:46 -0600, Alex Hunsaker wrote: > On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut <peter_e@gmx.net> wrote: > > > Another open question I thought of was whether we should put the > > dependency record on the pg_index row, or the pg_constraint row, or > > perhaps the pg_class row. Right now, it is using pg_index, because that > > was easiest to code up, but I suspect that once we have not-null > > constraints in pg_constraint, it will be more consistent to make all > > dependencies go against pg_constraint rather than a mix of several > > catalogs. > > I think for primary keys pg_index is OK. However for the not-null > case we have to use pg_constraint... So given that we end up having to > code that anyways, it seems like it will end up being > cleaner/consistent to always use the pg_constraint row(s). So +1 for > using pg_constraint instead of pg_index from me. Next version. Changed dependencies to pg_constraint, removed handling of unique constraints for now, and made some enhancements so that views track dependencies on constraints even in subqueries. Should be close to final now. :-)
Вложения
Peter Eisentraut <peter_e@gmx.net> writes: > Next version. Changed dependencies to pg_constraint, removed handling > of unique constraints for now, and made some enhancements so that views > track dependencies on constraints even in subqueries. Should be close > to final now. :-) I've committed this with some revisions, notably: The view.c changes were fundamentally wrong. The right place to extract dependencies from a parsetree is in dependency.c, specifically find_expr_references_walker. The way you were doing it meant that dependencies on constraints would only be noticed for views, not for other cases such as stored plans. I rewrote the catalog search to look only at pg_constraint, not using pg_index at all. I think this will be easier to extend to the case of looking for UNIQUE + NOT NULL, whenever we get around to doing that. I also moved the search into catalog/pg_constraint.c, because it didn't seem to belong in parse_agg (as hinted by the large number of #include additions you had to make to put it there). I put in a bit of caching logic to prevent repeating the search for multiple Vars of the same relation. Tests or no tests, I can't believe that's going to be cheap enough that we want to repeat it over and over... regards, tom lane
On 7 August 2010 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> Next version. Changed dependencies to pg_constraint, removed handling >> of unique constraints for now, and made some enhancements so that views >> track dependencies on constraints even in subqueries. Should be close >> to final now. :-) > > I've committed this with some revisions, notably: > > The view.c changes were fundamentally wrong. The right place to > extract dependencies from a parsetree is in dependency.c, specifically > find_expr_references_walker. The way you were doing it meant that > dependencies on constraints would only be noticed for views, not for > other cases such as stored plans. > > I rewrote the catalog search to look only at pg_constraint, not using > pg_index at all. I think this will be easier to extend to the case of > looking for UNIQUE + NOT NULL, whenever we get around to doing that. > I also moved the search into catalog/pg_constraint.c, because it didn't > seem to belong in parse_agg (as hinted by the large number of #include > additions you had to make to put it there). > > I put in a bit of caching logic to prevent repeating the search for > multiple Vars of the same relation. Tests or no tests, I can't believe > that's going to be cheap enough that we want to repeat it over and > over... > > regards, tom lane > I was testing out this feature this morning and discovered that the results may be non-deterministic if the PK is deferrable. I think that check_functional_grouping() should exclude any deferrable constraints, since in general, any inference based on a deferrable constraint can't be trusted when planning a query, since the constraint can't be guaranteed to be valid when the query is executed. That's also consistent with the SQL spec. The original version of the patch had that check in it, but it vanished from the final committed version. Was that just an oversight, or an intentional change? Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 7 August 2010 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I was testing out this feature this morning and discovered that the > results may be non-deterministic if the PK is deferrable. Good point. > The original version of the patch had that check in it, but it > vanished from the final committed version. Was that just an oversight, > or an intentional change? I don't recall having thought about it one way or the other. What did the check look like? regards, tom lane
On 5 September 2010 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> On 7 August 2010 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I was testing out this feature this morning and discovered that the >> results may be non-deterministic if the PK is deferrable. > > Good point. > >> The original version of the patch had that check in it, but it >> vanished from the final committed version. Was that just an oversight, >> or an intentional change? > > I don't recall having thought about it one way or the other. What did > the check look like? > Well originally it was searching indexes rather than constraints, and funcdeps_check_pk() included the following check: if (!indexStruct->indisprimary || !indexStruct->indimmediate)continue; Now its looping over pg_constraint entries, so I guess anything wtih con->condeferrable == true should be ignored. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 5 September 2010 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't recall having thought about it one way or the other. �What did >> the check look like? > Well originally it was searching indexes rather than constraints, and > funcdeps_check_pk() included the following check: > if (!indexStruct->indisprimary || !indexStruct->indimmediate) > continue; > Now its looping over pg_constraint entries, so I guess anything wtih > con->condeferrable == true should be ignored. Seems reasonable, will fix. Thanks for the report! regards, tom lane
On sön, 2010-09-05 at 11:35 -0400, Tom Lane wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > On 5 September 2010 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I don't recall having thought about it one way or the other. What did > >> the check look like? > > > Well originally it was searching indexes rather than constraints, and > > funcdeps_check_pk() included the following check: > > > if (!indexStruct->indisprimary || !indexStruct->indimmediate) > > continue; > > > Now its looping over pg_constraint entries, so I guess anything wtih > > con->condeferrable == true should be ignored. > > Seems reasonable, will fix. Thanks for the report! Yes, the SQL standard explicitly requires the constraint in question to be immediate.