Обсуждение: Functional dependencies and GROUP BY

Поиск

Список

Период

Сортировка

Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

07 июня 2010 г., 15:34:12

I have developed a patch that partially implements the "functional
dependency" feature that allows some columns to be omitted from the
GROUP BY clause if it can be shown that the columns are functionally
dependent on the columns in the group by clause and therefore guaranteed
to be unique per group.  The full functional dependency deduction rules
are pretty big and arcane, so I concentrated on getting a useful subset
working.  In particular:

When grouping by primary key, the other columns can be omitted, e.g.,

CREATE TABLE tab1 (a int PRIMARY KEY, b int);

SELECT a, b FROM tab1 GROUP BY a;

This is frequently requested by MySQL converts (and possibly others).

Also, when a column is compared with a constant, it can appear
ungrouped:

SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;

For lack of a better idea, I have made it so that merge-joinable
operators qualify as equality operators.  Better ideas welcome.

Other rules could be added over time (but I'm current not planning to
work on that myself).

At this point, this patch could use some review and testing with unusual
queries that break my implementation. ;-)

Вложения

functional-deps.patch

Re: Functional dependencies and GROUP BY

От

Greg Stark

Дата:

07 июня 2010 г., 18:30:27

On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> I have developed a patch that partially implements the "functional
> dependency" feature

Nice! :)

-- 
greg

Re: Functional dependencies and GROUP BY

От

Hitoshi Harada

Дата:

07 июня 2010 г., 21:59:51

2010/6/8 Peter Eisentraut <peter_e@gmx.net>:
> I have developed a patch that partially implements the "functional
> dependency" feature that allows some columns to be omitted from the
> GROUP BY clause if it can be shown that the columns are functionally
> dependent on the columns in the group by clause and therefore guaranteed
> to be unique per group.  The full functional dependency deduction rules
> are pretty big and arcane, so I concentrated on getting a useful subset
> working.  In particular:

> Also, when a column is compared with a constant, it can appear
> ungrouped:
>
> SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;

I don't see why it should be allowed. I see the insist that y must be
unique value so it is ok to be ungrouped but the point of discussion
is far from that; Semantically y is not grouping key.

In addition, what if y is implicitly a constant? For example,

SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

or there should be more complicated example including JOIN cases. I
don't believe we can detect all of such cases. If the simple case is
allowed, users don't understand why the complicated case doesn't allow
sometimes. So it'll not be consistent.

Finally, it may hide unintended bugs. ORM tools may make WHERE clause
in some conditions and don't in other conditions.

Regards,

--
Hitoshi Harada

Re: Functional dependencies and GROUP BY

От

Stephen Frost

Дата:

07 июня 2010 г., 22:38:34

* Hitoshi Harada (umi.tanuki@gmail.com) wrote:
> I don't see why it should be allowed. I see the insist that y must be
> unique value so it is ok to be ungrouped but the point of discussion
> is far from that; Semantically y is not grouping key.

Ignoring the fact that it's terribly useful- isn't it part of the SQL
spec?

> In addition, what if y is implicitly a constant? For example,
>
> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

Not sure I see the issue here?

> Finally, it may hide unintended bugs. ORM tools may make WHERE clause
> in some conditions and don't in other conditions.

Yeah, this one I really just done buy..  If an ORM tool doesn't write
correct SQL, then it's the ORM's fault, not ours.
Thanks,
    Stephen

Re: Functional dependencies and GROUP BY

От

Stephen Frost

Дата:

07 июня 2010 г., 22:41:33

* Peter Eisentraut (peter_e@gmx.net) wrote:
> This is frequently requested by MySQL converts (and possibly others).

I'd certainly love to see it- but let's not confuse people by implying
that it would actually act the way MySQL does.  It wouldn't, because
what MySQL does is alot closer to 'distinct on' and is patently insane
to boot.  Again, I'd *love* to see this be done in PG, but when we
document it and tell people about it, *please* don't say it's similar in
any way to MySQL's "oh, we'll just pick a random value from the columns
that aren't group'd on" implementation.

> At this point, this patch could use some review and testing with unusual
> queries that break my implementation. ;-)

I'll give it a shot... :)
Thanks!
    Stephen

Re: Functional dependencies and GROUP BY

От

Tom Lane

Дата:

08 июня 2010 г., 00:16:27

Peter Eisentraut <peter_e@gmx.net> writes:
> I have developed a patch that partially implements the "functional
> dependency" feature that allows some columns to be omitted from the
> GROUP BY clause if it can be shown that the columns are functionally
> dependent on the columns in the group by clause and therefore guaranteed
> to be unique per group.

The main objection to this is the same one I've had all along: it makes
the syntactic validity of a query dependent on what indexes exist for
the table.  At minimum, that means that enforcing the check at parse
time is the Wrong Thing.

The var-compared-with-constant case seems like a big crock.  Are we
really required to provide such a thing per spec?

I'm also fairly concerned about the performance of a check implemented
this way --- it's going to do a lot of work, and do it over and over
again as it traverses the query tree.  At least some of that could be
alleviated after you move the check to the planner, just by virtue of
the index information already having been acquired ... but I'd still
suggest expending more than no effort on caching the results.  For
instance, given "SELECT * FROM a_very_wide_table GROUP BY pk" you
shouldn't have to prove more than once that a_very_wide_table is
grouped by its PK.
        regards, tom lane

Re: Functional dependencies and GROUP BY

От

Pavel Stehule

Дата:

08 июня 2010 г., 03:57:22

2010/6/7 Greg Stark <gsstark@mit.edu>:
> On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> I have developed a patch that partially implements the "functional
>> dependency" feature
>
> Nice! :)
>

I like this idea too. It can simplify some queries and I believe - it
is very good marketing bonus for us.

Pavel
> --
> greg
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

08 июня 2010 г., 06:32:11

On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:
> > Also, when a column is compared with a constant, it can appear
> > ungrouped:
> >
> > SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;
> 
> I don't see why it should be allowed. I see the insist that y must be
> unique value so it is ok to be ungrouped but the point of discussion
> is far from that; Semantically y is not grouping key.

I'm not sure what your argument is.  If y is uniquely determined within
each group, then it's OK for it to be ungrouped.  What other criteria do
you have in mind for determining that instead?  It looks like you are
going by aesthetics. ;-)

> In addition, what if y is implicitly a constant? For example,
> 
> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;
> 
> or there should be more complicated example including JOIN cases. I
> don't believe we can detect all of such cases. If the simple case is
> allowed, users don't understand why the complicated case doesn't allow
> sometimes. So it'll not be consistent.

Yes, as I said, my implementation is incomplete in the sense that it
only recognizes some functional dependencies.  To recognize the sort of
thing you show,  you would need some kind of complex deduction or proof
engine, and that doesn't seem worthwhile, at least for me, at this
point.

Re: Functional dependencies and GROUP BY

От

Rob Wultsch

Дата:

08 июня 2010 г., 06:55:49

On Mon, Jun 7, 2010 at 6:41 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Peter Eisentraut (peter_e@gmx.net) wrote:
>> This is frequently requested by MySQL converts (and possibly others).
>
> I'd certainly love to see it- but let's not confuse people by implying
> that it would actually act the way MySQL does.  It wouldn't, because
> what MySQL does is alot closer to 'distinct on' and is patently insane
> to boot.  Again, I'd *love* to see this be done in PG, but when we
> document it and tell people about it, *please* don't say it's similar in
> any way to MySQL's "oh, we'll just pick a random value from the columns
> that aren't group'd on" implementation.

Preface: I work as a MySQL DBA (yeah, yeah, laugh it up...).

It has been my experience that the vast majority of the time when a
MySQL users make use of the "fine feature" which allows them to use
unaggregated columns which is not present in the GROUP BY clause in an
aggregating query they have made an error because they do not
understand GROUP BY. I have found this lack of understanding to be
very wide spread across the MySQL developer and *DBA* communities. I
also would really hesitate to compare this useful feature to the *fine
feature* present in MySQL. Due to a long standing bug
(http://bugs.mysql.com/bug.php?id=8510) it really is not possible to
get MySQL to behave sanely. It is my impression that many programs of
significant size that interact with MySQL have errors because of this
issue and it would be good to not claim to have made Postgres
compatible.

That said, I imagine if this feature could make it into the Postgres
tree it would be very useful.

Would I be correct in assuming that while this feature would make
query planning more expensive, it would also often decrease the cost
of execution?

Best,

Rob Wultsch
wultsch@gmail.com

Re: Functional dependencies and GROUP BY

От

Greg Stark

Дата:

08 июня 2010 г., 09:41:21

On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> I have developed a patch that partially implements the "functional
>> dependency" feature that allows some columns to be omitted from the
>> GROUP BY clause if it can be shown that the columns are functionally
>> dependent on the columns in the group by clause and therefore guaranteed
>> to be unique per group.
>
> The main objection to this is the same one I've had all along: it makes
> the syntactic validity of a query dependent on what indexes exist for
> the table.  At minimum, that means that enforcing the check at parse
> time is the Wrong Thing.

It also needs to ensure that the plan is invalidated if the constraint
is dropped, which I assume amounts to the same thing.




--
greg

Re: Functional dependencies and GROUP BY

От

Tom Lane

Дата:

08 июня 2010 г., 11:05:47

Greg Stark <gsstark@mit.edu> writes:
> On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The main objection to this is the same one I've had all along: it makes
>> the syntactic validity of a query dependent on what indexes exist for
>> the table. �At minimum, that means that enforcing the check at parse
>> time is the Wrong Thing.

> It also needs to ensure that the plan is invalidated if the constraint
> is dropped, which I assume amounts to the same thing.

Well, no, any cached plan will get invalidated if the index goes away.
The big problem with this implementation is that you could create a
*rule* (eg a view) containing a query whose validity depends on the
existence of an index.  Dropping the index will not cause the rule
to be invalidated.

Perhaps the correct fix would be to mark stored query trees as having a
dependency on the index, so that dropping the index/constraint would
force a drop of the rule too.  Just pushing the check to plan time, as
I suggested yesterday, isn't a very nice fix because it would result
in the rule unexpectedly starting to fail at execution.
        regards, tom lane

Re: Functional dependencies and GROUP BY

От

Greg Stark

Дата:

08 июня 2010 г., 11:12:22

On Tue, Jun 8, 2010 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, no, any cached plan will get invalidated if the index goes away.
> The big problem with this implementation is that you could create a
> *rule* (eg a view) containing a query whose validity depends on the
> existence of an index.  Dropping the index will not cause the rule
> to be invalidated.

Hm, I was incorrectly thinking of this as analogous to the cases of
plans that could be optimized based on the existence of a constraint.
For example removing columns from a sort key because they're unique.
But this is different because not just the plan but the validity of
the query itself is dependent on the constraint.

--
greg

Re: Functional dependencies and GROUP BY

От

Tom Lane

Дата:

08 июня 2010 г., 11:21:38

Peter Eisentraut <peter_e@gmx.net> writes:
> On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:
>> In addition, what if y is implicitly a constant? For example,
>> 
>> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

> Yes, as I said, my implementation is incomplete in the sense that it
> only recognizes some functional dependencies.  To recognize the sort of
> thing you show,  you would need some kind of complex deduction or proof
> engine, and that doesn't seem worthwhile, at least for me, at this
> point.

The question is why bother to recognize *any* cases of this form.
I find it really semantically ugly to have the parser effectively
doing one deduction of this form when the main engine for that type
of deduction is elsewhere; so unless there is a really good argument
why we have to do this case (and NOT "it was pretty easy"), I don't
want to do it.

As far as I recall, at least 99% of the user requests for this type
of behavior, maybe 100%, would be satisfied by recognizing the
group-by-primary-key case.  So I think we should do that and be happy.
        regards, tom lane

Re: Functional dependencies and GROUP BY

От

Marko Tiikkaja

Дата:

08 июня 2010 г., 11:48:55

On 6/8/10 5:21 PM +0300, Tom Lane wrote:
> Peter Eisentraut<peter_e@gmx.net>  writes:
>> On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:
>>> In addition, what if y is implicitly a constant? For example,
>>>
>>> SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;
>
>> Yes, as I said, my implementation is incomplete in the sense that it
>> only recognizes some functional dependencies.  To recognize the sort of
>> thing you show,  you would need some kind of complex deduction or proof
>> engine, and that doesn't seem worthwhile, at least for me, at this
>> point.
>
> The question is why bother to recognize *any* cases of this form.
> I find it really semantically ugly to have the parser effectively
> doing one deduction of this form when the main engine for that type
> of deduction is elsewhere; so unless there is a really good argument
> why we have to do this case (and NOT "it was pretty easy"), I don't
> want to do it.
>
> As far as I recall, at least 99% of the user requests for this type
> of behavior, maybe 100%, would be satisfied by recognizing the
> group-by-primary-key case.  So I think we should do that and be happy.

+1


Regards,
Marko Tiikkaja

Re: Functional dependencies and GROUP BY

От

Stephen Frost

Дата:

08 июня 2010 г., 12:16:32

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Perhaps the correct fix would be to mark stored query trees as having a
> dependency on the index, so that dropping the index/constraint would
> force a drop of the rule too.  Just pushing the check to plan time, as
> I suggested yesterday, isn't a very nice fix because it would result
> in the rule unexpectedly starting to fail at execution.

Alternatively, we could rewrite the rule (not unlike what we do for
"SELECT *") to actually add on the other implicitly grouped-by columns..
I don't know if that's better or worse than creating a dependency,
since if the constraint were dropped/changed, people might expect the
rule's output to change.  Of course, as you mention, the alternative
would really be for the rule to just start failing..  Still, if I wanted
to change the constraint, it'd be alot nicer to just be able to change
it and, presuming I'm just adding a column to it or doing some other
change which wouldn't invalidate the rule, not have to drop/recreate
the rule.
Thanks,
    Stephen

Re: Functional dependencies and GROUP BY

От

Greg Stark

Дата:

08 июня 2010 г., 12:22:40

On Tue, Jun 8, 2010 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The question is why bother to recognize *any* cases of this form.
> I find it really semantically ugly to have the parser effectively
> doing one deduction of this form when the main engine for that type
> of deduction is elsewhere; so unless there is a really good argument
> why we have to do this case (and NOT "it was pretty easy"), I don't
> want to do it.

Well it does appear to be there:

4.18.11 Known functional dependencies in the result of a <where clause>
...
If AP is an equality AND-component of the <search condition> simply
contained in the <where clause> and one comparand of AP is a column
reference CR, and the other comparand of AP is a <literal>, then let
CRC be the counterpart of CR in R. {}   {CRC} is a known functional
dependency in R, where {} denotes the empty set.

NOTE 43 — An SQL-implementation may also choose to recognize {}
{CRC} as a known functional dependency if the other comparand is a
deterministic expression containing no column references.
...

Since Peter's not eager to implement the whole section -- which does
seem pretty baroque -- it's up to us to draw the line where we stop
coding and declare it good enough. I think we're all agreed that
grouping by a pk is clearly the most important case. It may be
important to get some other cases just so that the PK property carries
through other clauses such as joins and group bys. But ultimately the
only thing stopping us from implementing the whole thing is our
threshold of pain for writing and maintaining the extra code.

--
greg

Re: Functional dependencies and GROUP BY

От

Tom Lane

Дата:

08 июня 2010 г., 12:33:31

Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Perhaps the correct fix would be to mark stored query trees as having a
>> dependency on the index, so that dropping the index/constraint would
>> force a drop of the rule too.

> Alternatively, we could rewrite the rule (not unlike what we do for
> "SELECT *") to actually add on the other implicitly grouped-by columns..
> I don't know if that's better or worse than creating a dependency,
> since if the constraint were dropped/changed, people might expect the
> rule's output to change.

Hm.  The problem with that is that one of the benefits we'd like to get
from this is an efficiency win: the generated plan ought to only group
by the PK, not uselessly sort/group by everything in the row.  I suppose
we could have the planner reverse-engineer its way to that, but it seems
awfully slow and clunky to add on the extra columns and then reason our
way to removing them again.
        regards, tom lane

Re: Functional dependencies and GROUP BY

От

Stephen Frost

Дата:

08 июня 2010 г., 12:51:18

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Hm.  The problem with that is that one of the benefits we'd like to get
> from this is an efficiency win: the generated plan ought to only group
> by the PK, not uselessly sort/group by everything in the row.  I suppose
> we could have the planner reverse-engineer its way to that, but it seems
> awfully slow and clunky to add on the extra columns and then reason our
> way to removing them again.

That's certainly a good point.  Another issue that I realized when
thinking about this again- if someone wanted to *drop* a column that's
part of a PK (since it turned out to not be necessary, for example), and
then wanted to recreate the rule based on what was stored in the
catalog, they wouldn't be able to without modifying it, and that's
certainly be annoying too.

Guess my 2c would be for creating the dependency.  I really dislike the
idea of the rule just all of a sudden breaking.
Thanks,
    Stephen

Re: Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

11 июня 2010 г., 09:04:17

On tis, 2010-06-08 at 10:21 -0400, Tom Lane wrote:
> The question is why bother to recognize *any* cases of this form.
> I find it really semantically ugly to have the parser effectively
> doing one deduction of this form when the main engine for that type
> of deduction is elsewhere; so unless there is a really good argument
> why we have to do this case (and NOT "it was pretty easy"), I don't
> want to do it.

Yeah, I'm not horribly attached to it.  I began to structure the code to
support multiple kinds of checks, and at the end only two kinds were
reasonably doable and useful.  We can remove it if no one is interested
in it, which appears to be the case.

Re: Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

11 июня 2010 г., 09:07:21

On tis, 2010-06-08 at 10:05 -0400, Tom Lane wrote:
> Perhaps the correct fix would be to mark stored query trees as having
> a
> dependency on the index, so that dropping the index/constraint would
> force a drop of the rule too.  Just pushing the check to plan time, as
> I suggested yesterday, isn't a very nice fix because it would result
> in the rule unexpectedly starting to fail at execution.

There are actually pretty explicit instructions about this in the SQL
standard:

<drop table constraint definition>

4) If QS is a <query specification> that contains an implicit or
explicit <group by clause> and that contains a column reference to a
column C in its <select list> that is not contained in an aggregated
argument of a <set function specification>, and if G is the set of
grouping columns of QS, and if the table constraint TC is needed to
conclude that G ↦ C is a known functional dependency in QS, then QS is
said to be dependent on TC.

5) If V is a view that contains a <query specification> that is
dependent on a table constraint TC, then V is said to be dependent on
TC.

So the dependency between the view/rule and the constraint/index needs
to be stored in the dependency system, and RESTRICT/CASCADE will take
effect.

Re: Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

25 июня 2010 г., 17:06:52

On mån, 2010-06-07 at 21:33 +0300, Peter Eisentraut wrote:
> I have developed a patch that partially implements the "functional
> dependency" feature that allows some columns to be omitted from the
> GROUP BY clause if it can be shown that the columns are functionally
> dependent on the columns in the group by clause and therefore
> guaranteed to be unique per group.

Second version:

I stripped out all checks except the primary key/unique constraint
checks.

Views whose existence depends on one of those constraints get a
dependency recorded.  This depends on the patch currently in the commit
fest to record not null constraints in pg_constraint, so that the
dependencies on not-null constraints can be recorded.

I haven't done any caching of index lookups yet.  Some testing with
1600-column tables didn't show any effect.  I'll test this a little
more.

Вложения

functional-deps.patch

Re: Functional dependencies and GROUP BY

От

Alex Hunsaker

Дата:

17 июля 2010 г., 01:29:59

On Fri, Jun 25, 2010 at 14:06, Peter Eisentraut <peter_e@gmx.net> wrote:
> Second version:

Hi!

Ive looked this over.  Looks great!  I have some nits about the
documentation and comments ( non issues like referencing primary keys
when it really means not null unique indexes :-P ), but on the whole
it works and looks good.

The only corner case I have run into is creating a view with what I
would call an implicit 'not null' constraint.  Demonstration below:

create table nn (a int4 not null, b int4, unique (a));
select * from nn group by a; -- should this work? I think not?
a | b
---+---
(0 rows)

create view vv as select a, b from nn group by a;
select * from vv;a | b
---+---
(0 rows)

=# alter table nn alter column a drop not null;
=# select * from nn group by a; -- ok, broken makes sense
ERROR:  column "nn.b" must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: SELECT * from nn group by a;

=# select * from vv; -- yipes should be broken?a | b
---+---
(0 rows)

Im thinking we should not allow the "select * from nn group by a;" to
work.  Thoughts?

(FYI I do plan on doing some performance testing with large columns
later, any other requests?)

Re: Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

17 июля 2010 г., 07:15:33

On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote:
> The only corner case I have run into is creating a view with what I
> would call an implicit 'not null' constraint.  Demonstration below:
> 
> create table nn (a int4 not null, b int4, unique (a));
> select * from nn group by a; -- should this work? I think not? 

I believe I referred to this upsthread.  There is another patch in the
commitfest about explicitly representing NOT NULL constraints in
pg_constraint.  Then this case would create a dependency on those
constraints.  So we need to get that other patch in first.

Re: Functional dependencies and GROUP BY

От

Alex Hunsaker

Дата:

17 июля 2010 г., 14:13:39

On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut <peter_e@gmx.net> wrote:
> On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote:
>> The only corner case I have run into is creating a view with what I
>> would call an implicit 'not null' constraint.  Demonstration below:
>>
>> create table nn (a int4 not null, b int4, unique (a));
>> select * from nn group by a; -- should this work? I think not?
>
> I believe I referred to this upsthread.

Aww, and here I thought I had just been diligent :).  In other news
its really no surprise that your test with 1600 columns had little
effect.  As it loops over the the indexes, then the index keys and
then the group by items right? So I would expect the more indexes you
had or group by items to slow it down.  Not so much the number of
columns.  Right?

Anyhow it sounds like I should try it on top of the other patch and
see if it works.  I assume it might still need some fixups to work
with that other patch? Or do you expect it to just work?

Re: Functional dependencies and GROUP BY

От

Alex Hunsaker

Дата:

18 июля 2010 г., 01:30:49

On Fri, Jul 16, 2010 at 22:29, Alex Hunsaker <badalex@gmail.com> wrote:
> (FYI I do plan on doing some performance testing with large columns
> later, any other requests?)

And here are the results.  All tests are with an empty table with 1500
int4 columns.  There is a unique non null index on the first column.

(non assert build)
A: select count(1) from (select * from test group by ...1500 columns...) as res;
B: select count(1) from (select * from test group by a_0) as res; --
a_0 has the not null unique index

CVS A:  360ms
PATCH A: 370ms
PATCH B: 60ms

1500 indexes (one per column, on the column):
CVS: A: 670ms
PATCH A: 850ms
PATCH B: 561ms

So it seems for tables with lots of columns the patch is faster, at
least when you omit all the columns from the group by.  I suspect for
most "normal" (5-20 columns) usage it should be a wash.

(Stupid question) Does anyone know why HEAD is quite a bit slower when
there are lots off indexes?  Do we end up looping and perhaps locking
them or something?

Re: Functional dependencies and GROUP BY

От

Alex Hunsaker

Дата:

18 июля 2010 г., 01:41:02

On Sat, Jul 17, 2010 at 11:13, Alex Hunsaker <badalex@gmail.com> wrote:
> On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut <peter_e@gmx.net> wrote:
>> On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote:
>>> The only corner case I have run into is creating a view with what I
>>> would call an implicit 'not null' constraint.  Demonstration below:
>>>
>>> create table nn (a int4 not null, b int4, unique (a));
>>> select * from nn group by a; -- should this work? I think not?
>>
>> I believe I referred to this upsthread.

Yeah, I went back and reread the thread and um... yep its right where
you posted patch 2.  I think I read it, forgot about it and then it
bubbled up to my subconscious while testing :).

> Anyhow it sounds like I should try it on top of the other patch and
> see if it works.  I assume it might still need some fixups to work
> with that other patch? Or do you expect it to just work?
[ referring to the not null pg_constraint patch ]

Probably no surprise to you, I tried it on top of the not null
pg_constraint patch and it did not work 'out of the box'.  Mainly I
was curious if there was some magic going on that I did not see.  In
any event do you think its worth adding a regression test for this?

Re: Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

18 июля 2010 г., 05:40:29

On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote:
> its really no surprise that your test with 1600 columns had little
> effect.  As it loops over the the indexes, then the index keys and
> then the group by items right? So I would expect the more indexes you
> had or group by items to slow it down.  Not so much the number of
> columns.  Right?

At the outer level (which is not visible in this patch) it loops over
all columns in the select list, and then it looks up the indexes each
time.  So the concern was that if the select list was * with a wide
table, looking up the indexes each time would be slow.

> Anyhow it sounds like I should try it on top of the other patch and
> see if it works.  I assume it might still need some fixups to work
> with that other patch? Or do you expect it to just work?

There is some work necessary to integrate the two.

Re: Functional dependencies and GROUP BY

От

Alex Hunsaker

Дата:

23 июля 2010 г., 14:01:29

On Sun, Jul 18, 2010 at 02:40, Peter Eisentraut <peter_e@gmx.net> wrote:
> On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote:
>> So I would expect the more indexes you
>> had or group by items to slow it down.  Not so much the number of
>> columns.  Right?
>
> At the outer level (which is not visible in this patch) it loops over
> all columns in the select list, and then it looks up the indexes each
> time.  So the concern was that if the select list was * with a wide
> table, looking up the indexes each time would be slow.

Thanks for the explanation.

>> Anyhow it sounds like I should try it on top of the other patch and
>> see if it works.  I assume it might still need some fixups to work
>> with that other patch? Or do you expect it to just work?
>
> There is some work necessary to integrate the two.

I just read that patch is getting pushed till at least the next commit
fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php

Should we push this patch back to?  Alternatively we could make it
work with just primary keys until the other patch gets in.  I think
that makes sense, find that attached.  Thoughts?

Note I axed the index not null attribute checking, I have not thought
to deeply about it other than if its a primary key it cant have non
null attributes.... Right?  I may have missed something subtle hence
the heads up.

Вложения

func_deps_v3.patch

Re: Functional dependencies and GROUP BY

От

Alex Hunsaker

Дата:

23 июля 2010 г., 14:17:35

On Fri, Jul 23, 2010 at 11:00, Alex Hunsaker <badalex@gmail.com> wrote:
> Alternatively we could make it
> work with just primary keys until the other patch gets in.  I think
> that makes sense, find that attached.  Thoughts?

*sigh*, find attached a version that removes talk of unique not null
constraints from the docs

Вложения

func_deps_v4.patch.gz

Re: Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

24 июля 2010 г., 09:24:00

On fre, 2010-07-23 at 11:00 -0600, Alex Hunsaker wrote:
> I just read that patch is getting pushed till at least the next commit
> fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php
> 
> Should we push this patch back to?  Alternatively we could make it
> work with just primary keys until the other patch gets in.  I think
> that makes sense, find that attached.  Thoughts?

I was thinking the same thing.

> Note I axed the index not null attribute checking, I have not thought
> to deeply about it other than if its a primary key it cant have non
> null attributes.... Right?  I may have missed something subtle hence
> the heads up.

Another open question I thought of was whether we should put the
dependency record on the pg_index row, or the pg_constraint row, or
perhaps the pg_class row.  Right now, it is using pg_index, because that
was easiest to code up, but I suspect that once we have not-null
constraints in pg_constraint, it will be more consistent to make all
dependencies go against pg_constraint rather than a mix of several
catalogs.

Re: Functional dependencies and GROUP BY

От

Alex Hunsaker

Дата:

26 июля 2010 г., 13:47:02

On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut <peter_e@gmx.net> wrote:

> Another open question I thought of was whether we should put the
> dependency record on the pg_index row, or the pg_constraint row, or
> perhaps the pg_class row.  Right now, it is using pg_index, because that
> was easiest to code up, but I suspect that once we have not-null
> constraints in pg_constraint, it will be more consistent to make all
> dependencies go against pg_constraint rather than a mix of several
> catalogs.

I think for primary keys pg_index is OK.  However for the not-null
case we have to use pg_constraint... So given that we end up having to
code that anyways, it seems like it will end up being
cleaner/consistent to always use the pg_constraint row(s).  So +1 for
using pg_constraint instead of pg_index from me.

Re: Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

06 августа 2010 г., 18:02:42

On mån, 2010-07-26 at 10:46 -0600, Alex Hunsaker wrote:
> On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut <peter_e@gmx.net> wrote:
>
> > Another open question I thought of was whether we should put the
> > dependency record on the pg_index row, or the pg_constraint row, or
> > perhaps the pg_class row.  Right now, it is using pg_index, because that
> > was easiest to code up, but I suspect that once we have not-null
> > constraints in pg_constraint, it will be more consistent to make all
> > dependencies go against pg_constraint rather than a mix of several
> > catalogs.
>
> I think for primary keys pg_index is OK.  However for the not-null
> case we have to use pg_constraint... So given that we end up having to
> code that anyways, it seems like it will end up being
> cleaner/consistent to always use the pg_constraint row(s).  So +1 for
> using pg_constraint instead of pg_index from me.

Next version.  Changed dependencies to pg_constraint, removed handling
of unique constraints for now, and made some enhancements so that views
track dependencies on constraints even in subqueries.  Should be close
to final now. :-)

Вложения

functional-deps.patch

Re: Functional dependencies and GROUP BY

От

Tom Lane

Дата:

06 августа 2010 г., 23:51:43

Peter Eisentraut <peter_e@gmx.net> writes:
> Next version.  Changed dependencies to pg_constraint, removed handling
> of unique constraints for now, and made some enhancements so that views
> track dependencies on constraints even in subqueries.  Should be close
> to final now. :-)

I've committed this with some revisions, notably:

The view.c changes were fundamentally wrong.  The right place to
extract dependencies from a parsetree is in dependency.c, specifically
find_expr_references_walker.  The way you were doing it meant that
dependencies on constraints would only be noticed for views, not for
other cases such as stored plans.

I rewrote the catalog search to look only at pg_constraint, not using
pg_index at all.  I think this will be easier to extend to the case of
looking for UNIQUE + NOT NULL, whenever we get around to doing that.
I also moved the search into catalog/pg_constraint.c, because it didn't
seem to belong in parse_agg (as hinted by the large number of #include
additions you had to make to put it there).

I put in a bit of caching logic to prevent repeating the search for
multiple Vars of the same relation.  Tests or no tests, I can't believe
that's going to be cheap enough that we want to repeat it over and
over...
        regards, tom lane

Re: Functional dependencies and GROUP BY

От

Dean Rasheed

Дата:

05 сентября 2010 г., 11:23:45

On 7 August 2010 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> Next version.  Changed dependencies to pg_constraint, removed handling
>> of unique constraints for now, and made some enhancements so that views
>> track dependencies on constraints even in subqueries.  Should be close
>> to final now. :-)
>
> I've committed this with some revisions, notably:
>
> The view.c changes were fundamentally wrong.  The right place to
> extract dependencies from a parsetree is in dependency.c, specifically
> find_expr_references_walker.  The way you were doing it meant that
> dependencies on constraints would only be noticed for views, not for
> other cases such as stored plans.
>
> I rewrote the catalog search to look only at pg_constraint, not using
> pg_index at all.  I think this will be easier to extend to the case of
> looking for UNIQUE + NOT NULL, whenever we get around to doing that.
> I also moved the search into catalog/pg_constraint.c, because it didn't
> seem to belong in parse_agg (as hinted by the large number of #include
> additions you had to make to put it there).
>
> I put in a bit of caching logic to prevent repeating the search for
> multiple Vars of the same relation.  Tests or no tests, I can't believe
> that's going to be cheap enough that we want to repeat it over and
> over...
>
>                        regards, tom lane
>

I was testing out this feature this morning and discovered that the
results may be non-deterministic if the PK is deferrable. I think that
check_functional_grouping() should exclude any deferrable constraints,
since in general, any inference based on a deferrable constraint can't
be trusted when planning a query, since the constraint can't be
guaranteed to be valid when the query is executed. That's also
consistent with the SQL spec.

The original version of the patch had that check in it, but it
vanished from the final committed version. Was that just an oversight,
or an intentional change?

Regards,
Dean

Re: Functional dependencies and GROUP BY

От

Tom Lane

Дата:

05 сентября 2010 г., 12:15:44

Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 7 August 2010 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I was testing out this feature this morning and discovered that the
> results may be non-deterministic if the PK is deferrable.

Good point.

> The original version of the patch had that check in it, but it
> vanished from the final committed version. Was that just an oversight,
> or an intentional change?

I don't recall having thought about it one way or the other.  What did
the check look like?
        regards, tom lane

Re: Functional dependencies and GROUP BY

От

Dean Rasheed

Дата:

05 сентября 2010 г., 12:31:54

On 5 September 2010 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> On 7 August 2010 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I was testing out this feature this morning and discovered that the
>> results may be non-deterministic if the PK is deferrable.
>
> Good point.
>
>> The original version of the patch had that check in it, but it
>> vanished from the final committed version. Was that just an oversight,
>> or an intentional change?
>
> I don't recall having thought about it one way or the other.  What did
> the check look like?
>

Well originally it was searching indexes rather than constraints, and
funcdeps_check_pk() included the following check:

if (!indexStruct->indisprimary || !indexStruct->indimmediate)continue;

Now its looping over pg_constraint entries, so I guess anything wtih
con->condeferrable == true should be ignored.

Regards,
Dean

Re: Functional dependencies and GROUP BY

От

Tom Lane

Дата:

05 сентября 2010 г., 12:35:55

Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 5 September 2010 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't recall having thought about it one way or the other. �What did
>> the check look like?

> Well originally it was searching indexes rather than constraints, and
> funcdeps_check_pk() included the following check:

> if (!indexStruct->indisprimary || !indexStruct->indimmediate)
>     continue;

> Now its looping over pg_constraint entries, so I guess anything wtih
> con->condeferrable == true should be ignored.

Seems reasonable, will fix.  Thanks for the report!
        regards, tom lane

Re: Functional dependencies and GROUP BY

От

Peter Eisentraut

Дата:

05 сентября 2010 г., 15:28:23

On sön, 2010-09-05 at 11:35 -0400, Tom Lane wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> > On 5 September 2010 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I don't recall having thought about it one way or the other. What did
> >> the check look like?
> 
> > Well originally it was searching indexes rather than constraints, and
> > funcdeps_check_pk() included the following check:
> 
> > if (!indexStruct->indisprimary || !indexStruct->indimmediate)
> >     continue;
> 
> > Now its looping over pg_constraint entries, so I guess anything wtih
> > con->condeferrable == true should be ignored.
> 
> Seems reasonable, will fix.  Thanks for the report!

Yes, the SQL standard explicitly requires the constraint in question to
be immediate.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Functional dependencies and GROUP BY

Вложения

Вложения

Вложения

Вложения

Вложения