Обсуждение: GROUPING SETS revisited
In case anyone's interested, I've taken the CTE-based grouping sets patch from [1] and made it apply to 9.1, attached. I haven't yet done things like checked it for whitespace consistency, style conformity, or anything else, but (tuits permitting) hope to figure out how it works and get it closer to commitability in some upcoming commitfest. I mention it here so that if someone else is working on it, we can avoid duplicated effort, and to see if a CTE-based grouping sets implementation is really the way we think we want to go. [1] http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
Вложения
Hello 2010/8/3 Joshua Tolley <eggyknap@gmail.com>: > In case anyone's interested, I've taken the CTE-based grouping sets patch from > [1] and made it apply to 9.1, attached. I haven't yet done things like checked > it for whitespace consistency, style conformity, or anything else, but (tuits > permitting) hope to figure out how it works and get it closer to commitability > in some upcoming commitfest. > > I mention it here so that if someone else is working on it, we can avoid > duplicated effort, and to see if a CTE-based grouping sets implementation is > really the way we think we want to go. > I am plaing with it now :). I have a plan to replace CTE with similar but explicit executor node. The main issue of existing patch was using just transformation to CTE. I agree, so it isn't too much extensiable in future. Now I am cleaning identifiers little bit. Any colaboration is welcome. My plan: 1) clean CTE patch 2) replace CTE with explicit executor node, but still based on tuplestore 3) when will be possible parallel processing based on hash agg - then we don't need to use tuplestore comments?? Regards Pavel > [1] http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php > > -- > Joshua Tolley / eggyknap > End Point Corporation > http://www.endpoint.com > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkxXrggACgkQRiRfCGf1UMMlCQCglaIdtPj8Qe6G60V2LHn5pFNn > kgIAniXRgIQEbVrK/eDVZnmKCzw33lT9 > =XVVV > -----END PGP SIGNATURE----- > >
2010/8/3 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > 2010/8/3 Joshua Tolley <eggyknap@gmail.com>: >> In case anyone's interested, I've taken the CTE-based grouping sets patch from >> [1] and made it apply to 9.1, attached. I haven't yet done things like checked >> it for whitespace consistency, style conformity, or anything else, but (tuits >> permitting) hope to figure out how it works and get it closer to commitability >> in some upcoming commitfest. >> >> I mention it here so that if someone else is working on it, we can avoid >> duplicated effort, and to see if a CTE-based grouping sets implementation is >> really the way we think we want to go. >> > > I am plaing with it now :). I have a plan to replace CTE with similar > but explicit executor node. The main issue of existing patch was using > just transformation to CTE. I agree, so it isn't too much extensiable > in future. Now I am cleaning identifiers little bit. Any colaboration > is welcome. > > My plan: > 1) clean CTE patch > 2) replace CTE with explicit executor node, but still based on tuplestore > 3) when will be possible parallel processing based on hash agg - then > we don't need to use tuplestore Couldn't you explain what exactly "explicit executor node"? I hope we can share your image to develop it further than only transformation to CTE. Regards, -- Hitoshi Harada
2010/8/3 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/8/3 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> 2010/8/3 Joshua Tolley <eggyknap@gmail.com>: >>> In case anyone's interested, I've taken the CTE-based grouping sets patch from >>> [1] and made it apply to 9.1, attached. I haven't yet done things like checked >>> it for whitespace consistency, style conformity, or anything else, but (tuits >>> permitting) hope to figure out how it works and get it closer to commitability >>> in some upcoming commitfest. >>> >>> I mention it here so that if someone else is working on it, we can avoid >>> duplicated effort, and to see if a CTE-based grouping sets implementation is >>> really the way we think we want to go. >>> >> >> I am plaing with it now :). I have a plan to replace CTE with similar >> but explicit executor node. The main issue of existing patch was using >> just transformation to CTE. I agree, so it isn't too much extensiable >> in future. Now I am cleaning identifiers little bit. Any colaboration >> is welcome. >> >> My plan: >> 1) clean CTE patch >> 2) replace CTE with explicit executor node, but still based on tuplestore >> 3) when will be possible parallel processing based on hash agg - then >> we don't need to use tuplestore > > Couldn't you explain what exactly "explicit executor node"? I hope we > can share your image to develop it further than only transformation to > CTE. I have a one reason Implementation based on CTE doesn't create space for possible optimalisations (I think now, maybe it isn't true). It is good for initial or referencial implementation - but it can be too complex, when we will try to append some optimalizations - like parallel hash agg processing, direct data reading without tuplestore. If you are, as CTE author, thinking so these features are possible in non recursive CTE too, I am not agains. I hope so this week I'll have a CTE based patch - and we can talk about next direction. I see as possible performance issue using a tuplestore - there are lot of cases where repeating of source query can be faster. If I remember well, Tom has a objection, so transformation to CTE is too early - in parser. So It will be first change. Executor node can be CTE. regards Pavel p.s. I am sure, so there are lot of task, that can be solved together with non recursive CTE. > > > Regards, > > -- > Hitoshi Harada >
On Mon, Aug 02, 2010 at 11:50:00PM -0600, Josh Tolley wrote: > In case anyone's interested, I've taken the CTE-based grouping sets > patch from [1] and made it apply to 9.1, attached. I haven't yet > done things like checked it for whitespace consistency, style > conformity, or anything else, but (tuits permitting) hope to figure > out how it works and get it closer to commitability in some upcoming > commitfest. > > I mention it here so that if someone else is working on it, we can > avoid duplicated effort, and to see if a CTE-based grouping sets > implementation is really the way we think we want to go. > > [1] > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php I've added back one file in the patch enclosed here. I'm still getting compile fails from CC="ccache gcc" ./configure --prefix=$PG_PREFIX --with-pgport=$PGPORT --with-perl --with-libxml --enable-debug --enable-cassert make Log from that also enclosed. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Вложения
On Tue, Aug 03, 2010 at 12:58:03PM -0700, David Fetter wrote: > On Mon, Aug 02, 2010 at 11:50:00PM -0600, Josh Tolley wrote: > > In case anyone's interested, I've taken the CTE-based grouping sets > > patch from [1] and made it apply to 9.1, attached. I haven't yet > > done things like checked it for whitespace consistency, style > > conformity, or anything else, but (tuits permitting) hope to figure > > out how it works and get it closer to commitability in some upcoming > > commitfest. > > > > I mention it here so that if someone else is working on it, we can > > avoid duplicated effort, and to see if a CTE-based grouping sets > > implementation is really the way we think we want to go. > > > > [1] > > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php > > I've added back one file in the patch enclosed here. I'm still > getting compile fails from > > CC="ccache gcc" ./configure --prefix=$PG_PREFIX --with-pgport=$PGPORT --with-perl --with-libxml --enable-debug --enable-cassert > make > > Log from that also enclosed. > Yeah, I seem to have done a poor job of producing the patch based on the repository I was working from. That said, it seems Pavel's working actively on a patch anyway, so perhaps my updating the old one isn't all that worthwhile. Pavel, is your code somewhere that we can get to it? -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
2010/8/3 Joshua Tolley <eggyknap@gmail.com>: > On Tue, Aug 03, 2010 at 12:58:03PM -0700, David Fetter wrote: >> On Mon, Aug 02, 2010 at 11:50:00PM -0600, Josh Tolley wrote: >> > In case anyone's interested, I've taken the CTE-based grouping sets >> > patch from [1] and made it apply to 9.1, attached. I haven't yet >> > done things like checked it for whitespace consistency, style >> > conformity, or anything else, but (tuits permitting) hope to figure >> > out how it works and get it closer to commitability in some upcoming >> > commitfest. >> > >> > I mention it here so that if someone else is working on it, we can >> > avoid duplicated effort, and to see if a CTE-based grouping sets >> > implementation is really the way we think we want to go. >> > >> > [1] >> > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php >> >> I've added back one file in the patch enclosed here. I'm still >> getting compile fails from >> >> CC="ccache gcc" ./configure --prefix=$PG_PREFIX --with-pgport=$PGPORT --with-perl --with-libxml --enable-debug --enable-cassert >> make >> >> Log from that also enclosed. >> > > Yeah, I seem to have done a poor job of producing the patch based on the > repository I was working from. That said, it seems Pavel's working actively on > a patch anyway, so perhaps my updating the old one isn't all that worthwhile. > Pavel, is your code somewhere that we can get to it? > not now. please wait a week. Regards Pavel > -- > Joshua Tolley / eggyknap > End Point Corporation > http://www.endpoint.com > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkxYeiQACgkQRiRfCGf1UMPlEQCff+I4sCGtR+lzUs6Wb5JKi7Uu > 3qYAnjLHzHzyMSHHX55QsphkaBbEJ0Zf > =uRqV > -----END PGP SIGNATURE----- > >
On Wed, Aug 04, 2010 at 04:44:05AM +0200, Pavel Stehule wrote: > > Yeah, I seem to have done a poor job of producing the patch based on the > > repository I was working from. That said, it seems Pavel's working actively on > > a patch anyway, so perhaps my updating the old one isn't all that worthwhile. > > Pavel, is your code somewhere that we can get to it? > > > > not now. please wait a week. That works for me. I'm glad to try doing a better job of putting together my version of the patch, if anyone thinks it's useful, but it seems that since Pavel's code is due to appear sometime in the foreseeable future, there's not much point in my doing that. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
2010/8/4 Joshua Tolley <eggyknap@gmail.com>: > On Wed, Aug 04, 2010 at 04:44:05AM +0200, Pavel Stehule wrote: >> > Yeah, I seem to have done a poor job of producing the patch based on the >> > repository I was working from. That said, it seems Pavel's working actively on >> > a patch anyway, so perhaps my updating the old one isn't all that worthwhile. >> > Pavel, is your code somewhere that we can get to it? >> > >> >> not now. please wait a week. > > That works for me. I'm glad to try doing a better job of putting together my > version of the patch, if anyone thinks it's useful, but it seems that since > Pavel's code is due to appear sometime in the foreseeable future, there's not > much point in my doing that. > I hope, so next week you can do own work on this job - I am not a native speaker, and my code will need a checking and fixing comments Regards Pavel > -- > Joshua Tolley / eggyknap > End Point Corporation > http://www.endpoint.com > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkxZkp8ACgkQRiRfCGf1UMMUcwCfcPayQbWRUYwhpCF1f24LsdD9 > H/gAnRzCEq6yLX/RVLLi88ROhurOzbhK > =gUPx > -----END PGP SIGNATURE----- > >
On Thu, Aug 05, 2010 at 06:21:18AM +0200, Pavel Stehule wrote: > I hope, so next week you can do own work on this job - I am not a > native speaker, and my code will need a checking and fixing comments I haven't entirely figured out how the code in the old patch works, but I promise I *can* edit comments/docs :) -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
Hello I am sending a updated version. i hope so there is more comments, longer and more descriptive identifiers and I fixed a few bugs. But I found some new bugs :( What is ok: create table cars(name varchar, place varchar, count integer); insert into cars values('skoda', 'czech rep.', 10000); insert into cars values('skoda', 'germany', 5000); insert into cars values('bmw', 'czech rep.', 100); insert into cars values('bmw', 'germany', 1000); insert into cars values('opel', 'czech rep.', 7000); insert into cars values('opel', 'germany', 7000); postgres=# select name, place, sum(count) from cars group by (); name | place | sum ------+-------+------- | | 30100 (1 row) postgres=# select name, place, sum(count) from cars group by cube(name, place); name | place | sum -------+------------+------- bmw | czech rep. | 100 skoda | germany | 5000 opel | czech rep. | 7000 opel | germany | 7000 skoda | czech rep. | 10000 bmw | germany | 1000 bmw | | 1100 skoda | | 15000 opel | | 14000 | germany | 13000 | czech rep. | 17100 | | 30100 (12 rows) postgres=# select name, place, sum(count) from cars group by grouping sets(name, place),(); name | place | sum -------+------------+------- bmw | | 1100 skoda | | 15000 opel | | 14000 | germany | 13000 | czech rep. | 17100 | | 30100 (6 rows) postgres=# select name, place, sum(count) from cars group by grouping sets(name, place,()),(); name | place | sum -------+------------+------- bmw | | 1100 skoda | | 15000 opel | | 14000 | germany | 13000 | czech rep. | 17100 | | 30100 (6 rows) postgres=# select name, place, sum(count), grouping(name) from cars group by grouping sets(name); name | place | sum | grouping -------+-------+-------+---------- bmw | | 1100 | 0 skoda | | 15000 | 0 opel | | 14000 | 0 (3 rows) what is wrong: postgres=# select name, place from cars group by (); name | place -------+------------ skoda | czech rep. skoda | germany bmw | czech rep. bmw | germany opel | czech rep. opel | germany (6 rows) have to be NULL, NULL postgres=# select name, place, sum(count), grouping(name) from cars group by grouping sets(name) having grouping(name) = 1; ERROR: unrecognized node type: 934 my rewriting rule is applied too late and maybe isn't optimal. I replace a grouping(x) by const. maybe is better to use a variable. Same issue is with ORDER BY clause. So Joshua, can you look on code? Regards Pavel Stehule 2010/8/5 Joshua Tolley <eggyknap@gmail.com>: > On Thu, Aug 05, 2010 at 06:21:18AM +0200, Pavel Stehule wrote: >> I hope, so next week you can do own work on this job - I am not a >> native speaker, and my code will need a checking and fixing comments > > I haven't entirely figured out how the code in the old patch works, but I > promise I *can* edit comments/docs :) > > -- > Joshua Tolley / eggyknap > End Point Corporation > http://www.endpoint.com > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkxaSjEACgkQRiRfCGf1UMM9dQCZASYJUmXLe5i7L4aQnMicwMfy > cu8An3fMdR/ISezw5YV3KsCAOM+BILO1 > =uZb+ > -----END PGP SIGNATURE----- > >
Вложения
On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote: > So Joshua, can you look on code? Sure... thanks :) -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
I found other issue :( postgres=# select name, place from cars group by grouping sets(name, place,());name | place -------+------------bmw |skoda |opel | | germany | czech rep.skoda | czech rep.skoda | germanybmw | czechrep.bmw | germanyopel | czech rep.opel | germany (11 rows) postgres=# explain select name, place from cars group by grouping sets(name, place,()); QUERY PLAN ------------------------------------------------------------------------------Append (cost=36.98..88.55 rows=1230 width=54) CTE GroupingSets -> Seq Scan on cars (cost=0.00..18.30 rows=830 width=68) -> HashAggregate (cost=18.68..20.68rows=200 width=32) -> CTE Scan on "GroupingSets" (cost=0.00..16.60 rows=830 width=32) -> HashAggregate (cost=18.68..20.68 rows=200 width=32) -> CTE Scan on "GroupingSets" (cost=0.00..16.60 rows=830 width=32) -> CTE Scan on "GroupingSets" (cost=0.00..16.60 rows=830 width=64) (8 rows) the combination of nonagregates and empty sets do a problems - because we can't ensure agg mode without aggregates or group by. But it is only minor issue 2010/8/5 Joshua Tolley <eggyknap@gmail.com>: > On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote: >> So Joshua, can you look on code? > > Sure... thanks :) > > -- > Joshua Tolley / eggyknap > End Point Corporation > http://www.endpoint.com > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkxa1NsACgkQRiRfCGf1UMPwzQCgjz52P86Yx4ac4aRkKwjn8OHK > 6/EAoJ/CjXEyPaLpx39SI5bKQPz+AwBR > =Mi2J > -----END PGP SIGNATURE----- > >
On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote: > I am sending a updated version. I've been looking at the changes to gram.y, and noted the comment under func_expr where you added CUBE and ROLLUP definitions. It says that CUBE can't be a reserved keyword because it's already used in the cube contrib module. But then the changes to kwlist.h include this: + PG_KEYWORD("cube", CUBE, RESERVED_KEYWORD) ... + PG_KEYWORD("rollup", ROLLUP, RESERVED_KEYWORD) ...and CUBE and ROLLUP are added in gram.y under the reserved_keyword list. I realize things like CURRENT_TIME, that also have special entries in the func_expr grammar, are also reserved keywords, but this all seems at odds with the comment. What am I missing? Is the comment simply pointing out that the designation of CUBE and ROLLUP as reserved keywords will have to change at some point, but it hasn't been implemented yet (or no one has figured out how to do it)? -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
2010/8/7 Joshua Tolley <eggyknap@gmail.com>: > On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote: >> I am sending a updated version. > > I've been looking at the changes to gram.y, and noted the comment under func_expr > where you added CUBE and ROLLUP definitions. It says that CUBE can't be a > reserved keyword because it's already used in the cube contrib module. But > then the changes to kwlist.h include this: > I am little bit confused now - it's bad comment - and I have to verify it. What I remember, we cannot to use a two parser's rules, because it going to a conflict. So there have to be used a trick with a moving to decision to transform stage, where we have a context info. I have to recheck a minimal level - probably it can't be a RESERVED_KEYWORD. Because then we can't to create a function "cube". > + PG_KEYWORD("cube", CUBE, RESERVED_KEYWORD) > ... > + PG_KEYWORD("rollup", ROLLUP, RESERVED_KEYWORD) > > ...and CUBE and ROLLUP are added in gram.y under the reserved_keyword list. I > realize things like CURRENT_TIME, that also have special entries in the > func_expr grammar, are also reserved keywords, but this all seems at odds with > the comment. What am I missing? Is the comment simply pointing out that the > designation of CUBE and ROLLUP as reserved keywords will have to change at > some point, but it hasn't been implemented yet (or no one has figured out how > to do it)? > > -- > Joshua Tolley / eggyknap > End Point Corporation > http://www.endpoint.com > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkxcjSIACgkQRiRfCGf1UMPpCwCcCHBh/1NiLykIcVYgPyfbIegF > xq0AoID75rCPiW8yf29OSkaJVza1FQt5 > =PcLs > -----END PGP SIGNATURE----- > >
Hello I was confused when I though so I found a solution of 1 shift/reduce conflict :( All identificators used for buildin functions have to be a col_name_keywords or reserved keyword. There is conflict with our (probably obsolete) feature SELECT colname(tabname). So for this moment the real solution is removing CUBE and ROLLUP from keywords and dynamically testing a funcname in transformation stage - what is slower and more ugly. ideas? Regards Pavel Stehule 2010/8/7 Pavel Stehule <pavel.stehule@gmail.com>: > 2010/8/7 Joshua Tolley <eggyknap@gmail.com>: >> On Thu, Aug 05, 2010 at 04:46:51PM +0200, Pavel Stehule wrote: >>> I am sending a updated version. >> >> I've been looking at the changes to gram.y, and noted the comment under func_expr >> where you added CUBE and ROLLUP definitions. It says that CUBE can't be a >> reserved keyword because it's already used in the cube contrib module. But >> then the changes to kwlist.h include this: >> > > I am little bit confused now - it's bad comment - and I have to verify > it. What I remember, we cannot to use a two parser's rules, because it > going to a conflict. So there have to be used a trick with a moving to > decision to transform stage, where we have a context info. I have to > recheck a minimal level - probably it can't be a RESERVED_KEYWORD. > Because then we can't to create a function "cube". > >> + PG_KEYWORD("cube", CUBE, RESERVED_KEYWORD) >> ... >> + PG_KEYWORD("rollup", ROLLUP, RESERVED_KEYWORD) >> >> ...and CUBE and ROLLUP are added in gram.y under the reserved_keyword list. I >> realize things like CURRENT_TIME, that also have special entries in the >> func_expr grammar, are also reserved keywords, but this all seems at odds with >> the comment. What am I missing? Is the comment simply pointing out that the >> designation of CUBE and ROLLUP as reserved keywords will have to change at >> some point, but it hasn't been implemented yet (or no one has figured out how >> to do it)? >> >> -- >> Joshua Tolley / eggyknap >> End Point Corporation >> http://www.endpoint.com >> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.9 (GNU/Linux) >> >> iEYEARECAAYFAkxcjSIACgkQRiRfCGf1UMPpCwCcCHBh/1NiLykIcVYgPyfbIegF >> xq0AoID75rCPiW8yf29OSkaJVza1FQt5 >> =PcLs >> -----END PGP SIGNATURE----- >> >> >
Hello I found a break in GROUPING SETS implementation. Now I am playing with own executor and planner node and I can't to go forward :(. Probably this feature will need a significant update of our agg implementation. Probably needs a some similar structure like CTE but it can be a little bit reduced - there are a simple relation between source query and result query - I am not sure, if this has to be implemented via subqueries? The second question is relative big differencies between GROUP BY behave and GROUP BY GROUPING SETS behave. Now I don't know about way to join GROUP BY and GROUPING SETS together Any ideas welcome Regards Pavel