Обсуждение: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)
Hi, while reviewing one of the 'abbreviated keys' patches, I noticed that the ordered set aggregates are broken when using tuplesort with multiple runs. ISTM this got broken by 5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802: Don't use abbreviated keys for the final merge pass. When we write tuples out to disk and read them back in, the abbreviated keys become non-abbreviated, because the readtup routines don't know anything about abbreviation. But without this fix, the rest of the code still thinks the abbreviation-aware compartor should be used, so chaos ensues. Triggering it is quite simple: CREATE TABLE stuff AS SELECT random()::text AS randtext FROM generate_series(1,1000000); ANALYZEstuff; SET work_mem = '1MB'; SELECT percentile_disc(0) WITHIN GROUP (ORDER BY randtext) FROM stuff; which fails like this: ---------------------------------------------------------------------- Program received signal SIGSEGV, Segmentation fault. 0x0000000000931bf6 in mergeruns (state=0x1dd3b58) at tuplesort.c:2175 2175 if (state->sortKeys->abbrev_converter) (gdb) print state $1 = (Tuplesortstate *) 0x1dd3b58 (gdb) print state->sortKeys $2 = (SortSupport) 0x0 (gdb) print state->onlyKey $3 = (SortSupport) 0x1dd3d70 ---------------------------------------------------------------------- (gdb) bt #0 0x0000000000931bf6 in mergeruns (state=0x1dd3b58) at tuplesort.c:2175 #1 0x00000000009309c1 in tuplesort_performsort (state=0x1dd3b58) at tuplesort.c:1563 #2 0x0000000000864038 in percentile_disc_final (fcinfo=0x7fff2c9f4bc0) at orderedsetaggs.c:443 #3 0x0000000000664f23 in finalize_aggregate (aggstate=0x1dcbc40, peraggstate=0x1dcdb38, pergroupstate=0x1dce350, resultVal=0x1dcd308, resultIsNull=0x1dcd328 "") at nodeAgg.c:865 #4 0x0000000000665826 in agg_retrieve_direct (aggstate=0x1dcbc40) at nodeAgg.c:1295 #5 0x000000000066551a in ExecAgg (node=0x1dcbc40) at nodeAgg.c:1119 ---------------------------------------------------------------------- This seems to happen because ordered_set_startup() calls tuplesort_begin_datum() when (use_tuples == true), which only sets 'onlyKey' and leaves (sortKeys == NULL). So 'mergeruns' fails because it does not expect that. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)
От
Peter Geoghegan
Дата:
On Fri, Feb 20, 2015 at 11:58 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > This seems to happen because ordered_set_startup() calls > tuplesort_begin_datum() when (use_tuples == true), which only sets > 'onlyKey' and leaves (sortKeys == NULL). So 'mergeruns' fails because it > does not expect that. Oops. You're right. Attached patch fixes the issue, by making tuplesort_begin_datum() consistent with other comparable routines for other tuple cases. I think it makes sense to have the 'sortKeys' field always set, and the 'onlyKey' field also set when that additional optimization makes sense. That was what I understood the API to be, so arguably this is a pre-existing issue with tuplesort_begin_datum(). Thanks for the report! -- Peter Geoghegan
Вложения
On Fri, Feb 20, 2015 at 4:01 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Fri, Feb 20, 2015 at 11:58 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> This seems to happen because ordered_set_startup() calls >> tuplesort_begin_datum() when (use_tuples == true), which only sets >> 'onlyKey' and leaves (sortKeys == NULL). So 'mergeruns' fails because it >> does not expect that. > > Oops. You're right. Attached patch fixes the issue, by making > tuplesort_begin_datum() consistent with other comparable routines for > other tuple cases. I think it makes sense to have the 'sortKeys' field > always set, and the 'onlyKey' field also set when that additional > optimization makes sense. That was what I understood the API to be, so > arguably this is a pre-existing issue with tuplesort_begin_datum(). This doesn't completely fix the issue. Even with this patch applied: CREATE TABLE stuff AS SELECT random()::text AS randtext FROM generate_series(1,50000000); set maintenance_work_mem='1024'; create index on stuff using hash (randtext); ...wait for a bit, server crash... I find your statement that this is a pre-existing issue in tuplesort_begin_datum() to be pretty misleading, unless what you mean by it is "pre-existing since November, when an earlier patch by Peter Geoghegan changed the comments without changing the code to match". Commit 5ea86e6e65dd2da3e9a3464484985d48328e7fe3, changed the comments for the sortKey field to say that it would be set in all cases except for the hash index case, but it did not make that statement true. Subsequently, another of your patches, which I committed as 5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802, added a dependency on state->sortKeys always being set. Now you've got this patch here, which you claim makes sure that sortKeys is always set, but it actually doesn't, which is why the above still crashes. There are not so many tuplesort_begin_xxx routines that you couldn't audit them all before submitting your patches. Anyway, I propose the attached instead, which makes both Tomas's test case and the one above pass. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)
От
Peter Geoghegan
Дата:
On Tue, Mar 3, 2015 at 3:53 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I find your statement that this is a pre-existing issue in > tuplesort_begin_datum() to be pretty misleading, unless what you mean > by it is "pre-existing since November, when an earlier patch by Peter > Geoghegan changed the comments without changing the code to match". > Commit 5ea86e6e65dd2da3e9a3464484985d48328e7fe3, changed the comments > for the sortKey field to say that it would be set in all cases except > for the hash index case, but it did not make that statement true. Agreed. I believe that is in fact what I meant. I'm not trying to deflect blame - just to explain my understanding of the problem. > Subsequently, another of your patches, which I committed as > 5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802, added a dependency on > state->sortKeys always being set. Now you've got this patch here, > which you claim makes sure that sortKeys is always set, but it > actually doesn't, which is why the above still crashes. There are not > so many tuplesort_begin_xxx routines that you couldn't audit them all > before submitting your patches. Sorry. I should have caught the hash index case too. > Anyway, I propose the attached instead, which makes both Tomas's test > case and the one above pass. My patch actually matches Andrew Gierth's datumsort patch, in that it also uses this convention, as I believe it should. For that reason, I'd prefer to make the comment added in November true, rather than changing the comment...I feel it ought to be true anyway (which is what I meant about a pre-existing issue). You'll still need the defenses you've added for the the hash case, of course. You might want to also put a comment specifying why it's necessary, since the hash tuple case is the oddball cases that doesn't always have "sortKeys" set. In other words, I suggest that you commit the union of our two patches, less your changes to the comments around "sortKeys'. Thanks -- Peter Geoghegan
On Tue, Mar 3, 2015 at 7:14 PM, Peter Geoghegan <pg@heroku.com> wrote: > My patch actually matches Andrew Gierth's datumsort patch, in that it > also uses this convention, as I believe it should. For that reason, > I'd prefer to make the comment added in November true, rather than > changing the comment...I feel it ought to be true anyway (which is > what I meant about a pre-existing issue). You'll still need the > defenses you've added for the the hash case, of course. You might want > to also put a comment specifying why it's necessary, since the hash > tuple case is the oddball cases that doesn't always have "sortKeys" > set. > > In other words, I suggest that you commit the union of our two > patches, less your changes to the comments around "sortKeys'. I think we should commit my patch, and if a future patch needs sortKeys set in more places, it can make that change itself. There's no reason why it's needed with the code as it is today, and no reason to let bits of future changes leak into a bug-fix patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)
От
Peter Geoghegan
Дата:
On Wed, Mar 4, 2015 at 8:26 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I think we should commit my patch, and if a future patch needs > sortKeys set in more places, it can make that change itself. There's > no reason why it's needed with the code as it is today, and no reason > to let bits of future changes leak into a bug-fix patch. It's not as if I feel strongly about it. It's trivial to change the code to make the comment true, rather than change the comment to make the code true, and we need to do so anyway. I defer to you. -- Peter Geoghegan
On Wed, Mar 4, 2015 at 1:55 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Mar 4, 2015 at 8:26 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I think we should commit my patch, and if a future patch needs >> sortKeys set in more places, it can make that change itself. There's >> no reason why it's needed with the code as it is today, and no reason >> to let bits of future changes leak into a bug-fix patch. > > It's not as if I feel strongly about it. It's trivial to change the > code to make the comment true, rather than change the comment to make > the code true, and we need to do so anyway. I defer to you. I did it my way. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company