Обсуждение: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

Поиск
Список
Период
Сортировка

failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

От
Tomas Vondra
Дата:
Hi,

while reviewing one of the 'abbreviated keys' patches, I noticed that
the ordered set aggregates are broken when using tuplesort with multiple
runs.

ISTM this got broken by 5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802:
   Don't use abbreviated keys for the final merge pass.
   When we write tuples out to disk and read them back in, the   abbreviated keys become non-abbreviated, because the
readtup  routines don't know anything about abbreviation.  But without   this fix, the rest of the code still thinks
the  abbreviation-aware compartor should be used, so chaos ensues.
 

Triggering it is quite simple:
   CREATE TABLE stuff AS SELECT random()::text AS randtext                         FROM generate_series(1,1000000);
ANALYZEstuff;
 
   SET work_mem = '1MB';
   SELECT percentile_disc(0) WITHIN GROUP (ORDER BY randtext)     FROM stuff;

which fails like this:

----------------------------------------------------------------------
Program received signal SIGSEGV, Segmentation fault.
0x0000000000931bf6 in mergeruns (state=0x1dd3b58) at tuplesort.c:2175
2175            if (state->sortKeys->abbrev_converter)
(gdb) print state
$1 = (Tuplesortstate *) 0x1dd3b58
(gdb) print state->sortKeys
$2 = (SortSupport) 0x0
(gdb) print state->onlyKey
$3 = (SortSupport) 0x1dd3d70
----------------------------------------------------------------------
(gdb) bt
#0  0x0000000000931bf6 in mergeruns (state=0x1dd3b58) at tuplesort.c:2175
#1  0x00000000009309c1 in tuplesort_performsort (state=0x1dd3b58) at
tuplesort.c:1563
#2  0x0000000000864038 in percentile_disc_final (fcinfo=0x7fff2c9f4bc0)
at orderedsetaggs.c:443
#3  0x0000000000664f23 in finalize_aggregate (aggstate=0x1dcbc40,
peraggstate=0x1dcdb38, pergroupstate=0x1dce350, resultVal=0x1dcd308,
resultIsNull=0x1dcd328 "") at nodeAgg.c:865
#4  0x0000000000665826 in agg_retrieve_direct (aggstate=0x1dcbc40) at
nodeAgg.c:1295
#5  0x000000000066551a in ExecAgg (node=0x1dcbc40) at nodeAgg.c:1119
----------------------------------------------------------------------

This seems to happen because ordered_set_startup() calls
tuplesort_begin_datum() when (use_tuples == true), which only sets
'onlyKey' and leaves (sortKeys == NULL). So 'mergeruns' fails because it
does not expect that.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

От
Peter Geoghegan
Дата:
On Fri, Feb 20, 2015 at 11:58 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> This seems to happen because ordered_set_startup() calls
> tuplesort_begin_datum() when (use_tuples == true), which only sets
> 'onlyKey' and leaves (sortKeys == NULL). So 'mergeruns' fails because it
> does not expect that.

Oops. You're right. Attached patch fixes the issue, by making
tuplesort_begin_datum() consistent with other comparable routines for
other tuple cases. I think it makes sense to have the 'sortKeys' field
always set, and the 'onlyKey' field also set when that additional
optimization makes sense. That was what I understood the API to be, so
arguably this is a pre-existing issue with tuplesort_begin_datum().

Thanks for the report!
--
Peter Geoghegan

Вложения

Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

От
Robert Haas
Дата:
On Fri, Feb 20, 2015 at 4:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Feb 20, 2015 at 11:58 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> This seems to happen because ordered_set_startup() calls
>> tuplesort_begin_datum() when (use_tuples == true), which only sets
>> 'onlyKey' and leaves (sortKeys == NULL). So 'mergeruns' fails because it
>> does not expect that.
>
> Oops. You're right. Attached patch fixes the issue, by making
> tuplesort_begin_datum() consistent with other comparable routines for
> other tuple cases. I think it makes sense to have the 'sortKeys' field
> always set, and the 'onlyKey' field also set when that additional
> optimization makes sense. That was what I understood the API to be, so
> arguably this is a pre-existing issue with tuplesort_begin_datum().

This doesn't completely fix the issue.  Even with this patch applied:

CREATE TABLE stuff AS SELECT random()::text AS randtext FROM
generate_series(1,50000000);
set maintenance_work_mem='1024';
create index on stuff using hash (randtext);
...wait for a bit, server crash...

I find your statement that this is a pre-existing issue in
tuplesort_begin_datum() to be pretty misleading, unless what you mean
by it is "pre-existing since November, when an earlier patch by Peter
Geoghegan changed the comments without changing the code to match".
Commit 5ea86e6e65dd2da3e9a3464484985d48328e7fe3, changed the comments
for the sortKey field to say that it would be set in all cases except
for the hash index case, but it did not make that statement true.
Subsequently, another of your patches, which I committed as
5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802, added a dependency on
state->sortKeys always being set.  Now you've got this patch here,
which you claim makes sure that sortKeys is always set, but it
actually doesn't, which is why the above still crashes.  There are not
so many tuplesort_begin_xxx routines that you couldn't audit them all
before submitting your patches.

Anyway, I propose the attached instead, which makes both Tomas's test
case and the one above pass.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

От
Peter Geoghegan
Дата:
On Tue, Mar 3, 2015 at 3:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I find your statement that this is a pre-existing issue in
> tuplesort_begin_datum() to be pretty misleading, unless what you mean
> by it is "pre-existing since November, when an earlier patch by Peter
> Geoghegan changed the comments without changing the code to match".
> Commit 5ea86e6e65dd2da3e9a3464484985d48328e7fe3, changed the comments
> for the sortKey field to say that it would be set in all cases except
> for the hash index case, but it did not make that statement true.

Agreed. I believe that is in fact what I meant. I'm not trying to
deflect blame - just to explain my understanding of the problem.

> Subsequently, another of your patches, which I committed as
> 5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802, added a dependency on
> state->sortKeys always being set.  Now you've got this patch here,
> which you claim makes sure that sortKeys is always set, but it
> actually doesn't, which is why the above still crashes.  There are not
> so many tuplesort_begin_xxx routines that you couldn't audit them all
> before submitting your patches.

Sorry. I should have caught the hash index case too.

> Anyway, I propose the attached instead, which makes both Tomas's test
> case and the one above pass.

My patch actually matches Andrew Gierth's datumsort patch, in that it
also uses this convention, as I believe it should. For that reason,
I'd prefer to make the comment added in November true, rather than
changing the comment...I feel it ought to be true anyway (which is
what I meant about a pre-existing issue). You'll still need the
defenses you've added for the the hash case, of course. You might want
to also put a comment specifying why it's necessary, since the hash
tuple case is the oddball cases that doesn't always have "sortKeys"
set.

In other words, I suggest that you commit the union of our two
patches, less your changes to the comments around "sortKeys'.

Thanks
-- 
Peter Geoghegan



Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

От
Robert Haas
Дата:
On Tue, Mar 3, 2015 at 7:14 PM, Peter Geoghegan <pg@heroku.com> wrote:
> My patch actually matches Andrew Gierth's datumsort patch, in that it
> also uses this convention, as I believe it should. For that reason,
> I'd prefer to make the comment added in November true, rather than
> changing the comment...I feel it ought to be true anyway (which is
> what I meant about a pre-existing issue). You'll still need the
> defenses you've added for the the hash case, of course. You might want
> to also put a comment specifying why it's necessary, since the hash
> tuple case is the oddball cases that doesn't always have "sortKeys"
> set.
>
> In other words, I suggest that you commit the union of our two
> patches, less your changes to the comments around "sortKeys'.

I think we should commit my patch, and if a future patch needs
sortKeys set in more places, it can make that change itself.  There's
no reason why it's needed with the code as it is today, and no reason
to let bits of future changes leak into a bug-fix patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

От
Peter Geoghegan
Дата:
On Wed, Mar 4, 2015 at 8:26 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think we should commit my patch, and if a future patch needs
> sortKeys set in more places, it can make that change itself.  There's
> no reason why it's needed with the code as it is today, and no reason
> to let bits of future changes leak into a bug-fix patch.

It's not as if I feel strongly about it. It's trivial to change the
code to make the comment true, rather than change the comment to make
the code true, and we need to do so anyway. I defer to you.


-- 
Peter Geoghegan



Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

От
Robert Haas
Дата:
On Wed, Mar 4, 2015 at 1:55 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Mar 4, 2015 at 8:26 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think we should commit my patch, and if a future patch needs
>> sortKeys set in more places, it can make that change itself.  There's
>> no reason why it's needed with the code as it is today, and no reason
>> to let bits of future changes leak into a bug-fix patch.
>
> It's not as if I feel strongly about it. It's trivial to change the
> code to make the comment true, rather than change the comment to make
> the code true, and we need to do so anyway. I defer to you.

I did it my way.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company