On Wed, Feb 19, 2020 at 08:16:36PM +0100, Tomas Vondra wrote:
> 5) Assert(nbuckets > 0);
> ...
> This however quickly fails on this assert in BuildTupleHashTableExt (see
> backtrace1.txt):
>
> Assert(nbuckets > 0);
>
> The value is computed in hash_choose_num_buckets, and there seem to be
> no protections against returning bogus values like 0. So maybe we should
> return
>
> Min(nbuckets, 1024)
>
> or something like that, similarly to hash join. OTOH maybe it's simply
> due to agg_refill_hash_table() passing bogus values to the function?
>
>
> 6) Another thing that occurred to me was what happens to grouping sets,
> which we can't spill to disk. So I did this:
>
> create table t2 (a int, b int, c int);
>
> -- run repeatedly, until there are about 20M rows in t2 (1GB)
> with tx as (select array_agg(a) as a, array_agg(b) as b
> from (select a, b from t order by random()) foo),
> ty as (select array_agg(a) AS a
> from (select a from t order by random()) foo)
> insert into t2 select unnest(tx.a), unnest(ty.a), unnest(tx.b)
> from tx, ty;
>
> analyze t2;
> ...
>
> which fails with segfault at execution time:
>
> tuplehash_start_iterate (tb=0x18, iter=iter@entry=0x2349340)
> 870 for (i = 0; i < tb->size; i++)
> (gdb) bt
> #0 tuplehash_start_iterate (tb=0x18, iter=iter@entry=0x2349340)
> #1 0x0000000000654e49 in agg_retrieve_hash_table_in_memory ...
>
> That's not surprising, because 0x18 pointer is obviously bogus. I guess
> this is simply an offset 18B added to a NULL pointer?
I did some investigation, have you disabled the assert when this panic
happens? If so, it's the same issue as "5) nbucket == 0", which passes a
zero size to allocator when creates that endup-with-0x18 hashtable.
Sorry my testing env goes weird right now, haven't reproduced it yet.
--
Adam Lee