Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Дата
Msg-id b337cd3c-7091-2342-8e66-36919d91f70c@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  ("Todd A. Cook" <tcook@blackducksoftware.com>)
Ответы Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs

On 01/25/2018 11:31 PM, Todd A. Cook wrote:
> On 11/27/17 14:17, Tomas Vondra wrote:
>> Hi,
>>
>> On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      14932
>>> Logged by:          Todd Cook
>>> Email address:      tcook@blackducksoftware.com
>>> PostgreSQL version: 10.1
>>> Operating system:   CentOS Linux release 7.4.1708 (Core)
>>> Description:
>>>
>>> It hangs on a table with 167834 rows, though it works fine with only
>>> 167833
>>> rows.  When it hangs, CTRL-C does not interrupt it, and the backend
>>> has to
>>> be killed to stop it.
>>>
>>
>> Can you share the query and data, so that we can reproduce the issue?
>>
>> Based on the stack traces this smells like a bug in the simplehash,
>> introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(),
>> which gets triggered for 167834 rows (but not for 167833).
> 
> FWIW, changing the guts of hashint8() to
> 
> +       if (val >= INT32_MIN && val <= INT32_MAX)
> +               return hash_uint32((uint32) val);
> +       else
> +               return hash_any((unsigned char *) &val, sizeof(val));
> 
> allows us to process a full-sized data set of around 900 million rows. 
> However,
> memory usage seemed to be rather excessive (we can only run 7 of these
> jobs in parallel
> on a 128GB system before the OOM killer kicked in, rather than the usual
> 24); if there's
> any interest, I can try to measure exactly how excessive.
> 

I suspect you're right the hash is biased to lohalf bits, as you wrote
in the 19/12 message. In fact, I think it's a direct consequence of the
requirement that hashint8() needs to produce the same hash for logically
equivalent int2 and int4 values.

Out of curiosity, could you try replacing the hash_any call in hashint8
with a hash function like murmur3, and see if it improves the behavior?

That obviously breaks the hashint8 for cross-type hash joins, but it
would be interesting bit of information I think.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #15032: Segmentation fault when running a particular query
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop