Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Дата	30 января 2018 г. 05:17:12
Msg-id	be9132a8-67c9-533e-b0ce-6617d24a7464@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-bugs

Дерево обсуждения

On 01/29/2018 08:11 PM, Tom Lane wrote:
> One other point here is that it's not really clear to me what a
> randomly varying IV is supposed to accomplish. Surely we're not
> intending that it prevents somebody from crafting a data set that
> causes bad hash performance. If a user with DB access wants to cause
> a performance problem, there are and always will be plenty of other
> avenues to making that happen.
While I'm not sure the random IV is something we want/need, I don't
think "There are other ways to harm the database," is a convincing argument.

I agree it's fairly easy to set work_mem to an insane value, or craft a
query that eats all memory (many sorts using work_mem each, hashagg used
for data with many groups, ...). Basically, if you don't know how to
crash the database, you're not a senior DBA/engineer.

But not all attacks are equal. All those examples I named require direct
database connection, ability to construct SQL queries, etc.

The case discussed in this thread does not require that - it's enough to
control the data input, say by uploading a CSV file with crafted data to
some web application (as Greg mentions). Of course, you need to make
some assumptions while crafting the data (that it's using Postgres, and
that it'll use hash aggregate on the data), but that's about it.

> If the idea is that for a data set that otherwise would have bad hash
> performance, choosing a different IV would (almost always) fix it,
> that sounds good but you're ignoring the inverse case: for a data
> set that works fine, there would be some choices of IV that create a
> problem where there was none before. I see no reason to think that
> the probability of the former kind of situation is higher than the
> latter.

I don't think this thread is really about randomly generated data sets,
but about something that is generated in a way that increases the number
of collisions - either intentionally or unintentionally. That's
certainly the case of my data set, which was designed to trigger one of
the issues in the code.

I agree we need to check if the randomization might cause more harm than
good by causing random failures. But my feeling is the probability will
be very low.

> So I'm on board with using the extended hash functions when
> available, but I'm not convinced that a varying IV buys us anything
> but trouble.

I don't have a clear opinion on the random IV yet.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop