Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Дата
Msg-id be9132a8-67c9-533e-b0ce-6617d24a7464@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs

On 01/29/2018 08:11 PM, Tom Lane wrote:
> One other point here is that it's not really clear to me what a
> randomly varying IV is supposed to accomplish. Surely we're not
> intending that it prevents somebody from crafting a data set that
> causes bad hash performance. If a user with DB access wants to cause
> a performance problem, there are and always will be plenty of other
> avenues to making that happen.
While I'm not sure the random IV is something we want/need, I don't
think "There are other ways to harm the database," is a convincing argument.

I agree it's fairly easy to set work_mem to an insane value, or craft a
query that eats all memory (many sorts using work_mem each, hashagg used
for data with many groups, ...). Basically, if you don't know how to
crash the database, you're not a senior DBA/engineer.

But not all attacks are equal. All those examples I named require direct
database connection, ability to construct SQL queries, etc.

The case discussed in this thread does not require that - it's enough to
control the data input, say by uploading a CSV file with crafted data to
some web application (as Greg mentions). Of course, you need to make
some assumptions while crafting the data (that it's using Postgres, and
that it'll use hash aggregate on the data), but that's about it.


> If the idea is that for a data set that otherwise would have bad hash
> performance, choosing a different IV would (almost always) fix it,
> that sounds good but you're ignoring the inverse case: for a data
> set that works fine, there would be some choices of IV that create a
> problem where there was none before. I see no reason to think that
> the probability of the former kind of situation is higher than the
> latter.

I don't think this thread is really about randomly generated data sets,
but about something that is generated in a way that increases the number
of collisions - either intentionally or unintentionally. That's
certainly the case of my data set, which was designed to trigger one of
the issues in the code.

I agree we need to check if the randomization might cause more harm than
good by causing random failures. But my feeling is the probability will
be very low.

> So I'm on board with using the extended hash functions when
> available, but I'm not convinced that a varying IV buys us anything
> but trouble.

I don't have a clear opinion on the random IV yet.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: BUG #15036: Un-killable queries Hanging in BgWorkerShutdown
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #15035: scram-sha-256 blocks all logins