Greg Stark <stark@mit.edu> writes:
> On 29 January 2018 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One other point here is that it's not really clear to me what a randomly
>> varying IV is supposed to accomplish. Surely we're not intending that
>> it prevents somebody from crafting a data set that causes bad hash
>> performance.
> I actually think that is a real live issue that we will be forced to
> deal with one day. And I think that day is coming soon.
> It's not hard to imagine a user of a web site intentionally naming
> their objects such that they all hash to the same value. Probably most
> systems the worst case is a query that takes a few seconds or even
> tens of seconds but if you get lucky you could run a server out of
> memory.
By their very nature, hash algorithms have weak spots. Pretending that
they do not, or that you can 100% remove them, is a fool's errand.
You could always "set enable_hashjoin = off", and deal with mergejoin's
weak spots instead; but that just requires a different data set to
expose its performance shortcomings ...
regards, tom lane