Daniel Newman <dtnewman@gmail.com> writes:
> Yes. If i delete the index and recreate it, the bug is replicated.
So the answer is that this got broken by commit 9f03ca915196dfc8,
which appears to have imagined that _hash_form_tuple() is just an
alias for index_form_tuple(). But it ain't. As a result, construction
of hash indexes larger than shared_buffers is broken in 9.5 and up:
what gets entered into the index is garbage, because we are taking
raw data as if it were already hashed. (In fact, in this example,
we seem to be taking *pointers* to raw data as the hash values.)
> Interestingly, I modified the pg_dump file a bit. At the end, it says:
>> CREATE INDEX hash_issue_index ON hash_issue_table USING hash
>> (hash_issue_column);
>> DROP INDEX hash_issue_index;
>> CREATE INDEX hash_issue_index ON hash_issue_table USING hash
>> (hash_issue_column);
> This is because the issue was not replicating (for some reason) when it
> built the index the first time.
I think what's happening there is that the first CREATE INDEX
underestimates the size of the table and decides it doesn't need to
use the _h_spool() code path. The other path isn't broken.
We can either revert the aforesaid commit so far as it affects hash,
or do something to break _hash_form_tuple's encapsulation of the
hash-value-for-data substitution. I don't immediately see a non-messy
way to do the latter.
regards, tom lane