Обсуждение: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

Поиск
Список
Период
Сортировка

BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
danielnewman@umich.edu
Дата:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz
aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDIxMApMb2dnZWQgYnk6ICAg
ICAgICAgIERhbmllbCBOZXdtYW4KRW1haWwgYWRkcmVzczogICAgICBkYW5p
ZWxuZXdtYW5AdW1pY2guZWR1ClBvc3RncmVTUUwgdmVyc2lvbjogOS41LjMK
T3BlcmF0aW5nIHN5c3RlbTogICBPUyBYIDEwLjExLjUKRGVzY3JpcHRpb246
ICAgICAgICAKCkkgaGF2ZSBhIHRhYmxlIHdpdGggb25lIGNvbHVtbiBhbmQg
YSBoYXNoIGluZGV4Og0KDQogICAgICAgICAgVGFibGUgInB1YmxpYy5oYXNo
X2lzc3VlX3RhYmxlIg0KICAgICAgQ29sdW1uICAgICAgIHwgICAgICAgVHlw
ZSAgICAgICAgfCBNb2RpZmllcnMgDQotLS0tLS0tLS0tLS0tLS0tLS0tKy0t
LS0tLS0tLS0tLS0tLS0tLS0rLS0tLS0tLS0tLS0NCiBoYXNoX2lzc3VlX2Nv
bHVtbiB8IGNoYXJhY3RlciB2YXJ5aW5nIHwgbm90IG51bGwNCkluZGV4ZXM6
DQogICAgImhhc2hfaXNzdWVfaW5kZXgiIGhhc2ggKGhhc2hfaXNzdWVfY29s
dW1uKQ0KDQpXaGVuIEkgcnVuIHRoZSBxdWVyeSBgc2VsZWN0ICogZnJvbSBo
YXNoX2lzc3VlX3RhYmxlIHdoZXJlIGhhc2hfaXNzdWVfY29sdW1uCmxpa2Ug
JzIxODQnO2AgSSBnZXQgNzAxIHJlc3VsdHMuIFdoZW4gSSBydW4gdGhlIHF1
ZXJ5IA0KV2hlbiBJIHJ1biB0aGUgcXVlcnkgYHNlbGVjdCAqIGZyb20gaGFz
aF9pc3N1ZV90YWJsZSB3aGVyZSBoYXNoX2lzc3VlX2NvbHVtbgo9ICcyMTg0
JztgLCBJIGdldCAwIHJlc3VsdHMuIEhvd2V2ZXIsIGlmIEkgZHJvcCB0aGUg
aGFzaCBpbmRleCBhbmQgcmVydW4gdGhlCnNlY29uZCBxdWVyeSwgSSBnZXQg
dGhlIHNhbWUgcmVzdWx0cy4NCg0KSSBoYXZlIGNyZWF0ZWQgYSBzcWwgZHVt
cCBmaWxlIGNvbnRhaW5pbmcgdGhlIGRhdGEgbmVlZGVkIHRvIHJlcGxpY2F0
ZSB0aGUKaXNzdWUsIGJ1dCBpdCBpcyB0b28gbGFyZ2UgdG8gY29weS9wYXN0
ZSBoZXJlLiBJcyB0aGVyZSBzb21ld2hlcmUgSSBjYW4KdXBsb2FkIG9yIHNl
bmQgdGhhdCBmaWxlPwoK
danielnewman@umich.edu writes:
> When I run the query `select * from hash_issue_table where hash_issue_column
> like '2184';` I get 701 results. When I run the query
> When I run the query `select * from hash_issue_table where hash_issue_column
> = '2184';`, I get 0 results. However, if I drop the hash index and rerun the
> second query, I get the same results.

Sounds like a corrupted hash index to me; which is not terribly surprising
given hash indexes' lack of WAL support.  Can you reproduce this after
rebuilding the hash index?

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Daniel Newman
Дата:
Yes. If i delete the index and recreate it, the bug is replicated. I have
uploaded a pg_dump .sql file to
https://www.dropbox.com/s/4dxuo2uthj721od/hash_issue_db.sql?dl=0 that you
can use to recreate the issue.

After creating the database when I run:

>>> select * from hash_issue_table where hash_issue_column = '1';

I get no results.

When I run:
>>> drop index hash_issue_index;
>>> select * from hash_issue_table where hash_issue_column = '1';

I get 531 rows of results.

When I run:

>>> create index hash_issue_index on hash_issue_table using
hash(hash_issue_column);
>>> select * from hash_issue_table where hash_issue_column = '1';

I get 0 results again. I have repeated this successfully multiple times and
on a coworker's machine as well.

Interestingly, I modified the pg_dump file a bit. At the end, it says:

CREATE INDEX hash_issue_index ON hash_issue_table USING hash
> (hash_issue_column);
>
> DROP INDEX hash_issue_index;
>
> CREATE INDEX hash_issue_index ON hash_issue_table USING hash
> (hash_issue_column);
>

This is because the issue was not replicating (for some reason) when it
built the index the first time.


On Thu, Jun 23, 2016 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> danielnewman@umich.edu writes:
> > When I run the query `select * from hash_issue_table where
> hash_issue_column
> > like '2184';` I get 701 results. When I run the query
> > When I run the query `select * from hash_issue_table where
> hash_issue_column
> > = '2184';`, I get 0 results. However, if I drop the hash index and rerun
> the
> > second query, I get the same results.
>
> Sounds like a corrupted hash index to me; which is not terribly surprising
> given hash indexes' lack of WAL support.  Can you reproduce this after
> rebuilding the hash index?
>
>                         regards, tom lane
>
Daniel Newman <dtnewman@gmail.com> writes:
> Yes. If i delete the index and recreate it, the bug is replicated.

So the answer is that this got broken by commit 9f03ca915196dfc8,
which appears to have imagined that _hash_form_tuple() is just an
alias for index_form_tuple().  But it ain't.  As a result, construction
of hash indexes larger than shared_buffers is broken in 9.5 and up:
what gets entered into the index is garbage, because we are taking
raw data as if it were already hashed.  (In fact, in this example,
we seem to be taking *pointers* to raw data as the hash values.)

> Interestingly, I modified the pg_dump file a bit. At the end, it says:
>> CREATE INDEX hash_issue_index ON hash_issue_table USING hash
>> (hash_issue_column);
>> DROP INDEX hash_issue_index;
>> CREATE INDEX hash_issue_index ON hash_issue_table USING hash
>> (hash_issue_column);
> This is because the issue was not replicating (for some reason) when it
> built the index the first time.

I think what's happening there is that the first CREATE INDEX
underestimates the size of the table and decides it doesn't need to
use the _h_spool() code path.  The other path isn't broken.

We can either revert the aforesaid commit so far as it affects hash,
or do something to break _hash_form_tuple's encapsulation of the
hash-value-for-data substitution.  I don't immediately see a non-messy
way to do the latter.

            regards, tom lane
I wrote:
> So the answer is that this got broken by commit 9f03ca915196dfc8,
> which appears to have imagined that _hash_form_tuple() is just an
> alias for index_form_tuple().  But it ain't.  As a result, construction
> of hash indexes larger than shared_buffers is broken in 9.5 and up:

BTW, an easy way to prove that the _h_spool code path is broken is to
do this:

diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 49a6c81..162cb63 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -125,7 +125,7 @@ hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)
      * NOTE: this test will need adjustment if a bucket is ever different from
      * one page.
      */
-    if (num_buckets >= (uint32) NBuffers)
+    if (1)
         buildstate.spool = _h_spoolinit(heap, index, num_buckets);
     else
         buildstate.spool = NULL;


and then run the regression tests.  I'm thinking it would be a good
idea to provide some less-painful way to force that code path to be
taken for testing purposes.

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Daniel Newman
Дата:
Wow, thank you for such a quick response! And thanks for pointing out the
commit... I'm definitely interested in having a look at the code.

In the meantime, I can solve the specific problem I'm having by switching
to a btree index :)

Thank you for this and all the rest of your great work.

Best,
Daniel Newman

On Thu, Jun 23, 2016 at 7:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Daniel Newman <dtnewman@gmail.com> writes:
> > Yes. If i delete the index and recreate it, the bug is replicated.
>
> So the answer is that this got broken by commit 9f03ca915196dfc8,
> which appears to have imagined that _hash_form_tuple() is just an
> alias for index_form_tuple().  But it ain't.  As a result, construction
> of hash indexes larger than shared_buffers is broken in 9.5 and up:
> what gets entered into the index is garbage, because we are taking
> raw data as if it were already hashed.  (In fact, in this example,
> we seem to be taking *pointers* to raw data as the hash values.)
>
> > Interestingly, I modified the pg_dump file a bit. At the end, it says:
> >> CREATE INDEX hash_issue_index ON hash_issue_table USING hash
> >> (hash_issue_column);
> >> DROP INDEX hash_issue_index;
> >> CREATE INDEX hash_issue_index ON hash_issue_table USING hash
> >> (hash_issue_column);
> > This is because the issue was not replicating (for some reason) when it
> > built the index the first time.
>
> I think what's happening there is that the first CREATE INDEX
> underestimates the size of the table and decides it doesn't need to
> use the _h_spool() code path.  The other path isn't broken.
>
> We can either revert the aforesaid commit so far as it affects hash,
> or do something to break _hash_form_tuple's encapsulation of the
> hash-value-for-data substitution.  I don't immediately see a non-messy
> way to do the latter.
>
>                         regards, tom lane
>

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Robert Haas
Дата:
On Thu, Jun 23, 2016 at 7:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniel Newman <dtnewman@gmail.com> writes:
>> Yes. If i delete the index and recreate it, the bug is replicated.
>
> So the answer is that this got broken by commit 9f03ca915196dfc8,
> which appears to have imagined that _hash_form_tuple() is just an
> alias for index_form_tuple().  But it ain't.  As a result, construction
> of hash indexes larger than shared_buffers is broken in 9.5 and up:
> what gets entered into the index is garbage, because we are taking
> raw data as if it were already hashed.  (In fact, in this example,
> we seem to be taking *pointers* to raw data as the hash values.)

Oops.

>> Interestingly, I modified the pg_dump file a bit. At the end, it says:
>>> CREATE INDEX hash_issue_index ON hash_issue_table USING hash
>>> (hash_issue_column);
>>> DROP INDEX hash_issue_index;
>>> CREATE INDEX hash_issue_index ON hash_issue_table USING hash
>>> (hash_issue_column);
>> This is because the issue was not replicating (for some reason) when it
>> built the index the first time.
>
> I think what's happening there is that the first CREATE INDEX
> underestimates the size of the table and decides it doesn't need to
> use the _h_spool() code path.  The other path isn't broken.
>
> We can either revert the aforesaid commit so far as it affects hash,
> or do something to break _hash_form_tuple's encapsulation of the
> hash-value-for-data substitution.  I don't immediately see a non-messy
> way to do the latter.

Would it work to have something like _hash_form_tuple() except that
instead of returning an index tuple it would just return the Datum and
isnull flags and let the caller decide what to do with them?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 23, 2016 at 7:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We can either revert the aforesaid commit so far as it affects hash,
>> or do something to break _hash_form_tuple's encapsulation of the
>> hash-value-for-data substitution.  I don't immediately see a non-messy
>> way to do the latter.

> Would it work to have something like _hash_form_tuple() except that
> instead of returning an index tuple it would just return the Datum and
> isnull flags and let the caller decide what to do with them?

Yeah.  Let's define it as a function that transforms input values/isnull
arrays into output values/isnull arrays.  It seems all right for the
callers to know the latter are always of length 1, so they can be local
variables in the callers.  I'll go make it so.

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Thu, Jun 23, 2016 at 4:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So the answer is that this got broken by commit 9f03ca915196dfc8,
> which appears to have imagined that _hash_form_tuple() is just an
> alias for index_form_tuple().  But it ain't.

Can we add test coverage that will detect when comparetup_index_hash()
gives wrong answers, please? It's been broken at least 2 times, that
I'm aware of. I'll write a patch, if that helps.

--
Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes:
> On Thu, Jun 23, 2016 at 4:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So the answer is that this got broken by commit 9f03ca915196dfc8,
>> which appears to have imagined that _hash_form_tuple() is just an
>> alias for index_form_tuple().  But it ain't.

> Can we add test coverage that will detect when comparetup_index_hash()
> gives wrong answers, please? It's been broken at least 2 times, that
> I'm aware of. I'll write a patch, if that helps.

I agree that we need to do something, but I'm not very clear on what
the something is.  I considered inventing a debug #define that would
reduce the _h_spool threshold to the minimum workable amount (2, looks
like), but I'm not sure how much that will help.  What did you have
in mind?

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Fri, Jun 24, 2016 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I agree that we need to do something, but I'm not very clear on what
> the something is.  I considered inventing a debug #define that would
> reduce the _h_spool threshold to the minimum workable amount (2, looks
> like), but I'm not sure how much that will help.  What did you have
> in mind?

I think we should do that. Why do you say that you're not sure how
much it will help?

It may have escaped your attention, so I'll point out that amcheck
regression tests are my solution to testing external sorting, an area
which is similarly sorely lacking.

--
Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes:
> On Fri, Jun 24, 2016 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I agree that we need to do something, but I'm not very clear on what
>> the something is.  I considered inventing a debug #define that would
>> reduce the _h_spool threshold to the minimum workable amount (2, looks
>> like), but I'm not sure how much that will help.  What did you have
>> in mind?

> I think we should do that. Why do you say that you're not sure how
> much it will help?

Well, it won't run automatically --- unless someone spins up a buildfarm
machine that adds that symbol to CPPFLAGS, and even then we'd only have
coverage on one platform.

However, the next step up would be to invent an index storage parameter or
some such to force the decision, letting the standard regression tests
include a reasonably-cheap case that would exercise _h_spool.  But that
seems like rather a large amount of work compared to the likely benefits.

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Fri, Jun 24, 2016 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think we should do that. Why do you say that you're not sure how
>> much it will help?
>
> Well, it won't run automatically --- unless someone spins up a buildfarm
> machine that adds that symbol to CPPFLAGS, and even then we'd only have
> coverage on one platform.

I think that this is an argument against further proliferation of
#define's for this kind of thing, not an argument against adding a way
to force tuplesort based hash index builds.


--
Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes:
> On Fri, Jun 24, 2016 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, it won't run automatically --- unless someone spins up a buildfarm
>> machine that adds that symbol to CPPFLAGS, and even then we'd only have
>> coverage on one platform.

> I think that this is an argument against further proliferation of
> #define's for this kind of thing, not an argument against adding a way
> to force tuplesort based hash index builds.

So ... what's your point?  This statement doesn't seem to lead to a
conclusion in favor of either of the alternatives I mentioned.

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Sat, Jun 25, 2016 at 8:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think that this is an argument against further proliferation of
>> #define's for this kind of thing, not an argument against adding a way
>> to force tuplesort based hash index builds.
>
> So ... what's your point?  This statement doesn't seem to lead to a
> conclusion in favor of either of the alternatives I mentioned.

I'm in favor of just adding a debug option. We should introduce a more
general idea of a #define that enables a variety of debugging options,
because it's silly to have to worry about buildfarm coverage each and
every time this crops up. I'm not entirely sure what level that should
be scoped at. The status quo is impractical.

Incidentally, did you see to it that there is buildfarm coverage for
RAW_EXPRESSION_COVERAGE_TEST?

--
Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes:
> On Sat, Jun 25, 2016 at 8:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So ... what's your point?  This statement doesn't seem to lead to a
>> conclusion in favor of either of the alternatives I mentioned.

> I'm in favor of just adding a debug option. We should introduce a more
> general idea of a #define that enables a variety of debugging options,
> because it's silly to have to worry about buildfarm coverage each and
> every time this crops up. I'm not entirely sure what level that should
> be scoped at. The status quo is impractical.

I don't exactly see how that leads to any improvement --- you still end
up having to manually configure some buildfarm critters for coverage, no?

> Incidentally, did you see to it that there is buildfarm coverage for
> RAW_EXPRESSION_COVERAGE_TEST?

Yes, see dromedary.

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Sat, Jun 25, 2016 at 12:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm in favor of just adding a debug option. We should introduce a more
>> general idea of a #define that enables a variety of debugging options,
>> because it's silly to have to worry about buildfarm coverage each and
>> every time this crops up. I'm not entirely sure what level that should
>> be scoped at. The status quo is impractical.
>
> I don't exactly see how that leads to any improvement --- you still end
> up having to manually configure some buildfarm critters for coverage, no?

Yes, but you only have to do it once. I don't see the problem with
creating a #define that represents a grab-bag of debug options that
should run on one or two buildfarm animals. That doesn't preclude also
allowing things like RAW_EXPRESSION_COVERAGE_TEST to be enabled more
selectively.

In short, I want to make it easier to add custom debugging options
that could trip a regression test failure.

--
Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes:
> On Sat, Jun 25, 2016 at 12:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't exactly see how that leads to any improvement --- you still end
>> up having to manually configure some buildfarm critters for coverage, no?

> Yes, but you only have to do it once. I don't see the problem with
> creating a #define that represents a grab-bag of debug options that
> should run on one or two buildfarm animals. That doesn't preclude also
> allowing things like RAW_EXPRESSION_COVERAGE_TEST to be enabled more
> selectively.

Meh.  I do not really think that dromedary's test options
(-DCOPY_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST) represent a model
to follow for this problem.  In both of those cases, the value of having
the option turned on is that it will exercise code that hasn't even been
written yet.  So those options have to be enabled across the entire test
suite in order to be of much use.  Furthermore, there's no particular
reason to think that they'd expose platform-dependent behavior, so there's
no real downside to having just one or two buildfarm critters using them.

But exercising a non-default code path in hash index build is something we
only need in one place in the tests, and at least in my judgment there is
a possibility for platform-specific behavior.  So I think some way of
turning it on dynamically, and then adding that as a standard test case,
would be a considerable improvement.  I just don't quite want to go to the
effort of inventing a GUC or reloption.  Is there some intermediate
answer?

            regards, tom lane
I wrote:
> But exercising a non-default code path in hash index build is something we
> only need in one place in the tests, and at least in my judgment there is
> a possibility for platform-specific behavior.  So I think some way of
> turning it on dynamically, and then adding that as a standard test case,
> would be a considerable improvement.  I just don't quite want to go to the
> effort of inventing a GUC or reloption.  Is there some intermediate
> answer?

A small-footprint answer that just occurred to me is to redefine the
threshold as being, not NBuffers, but maintenance_work_mem.  Then a
reasonably-sized test case could be made by reducing that to its minimum
allowed value.  This seems like it'd be fairly unsurprising for users
since there's lots of precedent for maintenance_work_mem affecting the
behavior of CREATE INDEX.

If maintenance_work_mem exceeds shared_buffers then you'd get a certain
amount of thrashing between shared buffers and kernel disk cache, but it
wouldn't be really awful unless maintenance_work_mem exceeded available
RAM, which would be a certifiably Bad Idea anyway.  I guess we could
forestall the buffer-thrashing scenario and preserve testability by
setting the sort threshold to min(NBuffers, maintenance_work_mem).

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Mon, Jun 27, 2016 at 7:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Meh.  I do not really think that dromedary's test options
> (-DCOPY_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST) represent a model
> to follow for this problem.  In both of those cases, the value of having
> the option turned on is that it will exercise code that hasn't even been
> written yet.  So those options have to be enabled across the entire test
> suite in order to be of much use.  Furthermore, there's no particular
> reason to think that they'd expose platform-dependent behavior, so there's
> no real downside to having just one or two buildfarm critters using them.

You think that the hash index tuplesort code should be possible to
exercise in the regression tests dynamically, without any new #define
whatsoever. Fair enough, but I still think that enabling debug #define
options should be possible at a coarser granularity, even if that ends
up covering a set of cases that are not very crisply defined. Maybe
that's a discussion for the next time something like
RAW_EXPRESSION_COVERAGE_TEST is proposed, though.

--
Peter Geoghegan

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Mon, Jun 27, 2016 at 8:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A small-footprint answer that just occurred to me is to redefine the
> threshold as being, not NBuffers, but maintenance_work_mem.  Then a
> reasonably-sized test case could be made by reducing that to its minimum
> allowed value.  This seems like it'd be fairly unsurprising for users
> since there's lots of precedent for maintenance_work_mem affecting the
> behavior of CREATE INDEX.

I like this idea. Should I write a patch?

--
Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes:
> On Mon, Jun 27, 2016 at 8:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A small-footprint answer that just occurred to me is to redefine the
>> threshold as being, not NBuffers, but maintenance_work_mem.  Then a
>> reasonably-sized test case could be made by reducing that to its minimum
>> allowed value.  This seems like it'd be fairly unsurprising for users
>> since there's lots of precedent for maintenance_work_mem affecting the
>> behavior of CREATE INDEX.

> I like this idea. Should I write a patch?

Please.

            regards, tom lane
Peter Geoghegan <pg@heroku.com> writes:
> I like this idea. Should I write a patch?

BTW, while you're at it: it strikes me that the threshold should be
either min(NBuffers, maintenance_work_mem) or
min(NLocBuffer, maintenance_work_mem), depending on whether we're
talking about a regular or temp table/index.  That is, there's a
pre-existing bug here that when NLocBuffer is a good deal less than
NBuffers (which is the typical case nowadays) you'll get a lot of
thrashing between local buffers and kernel cache, if the index isn't
quite big enough to trigger the sorting code.  This might not manifest
as actual I/O, but it's not the intended behavior either.

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Mon, Jun 27, 2016 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> I like this idea. Should I write a patch?
>
> BTW, while you're at it: it strikes me that the threshold should be
> either min(NBuffers, maintenance_work_mem) or
> min(NLocBuffer, maintenance_work_mem), depending on whether we're
> talking about a regular or temp table/index.  That is, there's a
> pre-existing bug here that when NLocBuffer is a good deal less than
> NBuffers (which is the typical case nowadays) you'll get a lot of
> thrashing between local buffers and kernel cache, if the index isn't
> quite big enough to trigger the sorting code.  This might not manifest
> as actual I/O, but it's not the intended behavior either.

Understood. It's on my TODO list for the week, although I'll
prioritize isolating and fixing that UPSERT "attempted to delete
invisible tuple" bug.


--
Peter Geoghegan

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Mon, Jun 27, 2016 at 1:31 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Understood. It's on my TODO list for the week, although I'll
> prioritize isolating and fixing that UPSERT "attempted to delete
> invisible tuple" bug.

I would like to start work on this immediately, but feel it's
premature until my other tuplesort testing patch gets some review [1].

[1] https://www.postgresql.org/message-id/CAM3SWZSgxehDkDMq1FdiW2A0Dxc79wH0hz1x-TnGy=1BXEL+nw@mail.gmail.com
--
Peter Geoghegan

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Mon, Jun 27, 2016 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> I like this idea. Should I write a patch?
>
> BTW, while you're at it: it strikes me that the threshold should be
> either min(NBuffers, maintenance_work_mem) or
> min(NLocBuffer, maintenance_work_mem), depending on whether we're
> talking about a regular or temp table/index.  That is, there's a
> pre-existing bug here that when NLocBuffer is a good deal less than
> NBuffers (which is the typical case nowadays) you'll get a lot of
> thrashing between local buffers and kernel cache, if the index isn't
> quite big enough to trigger the sorting code.  This might not manifest
> as actual I/O, but it's not the intended behavior either.

Finally found time for this. Attached patch tests hash tuplesorts
along the lines we discussed. It adds one new tuplesort operation,
which does not spill to disk. It also asserts that hash values
retrieved through the tuplesort interface are in fact in sorted order.
I wanted to have something reliably trip when comparetup_index_hash()
gives wrong answers, even when a corrupt final index is perhaps not a
consequence of the underlying bug.

Due to how the hash build code works, maintenance_work_mem will need
to be 2 blocks smaller than the final index size in order to force a
sort (see code comments). I think that this does not matter, since
none of this is documented currently.

Having reviewed the coverage report for tuplesort.c [1], I think that
the only notable testing omission once this latest patch is committed
will be that there is no coverage for "backwards get tuple" cases
(this is more noticeable when looking at coverage statistics for
logtape.c, actually). This seems less important to me, and would be
awkward to test in any case, given my constraints.

[1] http://coverage.postgresql.org/src/backend/utils/sort/tuplesort.c.gcov.html
--
Peter Geoghegan

Вложения
Peter Geoghegan <pg@heroku.com> writes:
> Finally found time for this. Attached patch tests hash tuplesorts
> along the lines we discussed. It adds one new tuplesort operation,
> which does not spill to disk. It also asserts that hash values
> retrieved through the tuplesort interface are in fact in sorted order.
> I wanted to have something reliably trip when comparetup_index_hash()
> gives wrong answers, even when a corrupt final index is perhaps not a
> consequence of the underlying bug.

Pushed with some adjustments, mostly paranoia concerning integer overflows
in the calculation around maintenance_work_mem.  I did not like the way
you'd set up the test case though: tenk1 is *very* heavily used in the
regression tests, and having an extra index on it that might capture query
plans for unrelated test cases doesn't seem like a good idea.  Especially
not when you drop the index in the middle of a large group of parallel
tests, so that any capturing that did happen would be timing-sensitive.
So I just made the create_index test create and immediately drop the
index.  We have other tests that are supposed to exercise searches of
hash indexes, anyway.

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Sat, Jul 16, 2016 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I just made the create_index test create and immediately drop the
> index.  We have other tests that are supposed to exercise searches of
> hash indexes, anyway.

I suppose that the assertion made the new hash index searches redundant.

Thanks

--
Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes:
> On Sat, Jul 16, 2016 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I just made the create_index test create and immediately drop the
>> index.  We have other tests that are supposed to exercise searches of
>> hash indexes, anyway.

> I suppose that the assertion made the new hash index searches redundant.

[ thinks about it a bit more... ]  Actually, maybe it doesn't.  The thing
that made 9f03ca915196dfc8 really nasty is that it built an index that was
actually corrupt, with incorrect hash codes stored for entries.  I think
your assertion would probably have caught that, but only accidentally,
because the problem had little to do with whether the sort satisfied its
charter.  Maybe we should stick at least one simple test query in there
before we drop the index again.

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Sat, Jul 16, 2016 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I suppose that the assertion made the new hash index searches redundant.
>
> [ thinks about it a bit more... ]  Actually, maybe it doesn't.  The thing
> that made 9f03ca915196dfc8 really nasty is that it built an index that was
> actually corrupt, with incorrect hash codes stored for entries.  I think
> your assertion would probably have caught that, but only accidentally,
> because the problem had little to do with whether the sort satisfied its
> charter.  Maybe we should stick at least one simple test query in there
> before we drop the index again.

That's what I was thinking, but wasn't sure offhand which way it would
happen. Couldn't hurt much to just add one more test.

It is pretty awkward how the hash index searches are in
hash_index.sql, which must coordinate with create_index.sql, but I
didn't want to go against the grain and just put it all in one place.
Maybe I should have.

--
Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes:
> On Sat, Jul 16, 2016 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... Maybe we should stick at least one simple test query in there
>> before we drop the index again.

> That's what I was thinking, but wasn't sure offhand which way it would
> happen. Couldn't hurt much to just add one more test.

Right.  Done.

> It is pretty awkward how the hash index searches are in
> hash_index.sql, which must coordinate with create_index.sql, but I
> didn't want to go against the grain and just put it all in one place.
> Maybe I should have.

Well, that structure is intended to work with indexes that are built and
then left in place throughout the tests.  For a short-lived index, the
best thing is to create it, test it, drop it, and really you can do that
about anywhere.  But again, doing that on tenk1 is problematic because
of its popularity in all sorts of query-planning tests; if you're going
to add a short-lived index you need to do that in a test that's not meant
to run concurrently with a lot of other stuff.

Another approach we could have taken is to construct a table just for
this purpose.  But I think the way it is now is fine.

In the longer term, if hash indexes ever become less of a second-class
citizen, maybe we'd actually want to include a hash index on tenk1
throughout the tests.  But I don't want to go there today.

            regards, tom lane

Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column

От
Peter Geoghegan
Дата:
On Sat, Jul 16, 2016 at 1:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It is pretty awkward how the hash index searches are in
>> hash_index.sql, which must coordinate with create_index.sql, but I
>> didn't want to go against the grain and just put it all in one place.
>> Maybe I should have.
>
> Well, that structure is intended to work with indexes that are built and
> then left in place throughout the tests.  For a short-lived index, the
> best thing is to create it, test it, drop it, and really you can do that
> about anywhere.  But again, doing that on tenk1 is problematic because
> of its popularity in all sorts of query-planning tests; if you're going
> to add a short-lived index you need to do that in a test that's not meant
> to run concurrently with a lot of other stuff.

I did think that the "fillfactor = 10" bloat would have prevented
problems with plan stability, but I accept that that might not be good
enough.

> Another approach we could have taken is to construct a table just for
> this purpose.

I actually started with that.

Anyway, test coverage of tuplesort.c seems in much better shape now.
The strategy of not always moving the tests towards the code, but
rather moving the code towards the tests (improving code testability)
seems to have been more effective than I'd anticipated.

My unpublished parallel CREATE INDEX patch forces a single worker,
single leader (deterministic division of labor) "serial parallel" sort
for every CREATE INDEX statement when force_parallel_mode is set to
'regress'. That's a nice thing to have for testing, because the
determinism allows one to line things up just right, and reliably
tickle certain classes of bug (this made a certain amount of TDD
possible). This configuration is only useful for testing, and is not
htat different to forcing an external sort, the notable difference
being that the worker process is still reliably subject to the general
limitations of parallel sort worker processes.

--
Peter Geoghegan