Обсуждение: [PERFORM] index of only not null, use function index?

Поиск
Список
Период
Сортировка

[PERFORM] index of only not null, use function index?

От
Ariel
Дата:
I need to be able to quickly find rows where a column is not null (only a
small percent of the rows will have that column not null).

Should I do:

CREATE INDEX ON table ((col IS NOT NULL)) WHERE col IS NOT NULL

or:

CREATE INDEX ON table (col) WHERE col IS NOT NULL

I'm thinking the first index will make a smaller, simpler, index since I
don't actually need to index the value of the column. But are there any
drawbacks I may not be aware of? Or perhaps there are no actual benefits?

      -Ariel


Re: [PERFORM] index of only not null, use function index?

От
Merlin Moncure
Дата:
On Mon, May 22, 2017 at 10:17 AM, Ariel <aspostgresql@dsgml.com> wrote:
>
> I need to be able to quickly find rows where a column is not null (only a
> small percent of the rows will have that column not null).
>
> Should I do:
>
> CREATE INDEX ON table ((col IS NOT NULL)) WHERE col IS NOT NULL
>
> or:
>
> CREATE INDEX ON table (col) WHERE col IS NOT NULL
>
> I'm thinking the first index will make a smaller, simpler, index since I
> don't actually need to index the value of the column. But are there any
> drawbacks I may not be aware of? Or perhaps there are no actual benefits?


You are correct.  I don't see any downside to converting to bool; this
will be more efficient especially if 'col' is large at the small cost
of some generality.  Having said that, what I typically do in such
cases (this comes a lot in database driven work queues) something like
this:

CREATE INDEX ON table (OrderCol) WHERE col IS NOT NULL;

Where "OrderCol" is some field that defines some kind of order to the
items that you are marking off.  This will give very good performance
of queries in the form of:

SELECT Col FROM table WHERE col IS NOT NULL ORDER BY OrderCol LIMIT 1;

merlin


Re: [PERFORM] index of only not null, use function index?

От
Tom Lane
Дата:
Merlin Moncure <mmoncure@gmail.com> writes:
> On Mon, May 22, 2017 at 10:17 AM, Ariel <aspostgresql@dsgml.com> wrote:
>> Should I do:
>>
>> CREATE INDEX ON table ((col IS NOT NULL)) WHERE col IS NOT NULL
>>
>> or:
>>
>> CREATE INDEX ON table (col) WHERE col IS NOT NULL
>>
>> I'm thinking the first index will make a smaller, simpler, index since I
>> don't actually need to index the value of the column. But are there any
>> drawbacks I may not be aware of? Or perhaps there are no actual benefits?

> You are correct.  I don't see any downside to converting to bool; this
> will be more efficient especially if 'col' is large at the small cost
> of some generality.

Depends on the datatype really.  Because of alignment considerations,
the index tuples will be the same size for any column value <= 4 bytes,
or <= 8 bytes on 64-bit hardware.  So if this is an integer column,
or even bigint on 64-bit, you won't save any space with the first
index definition.  If it's a text column with an average width larger
than what I just mentioned, you could save some space that way.

In general, indexes on expressions are a tad more expensive to maintain
than indexes on plain column values.  And the second index at least has
the potential to be useful for other queries than the one you're thinking
about.  So personally I'd go with the second definition unless you can
show that there's a really meaningful space savings with the first one.

> Having said that, what I typically do in such
> cases (this comes a lot in database driven work queues) something like
> this:
> CREATE INDEX ON table (OrderCol) WHERE col IS NOT NULL;

Right, you can frequently get a lot of mileage out of indexing something
that's unrelated to the predicate condition, but is also needed by the
query you want to optimize.

            regards, tom lane


Re: [PERFORM] index of only not null, use function index?

От
Jeremy Finzel
Дата:
Normally, I find that in these situations, it makes sense to index the primary key of the table WHERE col is not null, because it will usually cover the largest number of cases, and is much better than a two-value boolean index, for example.

On Thu, Jun 8, 2017 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Merlin Moncure <mmoncure@gmail.com> writes:
> On Mon, May 22, 2017 at 10:17 AM, Ariel <aspostgresql@dsgml.com> wrote:
>> Should I do:
>>
>> CREATE INDEX ON table ((col IS NOT NULL)) WHERE col IS NOT NULL
>>
>> or:
>>
>> CREATE INDEX ON table (col) WHERE col IS NOT NULL
>>
>> I'm thinking the first index will make a smaller, simpler, index since I
>> don't actually need to index the value of the column. But are there any
>> drawbacks I may not be aware of? Or perhaps there are no actual benefits?

> You are correct.  I don't see any downside to converting to bool; this
> will be more efficient especially if 'col' is large at the small cost
> of some generality.

Depends on the datatype really.  Because of alignment considerations,
the index tuples will be the same size for any column value <= 4 bytes,
or <= 8 bytes on 64-bit hardware.  So if this is an integer column,
or even bigint on 64-bit, you won't save any space with the first
index definition.  If it's a text column with an average width larger
than what I just mentioned, you could save some space that way.

In general, indexes on expressions are a tad more expensive to maintain
than indexes on plain column values.  And the second index at least has
the potential to be useful for other queries than the one you're thinking
about.  So personally I'd go with the second definition unless you can
show that there's a really meaningful space savings with the first one.

> Having said that, what I typically do in such
> cases (this comes a lot in database driven work queues) something like
> this:
> CREATE INDEX ON table (OrderCol) WHERE col IS NOT NULL;

Right, you can frequently get a lot of mileage out of indexing something
that's unrelated to the predicate condition, but is also needed by the
query you want to optimize.

                        regards, tom lane


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index of only not null, use function index?

От
Merlin Moncure
Дата:
On Thu, Jun 8, 2017 at 11:05 AM, Jeremy Finzel <finzelj@gmail.com> wrote:
> On Thu, Jun 8, 2017 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Merlin Moncure <mmoncure@gmail.com> writes:
>> > Having said that, what I typically do in such
>> > cases (this comes a lot in database driven work queues) something like
>> > this:
>> > CREATE INDEX ON table (OrderCol) WHERE col IS NOT NULL;
>>
>> Right, you can frequently get a lot of mileage out of indexing something
>> that's unrelated to the predicate condition, but is also needed by the
>> query you want to optimize.

> Normally, I find that in these situations, it makes sense to index the
> primary key of the table WHERE col is not null, because it will usually
> cover the largest number of cases, and is much better than a two-value
> boolean index, for example.

[meta note: please try to avoid top-posting]

Yeah, if you index the primary key and query it like this:

CREATE INDEX ON table (pkey) WHERE col IS NOT NULL;

SELECT pkey FROM table WHERE col IS NOT NULL
ORDER BY pkey LIMIT n;

This can give the best possible results since this can qualify for an
index only scan :-).

merlin