Обсуждение: BUG #9833: daterange is not utilizing index correctly

Поиск
Список
Период
Сортировка

BUG #9833: daterange is not utilizing index correctly

От
shaharhd@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      9833
Logged by:          Shahar Hadas
Email address:      shaharhd@gmail.com
PostgreSQL version: 9.3.3
Operating system:   Mac OSX - Postgres.app
Description:

Simple table which has a user_birthday field with a type of date (can be
NULL value)

there's an index (btree) defined on that field, with the rule of NOT
user_birthday IS NULL.

compare the following queries:

explain analyze SELECT *
FROM users
WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01)')

explain analyze SELECT *
FROM users
WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date

at first glance both should have the same execution plan, but for some
reason, here are the results:

"Seq Scan on users  (cost=0.00..52314.25 rows=11101 width=241) (actual
time=0.014..478.983 rows=208886 loops=1)"
"  Filter: (user_birthday <@ '[1978-07-15,1983-03-01)'::daterange)"
"  Rows Removed by Filter: 901214"
"Total runtime: 489.584 ms"

"Bitmap Heap Scan on users  (cost=4468.01..46060.53 rows=210301 width=241)
(actual time=57.104..489.785 rows=209019 loops=1)"
"  Recheck Cond: ((user_birthday >= '1978-07-15'::date) AND (user_birthday
<= '1983-03-01'::date))"
"  Rows Removed by Index Recheck: 611375"
"  ->  Bitmap Index Scan on ix_users_birthday  (cost=0.00..4415.44
rows=210301 width=0) (actual time=54.621..54.621 rows=209019 loops=1)"
"        Index Cond: ((user_birthday >= '1978-07-15'::date) AND
(user_birthday <= '1983-03-01'::date))"
"Total runtime: 500.983 ms"

as you can see, the <@ daterange is not utilizing the existing index, while
the between does.

(note that the actual use case for this rule is in a more complex query,
which doesn't result in the Recheck Cond and Bitmap Heap scan)

Is this a bug? or how the daterange was designed to function?

Re: BUG #9833: daterange is not utilizing index correctly

От
bricklen
Дата:
On Wed, Apr 2, 2014 at 2:10 PM, <shaharhd@gmail.com> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      9833
> Logged by:          Shahar Hadas
> Email address:      shaharhd@gmail.com
> PostgreSQL version: 9.3.3
> Operating system:   Mac OSX - Postgres.app
> Description:
>
> Simple table which has a user_birthday field with a type of date (can be
> NULL value)
>
> there's an index (btree) defined on that field, with the rule of NOT
> user_birthday IS NULL.
>
> compare the following queries:
>
> explain analyze SELECT *
> FROM users
> WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01)')
>
> explain analyze SELECT *
> FROM users
> WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date
>
> at first glance both should have the same execution plan, but for some
> reason, here are the results:
>
> "Seq Scan on users  (cost=0.00..52314.25 rows=11101 width=241) (actual
> time=0.014..478.983 rows=208886 loops=1)"
> "  Filter: (user_birthday <@ '[1978-07-15,1983-03-01)'::daterange)"
> "  Rows Removed by Filter: 901214"
> "Total runtime: 489.584 ms"
>
> "Bitmap Heap Scan on users  (cost=4468.01..46060.53 rows=210301 width=241)
> (actual time=57.104..489.785 rows=209019 loops=1)"
> "  Recheck Cond: ((user_birthday >= '1978-07-15'::date) AND (user_birthday
> <= '1983-03-01'::date))"
> "  Rows Removed by Index Recheck: 611375"
> "  ->  Bitmap Index Scan on ix_users_birthday  (cost=0.00..4415.44
> rows=210301 width=0) (actual time=54.621..54.621 rows=209019 loops=1)"
> "        Index Cond: ((user_birthday >= '1978-07-15'::date) AND
> (user_birthday <= '1983-03-01'::date))"
> "Total runtime: 500.983 ms"
>
> as you can see, the <@ daterange is not utilizing the existing index, while
> the between does.
>
> (note that the actual use case for this rule is in a more complex query,
> which doesn't result in the Recheck Cond and Bitmap Heap scan)
>
> Is this a bug? or how the daterange was designed to function?
>

Three things off the top of my head:
1). [1978-07-15,1983-03-01) means "include 1978-07-15 and everything up to
- but not including - 1983-03-01", whereas BETWEEN is inclusive.
2). I haven't tested, but wouldn't you need to create a GiST index for <@
to use the index?
3). Related to #2, have you tried installing the btree_gist extension to
allow the query planner to use the index?

Re: BUG #9833: daterange is not utilizing index correctly

От
Greg Stark
Дата:
On 3 Apr 2014 21:39, "bricklen" <bricklen@gmail.com> wrote:
>

> 1). [1978-07-15,1983-03-01) means "include 1978-07-15 and everything up
to - but not including - 1983-03-01", whereas BETWEEN is inclusive.

That means we can't simply rewrite the with as a BETWEEN clause. However
conceptually btree could be extended to handle @> operators like this.

This is a perfectly reasonable feature request but it's just not something
btree can handle currently. Btree operator classes can handle a few
specific operators <, <=, =, >=, and > and that's it I believe. Adding more
is non trivial work and range data types are fairly new.

> 2). I haven't tested, but wouldn't you need to create a GiST index for <@
to use the index?
> 3). Related to #2, have you tried installing the btree_gist extension to
allow the query planner to use the index?

Gist is more flexible about adding new operators. I don't know if that
means it has this one though. If it does you would have to build a new
index using the btree_gist operator class which is may not be worth it's
weight in your database.

Re: BUG #9833: daterange is not utilizing index correctly

От
Shahar
Дата:
I have a more elaborate documentation of the issue in :
http://stackoverflow.com/questions/22824314/postgresql-daterange-not-using-index-correctly

As per your questions:
1. Doesn't really make a difference - its just a one day range, different
dates will give the same result
2. Yes, tried to create a gist index on user_birthday, didn't have any
affect.
3. The btree_gist in loaded. It must be if you want to create a gist index
on a date field (simple types)

this time the email with Reply-All


On Thu, Apr 3, 2014 at 9:39 PM, bricklen <bricklen@gmail.com> wrote:

>
> On Wed, Apr 2, 2014 at 2:10 PM, <shaharhd@gmail.com> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:      9833
>> Logged by:          Shahar Hadas
>> Email address:      shaharhd@gmail.com
>> PostgreSQL version: 9.3.3
>> Operating system:   Mac OSX - Postgres.app
>> Description:
>>
>> Simple table which has a user_birthday field with a type of date (can be
>> NULL value)
>>
>> there's an index (btree) defined on that field, with the rule of NOT
>> user_birthday IS NULL.
>>
>> compare the following queries:
>>
>> explain analyze SELECT *
>> FROM users
>> WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01)')
>>
>> explain analyze SELECT *
>> FROM users
>> WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date
>>
>> at first glance both should have the same execution plan, but for some
>> reason, here are the results:
>>
>> "Seq Scan on users  (cost=0.00..52314.25 rows=11101 width=241) (actual
>> time=0.014..478.983 rows=208886 loops=1)"
>> "  Filter: (user_birthday <@ '[1978-07-15,1983-03-01)'::daterange)"
>> "  Rows Removed by Filter: 901214"
>> "Total runtime: 489.584 ms"
>>
>> "Bitmap Heap Scan on users  (cost=4468.01..46060.53 rows=210301 width=241)
>> (actual time=57.104..489.785 rows=209019 loops=1)"
>> "  Recheck Cond: ((user_birthday >= '1978-07-15'::date) AND (user_birthday
>> <= '1983-03-01'::date))"
>> "  Rows Removed by Index Recheck: 611375"
>> "  ->  Bitmap Index Scan on ix_users_birthday  (cost=0.00..4415.44
>> rows=210301 width=0) (actual time=54.621..54.621 rows=209019 loops=1)"
>> "        Index Cond: ((user_birthday >= '1978-07-15'::date) AND
>> (user_birthday <= '1983-03-01'::date))"
>> "Total runtime: 500.983 ms"
>>
>> as you can see, the <@ daterange is not utilizing the existing index,
>> while
>> the between does.
>>
>> (note that the actual use case for this rule is in a more complex query,
>> which doesn't result in the Recheck Cond and Bitmap Heap scan)
>>
>> Is this a bug? or how the daterange was designed to function?
>>
>
> Three things off the top of my head:
> 1). [1978-07-15,1983-03-01) means "include 1978-07-15 and everything up to
> - but not including - 1983-03-01", whereas BETWEEN is inclusive.
> 2). I haven't tested, but wouldn't you need to create a GiST index for <@
> to use the index?
> 3). Related to #2, have you tried installing the btree_gist extension to
> allow the query planner to use the index?
>

Re: BUG #9833: daterange is not utilizing index correctly

От
Shahar
Дата:
On Fri, Apr 4, 2014 at 7:36 AM, Greg Stark <stark@mit.edu> wrote:

>
> On 3 Apr 2014 21:39, "bricklen" <bricklen@gmail.com> wrote:
> >
>
> > 1). [1978-07-15,1983-03-01) means "include 1978-07-15 and everything up
> to - but not including - 1983-03-01", whereas BETWEEN is inclusive.
>
> That means we can't simply rewrite the with as a BETWEEN clause. However
> conceptually btree could be extended to handle @> operators like this.
>
You might have over looked something.
When using a '[<some date>, <some date>]' range, postgres automatically
change it to '[<some date>, <some date> + 1)'
The explain shows it:
EXPLAIN ANALYZE SELECT *
FROM users
WHERE user_birthday <@ daterange('[1978-07-15,1983-03-02]')

results in:
"Seq Scan on users  (cost=0.00..53248.25 rows=11101 width=241) (actual
time=0.023..546.617 rows=209154 loops=1)"
"  Filter: (user_birthday <@ '[1978-07-15,1983-03-03)'::daterange)"
"  Rows Removed by Filter: 900946"
"Total runtime: 558.557 ms"

>

> This is a perfectly reasonable feature request but it's just not something
> btree can handle currently. Btree operator classes can handle a few
> specific operators <, <=, =, >=, and > and that's it I believe. Adding more
> is non trivial work and range data types are fairly new.
>
My assumption was, that due to the fact that a range has an upper and lower
limit, it would be more efficient than between, as the planner can always
assume that the upper limit is always equal or above the lower limit.
Wouldn't that make the index search much faster, as by the time you found
the index for the lower limit, you're search options for the upper limit
are limited and then easier to plan on.

> > 2). I haven't tested, but wouldn't you need to create a GiST index for
> <@ to use the index?
>
How do you do that? create a GiST index for a data field with <@ operator?
that's the relevant table:

CREATE TABLE users
(
  user_id bigserial NOT NULL,
  user_email text NOT NULL,
  user_password text,
  user_first_name text NOT NULL,
  user_middle_name text,
  user_last_name text NOT NULL,
  user_birthday date,
  CONSTRAINT pk_users PRIMARY KEY (user_id)
)

The users table has 1.1 million records with full randomness on the dates
between the dates: 1974-04-01 and 1998-04-03.
all I was able to do was create this, which wouldn't affect this query...

CREATE INDEX ix_users_birthday_gist
  ON glances.users
  USING gist
  (user_birthday)
  WHERE NOT user_birthday IS NULL;

>  > 3). Related to #2, have you tried installing the btree_gist extension
> to allow the query planner to use the index?
>
> Gist is more flexible about adding new operators. I don't know if that
> means it has this one though. If it does you would have to build a new
> index using the btree_gist operator class which is may not be worth it's
> weight in your database.
>
btree_gist is enabled and loaded, otherwise I wouldn't be able to create
the GiST index on user_birthday.
but as I said - it didn't had any affect.

Shahar