Обсуждение: Indexing on a circle datatype

Поиск
Список
Период
Сортировка

Indexing on a circle datatype

От
Gavin Love
Дата:
Hey,

I seem to be unable to get postgres to use a gist index we have on a
circle data type.

Table "public.tradesmen_profiles"
       Column         |            Type             |       Modifiers
-----------------------+-----------------------------+-----------------------

id                          | integer                     | not null
work_area             | circle                       |
Indexes:
   "tradesmen_profiles_pkey" PRIMARY KEY, btree (id)
    "tradesmen_profiles_test" gist (work_area)

We are then trying to do the following query

SELECT  id FROM  tradesmen_profiles WHERE tradesmen_profiles.work_area
@> point(0.0548691728419,51.5404384172);

Which produces the following:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------


Seq Scan on tradesmen_profiles  (cost=0.00..3403.55 rows=14942 width=4)
(actual time=0.042..31.427 rows=5898 loops=1)
  Filter: (work_area @> '(0.0548691728419,51.5404384172)'::point)
Total runtime: 39.556 ms

I have also vacuum'd  and reindexed the table after building the index

VACUUM ANALYZE VERBOSE tradesmen_profiles;
REINDEX TABLE tradesmen_profiles;

So am I just trying to do something that is not possible or have I just
made a mistake with what I am trying to do?
This is not a big problem just now but as our data set grows I am
worried that having to do a sequence scan on this table every time will
be a serious performance overhead.

Thanks for your help,

Gavin

Re: Indexing on a circle datatype

От
Matthew Wakeling
Дата:
On Mon, 24 Aug 2009, Gavin Love wrote:
> I seem to be unable to get postgres to use a gist index we have on a circle
> data type.

> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------- 
> Seq Scan on tradesmen_profiles  (cost=0.00..3403.55 rows=14942 width=4)
> (actual time=0.042..31.427 rows=5898 loops=1)
> Filter: (work_area @> '(0.0548691728419,51.5404384172)'::point)
> Total runtime: 39.556 ms

If a sequential scan takes 39 ms, and returns 5898 rows, I'd say it's much
quicker than an index scan could ever be. Postgres assumes that a
sequential scan can access disc at a reasonable rate, but an index scan
involves lots of seeking, which can be a lot slower. You would be looking
at 6000 seeks here if the data wasn't in the cache, which could take tens
of seconds.

> This is not a big problem just now but as our data set grows I am worried
> that having to do a sequence scan on this table every time will be a serious
> performance overhead.

Try with a lot more data, like a thousand times as much. You will probably
find that Postgres will automatically switch over to an index scan when it
becomes beneficial.

Alternatively, if you really want to force its hand (just for testing
purposes), then try running:

SET enable_seqscan TO off;

and see what happens.

Matthew

--
 When I first started working with sendmail, I was convinced that the cf
 file had been created by someone bashing their head on the keyboard. After
 a week, I realised this was, indeed, almost certainly the case.
        -- Unknown

Re: Indexing on a circle datatype

От
Tom Lane
Дата:
Gavin Love <gavin@splicer.org.uk> writes:
> I seem to be unable to get postgres to use a gist index we have on a
> circle data type.
> SELECT  id FROM  tradesmen_profiles WHERE tradesmen_profiles.work_area
> @> point(0.0548691728419,51.5404384172);

So far as I can see, the member operators of gist circle_ops are

 gist         | circle_ops         | <<(circle,circle)
 gist         | circle_ops         | &<(circle,circle)
 gist         | circle_ops         | &>(circle,circle)
 gist         | circle_ops         | >>(circle,circle)
 gist         | circle_ops         | <@(circle,circle)
 gist         | circle_ops         | @>(circle,circle)
 gist         | circle_ops         | ~=(circle,circle)
 gist         | circle_ops         | &&(circle,circle)
 gist         | circle_ops         | |>>(circle,circle)
 gist         | circle_ops         | <<|(circle,circle)
 gist         | circle_ops         | &<|(circle,circle)
 gist         | circle_ops         | |&>(circle,circle)
 gist         | circle_ops         | @(circle,circle)
 gist         | circle_ops         | ~(circle,circle)

(this is extracted from the output of the query shown in 8.4 docs
section 11.9).  So, circle @> point is out of luck.  Try using a
zero- or small-radius circle on the right.

            regards, tom lane

Re: Indexing on a circle datatype

От
Gavin Love
Дата:
Tom Lane wrote:
> Gavin Love <gavin@splicer.org.uk> writes:
>> I seem to be unable to get postgres to use a gist index we have on a
>> circle data type.
>> SELECT  id FROM  tradesmen_profiles WHERE tradesmen_profiles.work_area
>> @> point(0.0548691728419,51.5404384172);
>
> So far as I can see, the member operators of gist circle_ops are
>
>  gist         | circle_ops         | <<(circle,circle)
>  gist         | circle_ops         | &<(circle,circle)
>  gist         | circle_ops         | &>(circle,circle)
>  gist         | circle_ops         | >>(circle,circle)
>  gist         | circle_ops         | <@(circle,circle)
>  gist         | circle_ops         | @>(circle,circle)
>  gist         | circle_ops         | ~=(circle,circle)
>  gist         | circle_ops         | &&(circle,circle)
>  gist         | circle_ops         | |>>(circle,circle)
>  gist         | circle_ops         | <<|(circle,circle)
>  gist         | circle_ops         | &<|(circle,circle)
>  gist         | circle_ops         | |&>(circle,circle)
>  gist         | circle_ops         | @(circle,circle)
>  gist         | circle_ops         | ~(circle,circle)
>
> (this is extracted from the output of the query shown in 8.4 docs
> section 11.9).  So, circle @> point is out of luck.  Try using a
> zero- or small-radius circle on the right.
>

I thought that might be the case but was unsure from the documentation I
could find. With a small circle it does indeed use the index.

Thanks for your help.

EXPLAIN ANALYZE
SELECT  tradesmen_profiles.id FROM  tradesmen_profiles WHERE
tradesmen_profiles.work_area  @> circle
'((0.0548691728419,51.5404384172),0)';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on tradesmen_profiles  (cost=4.50..115.92 rows=30
width=4) (actual time=2.339..18.495 rows=5898 loops=1)
    Filter: (work_area @> '<(0.0548691728419,51.5404384172),0>'::circle)
    ->  Bitmap Index Scan on tradesmen_profiles_test  (cost=0.00..4.49
rows=30 width=0) (actual time=1.927..1.927 rows=6404 loops=1)
          Index Cond: (work_area @>
'<(0.0548691728419,51.5404384172),0>'::circle)
  Total runtime: 26.554 ms
(5 rows)