Обсуждение: index on a box
I've got a site with a ton of geometric data and I'm using a
little of postgresql's geometrical types. I've got very large polygons,
up to 12kilopoints or so, in individual rows with floats for my x and y
values. I'm calculating a box that contains all of my points and am using
the @ operator to find my polygons by a point.
I was wondering, however, if there's a way I can use an index to
avoid table scanning for this. The relevant parts of my sample table look
like this:
create table tmp (
id integer,
name text,
b box
)
and I added the following index:
create index tmp_bybox on tmp using rtree(b);
I've got 33,507 rows currently (still loading data).
Here are the problems I'm having:
explain select name from tmp where point(-121,37) @ b order by area(b);
Sort (cost=2428.02..2428.02 rows=16754 width=44)
-> Seq Scan on tmp (cost=0.00..969.84 rows=16754 width=44)
Any ideas that might help me speed things up?
--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________
Dustin Sallings <dustin@spy.net> writes:
> I was wondering, however, if there's a way I can use an index to
> avoid table scanning for this.
If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you
get an indexscan plan?
I don't recommend doing such a SET for production purposes, but if this
works then the problem is just inaccurate selectivity/cost estimation.
I see that the on_pb operator has no selectivity estimator defined at
all :-( ... as a quick hack, try setting its oprrest and oprjoin to be
areasel and areajoinsel.
regards, tom lane
Around 11:08 on Jun 21, 2001, Tom Lane said:
# If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you
# get an indexscan plan?
Seq Scan on tmp (cost=100000000.00..100002500.84 rows=30434 width=92)
# I don't recommend doing such a SET for production purposes, but if
# this works then the problem is just inaccurate selectivity/cost
# estimation. I see that the on_pb operator has no selectivity estimator
# defined at all :-( ... as a quick hack, try setting its oprrest and
# oprjoin to be areasel and areajoinsel.
I'm not sure what you just said. :)
--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________
Dustin Sallings <dustin+postgres@spy.net> writes:
> Around 11:08 on Jun 21, 2001, Tom Lane said:
> # If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you
> # get an indexscan plan?
> Seq Scan on tmp (cost=100000000.00..100002500.84 rows=30434 width=92)
Drat.
> I'm not sure what you just said. :)
Never mind, it wouldn't work anyway.
The problem is that "point @ box" isn't an rtree-indexable operation.
You could use "box @ box" instead, where the lefthand box is a zero-area
box with all corners at the point of interest. Crufty, but unless you
want to go in and teach rtree about a new operator type...
regards, tom lane
Around 17:21 on Jun 21, 2001, Tom Lane said:
# Never mind, it wouldn't work anyway.
#
# The problem is that "point @ box" isn't an rtree-indexable operation.
# You could use "box @ box" instead, where the lefthand box is a
# zero-area box with all corners at the point of interest. Crufty, but
# unless you want to go in and teach rtree about a new operator type...
select * from tmp where box(point(-121,37),point(-121,37)) @ b;
Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92)
Hmm... That's interesting. It seems that point @ box would be
more generally useful than box @ box. Then again, I've only used this for
this one particular task I'm doing right now. :) It seems that rtree
already knows how to do what I'm trying to do if all I've got to do is
make a box containing the point twice to get the lookup to be fast.
If anyone's interested in what I'm doing with this, you can see it
in action here:
http://bleu.west.spy.net/~dustin/geo/pointinfoform.jsp
I've loaded about 60k polygons (consisting of a total of about
seven million points) describing the shape of various geographical areas
in the United States. I've got a table with the descriptions of the
polygons and box boundaries, then another table with the actual ordered
polygon data. I do a box match on the first table to get a list of
candidates, then examine them all in more detail with a point-in-polygon
algorithm in my application. It's currently pretty slow because I'm don't
actually have a box column on my first table, just the boundaries, which I
cast to a box and use point @ box(point(),point()) with a table scan. It
looks like, when the index works, it'll be as fast as it was when I had
very little data in the tables again. :)
Thanks for the help!
--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________
Dustin Sallings <dustin+postgres@spy.net> writes:
> create function box(point) returns box as
> 'select box($1, $1)'
> language 'sql';
> misc=# explain select * from tmp where box(point(-121, 37)) @ b;
> NOTICE: QUERY PLAN:
> Seq Scan on tmp (cost=0.00..2653.01 rows=61 width=92)
You'd better declare the function as 'iscachable'. As is, the planner
doesn't trust it to return a constant.
regards, tom lane
Around 15:14 on Jun 21, 2001, Dustin Sallings said:
# select * from tmp where box(point(-121,37),point(-121,37)) @ b;
#
# Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92)
Ugh, any idea here?
create function box(point) returns box as
'select box($1, $1)'
language 'sql';
misc=# explain select * from tmp where box(point(-121, 37)) @ b;
NOTICE: QUERY PLAN:
Seq Scan on tmp (cost=0.00..2653.01 rows=61 width=92)
misc=# explain select * from tmp
misc-# where box(point(-121,37),point(-121,37)) @ b;
NOTICE: QUERY PLAN:
Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92)
misc=# select box(point(-121, 37));
box
---------------------
(-121,37),(-121,37)
misc=# select box(point(-121,37),point(-121,37));
box
---------------------
(-121,37),(-121,37)
misc=# select 1 where box(point(-121,37),point(-121,37))=box(point(-121,
37));
?column?
----------
1
--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________
Around 18:26 on Jun 21, 2001, Tom Lane said:
# You'd better declare the function as 'iscachable'. As is, the planner
# doesn't trust it to return a constant.
Got it! Tried a few variations on the create until I got this:
create function box(point) returns box as
'select box($1, $1)'
language 'sql'
with (iscachable);
misc=# explain select * from tmp where box(point(-121, 37)) @ b;
NOTICE: QUERY PLAN:
Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92)
Thanks! :)
(oh, and would it be reasonable to list at least some of the
possible attributes in psql?)
--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________
Tom Lane, Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all!
On Tue, Jun 26, 2001 at 10:35:00PM +0800, some SMTP stream spewed forth: > Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all! > You probably want pg_shadow, accessible by a database superuser. gh
On Tue, 26 Jun 2001, [ISO-8859-1] ������ wrote: > Tom Lane�� > Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all! > pg_user (rather obvious i guess) t. > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >
Tom Lane, Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all!
select * from pg_users; >From: ������ <lilixin@cqu.edu.cn> >Reply-To: lilixin@cqu.edu.cn >To: Tom Lane <tgl@sss.pgh.pa.us>, Dustin Sallings <dustin+postgres@spy.net> >CC: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org> >Subject: Re: Re: [GENERAL] index on a box >Date: Sun, 08 Jul 2001 21:02:32 +0800 > >Tom Lane�� > Does anyone who knows a user's information is storeed in what pgSQL's >system table? Thank all! > > > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/users-lounge/docs/faq.html _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.