Обсуждение: How to get RTREE performance from GIST index?

Поиск
Список
Период
Сортировка

How to get RTREE performance from GIST index?

От
Clive Page
Дата:
I have been using Postgres for some years, in particular the RTREE
indexes to perform spatial queries on astronomical datasets.  I
misguidedly got our system manager to install Postgres 8.4 and I find
that I can no longer use rtrees - the system gives me a message

substituting access method "gist" for obsolete method "rtree"

The performance has dropped by at least a factor of 100 (I am not sure
how much more, because the relevant bit of my SQL is still running after
more than an hour, previously it took a minute or so to do this bit of
the script).

The relevant bits of SQL I have been using are:

CREATE TEMPORARY TABLE cat4p AS
   SELECT longid, srcid, ra, dec, poserr,
    BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0),
        POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox
    FROM cat4;
CREATE INDEX cat4pind ON cat4p USING RTREE(errbox);

CREATE TEMPORARY TABLE apair AS
   SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr
   FROM avcatpos AS a, cat4p AS c
   WHERE a.errbox && c.errbox AND
     gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) <
       LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr))
     AND a.srcid <> c.srcid;

It is this latter query, involving the && operator to find where two
rectangular boxes overlap, which seems to be taking the huge amount of time.

Is there a way of forcing the use of Rtree indexing in v8.4, or any
other work-around?

Regards

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH,  U.K.

Re: How to get RTREE performance from GIST index?

От
Thom Brown
Дата:
2009/11/21 Clive Page <clive.page@cantab.net>
I have been using Postgres for some years, in particular the RTREE indexes to perform spatial queries on astronomical datasets.  I misguidedly got our system manager to install Postgres 8.4 and I find that I can no longer use rtrees - the system gives me a message

substituting access method "gist" for obsolete method "rtree"

The performance has dropped by at least a factor of 100 (I am not sure how much more, because the relevant bit of my SQL is still running after more than an hour, previously it took a minute or so to do this bit of the script).

The relevant bits of SQL I have been using are:

CREATE TEMPORARY TABLE cat4p AS
 SELECT longid, srcid, ra, dec, poserr,
  BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0),
      POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox
  FROM cat4;
CREATE INDEX cat4pind ON cat4p USING RTREE(errbox);

CREATE TEMPORARY TABLE apair AS
 SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr
 FROM avcatpos AS a, cat4p AS c
 WHERE a.errbox && c.errbox AND
   gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) <
     LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr))
   AND a.srcid <> c.srcid;

It is this latter query, involving the && operator to find where two rectangular boxes overlap, which seems to be taking the huge amount of time.

Is there a way of forcing the use of Rtree indexing in v8.4, or any other work-around?



Rtree was reimplemented into GiST as of PostgreSQL version 8.2. There should no advantages of using Rtree, so I'm not sure why you're experiencing problems.  Hopefully someone can provide insight into what's causing the slow down.

Thom

Re: How to get RTREE performance from GIST index?

От
Tom Lane
Дата:
Clive Page <clive.page@cantab.net> writes:
> Is there a way of forcing the use of Rtree indexing in v8.4,

No, because the rtree code is gone entirely.  We took it out on the
basis of tests showing that the gist implementation performed as well
or better.  I'm not sure why it's not working for you, but if you
can provide a more complete test case, we could look into it.

One thing to check into right away is whether the system is even
trying to use the index --- what does EXPLAIN show about it?
Do you by any chance have EXPLAIN output for the same query on the
old system?  What was the old PG version, anyway?

            regards, tom lane

Re: How to get RTREE performance from GIST index?

От
Clive Page
Дата:
On 22/11/2009 05:40, Tom Lane wrote:
> No, because the rtree code is gone entirely.  We took it out on the
> basis of tests showing that the gist implementation performed as well
> or better.  I'm not sure why it's not working for you, but if you
> can provide a more complete test case, we could look into it.
>
> One thing to check into right away is whether the system is even
> trying to use the index --- what does EXPLAIN show about it?
> Do you by any chance have EXPLAIN output for the same query on the
> old system?  What was the old PG version, anyway?

Tom

Thanks for your reply.  I should have said that I was using v8.1.  After
I posted my question, I retried with
   CREATE INDEX ... USING GIST(errbox box_ops)
and left it to run overnight.  The query using the index, which finds
overlaps between rectangular boxes using the && operator, took 10228
seconds, whereas using RTREES in v8.1 it took around 50 seconds.  I have
several such queries to do, and cannot afford to wait for hours.   I
discovered the "box_ops" syntax only by reading lots of disparate bits
of documentation: it is very unsatisfactory that your indexing options
are so very poorly documented.  I saw that as well as GIST indexing
there is something called GIN indexing but failed to find anything
useful about these at all.  I tried to use them, but without success.
There is no point in having these facilities if they are not documented
adequately.

I am truly sorry that you made the decision to remove R-trees from
Postgres and had no regard for backward compatibility.  The availability
and high performance of R-trees was one of the main reasons I switched
to Postgres and have been using it for the last few years.  I realise
that if I take the time to experiment and use the EXPLAIN command and
play around for a week or two I *might* be able to restore something
like the earlier performance, but unfortunately I have a job I want to
get done in the next day or two.

Fortunately I have a simple work-around: Postgres v8.1 is still
installed here, and I'll use it right away.  For the longer term, I may
have to switch to MySQL, which had R-trees but not implemented very
efficiently (the last time I checked).  No doubt the new owners of MySQL
will have tried hard to get them working properly.  I'm truly sorry that
you don't take the need for R-tree indexing seriously.  I would have
thought that geometric queries such as the ones that I've been doing
would be more and more important in the real world.


--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH,  U.K.

Re: How to get RTREE performance from GIST index?

От
Martijn van Oosterhout
Дата:
On Sun, Nov 22, 2009 at 10:23:29AM +0000, Clive Page wrote:
> Thanks for your reply.  I should have said that I was using v8.1.  After
> I posted my question, I retried with
>   CREATE INDEX ... USING GIST(errbox box_ops)
> and left it to run overnight.  The query using the index, which finds
> overlaps between rectangular boxes using the && operator, took 10228
> seconds, whereas using RTREES in v8.1 it took around 50 seconds.  I have
> several such queries to do, and cannot afford to wait for hours.

If it really is that much slower, then it's a bug and should be fixed.
However, you have not provided not nearly enough information to work
out what the problem is. Could you show the EXPLAIN output from 8.1 and
8.4 at least so we have some handle on what your problem is.

PostgreSQL is used extensively for geometric queries, see postgis. They
abandoned rtree a while back because the GiST rtree support was better,
see

http://postgis.refractions.net/documentation/manual-1.3/ch03.html#id2570697

You are AFAICR the first person to have a problem is this area, but if
you can't take the few minutes needed to run EXPLAIN on before and
after then there is zero chance of it being fixed either.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Вложения

Re: How to get RTREE performance from GIST index?

От
Clive Page
Дата:
On 22/11/2009 10:44, Martijn van Oosterhout wrote:
> PostgreSQL is used extensively for geometric queries, see postgis. They
> abandoned rtree a while back because the GiST rtree support was better,

Maybe the support is better, but the performance is obviously not.  And
when there is a difference between under a minute and 3 hours, then
performance matters.  At least it does to me and my colleagues.

> You are AFAICR the first person to have a problem is this area, but if

I find it extremely hard to believe that.  All I am doing is finding
whether pairs of rectangular boxes overlap or not.  That is the most
trivial use of R-trees possible.  Surely someone thought to time that
using GIST?

> you can't take the few minutes needed to run EXPLAIN on before and
> after then there is zero chance of it being fixed either.

Unfortunately it isn't a "few minutes".  To re-run in v8.1 I have to
reload many tables into a different installation using v8.1: some of the
tables have a few million rows and hundreds of columns.  Then I have
change some scripts to add an EXPLAIN command and log the resulting
output (rather than getting the results that I actually want).  This
will take hours.  I will try to do it soon, but cannot do it instantly.
  I have some data that I want to process first.

I agree that this is a bug in Postgres - the bug was removing code that
worked perfectly well and upon which some users depended.  I simply
don't understand why the Rtree code could not have been left in there,
for those who found that the new-fangled GIST indexing did not work.

Regards

--
Clive Page

Re: How to get RTREE performance from GIST index?

От
Thom Brown
Дата:
2009/11/22 Clive Page <cgp@star.le.ac.uk>

Unfortunately it isn't a "few minutes".  To re-run in v8.1 I have to reload many tables into a different installation using v8.1: some of the tables have a few million rows and hundreds of columns.  Then I have change some scripts to add an EXPLAIN command and log the resulting output (rather than getting the results that I actually want).  This will take hours.  I will try to do it soon, but cannot do it instantly.  I have some data that I want to process first.

I agree that this is a bug in Postgres - the bug was removing code that worked perfectly well and upon which some users depended.  I simply don't understand why the Rtree code could not have been left in there, for those who found that the new-fangled GIST indexing did not work.



Hi Clive,

Since this is a performance issue, this should probably have been sent to the pgsql-performance mailing list.  But in any case, the vast majority of performance issues require an EXPLAIN output, or preferably with ANALYZE also as there is nothing to help diagnose what the query planner it attempting to do.


It's always of interest to developers if there are corner cases that might identify overlooked scenarios, so it's always appreciated if enough information is provided to recreate the conditions to see whether there is a genuine problem in the software itself.

Regards

Thom

Re: How to get RTREE performance from GIST index?

От
Alban Hertroys
Дата:
On 22 Nov 2009, at 11:55, Clive Page wrote:

> On 22/11/2009 10:44, Martijn van Oosterhout wrote:
>> PostgreSQL is used extensively for geometric queries, see postgis. They
>> abandoned rtree a while back because the GiST rtree support was better,
>
> Maybe the support is better, but the performance is obviously not.  And when there is a difference between under a
minuteand 3 hours, then performance matters.  At least it does to me and my colleagues. 

Before getting all worked up about the performance of GiST indexes, did you verify that your tables were analysed?
Did the person doing the upgrade tune the database? Was it tuned the same or differently?
It may well be that the resources GiST indexes require aren't exactly the same as what RTrees require, so maybe tuning
needsto be different or you need to add some hardware (which means you probably were close to the limits before and
wouldlikely have to do this in the near future anyway - still an unpleasant surprise of course). 

A query going from "mere" minutes to several hours usually points to the resource starvation or a particularly poor
queryplan. Don't assume GiST indexes are that much slower than RTrees, they wouldn't have replaced them if that were
thecase. 

That is why people are asking for the query plan (EXPLAIN, or preferably EXPLAIN ANALYSE, but that actually performs
the3h query), so that we have an idea where your query is going wrong. In almost all cases we see on this ML the
problemis not a bug in Postgres; it usually boils down to lack of maintenance, improper tuning or just plain
inefficientqueries. 

>> you can't take the few minutes needed to run EXPLAIN on before and
>> after then there is zero chance of it being fixed either.
>
> Unfortunately it isn't a "few minutes".  To re-run in v8.1 I have to reload many tables into a different installation
usingv8.1: some of the tables have a few million rows and hundreds of columns.  Then I have change some scripts to add
anEXPLAIN command and log the resulting output (rather than getting the results that I actually want).  This will take
hours. I will try to do it soon, but cannot do it instantly.  I have some data that I want to process first. 

You posted a number of queries in your original mail. Don't those expose the problem? Running them in psql connected to
eitherdatabase shouldn't take much time at all. 

We just would like to see where your performance issues are coming from, we don't necessarily need the results of your
actualcalculations for those. 

> I agree that this is a bug in Postgres - the bug was removing code that worked perfectly well and upon which some
usersdepended.  I simply don't understand why the Rtree code could not have been left in there, for those who found
thatthe new-fangled GIST indexing did not work. 


I'm not one of the developers, but I expect it was duplicating code or unmaintained and there was no evidence that
therewere any regressions when replacing RTree with GiST (ISTR that GiST is in fact some form of RTree) and therefore
deemeddeprecated. It is not unusual to remove deprecated features between major versions of a software product. 

I'm speculating here, but I'm quite convinced you would have seen the same regression in the performance of your
databaseif RTree would have still been in the database. It's simply not likely that the difference in index is causing
yourtrouble. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b09295211736876095208!



Re: How to get RTREE performance from GIST index?

От
Clive Page
Дата:
On 22/11/2009 11:52, Thom Brown wrote:
> Since this is a performance issue, this should probably have been sent
> to the pgsql-performance mailing list.  But in any case, the vast
> majority of performance issues require an EXPLAIN output, or preferably
> with ANALYZE also as there is nothing to help diagnose what the query
> planner it attempting to do.

You may think it's a performance issue, but it stems from the decision
to remove from Postgres an essential facility, that of generating and
using R-trees.

I'm currently trying to generate test cases, together with EXPLAIN output.

Regards

--
Clive Page

Re: How to get RTREE performance from GIST index?

От
Alban Hertroys
Дата:
On 21 Nov 2009, at 23:57, Clive Page wrote:

> CREATE TEMPORARY TABLE cat4p AS
>  SELECT longid, srcid, ra, dec, poserr,
>   BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0),
>       POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox
>   FROM cat4;
> CREATE INDEX cat4pind ON cat4p USING RTREE(errbox);

ANALYSE cat4p;

> CREATE TEMPORARY TABLE apair AS
>  SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr
>  FROM avcatpos AS a, cat4p AS c
>  WHERE a.errbox && c.errbox AND
>    gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) <
>      LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr))
>    AND a.srcid <> c.srcid;


If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update the
planner'sstatistics on their contents. If you don't you get the default query plan that's often not efficient. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b0929e511732016739697!



Re: How to get RTREE performance from GIST index?

От
Martijn van Oosterhout
Дата:
On Sun, Nov 22, 2009 at 10:55:21AM +0000, Clive Page wrote:
>> You are AFAICR the first person to have a problem is this area, but if
>
> I find it extremely hard to believe that.  All I am doing is finding
> whether pairs of rectangular boxes overlap or not.  That is the most
> trivial use of R-trees possible.  Surely someone thought to time that
> using GIST?

Ofcourse, there were benchmarks, which showed the GiST version to be
faster or the same.

http://www.sai.msu.su/~megera/postgres/gist/code/rtree/README.rtree_gist

Not surprising really, since conceptually the GiST version resembles the
old rtree code very closely, which is why people are surprised you're
seeing a difference. Hence people as suspecting that the problem lies
elsewhere.

(GiST is basically the extension of rtree of non-geometric types, there
really isn't that much difference between the two).

> I agree that this is a bug in Postgres - the bug was removing code that
> worked perfectly well and upon which some users depended.  I simply
> don't understand why the Rtree code could not have been left in there,
> for those who found that the new-fangled GIST indexing did not work.

I think there would be some disagreements about whether the old code
was "working well", it was broken enough that people didn't want to
maintain it.

Looking forward to your explain output.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Вложения

Re: How to get RTREE performance from GIST index?

От
Clive Page
Дата:
On 22/11/2009 12:09, Alban Hertroys wrote:
> If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update
theplanner's statistics on their contents. If you don't you get the default query plan that's often not efficient. 

Alban

Thanks - I didn't know that.  I'll try removing the TEMPORARY tag.

Is it documented somewhere that I should have seen?

Regards

--
Clive Page

Re: How to get RTREE performance from GIST index?

От
Clive Page
Дата:
On 22/11/2009 12:15, Martijn van Oosterhout wrote:

> Looking forward to your explain output.

Here it is (I wrapped some of the longer lines as might not have
survived the translation to email):

Postgres v8.1.0
   EXPLAIN SELECT a.longid AS longid, b.longid AS blongid,
          gcdist(a.ra, a.dec, b.ra, b.dec) AS dist
   FROM pos AS a, pos AS b
   WHERE a.errbox && b.errbox
     AND gcdist(a.ra, a.dec, b.ra, b.dec) <
     LEAST(0.9*a.dist_nn, 0.9*b.dist_nn, 7.0, 3.0 * (a.poserr + b.poserr) )
     AND (a.obsid <> b.obsid OR a.longid = b.longid) ;
                           QUERY PLAN
---------------------------------------------------------------
  Nested Loop  (cost=22.16..1241963555.61 rows=205459449 width=48)
    Join Filter: ((gcdist("outer".ra, "outer"."dec", "inner".ra,
"inner"."dec") <
    LEAST((0.9::double precision * "outer".dist_nn), (0.9
    ::double precision * "inner".dist_nn), 7::double precision,
    (3::double precision * ("outer".poserr + "inner".poserr)))) AND
(("outer".
obsid <> "inner".obsid) OR ("outer".longid = "inner".longid)))
    ->  Seq Scan on pos a  (cost=0.00..8213.83 rows=351983 width=68)
    ->  Bitmap Heap Scan on pos b  (cost=22.16..3469.79 rows=1760 width=68)
          Recheck Cond: ("outer".errbox && b.errbox)
          ->  Bitmap Index Scan on pos_errbox  (cost=0.00..22.16
rows=1760 width=0)
                Index Cond: ("outer".errbox && b.errbox)
(7 rows)
Actual timing using v8.1.0:
SELECT
Time: 71351.102 ms


Postgres 8.4.1
EXPLAIN output:
---------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..235836993.78 rows=205459449 width=48)
    Join Filter: (((a.obsid <> b.obsid) OR (a.longid = b.longid)) AND
    (gcdist(a.ra, a."dec", b.ra, b."dec") < LEAST((0.9::double precision
* a.dist_nn),
     (0.9::double precision * b.dist_nn), 7::double precision,
     (3::double precision * (a.poserr + b.poserr)))))
    ->  Seq Scan on pos a  (cost=0.00..8032.83 rows=351983 width=68)
    ->  Index Scan using pos_errbox on pos b  (cost=0.00..31.27
rows=1760 width=68)
          Index Cond: (a.errbox && b.errbox)
(5 rows)
Actual timing using v8.4.1 was 10228 seconds (sorry didn't record the
milliseconds).
It only worked when I left it running overnight!

Regards

--
Clive Page

Re: How to get RTREE performance from GIST index?

От
Alban Hertroys
Дата:
On 22 Nov 2009, at 13:19, Clive Page wrote:

> On 22/11/2009 12:09, Alban Hertroys wrote:
>> If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update
theplanner's statistics on their contents. If you don't you get the default query plan that's often not efficient. 
>
> Alban
>
> Thanks - I didn't know that.  I'll try removing the TEMPORARY tag.
>
> Is it documented somewhere that I should have seen?


It's not just temporary tables, it goes for all tables in fact. The difference is that with normal tables there is time
forautovacuum to pick them up as needing maintenance, whereas temporary tables are usually queried immediately after
they'recreated so that autovacuum is too late. 

This specific case for using ANALYSE isn't explicitly documented, it more or less follows from the usage pattern of
temporarytables. From the notes on the documentation of the ANALYZE command
(http://www.postgresql.org/docs/8.4/interactive/sql-analyze.html):

"In the default PostgreSQL configuration, The Autovacuum Daemon takes care of automatic analyzing of tables when they
arefirst loaded with data, and as they change throughout regular operation. When autovacuum is disabled, it is a good
ideato run ANALYZE periodically, or just after making major changes in the contents of a table." 

That last line isn't explicit about temporary tables, but the reason for running ANALYZE in both cases is the same.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b092e5911731012678321!



Re: How to get RTREE performance from GIST index?

От
Alban Hertroys
Дата:
On 21 Nov 2009, at 23:57, Clive Page wrote:
> The relevant bits of SQL I have been using are:
>
> CREATE TEMPORARY TABLE cat4p AS
>  SELECT longid, srcid, ra, dec, poserr,
>   BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0),
>       POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox
>   FROM cat4;
> CREATE INDEX cat4pind ON cat4p USING RTREE(errbox);

Looking closer at this, that errbox calculation looks like its formula wouldn't change between sessions. If you use it
frequentlyenough it's a good candidate to put a functional index on or, if your SELECT vs INSERT/UPDATE/DELETE ratio
leansto the former, add a column with the value pre-calculated (and indexed of course). 

You can automate keeping that column up to date by using a few simple BEFORE INSERT and BEFORE UPDATE triggers (they
reallyonly need to calculate the box-value and override that column's value). Insert/Update performance will decrease
(there'sa function call and an extra calculation after all), but Select performance will probably improve and there's
sufficienttime for autovacuum to pick up any changes in the data. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b09327a11731713516847!



Re: How to get RTREE performance from GIST index?

От
Martijn van Oosterhout
Дата:
On Sun, Nov 22, 2009 at 12:24:50PM +0000, Clive Page wrote:
> On 22/11/2009 12:15, Martijn van Oosterhout wrote:
>
>> Looking forward to your explain output.
>
> Here it is (I wrapped some of the longer lines as might not have
> survived the translation to email):

Ok, very interesting, since this shows that the plan is essentially
identical between the two versions. Which kind of rules out problems
with statistics and missing ANALYSE.

My next thought goes to configuration, in particular work_mem,
maintainence_work_mem and shared_buffers. Are they the same between 8.1
and 8.4?

Can you give some idea of the density of the rectangle? What would be a
typical number of overlapping boxes for this query?

Have a nice day,

> Postgres v8.1.0
>   EXPLAIN SELECT a.longid AS longid, b.longid AS blongid,
>          gcdist(a.ra, a.dec, b.ra, b.dec) AS dist
>   FROM pos AS a, pos AS b
>   WHERE a.errbox && b.errbox
>     AND gcdist(a.ra, a.dec, b.ra, b.dec) <
>     LEAST(0.9*a.dist_nn, 0.9*b.dist_nn, 7.0, 3.0 * (a.poserr + b.poserr) )
>     AND (a.obsid <> b.obsid OR a.longid = b.longid) ;
>                           QUERY PLAN
> ---------------------------------------------------------------
>  Nested Loop  (cost=22.16..1241963555.61 rows=205459449 width=48)
>    Join Filter: ((gcdist("outer".ra, "outer"."dec", "inner".ra,
> "inner"."dec") <
>    LEAST((0.9::double precision * "outer".dist_nn), (0.9
>    ::double precision * "inner".dist_nn), 7::double precision,
>    (3::double precision * ("outer".poserr + "inner".poserr)))) AND
> (("outer".
> obsid <> "inner".obsid) OR ("outer".longid = "inner".longid)))
>    ->  Seq Scan on pos a  (cost=0.00..8213.83 rows=351983 width=68)
>    ->  Bitmap Heap Scan on pos b  (cost=22.16..3469.79 rows=1760 width=68)
>          Recheck Cond: ("outer".errbox && b.errbox)
>          ->  Bitmap Index Scan on pos_errbox  (cost=0.00..22.16
> rows=1760 width=0)
>                Index Cond: ("outer".errbox && b.errbox)
> (7 rows)
> Actual timing using v8.1.0:
> SELECT
> Time: 71351.102 ms
>
>
> Postgres 8.4.1
> EXPLAIN output:
> ---------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..235836993.78 rows=205459449 width=48)
>    Join Filter: (((a.obsid <> b.obsid) OR (a.longid = b.longid)) AND
>    (gcdist(a.ra, a."dec", b.ra, b."dec") < LEAST((0.9::double precision
> * a.dist_nn),
>     (0.9::double precision * b.dist_nn), 7::double precision,
>     (3::double precision * (a.poserr + b.poserr)))))
>    ->  Seq Scan on pos a  (cost=0.00..8032.83 rows=351983 width=68)
>    ->  Index Scan using pos_errbox on pos b  (cost=0.00..31.27 rows=1760
> width=68)
>          Index Cond: (a.errbox && b.errbox)
> (5 rows)
> Actual timing using v8.4.1 was 10228 seconds (sorry didn't record the
> milliseconds).
> It only worked when I left it running overnight!
>
> Regards
>
> --
> Clive Page
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Вложения

Re: How to get RTREE performance from GIST index?

От
Clive Page
Дата:
Thanks to all those who responded to my posting yesterday.

I have now tried a simple simulation of joining tables with partly
overlapping rectangular boxes using Rtrees (with GIST automatically
replacing them in 8.4.1), and this works in 8.1.0 and 8.4.1, with the
latter a bit faster.  But my original data processing script still hangs
(or takes 3 hours, much the same thing as far as getting work done is
concerned) at various points when using v8.4.1.  The identical script
works fine using a 8.1.0 server, which fortunately we still have
available.  I have now inserted ANALYSE table commands before each
SELECT that depends upon an R-tree (GIST) index.  This doesn't seem to help.

It will obviously take a lot of time and effort to track this down.  For
the moment I shall stick to using v8.1.0, as there's really no alternative.

I still think it a great pity that rather than merely deprecating R-tree
indexing or making GIST the default but still allowing R-trees to be
used if one really wanted them, you actually removed Rtrees from the
code.  No doubt some tests show GIST to work and work faster than Rtrees
in test cases; clearly from my experience when using complicated
real-world data that's not necessarily true.

Maybe there's some magic spell that can be used to restore the earlier
performance, but I really don't have time at present to do the necessary
experimenting.

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH,  U.K.

Re: How to get RTREE performance from GIST index?

От
Clive Page
Дата:
Thanks to all those who responded to my posting yesterday.

I have now tried a simple simulation of joining tables with partly
overlapping rectangular boxes using Rtrees (with GIST automatically
replacing them in 8.4.1), and this works in 8.1.0 and 8.4.1, with the
latter a bit faster.  But my original data processing script still hangs
(or takes 3 hours, much the same thing as far as getting work done is
concerned) at various points when using v8.4.1.  The identical script
works fine using a 8.1.0 server, which fortunately we still have
available.  I have now inserted ANALYSE table commands before each
SELECT that depends upon an R-tree (GIST) index.  This doesn't seem to help.

It will obviously take a lot of time and effort to track this down.  For
the moment I shall stick to using v8.1.0, as there's really no alternative.

I still think it a great pity that rather than merely deprecating R-tree
indexing or making GIST the default but still allowing R-trees to be
used if one really wanted them, you actually removed Rtrees from the
code.  No doubt some tests show GIST to work and work faster than Rtrees
in test cases; clearly from my experience when using complicated
real-world data that's not necessarily true.

Maybe there's some magic spell that can be used to restore the earlier
performance, but I really don't have time at present to do the necessary
experimenting.

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH,  U.K.

--
Clive Page

Re: How to get RTREE performance from GIST index?

От
Alvaro Herrera
Дата:
Alban Hertroys wrote:

> "In the default PostgreSQL configuration, The Autovacuum Daemon takes
> care of automatic analyzing of tables when they are first loaded with
> data, and as they change throughout regular operation. When autovacuum
> is disabled, it is a good idea to run ANALYZE periodically, or just
> after making major changes in the contents of a table."
>
> That last line isn't explicit about temporary tables, but the reason
> for running ANALYZE in both cases is the same.

Actually, autovacuum doesn't process temp tables at all because it
cannot get to them; they might live solely in the creating process'
private memory area.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: How to get RTREE performance from GIST index?

От
Clive Page
Дата:
> Actually, autovacuum doesn't process temp tables at all because it
> cannot get to them; they might live solely in the creating process'
> private memory area.
>

Does that mean that, in between creating a temporary table and actually
using it in a complicate query, it is desirable to run an ANALYZE
command on it?

I haven't been doing that, because I didn't know.

Regards

--
Clive Page

Re: How to get RTREE performance from GIST index?

От
Alban Hertroys
Дата:
On 2 Dec 2009, at 21:12, Clive Page wrote:
>> Actually, autovacuum doesn't process temp tables at all because it
>> cannot get to them; they might live solely in the creating process'
>> private memory area.
>
> Does that mean that, in between creating a temporary table and actually using it in a complicate query, it is
desirableto run an ANALYZE command on it? 
>
> I haven't been doing that, because I didn't know.


Only if you created an index on it. If you didn't it doesn't matter as in that case you always get a sequential scan,
unlessI'm mistaken. 

Alban Hertroys

--
Screwing up is the best way to attach something to the ceiling.


!DSPAM:737,4b16f67511731227681557!