Re: GiST index performance

Поиск
Список
Период
Сортировка
От Yeb Havinga
Тема Re: GiST index performance
Дата
Msg-id e4961fb71003200916r6c9913dag28c04aef290f877d@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GiST index performance  (Kenneth Marshall <ktm@rice.edu>)
Ответы Re: GiST index performance  (Matthew Wakeling <matthew@flymine.org>)
Список pgsql-performance
On Fri, Mar 19, 2010 at 10:16 PM, Kenneth Marshall <ktm@rice.edu> wrote:
Hi Yeb,

I have not looked at the gist code, but would it be possible to
make virtual pages that have a size that is 1/power-of-2 * blocksize.
Then the leaf node could be 1/8 or even 1/16 the size of the full
pagesize.

Hello Ken,

The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with the disk pages getting larger, gist indexing has therefore become unexpectedly inefficient.

But I am also not really into the core-gist code, but do have a motivation to dive into it (more than 200% performance increase in Mathew's test case). However I'd like to verify for community support before working on it.

Maybe Theodor or Oleg could say something about how easy or hard it is to do?

regards,
Yeb Havinga
 

Regards,
Ken

On Fri, Mar 19, 2010 at 09:49:30PM +0100, Yeb Havinga wrote:
> Yeb Havinga wrote:
>>
>> Since the gistpagesize is derived from the database blocksize, it might be
>> wise to set the blocksize low for this case, I'm going to play with this a
>> bit more.
> Ok, one last mail before it turns into spam: with a 1KB database blocksize,
> the query now runs in 30 seconds (original 70 on my machine, shared buffers
> 240MB).
> The per inner loop access time now 24 microsecs compared to on my machine
> original 74 microsecs with 8KB size and 8 for the btree scan. Not a bad
> speedup with such a simple parameter :-)
>
> postgres=# EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND b.b
> + 2;
>                                                        QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------
> Nested Loop  (cost=0.00..4169159462.20 rows=111109777668 width=8) (actual
> time=0.184..29540.355 rows=2999997 loops=1)
>   ->  Seq Scan on b  (cost=0.00..47037.62 rows=999962 width=4) (actual
> time=0.024..1783.484 rows=1000000 loops=1)
>   ->  Index Scan using a_a on a  (cost=0.00..2224.78 rows=111114 width=4)
> (actual time=0.021..0.024 rows=3 loops=1000000)
>         Index Cond: ((a.a >= b.b) AND (a.a <= (b.b + 2)))
> Total runtime: 30483.303 ms
> (5 rows)
>
>
> postgres=# select gist_stat('a_a');
>                 gist_stat
> -------------------------------------------
> Number of levels:          5             +
> Number of pages:           47618         +
> Number of leaf pages:      45454         +
> Number of tuples:          1047617       +
> Number of invalid tuples:  0             +
> Number of leaf tuples:     1000000       +
> Total size of tuples:      21523756 bytes+
> Total size of leaf tuples: 20545448 bytes+
> Total size of index:       48760832 bytes+
> (1 row)
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

В списке pgsql-performance по дате отправления:

Предыдущее
От: Reydan Cankur
Дата:
Сообщение: pgbench installation
Следующее
От: Andy Colson
Дата:
Сообщение: Re: mysql to postgresql, performance questions