Обсуждение: WIP: SP-GiST, Space-Partitioned GiST

Поиск
Список
Период
Сортировка

WIP: SP-GiST, Space-Partitioned GiST

От
Oleg Bartunov
Дата:
Hi there,

attached is WIP-patch for 9.2 development source tree, which provides
implementation of SP-GiST (prototype
was presented at PGCon-2011, see
http://www.pgcon.org/2011/schedule/events/309.en.html and presentation
for details) as a core feature.  Main differences from prototype version:

1. Now it's part of pg core, not contrib module
2. It provides more operations for quadtree and suffix tree
3. It uses clustering algorithm of nodes on disk and has much better
utilization of disk space. Fillfactor is supported
4. Some corner cases were eliminated
5. It provides support for concurency and recovery (inserts are
logged, supports for deletes, and log replay will be added really
soon)

So, now code contains almost all possible overhead of production code
and we ask hackers to test performance on real data sets. We expect
the same performance for random data (since almost no overlaps) and
much better performance on real-life data, plus much better index
creation time. Also, we appreciate your comments and suggestions about
API.

Regards,

Oleg

Вложения

Re: WIP: SP-GiST, Space-Partitioned GiST

От
Alexander Korotkov
Дата:
Hi!

Ie expect some problems in support of comparison operators for text, because locale string comparison can have unexpected behaviour. 
Let's see the example. Create table with words and add extra leading space to some of them.

test=# create table dict(id serial, word text);
NOTICE:  CREATE TABLE will create implicit sequence "dict_id_seq" for serial column "dict.id"
CREATE TABLE
test=# \copy dict(word) from '/usr/share/dict/american-english';
test=# update dict set word = ' '||word where id%2=0;
UPDATE 49284

I use Ubuntu 11.04 with ru_RU.utf8 locale. So, comparison operators ignores leading spaces.

test=# select * from dict where word between 'cart' and 'cary';
  id   |      word      
-------+----------------
  3029 | Carter
  3031 | Cartesian
  3033 | Carthage's
  3035 | Cartier
  3037 | Cartwright
  3039 | Caruso
  3041 | Carver
 28419 | cart
 28421 | carted
 28423 | cartel's
 28425 | cartilage
 28427 | cartilages
 28429 | carting
 28431 | cartographer's
 28433 | cartography
 28435 | carton
 28437 | cartons
 28439 | cartoon's
 28441 | cartooning
 28443 | cartoonist's
 28445 | cartoons
 28447 | cartridge's
 28449 | carts
 28451 | cartwheel's
 28453 | cartwheeling
 28455 | carve
 28457 | carver
 28459 | carvers
 28461 | carving
 28463 | carvings
  3030 |  Carter's
  3032 |  Carthage
  3034 |  Carthaginian
  3036 |  Cartier's
  3038 |  Cartwright's
  3040 |  Caruso's
  3042 |  Carver's
 28420 |  cart's
 28422 |  cartel
 28424 |  cartels
 28426 |  cartilage's
 28428 |  cartilaginous
 28430 |  cartographer
 28432 |  cartographers
 28434 |  cartography's
 28436 |  carton's
 28438 |  cartoon
 28440 |  cartooned
 28442 |  cartoonist
 28444 |  cartoonists
 28446 |  cartridge
 28448 |  cartridges
 28450 |  cartwheel
 28452 |  cartwheeled
 28454 |  cartwheels
 28456 |  carved
 28458 |  carver's
 28460 |  carves
 28462 |  carving's
(59 rows)

But if I create spgist index query result differs.

test=# create index dict_idx on dict using spgist (word);
CREATE INDEX
test=# select * from dict where word between 'cart' and 'cary';
  id   |      word      
-------+----------------
 28419 | cart
 28421 | carted
 28423 | cartel's
 28425 | cartilage
 28427 | cartilages
 28429 | carting
 28431 | cartographer's
 28433 | cartography
 28435 | carton
 28437 | cartons
 28439 | cartoon's
 28441 | cartooning
 28443 | cartoonist's
 28445 | cartoons
 28447 | cartridge's
 28449 | carts
 28451 | cartwheel's
 28453 | cartwheeling
 28455 | carve
 28457 | carver
 28459 | carvers
 28461 | carving
 28463 | carvings
(23 rows)

------
With best regards,
Alexander Korotkov.