Обсуждение: WIP: SP-GiST, Space-Partitioned GiST
Hi there, attached is WIP-patch for 9.2 development source tree, which provides implementation of SP-GiST (prototype was presented at PGCon-2011, see http://www.pgcon.org/2011/schedule/events/309.en.html and presentation for details) as a core feature. Main differences from prototype version: 1. Now it's part of pg core, not contrib module 2. It provides more operations for quadtree and suffix tree 3. It uses clustering algorithm of nodes on disk and has much better utilization of disk space. Fillfactor is supported 4. Some corner cases were eliminated 5. It provides support for concurency and recovery (inserts are logged, supports for deletes, and log replay will be added really soon) So, now code contains almost all possible overhead of production code and we ask hackers to test performance on real data sets. We expect the same performance for random data (since almost no overlaps) and much better performance on real-life data, plus much better index creation time. Also, we appreciate your comments and suggestions about API. Regards, Oleg
Вложения
Hi!
Ie expect some problems in support of comparison operators for text, because locale string comparison can have unexpected behaviour.
Let's see the example. Create table with words and add extra leading space to some of them.
test=# create table dict(id serial, word text);
NOTICE: CREATE TABLE will create implicit sequence "dict_id_seq" for serial column "dict.id"
CREATE TABLE
test=# \copy dict(word) from '/usr/share/dict/american-english';
test=# update dict set word = ' '||word where id%2=0;
UPDATE 49284
I use Ubuntu 11.04 with ru_RU.utf8 locale. So, comparison operators ignores leading spaces.
test=# select * from dict where word between 'cart' and 'cary';
id | word
-------+----------------
3029 | Carter
3031 | Cartesian
3033 | Carthage's
3035 | Cartier
3037 | Cartwright
3039 | Caruso
3041 | Carver
28419 | cart
28421 | carted
28423 | cartel's
28425 | cartilage
28427 | cartilages
28429 | carting
28431 | cartographer's
28433 | cartography
28435 | carton
28437 | cartons
28439 | cartoon's
28441 | cartooning
28443 | cartoonist's
28445 | cartoons
28447 | cartridge's
28449 | carts
28451 | cartwheel's
28453 | cartwheeling
28455 | carve
28457 | carver
28459 | carvers
28461 | carving
28463 | carvings
3030 | Carter's
3032 | Carthage
3034 | Carthaginian
3036 | Cartier's
3038 | Cartwright's
3040 | Caruso's
3042 | Carver's
28420 | cart's
28422 | cartel
28424 | cartels
28426 | cartilage's
28428 | cartilaginous
28430 | cartographer
28432 | cartographers
28434 | cartography's
28436 | carton's
28438 | cartoon
28440 | cartooned
28442 | cartoonist
28444 | cartoonists
28446 | cartridge
28448 | cartridges
28450 | cartwheel
28452 | cartwheeled
28454 | cartwheels
28456 | carved
28458 | carver's
28460 | carves
28462 | carving's
(59 rows)
But if I create spgist index query result differs.
test=# create index dict_idx on dict using spgist (word);
CREATE INDEX
test=# select * from dict where word between 'cart' and 'cary';
id | word
-------+----------------
28419 | cart
28421 | carted
28423 | cartel's
28425 | cartilage
28427 | cartilages
28429 | carting
28431 | cartographer's
28433 | cartography
28435 | carton
28437 | cartons
28439 | cartoon's
28441 | cartooning
28443 | cartoonist's
28445 | cartoons
28447 | cartridge's
28449 | carts
28451 | cartwheel's
28453 | cartwheeling
28455 | carve
28457 | carver
28459 | carvers
28461 | carving
28463 | carvings
(23 rows)
------
With best regards,Alexander Korotkov.