Обсуждение: WIP: SP-GiST, Space-Partitioned GiST

Поиск

Список

Период

Сортировка

WIP: SP-GiST, Space-Partitioned GiST

От

Oleg Bartunov

Дата:

31 августа 2011 г., 21:25:09

Hi there,

attached is WIP-patch for 9.2 development source tree, which provides
implementation of SP-GiST (prototype
was presented at PGCon-2011, see
http://www.pgcon.org/2011/schedule/events/309.en.html and presentation
for details) as a core feature.  Main differences from prototype version:

1. Now it's part of pg core, not contrib module
2. It provides more operations for quadtree and suffix tree
3. It uses clustering algorithm of nodes on disk and has much better
utilization of disk space. Fillfactor is supported
4. Some corner cases were eliminated
5. It provides support for concurency and recovery (inserts are
logged, supports for deletes, and log replay will be added really
soon)

So, now code contains almost all possible overhead of production code
and we ask hackers to test performance on real data sets. We expect
the same performance for random data (since almost no overlaps) and
much better performance on real-life data, plus much better index
creation time. Also, we appreciate your comments and suggestions about
API.

Regards,

Oleg

Вложения

spgist_patch-0.84.gz

Re: WIP: SP-GiST, Space-Partitioned GiST

От

Alexander Korotkov

Дата:

01 сентября 2011 г., 12:14:48

Hi!

Ie expect some problems in support of comparison operators for text, because locale string comparison can have unexpected behaviour.

Let's see the example. Create table with words and add extra leading space to some of them.

test=# create table dict(id serial, word text);

NOTICE: CREATE TABLE will create implicit sequence "dict_id_seq" for serial column "dict.id"

CREATE TABLE

test=# \copy dict(word) from '/usr/share/dict/american-english';

test=# update dict set word = ' '||word where id%2=0;

UPDATE 49284

I use Ubuntu 11.04 with ru_RU.utf8 locale. So, comparison operators ignores leading spaces.

test=# select * from dict where word between 'cart' and 'cary';

id | word

-------+----------------

3029 | Carter

3031 | Cartesian

3033 | Carthage's

3035 | Cartier

3037 | Cartwright

3039 | Caruso

3041 | Carver

28419 | cart

28421 | carted

28423 | cartel's

28425 | cartilage

28427 | cartilages

28429 | carting

28431 | cartographer's

28433 | cartography

28435 | carton

28437 | cartons

28439 | cartoon's

28441 | cartooning

28443 | cartoonist's

28445 | cartoons

28447 | cartridge's

28449 | carts

28451 | cartwheel's

28453 | cartwheeling

28455 | carve

28457 | carver

28459 | carvers

28461 | carving

28463 | carvings

3030 | Carter's

3032 | Carthage

3034 | Carthaginian

3036 | Cartier's

3038 | Cartwright's

3040 | Caruso's

3042 | Carver's

28420 | cart's

28422 | cartel

28424 | cartels

28426 | cartilage's

28428 | cartilaginous

28430 | cartographer

28432 | cartographers

28434 | cartography's

28436 | carton's

28438 | cartoon

28440 | cartooned

28442 | cartoonist

28444 | cartoonists

28446 | cartridge

28448 | cartridges

28450 | cartwheel

28452 | cartwheeled

28454 | cartwheels

28456 | carved

28458 | carver's

28460 | carves

28462 | carving's

(59 rows)

But if I create spgist index query result differs.

test=# create index dict_idx on dict using spgist (word);

CREATE INDEX