Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

Поиск
Список
Период
Сортировка
От Kisung Kim
Тема Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
Дата
Msg-id CABF0Rr07yJop9OEgaWz5jFYoqqo_-cAv6g-Qy92VV9u5WMMwTA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)  (Lukas Fittl <lukas@fittl.com>)
Список pgsql-hackers
Hi,

I've run YCSB(Yahoo! Cloud Service Benchmark) on PostgreSQL and MongoDB with WiredTiger.
And I found some interesting results and some issues(maybe) on Btree index of PostgreSQL.

Here is my experiments and results.
YCSB is for document store benchmark and I build following schema in PG.

CREATE TABLE usertable (
    YCSB_KEY varchar(30) primary key,
    FIELDS jsonb);

And the benchmark generated avg-300-byte-length Json documents and loaded 100M rows in PG and Mongo.

First I found that the size difference between PG and Mongo:
I configured Mongo not to use any compression for both storage and index.

MongoDB index size: 2.1 GB
PostgreSQL index size: 5.5 GB

When I used the index bloating estimation script in https://github.com/ioguix/pgsql-bloat-estimation,
the result is as follows:
 current_database | schemaname |     tblname      |              idxname              | real_size  | extra_size |    extra_ratio    | fillfactor | bloat_size |    bloat_ratio    | is_na 
------------------+------------+------------------+-----------------------------------+------------+------------+-------------------+------------+------------+-------------------+-------
ycsb             | public     | usertable        | usertable_pkey                    | 5488852992 | 2673713152 |  48.7116917850949 |         90 | 2362122240 |  43.0348971532448 | f

It says that the bloat_ratio is 42 for the index.

So, I rebuilded the index and the result was changed:

 current_database | schemaname |     tblname      |              idxname              | real_size  | extra_size |    extra_ratio    | fillfactor | bloat_size |    bloat_ratio    | is_na 
------------------+------------+------------------+-----------------------------------+------------+------------+-------------------+------------+------------+-------------------+-------
 ycsb             | public     | usertable        | usertable_pkey                    | 3154264064 |  339132416 |  10.7515543758863 |         90 |   27533312 | 0.872891788428275 | f


I am curious about the results
1) why the index was bloated even though rows were only inserted not removed or updated.
2) And then why the bloating is removed when it is rebuilded

I guess that the splits of Btree nodes during inserting rows caused the bloating but I don't know exact reasons.
And given that MongoDB's index size is much smaller than PG after they handled the same workload (only insert),
then is there any chances to improve PG's index behavior.

Thank you very much. 


--

                                                                                                                                                       

Bitnine Global Inc., Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : kskim@bitnine.net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marko Tiikkaja
Дата:
Сообщение: Re: Assertion failure in REL9_5_STABLE
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)