Re: Zedstore - compressed in-core columnar storage
От | Konstantin Knizhnik |
---|---|
Тема | Re: Zedstore - compressed in-core columnar storage |
Дата | |
Msg-id | 3978b57e-fe25-ca6b-f56c-48084417e115@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Zedstore - compressed in-core columnar storage (Heikki Linnakangas <hlinnaka@iki.fi>) |
Список | pgsql-hackers |
On 09.04.2019 18:08, Heikki Linnakangas wrote: > On 09/04/2019 18:00, Konstantin Knizhnik wrote: >> On 09.04.2019 17:09, Konstantin Knizhnik wrote: >>> standard Postgres heap and my VOPS extension. >>> As test data I used TPC-H benchmark (actually only one lineitem table >>> generated with tpch-dbgen utility with scale factor 10 (~8Gb database). >>> I attached script which I have use to populate data (you have to to >>> download, build and run tpch-dbgen yourself, also you can comment code >>> related with VOPS). > > Cool, thanks! > >>> Unfortunately I failed to load data in zedstore: >>> >>> postgres=# insert into zedstore_lineitem_projection (select >>> l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag::"char",l_linestatus::"char" >>> >>> from lineitem); >>> psql: ERROR: compression failed. what now? >>> Time: 237804.775 ms (03:57.805) > > Yeah, it's still early days, it will crash and burn in a lot of cases. > We wanted to publish this early, to gather ideas and comments on the > high level design, and to validate that the table AM API that's in v12 > is usable. > >> Looks like the original problem was caused by internal postgres >> compressor: I have not configured Postgres to use lz4. >> When I configured Postgres --with-lz4, data was correctly inserted in >> zedstore table, but looks it is not compressed at all: >> >> postgres=# select pg_relation_size('zedstore_lineitem_projection'); >> pg_relation_size >> ------------------ >> 9363010640 > > The single-insert codepath isn't very optimized yet. If you populate > the table with large "INSERT ... SELECT ...", you end up with a huge > undo log. Try loading it with COPY. > > You can also see how many pages of each type there is with: > > select count(*), pg_zs_page_type('zedstore_lineitem_projection', g) > from generate_series(0, > pg_table_size('zedstore_lineitem_projection') / 8192 - 1) g group by 2; > > - Heikki postgres=# copy zedstore_lineitem from '/mnt/data/lineitem.tbl' delimiter '|' csv; COPY 59986052 Time: 232802.257 ms (03:52.802) postgres=# select pg_relation_size('zedstore_lineitem'); pg_relation_size ------------------ 10346504192 (1 row) postgres=# select count(*), pg_zs_page_type('zedstore_lineitem', g) from generate_series(0, pg_table_size('zedstore_lineitem') / 8192 - 1) g group by 2; count | pg_zs_page_type ---------+----------------- 1 | META 1262308 | BTREE 692 | UNDO (3 rows) And now performance is much worser: Time: 99819.476 ms (01:39.819) It is strange, because the main advantage of columnar store is that it has to fetch only accessed rows. What I see is that in non-parallel mode (max_parallel_workers_per_gather = 0) backend consumes about 11GB of memory. It fits in my desktop RAM (16GB) and speed is ~58 seconds. But one I start 4 parallel workers, them cause huge swapping: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28195 knizhnik 20 0 11.823g 6.553g 5.072g D 7.6 42.2 0:17.19 postgres 28074 knizhnik 20 0 11.848g 6.726g 5.223g D 7.3 43.3 4:14.96 postgres 28192 knizhnik 20 0 11.854g 6.586g 5.075g D 7.3 42.4 0:17.18 postgres 28193 knizhnik 20 0 11.870g 6.594g 5.064g D 7.3 42.4 0:17.19 postgres 28194 knizhnik 20 0 11.854g 6.589g 5.078g D 7.3 42.4 0:17.09 postgres which is also strange because data should be present in shared buffers. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
В списке pgsql-hackers по дате отправления: