Re: [Proposal] Page Compression for OLTP

Поиск
Список
Период
Сортировка
От chenhj
Тема Re: [Proposal] Page Compression for OLTP
Дата
Msg-id 4b3ebc9b.118.17649764c77.Coremail.chjischj@163.com
обсуждение исходный текст
Ответ на Re: [Proposal] Page Compression for OLTP  (chenhj <chjischj@163.com>)
Ответы Re: [Proposal] Page Compression for OLTP  (chenhj <chjischj@163.com>)
Список pgsql-hackers
Hi hackers,

I further improved this Patch, adjusted some of the design, and added related modifications
(pg_rewind,replication,checksum,backup) and basic tests. Any suggestions are welcome.

this patch can also be obtained from here
https://github.com/ChenHuajun/postgres/tree/page_compress_14

# 1. Page storage

The compressed data block is stored in one or more chunks of the compressed data file, 
and the size of each chunk is 1/8, 1/4, or 1/2 block size.
The storage location of each compressed data block is represented by an array of chunkno 
and stored in the compressed address file.

## 1.1 page compressed address file(_pca)

     blk0       1       2       3
+=======+=======+=======+=======+=======+
| head  |  1    |    2  | 3,4   |   5   |
+=======+=======+=======+=======+=======+

## 1.2 page compressed data file(_pcd)

chunk1    2         3          4         5
+=========+=========+==========+=========+=========+
| blk0    | blk2    | blk2_1   | blk2_2  | blk3    |
+=========+=========+==========+=========+=========+
|    4K   |


# 2. Usage

## 2.1 Set whether to use compression through storage parameters of tables and indexes

- compresstype
 Set whether to compress and the compression algorithm used, supported values: none, pglz, zstd

- compresslevel
  Set compress level(only zstd support)
 
- compress_chunk_size

 Chunk is the smallest unit of storage space allocated for compressed pages.
 The size of the chunk can only be 1/2, 1/4 or 1/8 of BLCKSZ

- compress_prealloc_chunks

  The number of chunks pre-allocated for each page. The maximum value allowed is: BLCKSZ/compress_chunk_size -1.
  If the number of chunks required for a compressed page is less than `compress_prealloc_chunks`,
  It allocates `compress_prealloc_chunks` chunks to avoid future storage fragmentation when the page needs more storage space.

example:
CREATE TABLE tbl_pc(id int, c1 text) WITH(compresstype=zstd, compresslevel=0, compress_chunk_size=1024, compress_prealloc_chunks=2);
CREATE INDEX tbl_pc_idx1 on tbl_pc(c1) WITH(compresstype=zstd, compresslevel=1, compress_chunk_size=4096, compress_prealloc_chunks=0);


## 2.2 Set default compression option when create table in specified tablespace

- default_compresstype
- default_compresslevel
- default_compress_chunk_size
- default_compress_prealloc_chunks

note:temp table and unlogged table will not be affected by the above 4 parameters

example:
ALTER TABLESPACE pg_default SET(default_compresstype=zstd, default_compresslevel=2, default_compress_chunk_size=1024, default_compress_prealloc_chunks=2);


## 2.3 View the storage location of each block of the compressed table

add some functions in pageinspect to inspect compressed relation

- get_compress_address_header(relname text, segno integer)
- get_compress_address_items(relname text, segno integer)

example:
SELECT nblocks, allocated_chunks, chunk_size, algorithm FROM get_compress_address_header('test_compressed',0);
 nblocks | allocated_chunks | chunk_size | algorithm 
---------+------------------+------------+-----------
       1 |               20 |       1024 |         1
(1 row)

SELECT * FROM get_compress_address_items('test_compressed',0);
 blkno | nchunks | allocated_chunks |   chunknos    
-------+---------+------------------+---------------
     0 |       0 |                4 | {1,2,3,4}
     1 |       0 |                4 | {5,6,7,8}
     2 |       0 |                4 | {9,10,11,12}
     3 |       0 |                4 | {13,14,15,16}
     4 |       0 |                4 | {17,18,19,20}
(5 rows)

## 2.4 Compare the compression ratio of different compression algorithms and compression levels

Use a new function in pageinspect can compare the compression ratio of different compression algorithms and compression levels.
This helps determine what compression parameters to use.

- page_compress(page bytea, algorithm text, level integer)

example:
postgres=# SELECT blk,octet_length(page_compress(get_raw_page('test_compressed', 'main', blk), 'pglz', 0)) compressed_size from generate_series(0,4) blk;
 blk | compressed_size
-----+-----------------
   0 |            3234
   1 |            3516
   2 |            3515
   3 |            3515
   4 |            1571
(5 rows)

postgres=# SELECT blk,octet_length(page_compress(get_raw_page('test_compressed', 'main', blk), 'zstd', 0)) compressed_size from generate_series(0,4) blk;
 blk | compressed_size
-----+-----------------
   0 |            1640
   1 |            1771
   2 |            1801
   3 |            1813
   4 |             806
(5 rows)


# 3. How to ensure crash safe
For the convenience of implementation, when the chunk space is allocated in the compressed address file, 
WAL is not written. Therefore, if postgres crashes during the space allocation process, 
incomplete data may remain in the compressed address file.

In order to ensure the data consistency of the compressed address file, the following measures have been taken

1. Divide the compressed address file into several 512-byte areas. The address data of each data block is stored in only one area, 
   and does not cross the area boundary to prevent half of the addresses from being persistent and the other half of the addresses not being persistent.
2. When allocating chunk space, write address information in a fixed order in the address file to avoid inconsistent data midway. details as follows

   -Accumulate the total number of allocated chunks in the Header (PageCompressHeader.allocated_chunks)
   -Write the chunkno array in the address corresponding to the data block (PageCompressAddr.chunknos)
   -Write the number of allocated chunks in the address corresponding to the written data block (PageCompressAddr.nchunks)
   -Update the global number of blocks in the Header (PageCompressHeader.nblocks)

typedef struct PageCompressHeader
{
pg_atomic_uint32 nblocks; /* number of total blocks in this segment */
pg_atomic_uint32 allocated_chunks; /* number of total allocated chunks in data area */
uint16 chunk_size; /* size of each chunk, must be 1/2 1/4 or 1/8 of BLCKSZ */
uint8 algorithm; /* compress algorithm, 1=pglz, 2=lz4 */
pg_atomic_uint32 last_synced_nblocks; /* last synced nblocks */
pg_atomic_uint32 last_synced_allocated_chunks; /* last synced allocated_chunks */
TimestampTz last_recovery_start_time; /* postmaster start time of last recovery */
} PageCompressHeader;

typedef struct PageCompressAddr
{
volatile uint8 nchunks; /* number of chunks for this block */
volatile uint8 allocated_chunks; /* number of allocated chunks for this block */

/* variable-length fields, 1 based chunk no array for this block, size of the array must be 2, 4 or 8 */
pc_chunk_number_t chunknos[FLEXIBLE_ARRAY_MEMBER];
} PageCompressAddr;

3. Once a chunk is allocated, it will always belong to a specific data block until the relation is truncated(or vacuum tail block), 
   avoiding frequent changes of address information.
4. When replaying WAL in the recovery phase after a postgres crash, check the address file of all compressed relations opened for the first time,
   and repair if inconsistent data (refer to the check_and_repair_compress_address function).


# 4. Problem

- When compress_chunk_size=1024, about 4MB of space is needed to store the address,
  which will cause the space of the small file to become larger after compression.
  Therefore, should avoid enabling compression for small tables.
- The zstd library needs to be installed separately. Could copy the source code of zstd to postgres?


# 5. TODO list

1. docs
2. optimize code style, error message and so on 
3. more test

BTW:
If anyone thinks this Patch is valuable, hope to improve it together.


Best Regards
Chen Hujaun

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Proposed patch for key managment
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] [PATCH] Generic type subscripting