Compressing temporary files

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Compressing temporary files
Дата
Msg-id 4F368CFD-306C-483E-968E-072D225A8DCE@yandex-team.ru
обсуждение исходный текст
Ответы Re: Compressing temporary files  (Bruce Momjian <bruce@momjian.us>)
Re: Compressing temporary files  (Robert Haas <robertmhaas@gmail.com>)
Re: Compressing temporary files  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: Compressing temporary files  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
Hi hackers!

There's a lot of compression discussions nowadays. And that's cool!
Recently Naresh Chainani in private discussion shared with me the idea to compress temporary files on disk.
And I was thrilled to find no evidence of implementation of this interesting idea.

I've prototyped Random Access Compressed File for fun[0]. The code is very dirty proof-of-concept.
I compress Buffile by one block at a time. There are directory pages to store information about the size of each
compressedblock. If any byte of the block is changed - whole block is recompressed. Wasted space is never reused. If
compressedblock is more then BLCSZ - unknown bad things will happen :) 

Here are some my observations.

0. The idea seems feasible. API of fd.c used by buffile.c can easily be abstracted for compressed temporary files.
Seeksare necessary, but they are not very frequent. It's easy to make temp file compression GUC-controlled. 

1. Temp file footprint can be easily reduced. For example query
create unlogged table y as select random()::text t from generate_series(0,9999999) g;
uses for toast index build 140000000 bytes of temp file. With patch this value is reduced to 40841704 (x3.42 smaller).

2. I have not found any evidence of performance improvement. I've only benchmarked patch on my laptop. And RAM (page
cache)diminished any difference between writing compressed block and uncompressed block. 

How do you think: does it worth to pursue the idea? OLTP systems rarely rely on data spilled to disk.
Are there any known good random access compressed file libs? So we could avoid reinventing the wheel.
Maybe someone tried this approach before?

Thanks!

Best regards, Andrey Borodin.

[0] https://github.com/x4m/postgres_g/commit/426cd767694b88e64f5e6bee99fc653c45eb5abd


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Zhihong Yu
Дата:
Сообщение: Re: Polyphase merge is obsolete
Следующее
От: Tom Lane
Дата:
Сообщение: Re: missing warning in pg_import_system_collations