[PATCH] Compression and on-disk sorting

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема [PATCH] Compression and on-disk sorting
Дата
Msg-id 20060517161730.GI15180@svana.org
обсуждение исходный текст
Ответы Re: [PATCH] Compression and on-disk sorting
Список pgsql-patches
Persuant to the discussions currently on -hackers, here's a patch that
uses zlib to compress the tapes as they go to disk. I default to the
compression level 3 (think gzip -3).

Please speed test all you like, I *think* it's bug free, but you never
know.

Outstanding questions:

- I use zlib because the builtin pg_lzcompress can't do what zlib does.
Here we setup input and output buffers and zlib will process as much as
it can (input empty or output full). This means no marshalling is
required. We can compress the whole file without having it in memory.

- zlib allocates memory for compression and decompression, I don't know
how much. However, it allocates via the postgres mcxt system so it
shouldn't too hard to find out. Simon pointed out that we'll need to
track this because we might allow hundreds of tapes.

- Each tape is compressed as one long compressed stream. Currently no
seeking is allowed, so only sorts, no joins! (As tom said, quick and
dirty numbers). This should show this possibility in its best light
but if we want to support seeking we're going to need to change that.
Maybe no compression on the last pass?

- It's probable that the benefits are strongly correlated to the speed
of your disk subsystem. We need to measure this effect. I can't
accuratly measure this because my compiler doesn't inline any of the
functions in tuplesort.c.

In my test of a compression ratio around 100-to-1, on 160MB of data
with tiny work_mem on my 5 year old laptop, it speeds it up by 60% so
it's obviously not a complete waste of time. Ofcourse, YMMV :)

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

В списке pgsql-patches по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: SLRU_BLCKSZ
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: [PATCH] Compression and on-disk sorting