Re: Significantly larger toast tables on 8.4?

Поиск
Список
Период
Сортировка
От Gregory Maxwell
Тема Re: Significantly larger toast tables on 8.4?
Дата
Msg-id e692861c0901070644y6f55f441gb39397ab4aca736b@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Significantly larger toast tables on 8.4?  (Martijn van Oosterhout <kleptog@svana.org>)
Список pgsql-hackers
On Fri, Jan 2, 2009 at 5:48 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
> So you compromise. You split the data into say 1MB blobs and compress
> each individually. Then if someone does a substring at offset 3MB you
> can find it quickly. This barely costs you anything in the compression
> ratio mostly.
>
> Implementation though, that's harder. The size of the blobs is tunable
> also. I imagine the optimal value will probably be around 100KB. (12
> blocks uncompressed).

Or have the database do that internally:  With the available fast
compression algorithms (zlib; lzo; lzf; etc) the diminishing return
from larger compression block sizes kicks in rather quickly. Other
algos like LZMA or BZIP gain more from bigger block sizes, but I
expect all of them are too slow to ever consider using in PostgreSQL.

So, I expect that the compression loss from compressing in chunks of
64kbytes would be minimal. The database could then include a list of
offsets for the 64kbyte chunks at the beginning of the field, or
something like that.  A short substring would then require
decompressing just one or two blocks, far less overhead then
decompressing everything.

It would probably be worthwhile to graph compression ratio vs block
size for some reasonable input.  I'd offer to do it; but I doubt I
have a reasonable test set for this.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: reducing statistics write overhead
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Multiplexing SUGUSR1