Re: Compression and on-disk sorting

Поиск

Список

Период

Сортировка

От	Greg Stark
Тема	Re: Compression and on-disk sorting
Дата	17 мая 2006 г. 14:01:34
Msg-id	87ejysr3fs.fsf@stark.xeocode.com обсуждение исходный текст
Ответ на	Re: Compression and on-disk sorting (Andrew Piskorski <atp@piskorski.com>)
Ответы	Re: Compression and on-disk sorting
Список	pgsql-hackers

Дерево обсуждения

Andrew Piskorski <atp@piskorski.com> writes:

> Things like enums and 1 bit booleans certainly could be useful, but
> they cannot take advantage of duplicate values across multiple rows at
> all, even if 1000 rows have the exact same value in their "date"
> column and are all in the same disk block, right?

That's an interesting direction to go in. Generic algorithms would still help
in that case since the identical value would occur more frequently than other
values it would be encoded in a smaller symbol. But there's going to be a
limit to how compressed it can get the data.

The ideal way to handle the situation you're describing would be to interleave
the tuples so that you have all 1000 values of the first column, followed by
all 1000 values of the second column and so on. Then you run a generic
algorithm on this and it achieves very high compression rates since there are
a lot of repeating patterns.

I don't see how you build a working database with data in this form however.
For example, a single insert would require updating small pieces of data
across the entire table. Perhaps there's some middle ground with interleaving
the tuples within a single compressed page, or something like that?

-- 
greg

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Compression and on-disk sorting