Re: Compression and on-disk sorting

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Compression and on-disk sorting
Дата
Msg-id 87ejysr3fs.fsf@stark.xeocode.com
обсуждение исходный текст
Ответ на Re: Compression and on-disk sorting  (Andrew Piskorski <atp@piskorski.com>)
Ответы Re: Compression and on-disk sorting
Список pgsql-hackers
Andrew Piskorski <atp@piskorski.com> writes:

> Things like enums and 1 bit booleans certainly could be useful, but
> they cannot take advantage of duplicate values across multiple rows at
> all, even if 1000 rows have the exact same value in their "date"
> column and are all in the same disk block, right?

That's an interesting direction to go in. Generic algorithms would still help
in that case since the identical value would occur more frequently than other
values it would be encoded in a smaller symbol. But there's going to be a
limit to how compressed it can get the data.

The ideal way to handle the situation you're describing would be to interleave
the tuples so that you have all 1000 values of the first column, followed by
all 1000 values of the second column and so on. Then you run a generic
algorithm on this and it achieves very high compression rates since there are
a lot of repeating patterns.

I don't see how you build a working database with data in this form however.
For example, a single insert would require updating small pieces of data
across the entire table. Perhaps there's some middle ground with interleaving
the tuples within a single compressed page, or something like that?

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [GENERAL] Querying libpq compile time options
Следующее
От: "Larry Rosenman"
Дата:
Сообщение: Re: [GENERAL] Querying libpq compile time options