Re: Add ZSON extension to /contrib/

Поиск

Список

Период

Сортировка

От	Andrew Dunstan
Тема	Re: Add ZSON extension to /contrib/
Дата	28 мая 2021 г. 17:22:26
Msg-id	09ec9e70-d901-40a6-cbca-b0e375eebb73@dunslane.net обсуждение исходный текст
Ответ на	Re: Add ZSON extension to /contrib/ (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: Add ZSON extension to /contrib/
Список	pgsql-hackers

Дерево обсуждения

On 5/28/21 6:35 AM, Tomas Vondra wrote:
>
>>
>> IMO the main benefit of having different dictionaries is that you
>> could have a small dictionary for small and very structured JSONB
>> fields (e.g. some time-series data), and a large one for large /
>> unstructured JSONB fields, without having the significant performance
>> impact of having that large and varied dictionary on the
>> small&structured field. Although a binary search is log(n) and thus
>> still quite cheap even for large dictionaries, the extra size is
>> certainly not free, and you'll be touching more memory in the process.
>>
> I'm sure we can think of various other arguments for allowing separate
> dictionaries. For example, what if you drop a column? With one huge
> dictionary you're bound to keep the data forever. With per-column dicts
> you can just drop the dict and free disk space / memory.
>
> I also find it hard to believe that no one needs 2**16 strings. I mean,
> 65k is not that much, really. To give an example, I've been toying with
> storing bitcoin blockchain in a database - one way to do that is storing
> each block as a single JSONB document. But each "item" (eg. transaction)
> is identified by a unique hash, so that means (tens of) thousands of
> unique strings *per document*.
>
> Yes, it's a bit silly and extreme, and maybe the compression would not
> help much in this case. But it shows that 2**16 is damn easy to hit.
>
> In other words, this seems like a nice example of survivor bias, where
> we only look at cases for which the existing limitations are acceptable,
> ignoring the (many) remaining cases eliminated by those limitations.
>
>

I don't think we should lightly discard the use of 2 byte keys though.
Maybe we could use a scheme similar to what we use for text lengths,
where the first bit indicates whether we have a 1 byte or 4 byte length
indicator. Many dictionaries will have less that 2^15-1 entries, so they
would use exclusively the smaller keys.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Johannes Graën
Дата: 28 мая 2021 г., 17:12:33
Сообщение: Degression (PG10 > 11, 12 or 13)

Следующее

От: Mark Dilger
Дата: 28 мая 2021 г., 18:30:49
Сообщение: Re: Command statistics system (cmdstats)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Add ZSON extension to /contrib/

Предыдущее

Следующее