Re: Sequential vs. random values - number of pages in B-tree

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема Re: Sequential vs. random values - number of pages in B-tree
Дата
Msg-id 5d6320bd-0268-4e8b-9cd1-159299cc03a3@mm
обсуждение исходный текст
Ответ на Re: Sequential vs. random values - number of pages in B-tree  (Francisco Olarte <folarte@peoplecall.com>)
Ответы Re: Sequential vs. random values - number of pages in B-tree
Список pgsql-general
    Francisco Olarte wrote:

> I think there are some pseudo-random number generators which
> can be made to work with any range, but do not recall which ones right
> now.

There's a simple technique that works on top of a Feistel network,
called the cycle-walking cipher. Described for instance at:
http://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf

I'm using the opportunity to add a wiki page:
https://wiki.postgresql.org/wiki/Pseudo_encrypt_constrained_to_an_arbitrary_range
with sample plgsql code for the [0..10,000,000] range that might be useful
for other cases.

But for the btree fragmentation and final size issue, TBH I don't expect
that constraining the values within that smaller range will make any
difference
in the tests, because it's the dispersion that matters, not the values
themselves.

I mean that, whether the values are well dispersed in the [0..1e7] range or
equally well dispersed in the [0..2**32] range, the probability of a newly
inserted value to compare greater or lower to any previous values of the list
should be the same, so shouldn't the page splits be the same, statistically
speaking?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


В списке pgsql-general по дате отправления:

Предыдущее
От: Thomas Güttler
Дата:
Сообщение: Re: PG vs ElasticSearch for Logs
Следующее
От: John R Pierce
Дата:
Сообщение: Re: PG vs ElasticSearch for Logs