Обсуждение: Synthetic keys and index fillfactor

Поиск
Список
Период
Сортировка

Synthetic keys and index fillfactor

От
Ron
Дата:
(This mostly pertains to recreating a PK on an existing table.)

Is there any reason to have the PK index on an ever-increasing field (for 
example SERIAL, sequence or timestamp fed by clock_timestamp() at time zone 
'UTC') be anything but fillfactor=100?

New records will always be added to the "lower right hand corner" of the 
tree, so having 20% empty space in the rest of the tree would just waste 
space (mainly buffers, but disk space could even start to add up on Very Big 
Tables).

-- 
Born in Arizona, moved to Babylonia.



Re: Synthetic keys and index fillfactor

От
"David G. Johnston"
Дата:
On Thu, Jan 12, 2023 at 3:45 PM Ron <ronljohnsonjr@gmail.com> wrote:

(This mostly pertains to recreating a PK on an existing table.)

Is there any reason to have the PK index on an ever-increasing field (for
example SERIAL, sequence or timestamp fed by clock_timestamp() at time zone
'UTC') be anything but fillfactor=100?

New records will always be added to the "lower right hand corner" of the
tree, so having 20% empty space in the rest of the tree would just waste
space (mainly buffers, but disk space could even start to add up on Very Big
Tables).


Yes, at least that is what I gather from the advice on the CREATE INDEX page.


David J.

Re: Synthetic keys and index fillfactor

От
Gavan Schneider
Дата:

On 13 Jan 2023, at 9:54, David G. Johnston wrote:

On Thu, Jan 12, 2023 at 3:45 PM Ron <ronljohnsonjr@gmail.com> wrote:

(This mostly pertains to recreating a PK on an existing table.)

Is there any reason to have the PK index on an ever-increasing field (for
example SERIAL, sequence or timestamp fed by clock_timestamp() at time
zone
'UTC') be anything but fillfactor=100?

New records will always be added to the "lower right hand corner" of the
tree, so having 20% empty space in the rest of the tree would just waste
space (mainly buffers, but disk space could even start to add up on Very
Big Tables).

Yes, at least that is what I gather from the advice on the CREATE INDEX
page.

https://www.postgresql.org/docs/15/sql-createindex.html

My reading would say otherwise even in the context of monotonic serial additions

Quote —
You should only consider [fillfactor 100] when you are completely sure that the table is static
(i.e. that it will never be affected by either inserts or updates). A fillfactor setting of 100 otherwise
risks harming performance: even a few updates or inserts will cause a sudden flood of page splits.
(My emphasis added)

Checking https://www.postgresql.org/docs/current/btree-implementation.html#67.4.1
My understanding of the “leaf” and “internal” pages of the B-tree structure is that the the binary search gets it effectiveness when starting more or less in the middle of the existing index range and is most efficient when the tree is well balanced (ie., “leaf” values less-than and greater-than in equal numbers to the left and right at each “internal page”). There is a point when the lopsided tree that develops from always adding a bigger index value to the lower right corner will require a “root page split” (aka rebalance?). I am thinking a bit of padding helps make this process more in-place and less like a full index rebuild which is something that would really hurt on a VBTˇ as more items get added.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

Re: Synthetic keys and index fillfactor

От
Ron
Дата:
On 1/12/23 16:54, David G. Johnston wrote:
On Thu, Jan 12, 2023 at 3:45 PM Ron <ronljohnsonjr@gmail.com> wrote:

(This mostly pertains to recreating a PK on an existing table.)

Is there any reason to have the PK index on an ever-increasing field (for
example SERIAL, sequence or timestamp fed by clock_timestamp() at time zone
'UTC') be anything but fillfactor=100?

New records will always be added to the "lower right hand corner" of the
tree, so having 20% empty space in the rest of the tree would just waste
space (mainly buffers, but disk space could even start to add up on Very Big
Tables).


Yes, at least that is what I gather from the advice on the CREATE INDEX page.



"even a few updates or inserts will cause a sudden flood of page splits" shouldn't matter for ever-increasing PK values, since you're always filling that bottom right corner anyway.

This might be a killer if the applications update many indexed fields:
https://www.postgresql.org/docs/15/btree-implementation.html#BTREE-DELETION
"Changing the value of only one column covered by one index during an UPDATE always necessitates a new set of index tuples — one for each and every index on the table."

--
Born in Arizona, moved to Babylonia.

Re: Synthetic keys and index fillfactor

От
"David G. Johnston"
Дата:
On Thu, Jan 12, 2023 at 4:33 PM Gavan Schneider <list.pg.gavan@pendari.org> wrote:

On 13 Jan 2023, at 9:54, David G. Johnston wrote:

On Thu, Jan 12, 2023 at 3:45 PM Ron <ronljohnsonjr@gmail.com> wrote:

(This mostly pertains to recreating a PK on an existing table.)

Is there any reason to have the PK index on an ever-increasing field (for
example SERIAL, sequence or timestamp fed by clock_timestamp() at time
zone
'UTC') be anything but fillfactor=100?

New records will always be added to the "lower right hand corner" of the
tree, so having 20% empty space in the rest of the tree would just waste
space (mainly buffers, but disk space could even start to add up on Very
Big Tables).

Yes, at least that is what I gather from the advice on the CREATE INDEX
page.

https://www.postgresql.org/docs/15/sql-createindex.html

My reading would say otherwise even in the context of monotonic serial additions

Quote —
You should only consider [fillfactor 100] when you are completely sure that the table is static
(i.e. that it will never be affected by either inserts or updates). A fillfactor setting of 100 otherwise
risks harming performance: even a few updates or inserts will cause a sudden flood of page splits.
(My emphasis added)


Yep, and as the question was: is there any reason to use a fillfactor other than 100, the quoted material provides said reasons and thus "Yes" is the correct answer.  We interpreted the same material in the same way, I just let the documentation speak for itself.

I'll admit it seems a bit counter-intuitive, and there really isn't a "why" offered in the user-facing documentation, but for 10% I'm not too worried about it.

David J.

Re: Synthetic keys and index fillfactor

От
Gavan Schneider
Дата:
With respect to “not 100%” for index packing…

On 13 Jan 2023, at 11:17, David G. Johnston wrote:

> I'll admit it seems a bit counter-intuitive, and there really isn't a "why"
> offered in the user-facing documentation, but for 10% I'm not too worried
> about it.
>
I have had the same “why” and this conversation got me thinking…

The resulting “why” is based on what would be needed to maintain the ACID requirement while the index is both in use
andbeing reorganised. Specifically, when one worker is using the index it has its “horizon” (ie., version) of the index
fixed,another worker might add to the data forcing an index rebalance,  this means two, and maybe more, workers are in
thesame index each has their view/version of that one index. This “multiverse” is supported by multiple versions of any
givennode. This can be accommodated in the spare part of the index packing. The spare space allows the index
rebalancingto be written into blocks that may already be buffered. Once the index rework is complete, WAL has been
written,ACID compliance assured then the revised index (with all its freshly written nodes) is declared open for
businessand new workers will get this version for their horizon. Older workers will complete on their version, and the
nodessupporting their view will then out of date,  and the space can return to the block’s free area . The cycle
repeats…

Likely I have many of the details wrong but it’s an explanation that’s good enough for my purposes. :)

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem —
neat,plausible, and wrong. 
— H. L. Mencken, 1920



Re: Synthetic keys and index fillfactor

От
"David G. Johnston"
Дата:
On Thu, Jan 12, 2023 at 9:25 PM Gavan Schneider <list.pg.gavan@pendari.org> wrote:
With respect to “not 100%” for index packing…

On 13 Jan 2023, at 11:17, David G. Johnston wrote:

> I'll admit it seems a bit counter-intuitive, and there really isn't a "why"
> offered in the user-facing documentation, but for 10% I'm not too worried
> about it.
>
I have had the same “why” and this conversation got me thinking…

The resulting “why” is based on what would be needed to maintain the ACID requirement while the index is both in use and being reorganised. Specifically, when one worker is using the index it has its “horizon” (ie., version) of the index fixed, another worker might add to the data forcing an index rebalance,  this means two, and maybe more, workers are in the same index each has their view/version of that one index. This “multiverse” is supported by multiple versions of any given node. This can be accommodated in the spare part of the index packing. The spare space allows the index rebalancing to be written into blocks that may already be buffered. Once the index rework is complete, WAL has been written, ACID compliance assured then the revised index (with all its freshly written nodes) is declared open for business and new workers will get this version for their horizon. Older workers will complete on their version, and the nodes supporting their view will then out of date,  and the space can return to the block’s free area . The cycle repeats…

Likely I have many of the details wrong but it’s an explanation that’s good enough for my purposes. :)


In short, heap pages can benefit from (but do not get by default) free space to optimize MVCC-related work.  Indexes can also benefit from this - and it is also not built into the default - but also have their own non-MVCC related maintenance needs that require space and the 10% of the page default is set aside for this work.

David J.