Re: documentation on HOT

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: documentation on HOT
Дата
Msg-id CAKFQuwZ_-k6ny-tbV-AZT142vBc2eyK6LtMXxQPSCjXFTM95PQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: documentation on HOT  ("Jonathan S. Katz" <jkatz@postgresql.org>)
Ответы Re: documentation on HOT  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-docs
On Fri, Jul 22, 2022 at 8:09 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
On 7/22/22 8:51 AM, Bruce Momjian wrote:
> On Thu, Jul 21, 2022 at 10:02:18PM -0400, Bruce Momjian wrote:
>> On Sun, Feb  6, 2022 at 09:29:56PM -0500, Jonathan Katz wrote:
>>> I agree with Bruce's point that we should have a new section (or
>>> subsection). As I mentioned in my previous post, given HOT involves
>>> indexing, I would suggest putting it there.
>>>
>>> I think that something that follows the general outline of Laurenz's post
>>> would satisfy the user requirements. It explains at a high level what HOT
>>> is, it's advantages, and how it works.
>>
>> Attached is a patch that adds a new HOT section to the storage chapter.
>
> I wasn't happy with the last paragraph so I added some more details.
> URL contents updated too:
>
>       https://momjian.us/tmp/pgsql/storage-hot.html

Thanks! This is great. Probably the most concise and clear explanation I
have seen for HOT, which is exactly what we need for the docs :)

A few suggestions:

s/Fortunately, there is/To help reduce overhead, PostgreSQL has/

s/In summary, h/H/

 > You can increase the change of HOT updates by using non-default table
fillfactor settings.

I think we should expand on this and explain how adjusting "fillfactor"
will affect this. I think that may change the final sentence too.


I think we need to expose the information regarding columns used in predicates here.

"(Here, "indexed column" means any column referenced
at all in an index definition, including for example columns that are
tested in a partial-index predicate but are not stored in the index.)"

I get it is an implementation detail but explaining the name seems like a good thing to do as well:

"Without HOT, every version of a row in an update chain has its own index
entries, even if all indexed columns are the same.  With HOT, a new tuple
placed on the same page and with all indexed columns the same as its
parent row version does not get new index entries.  This means there is
only one index entry for the entire update chain on the heap page.
An index-entry-less tuple is marked with the HEAP_ONLY_TUPLE flag."

Where the last sentence becomes: "Those index-entry-less tuples (yeah, still dislike triple-hypenation...) are thus named "Heap-Only Tuples".

(I've actually incorporated this as I think it should be down below, as a lead-in to the listing of conditions for when the optimization can be used.)

Then maybe "can be removed during select" should be reworded as:

"No longer visible heap-only tuples can be removed during normal
operation, including <command>SELECT</command>s, instead of requiring
periodic vacuum operations."

The original heap entry the index points to cannot be removed. "Old versions of heap-only tuples" vs. "No longer visible heap-only tuples" is probably a style choice.  There are basically three different "versions" in context here though so avoiding "old versions" has some appeal to me.

I'm not a fan of:

"Fortunately, there is an automatic system..."

I'd like to give credit to the fact we engineered a solution to the downsides, so change the lead-in paragraph to the conditions listing to be:

"To mitigate these downsides PostgreSQL implements an optimization whereby sometimes only the heap tuple is created, not the index entry, when performing an update.  In a case of giving things obvious and meaningful names, this is the Heap-Only Tuple (HOT) Optimization.  This optimization is possible when:"

David J.

В списке pgsql-docs по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: Question about role attributes docs
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: documentation on HOT