Re: Row Visibility and Table Access Methods

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Row Visibility and Table Access Methods
Дата
Msg-id CAFj8pRDqHF8zGgHgh=SqV4uvn8Fn_H=KhFWmRbuoPwHiTv2RVw@mail.gmail.com
обсуждение исходный текст
Ответ на Row Visibility and Table Access Methods  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: Row Visibility and Table Access Methods  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Список pgsql-hackers


čt 13. 12. 2018 v 10:23 odesílatel Simon Riggs <simon@2ndquadrant.com> napsal:
Currently, tables provide MVCC access semantics as the only option.

A more complete list of desirable semantics in applications are

* MVCC - provide latest consistent view
* Historical - keep all old row versions so that queries can access data as it used to be
* TTL=Duration - keep committed rows around for at most Duration seconds
* Latest - show only the most current committed row version, at the cost of inconsistency
There might be others

Since I see these options as semantics rather than physical, I think we should separate these operations away from Table Access Methods. This allows those semantics to be implemented in different ways for different storage types.

"Historical" access has been discussed many times, so no need to revisit here. Clearly, it is a very poopular idea, just not easily feasible with the current heap access layer. We might want an option for row_visibility retention. For tables with this option set, we would in later releases allow historical queries according to the SQL Standard.

"TTL" or "Time To Live" -  time-limited access to data is available in many other databases. It is simple to implement and we could easily have this in PG12. Implementation is 1) adding the option, 2) adding a time-component into the visibility check for scan and vacuum. This option implies an option exists to specify row_visibility retention.

"Latest" is similar to the way EvalPlanQual works, allowing UPDATE to see the latest version of a row before update, and similar to the way catalog scans work in that any access to a catalog entry sees the latest row based upon an immediate snapshot, not the one taken at the start of a query. It makes sense to allow this as an explicit table-level option, so any data viewed can see the latest version, just as UPDATEs already do. This also means that juggling bloat and old row versions becomes much less of an issue for very high volume update applications such as booking systems or stock tickers. (Clearly, better table access methods would improve on this further and even they would benefit from this simplification of the main issue around MVCC).
Logic for this type of visibility already exists in PG via HeapTupleSatisfiesSelf(), so we are just exposing what is already there to the user; no need to discuss semantics at length.
Again, patches to implement this option are simple and viable for PG12.

User interface are 2 new table-level options
* row_visibility = MVCC (default), TTL, Latest, Historical
* row_visibility_retention_interval = 'system' (default)
For MVCC, the only valid setting would be system, i.e. current MVCC behavior (but this might be altered by specific storage plugin parameters)
For Latest, the only valid setting would be system
For TTL, the interval to retain data for, setting of 0 is not valid
For Historical, the interval to retain old row versions for, 0 means forever

Implementation summary
1. Add new table-level option for row_visibility and row_visibility_retention_interval
2. Add option to heap_beginscan
3. Add option handling in heap prune
4. Add option to tqual

Thoughts?

looks great

Pavel


--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: tab-completion debug print
Следующее
От: David Fetter
Дата:
Сообщение: Re: tab-completion debug print