Re: Proposal for CSN based snapshots

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Proposal for CSN based snapshots
Дата
Msg-id 5370D323.1070606@vmware.com
обсуждение исходный текст
Ответ на Re: Proposal for CSN based snapshots  (Rajeev rastogi <rajeev.rastogi@huawei.com>)
Ответы Re: Proposal for CSN based snapshots  (Andres Freund <andres@2ndquadrant.com>)
Re: Proposal for CSN based snapshots  (Ants Aasma <ants@cybertec.at>)
Re: Proposal for CSN based snapshots  (Greg Stark <stark@mit.edu>)
Re: Proposal for CSN based snapshots  (Rajeev rastogi <rajeev.rastogi@huawei.com>)
Re: Proposal for CSN based snapshots  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On 01/24/2014 02:10 PM, Rajeev rastogi wrote:
> We are also planning to implement CSN based snapshot.
> So I am curious to know whether any further development is happening on this.

I started looking into this, and plan to work on this for 9.5. It's a 
big project, so any help is welcome. The design I have in mind is to use 
the LSN of the commit record as the CSN (as Greg Stark suggested).

Some problems and solutions I have been thinking of:

The core of the design is to store the LSN of the commit record in 
pg_clog. Currently, we only store 2 bits per transaction there, 
indicating if the transaction committed or not, but the patch will 
expand it to 64 bits, to store the LSN. To check the visibility of an 
XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and 
compared with the snapshot's LSN.

Currently, before consulting the clog for an XID's status, it is 
necessary to first check if the transaction is still in progress by 
scanning the proc array. To get rid of that requirement, just before 
writing the commit record in the WAL, the backend will mark the clog 
slot with a magic value that says "I'm just about to commit". After 
writing the commit record, it is replaced with the record's actual LSN. 
If a backend sees the magic value in the clog, it will wait for the 
transaction to finish the insertion, and then check again to get the 
real LSN. I'm thinking of just using XactLockTableWait() for that. This 
mechanism makes the insertion of a commit WAL record and updating the 
clog appear atomic to the rest of the system.

With this mechanism, taking a snapshot is just a matter of reading the 
current WAL insertion point. There is no need to scan the proc array, 
which is good. However, it probably still makes sense to record an xmin 
and an xmax in SnapshotData, for performance reasons. An xmax, in 
particular, will allow us to skip checking the clog for transactions 
that will surely not be visible. We will no longer track the latest 
completed XID or the xmin like we do today, but we can use 
SharedVariableCache->nextXid as a conservative value for xmax, and keep 
a cached global xmin value in shared memory, updated when convenient, 
that can be just copied to the snapshot.

In theory, we could use a snapshot LSN as the cutoff-point for 
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but 
that makes me feel uneasy. In any case, I think we'll need a cut-off 
point defined as an XID rather than an LSN for freezing purposes. In 
particular, we need a cut-off XID to determine how far the pg_clog can 
be truncated, and to store in relfrozenxid. So, we will still need the 
concept of a global oldest xmin.

When a snapshot is just an LSN, taking a snapshot can no longer 
calculate an xmin, like we currently do (there will be a snapshot LSN in 
place of an xmin in the proc array). So we will need a new mechanism to 
calculate the global oldest xmin. First scan the proc array to find the 
oldest still in-progress XID. That - 1 will become the new oldest global 
xmin, after all currently active snapshots have finished. We don't want 
to sleep in GetOldestXmin(), waiting for the snapshots to finish, so we 
should periodically advance a system-wide oldest xmin value, for example 
whenever the walwrite process wakes up, so that when we need an 
oldest-xmin value, we will always have a fairly recently calculated 
value ready in shared memory.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: cannot to compile PL/V8 on Fedora 20
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: cannot to compile PL/V8 on Fedora 20