Re: MVCC overheads

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: MVCC overheads
Дата
Msg-id CAMsr+YFgQ9kXFtgR=FJwD=GYfTmhgdaqL9tcZdudKs4eFjMLOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: MVCC overheads  (Pete Stevenson <etep.nosnevets@gmail.com>)
Список pgsql-hackers
On 8 July 2016 at 03:50, Pete Stevenson <etep.nosnevets@gmail.com> wrote:
Hi Simon -

Thanks for the note. I think it's fair to say that I didn't provide enough context, so let me try and elaborate on my question.

Please reply in-line in posts to make it easier to follow conversations with multiple people.
  
It is the case that for the database to implement MVCC it must provide consistent read to multiple different versions of data, i.e. depending on the version used at transaction start.

Not necessarily transaction start; for REPEATABLE READ isolation, statement start is sufficient, or even weaker than that.
 
I'm not an expert on postgresql internals, but this must have some cost.

Sure it does. Disk space, efficiency of use of RAM for disk cache, CPU cost of scanning over not-visible tuples, etc.
 
I think the cost related to MVCC guarantees can roughly be categorized as: creating new versions (linking them in)

The way PostgreSQL does that (read the manual) is pretty lightweight. You will have already found the old tuple so setting its xmax is cheap. Writing the new tuple costs much the same as an insert.
 
version checking on read

Yep. In particular, index scans because PostgreSQL doesn't maintain visibility information in indexes. Read up on PostgreSQL's mvcc implementation, index scans, index-only scans, visibility map, etc.
 
garbage collecting old versions

As implemented in PostgreSQL by VACUUM
 
and then there is an additional cost that I am interested in (again not claiming it is unnecessary in any sense) but there is a cost to generating the log.

The write-ahead log is orthogonal to MVCC. You can have MVCC without WAL (or other write durability). You can have write durability without MVCC. The two are almost entirely unrelated.
 
Thanks, by the way, for the warning about lab vs. reality. That's why I'm asking this question here. I want to keep the hypothetical tagged as such, but find defensible and realistic metrics where those exist, i.e. in this instance, we do have a database that can use MVCC. It should be possible to figure out how much work goes into maintaining that property.

MVCC logic is woven deeply thoughout PostgreSQL. I'm not sure how you'd even begin to offload it in any meaningful way, nor if it'd be useful to do so. Presumably you're thinking of some way to tell the storage layer "show me the table as if it has only rows visible to [this xact]" so Pg doesn't have to do any checking at all. But it's not always that simple. See:

- Logical decoding (time travel)
- VACUUM
- EvalPlanQual, re-checks of updated rows
- ...
 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Showing parallel status in \df+
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: A Modest Upgrade Proposal