Re: Relation extension scalability

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Relation extension scalability
Дата	30 марта 2015 г. 03:52:10
Msg-id	CA+Tgmoa8nddwpMA4Emn7sVoNrQ883mn8ZJBiqXa8dm3puKn=qQ@mail.gmail.com обсуждение исходный текст
Ответ на	Relation extension scalability (Andres Freund <andres@2ndquadrant.com>)
Ответы	Re: Relation extension scalability (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

On Sun, Mar 29, 2015 at 2:56 PM, Andres Freund <andres@2ndquadrant.com>
> As a quick recap, relation extension basically works like:
> 1) We lock the relation for extension
> 2) ReadBuffer*(P_NEW) is being called, to extend the relation
> 3) smgrnblocks() is used to find the new target block
> 4) We search for a victim buffer (via BufferAlloc()) to put the new
>    block into
> 5) If dirty the victim buffer is cleaned
> 6) The relation is extended using smgrextend()
> 7) The page is initialized
>
> The problems come from 4) and 5) potentially each taking a fair
> while. If the working set mostly fits into shared_buffers 4) can
> requiring iterating over all shared buffers several times to find a
> victim buffer. If the IO subsystem is buys and/or we've hit the kernel's
> dirty limits 5) can take a couple seconds.

Interesting.  I had always assumed the bottleneck was waiting for the
filesystem to extend the relation.

> Secondly I think we could maybe remove the requirement of needing an
> extension lock alltogether. It's primarily required because we're
> worried that somebody else can come along, read the page, and initialize
> it before us. ISTM that could be resolved by *not* writing any data via
> smgrextend()/mdextend(). If we instead only do the write once we've read
> in & locked the page exclusively there's no need for the extension
> lock. We probably still should write out the new page to the OS
> immediately once we've initialized it; to avoid creating sparse files.
>
> The other reason we need the extension lock is that code like
> lazy_scan_heap() and btvacuumscan() that tries to avoid initializing
> pages that are about to be initilized by the extending backend. I think
> we should just remove that code and deal with the problem by retrying in
> the extending backend; that's why I think moving extension to a
> different file might be helpful.

I thought the primary reason we did this is because we wanted to
write-and-fsync the block so that, if we're out of disk space, any
attendant failure will happen before we put data into the block.  Once
we've initialized the block, a subsequent failure to write or fsync it
will be hard to recover from; basically, we won't be able to
checkpoint any more.  If we discover the problem while the block is
still all-zeroes, the transaction that uncovers the problem errors
out, but the system as a whole is still OK.

Or at least, I think.  Maybe I'm misunderstanding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Michael Paquier
Дата: 30 марта 2015 г., 03:49:57
Сообщение: Re: Rounding to even for numeric data type

Следующее

От: Robert Haas
Дата: 30 марта 2015 г., 03:52:29
Сообщение: Re: getting rid of "thread fork emulation" in pgbench?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Relation extension scalability

Предыдущее

Следующее