Re: extending relations more efficiently

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: extending relations more efficiently
Дата
Msg-id 201205011742.46203.andres@anarazel.de
обсуждение исходный текст
Ответ на Re: extending relations more efficiently  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: extending relations more efficiently  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tuesday, May 01, 2012 05:06:11 PM Robert Haas wrote:
> On Tue, May 1, 2012 at 10:31 AM, Andres Freund <andres@anarazel.de> wrote:
> >> efficient than our current method - I'm guessing that it actually
> >> writes the updated metadata back to disk, where write() does not (this
> >> makes one wonder how safe it is to count on write to have the behavior
> >> we need here in the first place).
> > 
> > Currently the write() doesn't need to be crashsafe because it will be
> > repeated on crash-recovery and a checkpoint will fsync the file.
> 
> That's not what I'm worried about.  If the write() succeeds and then a
> subsequent close() on the filehandle reports an ENOSPC condition that
> means the write didn't really write after all, I am concerned that we
> might not handle that cleanly.
Hm. While write() might not write its state to disk I don't think that can 
imply than that the *in memory* state is inconsistent.
Posix doesn't allow ENOSPC for close() as far as I can see.

> > I don't really see why it would need to compare in the 8kb case. What
> > reason would there be to further extend in that small increments?
> In previous discussions, the concern has been that holding the
> relation extension lock across a multi-block extension would cause
> latency spikes for both the process doing the extensions and any other
> concurrent processes that need the lock.  Obviously if it were
> possible to extend by 64kB in the same time it takes to extend by 8kB
> that would be awesome, but if it takes eight times longer then things
> don't look so good.
Yes, sure.

> > There is the question whether this should be done in the background
> > though, so the relation extension lock is never hit in anything
> > time-critical...
> Yeah, although I'm fuzzy on how and whether that can be made to work,
> which is not to say that it can't.
The biggest problem I see is knowing when to trigger the extension of which 
file without scanning files all the time.

Using some limited size shm-queue of {reltblspc, relfilenode} of to-be-
extended files + a latch is the first thing I can think of. Every time a 
backend initializes a page with offset % EXTEND_SIZE == 0 it adds that table 
to the queue. The background writer extends the file by EXTEND_SIZE * 2 if 
necessary. If the queue is overflown all files are checked. Or the backends 
extend themselves again...
EXTEND_SIZE should probably scale with the table size up to 64MB or so...

> It might also be interesting to provide a mechanism to pre-extend a
> relation to a certain number of blocks, though if we did that we'd
> have to make sure that autovac got the memo not to truncate those
> pages away again.
Hm. I have to say I don't really see a big need to do this if the size of 
preallocation is adaptive to the file size. Sounds like it would add to much 
complications for little benefit.

Andres


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: proposal: additional error fields
Следующее
От: Joey Adams
Дата:
Сообщение: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?