Re: pg_xlog becomes extremely large during CREATE INDEX

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: pg_xlog becomes extremely large during CREATE INDEX
Дата
Msg-id 7536.1084639467@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: pg_xlog becomes extremely large during CREATE INDEX  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Список pgsql-general
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Sat, May 15, 2004 at 12:23:18AM -0400, Tom Lane wrote:
>> ... It tries to write all the levels of the btree in
>> parallel, so it is spitting out level-zero pages at a great rate,
>> level-one pages at a lesser rate, etc.  For a large index there could
>> be many btree levels, and pages in the higher levels will be held locked
>> in the shared buffer arena for considerable periods.

> I'm not sure how btree from-scratch-building work, but could it be made
> so that it first build level 0 completely, scanning the heap; then build
> level 1 scanning the level 0, and so on?

I don't think that would be a win; it would require an extra read pass
over nearly all of the index.  The idea here is to reduce the amount of
I/O, not increase it.

> I also wonder why index creation would write XLog information;

Yeah, that was implicit in my comment about not using the buffer
manager, and it was why I was concerned about being able to fsync
the index file.  Without xlog for backup, you'd better be able to
force the index to disk before you commit.

It turns out that btree index build currently writes each page three
times: once to extend the file, once into WAL, and then finally to write
the completed index page.  It is difficult to do any better as long as
you're using the shared buffer manager.  (When extending a file, the
buffer manager wants to allocate disk space before it will give you a
buffer to write into.  This is the right thing for heap insertions and
extending an existing index, because we don't force buffers to disk
before commit.  So if it didn't work that way, you might commit your
transaction before finding out there's no disk space to hold the data.
But we could dispense with that worry for index build if we ensure that
we write and fsync all the pages before commit.)  So I was thinking
about keeping the same index build process, but working in private
storage instead of shared buffers, and writing direct to smgr.  This
would not only eliminate the locking conflict against checkpoints,
but reduce the amount of I/O involved quite a lot.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Ben
Дата:
Сообщение: Re: one to many
Следующее
От: Alexander Vlasenko
Дата:
Сообщение: unsubscride pqsql-general