Re: MVCC for massively parallel inserts

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: MVCC for massively parallel inserts
Дата
Msg-id 87smit1y79.fsf@stark.dyndns.tv
обсуждение исходный текст
Ответ на Re: MVCC for massively parallel inserts  (Alex Satrapa <alex@lintelsys.com.au>)
Ответы Re: MVCC for massively parallel inserts  ("Joshua D. Drake" <jd@commandprompt.com>)
Список pgsql-general
Alex Satrapa <alex@lintelsys.com.au> writes:

> Properly set up in a RAID-1/0 array, you'll get much better "bandwidth" out of
> those drives. Whether you RAID in software or hardware is up to you and your
> budget - but if you choose hardware, make sure you actually get a hardware RAID
> controller, not one of the cheapies which are just
> multiple-IDE-controllers-with-special-drivers.
>
> We use a 3Ware 7500 (can't remember the model name for the life of me), with 8
> drives in a 4x2 RAID 1/0.

I would agree and if you really need the I/O bandwidth you can go to much
larger stripe sets than even this. The documentation I've seen before
suggested there were benefits up to stripe sets as large as twelve disks
across. That would be 24 drives if you're also doing mirroring.

Ideally separating WAL, index, and heap files is good, but you would have to
experiment to see which works out fastest for a given number of drives.

There are also some alternative approaches that could increase your
throughput. For example, you could have your multiple machines receiving the
data log the data to text files. Then you could copy the text files over to
the database periodically and load the with COPY which is faster than a
database insert.

Also, if it fits your model you could load the data into fresh unindexed
tables and then build a new index. Building a new index is a quicker operation
than handling individual inserts. That would make selects more complex though,
but you perhaps that's not a concern.

> IIRC, if the inserts are done in a transaction, the indexing gets done at the
> end of the batch rather than after each insert.

I believe this is wrong. The whole point of postgres's style of MVCC is that
each transaction can go ahead and do whatever modifications it needs and mark
it with its transaction id, any other transaction simply ignores the data
marked with transaction ids of uncommitted transactions. When commit time
arrives there's very little work to do to do the commit beyond simply marking
the transaction as committed.

--
greg

В списке pgsql-general по дате отправления:

Предыдущее
От: Alex Satrapa
Дата:
Сообщение: Re: MVCC for massively parallel inserts
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: MVCC for massively parallel inserts