Using multi-row technique with COPY
От | Simon Riggs |
---|---|
Тема | Using multi-row technique with COPY |
Дата | |
Msg-id | 1133120695.2906.230.camel@localhost.localdomain обсуждение исходный текст |
Ответы |
Re: Using multi-row technique with COPY
Re: Using multi-row technique with COPY Re: Using multi-row technique with COPY |
Список | pgsql-hackers |
Since we just discussed using a multi-row per buffer lock technique with Seq Scans, it seems appropriate to discuss a similar technique with COPY FROM that I had been mulling over. COPY FROM can read in sufficient rows until it has a whole block worth of data, then get a new block and write it all with one pair of BufferLock calls. This would only be possible, AFAICS, when a table does not have any unique indexes or row triggers. It should be possible to take advantage of this automatically when those requirements are met, without any new options. Just as it was with Seq Scans, this is worth about 10% reduction in CPU for a COPY FROM. The main use case for this is initial table loads or restores from dumps. However, it applies more widely since there is no requirement that the table is empty, nor that it has only just been created within the current transaction. FSM access would need to change slightly to allow for whole-block-only requests to be made for heaps, without damaging the average row length calculation. It might be simpler to ignore FSM entirely? As a further enhancement, I would also return to the NOLOGGING option for COPY. Previously we had said that COPY LOCK was the way to go - taking a full table lock to prevent concurrent inserts to a block from a COPY that didn't write WAL and another backend which wanted to write WAL about that block. With the above suggested all-inserts-at-once optimization, it would no longer be a requirement to lock the table. That means we can continue to take advantage of the ability to run multiple COPY loads into the same table. Avoiding writing WAL will further reduce CPU by about 15% and I/O by about 50%. I would also suggest that pgdump be changed to use the NOLOGGING option by default, with an option to work as previously. Comments? Best Regards, Simon Riggs
В списке pgsql-hackers по дате отправления: