Re: [GENERAL] 7.4Beta

Поиск
Список
Период
Сортировка
От Andreas Pflug
Тема Re: [GENERAL] 7.4Beta
Дата
Msg-id 3F3D2807.2040007@pse-consulting.de
обсуждение исходный текст
Ответ на Re: [GENERAL] 7.4Beta  ("Dann Corbit" <DCorbit@connx.com>)
Список pgsql-hackers
Dann Corbit wrote:

>Simplification of bulk operations can be very important for customers
>(on the other hand).  For the CONNX tool set, we offer an escape on
>INSERT/SELECT that performs the operation in bulk mode.
>
>There are serious downsides to bulk operations also (such as not being
>logged and therefore existing outside of a transaction).  Therefore,
>they are useful really only in cases where a complete destruction and
>repopulation is called for.  If anything goes haywire, you can't simply
>roll it back.
>
>Yet the speed savings can be enormous (orders of magnitude).
>
>Compared to iteration over a set of prepared inserts, a bulk insert
>(such as using Microsoft's BCP API or Oracles Direct Path loading) can
>be 100 times faster.  If you are moving gigabytes of data and performing
>a complete refresh, the method to use becomes obvious.
>
>When we go outside of the language bounds, a curly braced escape
>notation is used.  For instance, an insert/select might look like this:
>INSERT INTO <destination_table> SELECT <column_list> FROM <source_table>
>{fn commitcount 1000} {bulkmode}
>The commit count says to use batches of 1000 rows and bulkmode says to
>use the fastest possible insert method.
>
>Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data
>source as a table.  Then, with a syntax such as the above (or SELECT
>INTO etc.), you could very rapidly move data from one system into
>another.
>  
>
When saying "bulk operation" I don't necessarily mean using bulk load or 
stuff like that. What I mean is handling large amounts of similar data 
at the same time. That doesn't say anything about transactions or 
logging problems.
Imagine you have 100k or rows to load, each having FKs to (hopefully) 
existing rows
- Step 1: load the rows into a temp table
- Step 2: identify duplicates, logging  and deleting them
- Step 3: insert all data satisfying the FK constraints.
- Step 4: log all that didn't insert.

This are relatively few statements (not the simplest), which can be 
handled in a safe manner.
Just an example, how a RDBMS can do the job in a mass oriented (and thus 
optimizable) way.

Regards,
Andreas




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Dann Corbit"
Дата:
Сообщение: Re: [GENERAL] 7.4Beta
Следующее
От: Barry Lind
Дата:
Сообщение: Re: 7.4 LOG: invalid message length