Re: autocommit (true/false) for more than 1 million records

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: autocommit (true/false) for more than 1 million records
Дата
Msg-id 1409092428.41025.YahooMailNeo@web122306.mail.ne1.yahoo.com
обсуждение исходный текст
Ответ на Re: autocommit (true/false) for more than 1 million records  (Alex Goncharov <alex.goncharov.usa@gmail.com>)
Ответы Re: autocommit (true/false) for more than 1 million records  (Alex Goncharov <alex.goncharov.usa@gmail.com>)
Список pgsql-performance
Alex Goncharov <alex.goncharov.usa@gmail.com> wrote:

> Suppose I COPY a huge amount of data, e.g. 100 records.
>
> My 99 records are fine for the target, and the 100-th is not --
> it comes with a wrong record format or a target constraint
> violation.
>
> The whole thing is aborted then, and the good 99 records are not
> making it into the target table.

Right.  This is one reason people often batch such copies or check
the data very closely before copying in.

> My question is: Where are these 99 records have been living, on
> the database server, while the 100-th one hasn't come yet, and
> the need to throw the previous data accumulation away has not
> come yet?

They will have been written into the table.  They do not become
visible to any other transaction until and unless the inserting
transaction successfully commits.  These slides may help:

http://momjian.us/main/writings/pgsql/mvcc.pdf

> There have to be some limits to the space and/or counts taken by
> the new, uncommitted, data, while the COPY operation is still in
> progress.  What are they?

Primarily disk space for the table.  If you are not taking
advantage of the "unlogged load" optimization, you will have
written Write Ahead Log (WAL) records, too -- which (depending on
your configuration) you may be archiving.  In that case, you may
need to be concerned about the archive space required.  If you have
foreign keys defined for the table, you may get into trouble on the
RAM used to track pending checks for those constraints.  I would
recommend adding any FKs after you are done with the big bulk load.

PostgreSQL does *not* have a "rollback log" which will impose a limit.

> Say, I am COPYing 100 TB of data and the bad records are close
> to the end of the feed -- how will this all error out?

The rows will all be in the table, but not visible to any other
transaction.  Autovacuum will clean them out in the background, but
if you want to restart your load against an empty table it might be
a good idea to TRUNCATE that table; it will be a lot faster.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-performance по дате отправления:

Предыдущее
От: Alex Goncharov
Дата:
Сообщение: Re: autocommit (true/false) for more than 1 million records
Следующее
От: Alex Goncharov
Дата:
Сообщение: Re: autocommit (true/false) for more than 1 million records