Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

Поиск
Список
Период
Сортировка
Craig Ringer wrote:
> I'm encountering an odd issue with a bulk import query using PostgreSQL
> 8.3. After a 400,000 row import into a just-truncated table `booking', a
> sequential scan run on the table in the same transaction is incredibly
> slow, taking ~ 166738.047 ms. After a:
>     `COMMIT; BEGIN;'
> the same query runs in 712.615 ms, with almost all the time difference
> being in the sequential scan of the `booking' table [schema at end of post].
>
> The table is populated by a complex pl/pgsql function that draws from
> several other tables to convert data from another app's format.

You must be having an exception handler block in that pl/pgsql function,
which implicitly creates a new subtransaction on each invocation of the
exception handler block, so you end up with hundreds of thousands of
committed subtransactions. For each row in the seq scan, the list of
subtransactions is scanned, to see if the transaction that inserted the
row is part of the current top-level transaction. That's fine for a
handful of subtransactions, but it gets really slow with large numbers
of them, as you've seen. It's an O(n^2) operation, where n is the number
of rows inserted, so you'll be in even more trouble if the number of
rows increases.

As a work-around, avoid using exception handlers, or process more than 1
row per function invocation. Or COMMIT the transaction, as you did.

For 8.4, it would be nice to improve that. I tested that on my laptop
with a similarly-sized table, inserting each row in a pl/pgsql function
with an exception handler, and I got very similar run times. According
to oprofile, all the time is spent in TransactionIdIsInProgress. I think
it would be pretty straightforward to store the committed subtransaction
ids in a sorted array, instead of a linked list, and binary search. Or
to use a hash table. That should eliminate this problem, though there is
still other places as well where a large number of subtransactions will
hurt performance.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit