Re: Insert performance for large transaction with multiple COPY FROM

Поиск

Список

Период

Сортировка

От	Horst Dehmer
Тема	Re: Insert performance for large transaction with multiple COPY FROM
Дата	16 января 2013 г. 02:44:37
Msg-id	B1146DE5-749D-4845-AB47-BAA86482AC40@gmail.com обсуждение исходный текст
Ответ на	Insert performance for large transaction with multiple COPY FROM (Jeff Janes <jeff.janes@gmail.com>)
Ответы	Re: Insert performance for large transaction with multiple COPY FROM
Список	pgsql-performance

Дерево обсуждения

After more testing I have gained some insights:

The foreign key constraints are NOT responsible for the low COPY FROM performance in my case. I forgot about the indexes which are created along with the FK constraints.

Besides the primary key

CONSTRAINT obj_item_loc_pkey PRIMARY KEY (obj_item_id, loc_id, obj_item_loc_ix),

the table OBJ_ITEM_LOC has four additional indexes (let's call them idx_1 through idx_4)

CREATE INDEX idx_1 ON obj_item_loc USING btree (rec_id);

CREATE INDEX idx_2 ON obj_item_loc USING btree (loc_id);

CREATE INDEX idx_3 ON obj_item_loc USING btree (rptd_id);

CREATE INDEX idx_4 ON obj_item_loc USING btree (obj_item_id);

The indexes 2 to 4 are intended to speed up joins between OBJ_ITEM_LOC and

LOC (loc_id), RPTD (rptd_id) and OBJ_ITEM (obj_item) respectively (and I'm highly suspicious if this makes sense at all.)

idx_4 together with a simple select in the tables on-insert trigger is slowing things down considerably.

With idx_4 and the trigger rates are

44100 rows, 0:00:04.576, 9637 r/s: LOC

2101 rows, 0:00:00.221, 9506 r/s: OBJ_ITEM

2101 rows, 0:00:00.278, 7557 r/s: ORG

94713 rows, 0:00:18.502, 5119 r/s: RPTD

44100 rows, 0:03:03.437, 240 r/s: OBJ_ITEM_LOC

imported 187115 record in 0:03:27.081 => 903 r/s

pg_statio comes up with same big numbers (reads = bad, hits = not so bad?):

relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit

--------------+----------------+---------------+---------------+--------------

obj_item_loc | 1262 | 9908013 | 1199 | 1682005

rptd | 4434 | 279022 | 1806 | 1270746

org | 38 | 191559 | 19 | 201071

obj_item | 84 | 92476 | 29 | 104134

loc | 768 | 88902 | 597 | 352680

(5 rows)

Dropping idx_1, idx_2 and idx_3 at the same time has no significant impact. But take away idx_4 only:

44100 rows, 0:00:04.558, 9675 r/s: LOC

2101 rows, 0:00:00.220, 9593 r/s: OBJ_ITEM

2101 rows, 0:00:00.275, 7640 r/s: ORG

94713 rows, 0:00:18.407, 5145 r/s: RPTD

44100 rows, 0:00:11.433, 3857 r/s: OBJ_ITEM_LOC

imported 187115 record in 0:00:34.938 => 5355 r/s

Hm, not bad. Now for the select statement in the on insert trigger:

SELECT *

FROM obj_item_loc

WHERE obj_item_loc.obj_item_id = NEW.obj_item_id

AND obj_item_loc.loc_id = NEW.loc_id

AND obj_item_loc.obj_item_loc_ix = NEW.obj_item_loc_ix

INTO old;

Executing this query AFTER the bulk insert (and probably some auto-vacuuming) the query plan looks like this

explain analyze

select *

from obj_item_loc

where (obj_item_id, loc_id, obj_item_loc_ix) =

(10903011224100014650,10903010224100089226,10900024100000140894)

QUERY PLAN

--------------

Index Scan using obj_item_loc_loc_id_idx on obj_item_loc

(cost=0.00..8.36 rows=1 width=329)

(actual time=0.039..0.040 rows=1 loops=1)

Index Cond: (loc_id = 10903010224100089226::numeric)

Filter: ((obj_item_id = 10903011224100014650::numeric) AND

(obj_item_loc_ix = 10900024100000140894::numeric))

Total runtime: 0.079 ms

After some head-scratching I realized that obj_item_id is just referencing a meager 2101 rows which probably makes not for a good index candidate. So, the query plan make some sense, I guess.

Now I have some (more) questions:

1. How do I know which index (if any) is chosen for a select statement inside a trigger during a bulk load transaction? (or for that matter: a series of recursive plpgsql functions)

2. The query planner depends on stats collected by auto-vacuum/vacuum analyze, right? Does stats collecting also happen during a lengthy transaction?

3. Is it possible (or even advisable) to trigger vacuum analyze inside an ongoing transaction. Let's say load 10,000 rows of table A, analyze table A, insert the next 10,000 rows, analyze again, ...

I'm sorry if this is basic stuff I'm asking here, but especially point 2 is bothering me.

Kind regards

Horst Dehmer

On 12.01.2013, at 01:17, Jeff Janes <jeff.janes@gmail.com> wrote:

On Friday, January 11, 2013, Horst Dehmer wrote:

Except - and that's the wall I'm hitting - for one table which yielded just 75 records/second.
The main 'problem' seem to be the FK constraints. Dropping just them restored insert performance for this table to 6k records/s.

It sure sounds like you don't have enough RAM to hold the foreign-key table data needed to check the constraints, so every insert needs one disk revolution to fetch the data.

If you drop the indexes and constraints one at a time until it speeds up, is there a certain one that is the culprit?

You can look in pg_statio_user_tables to see what tables and indexes have high io being driven by the bulk loading.

Use "top" to see of the server is mostly IO bound or CPU bound.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее

От: Jeff Janes
Дата: 14 января 2013 г., 21:24:55
Сообщение: Re: Partition table in 9.0.x?

Следующее

От: Ali Pouya
Дата: 17 января 2013 г., 17:38:20
Сообщение: Re: Performance on Bulk Insert to Partitioned Table

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Insert performance for large transaction with multiple COPY FROM

Предыдущее

Следующее