Big question on insert performance/using COPY FROM
От | Morgan Kita |
---|---|
Тема | Big question on insert performance/using COPY FROM |
Дата | |
Msg-id | 08B420FF5BF7BC42A064212C2EB768801C10BE@neutron.verseon.com обсуждение исходный текст |
Ответы |
Re: Big question on insert performance/using COPY FROM
(Richard Huxton <dev@archonet.com>)
|
Список | pgsql-performance |
Hi, I am currently trying to speed up the insertion of bulk loads to my database. I have fiddled with all of the parameters thatI have seen suggested(aka checkpoint_segments, checkpoint_timeout, maintinence_work_mem, and shared buffers) with nosuccess. I even turned off fysnc with no effect so I am pretty sure the biggest problem is that the DB is CPU limited atthe moment because of the rather weak machine that postmaster is running on(Athlon 2400+ xp with 512 RAM), but that willchange in the future so I am trying to get performance increases that don't involve changing the machine at the moment. I am currently inserting into the database through lipqxx's C++ interface. I am using prepared statements that perform regularinserts. I would like to use COPY FROM since I have read so much about its increased performance with respect to INSERT,but I am not sure how to use it in my case. So let me give you an idea on how the tables are laid out. The real DB has more tables, but lets say for the sake of argument I have 3 tables; TB1, TB2, TB3. Lets say that TB1 hasa primary key PK1 and a unique identifier column(type text) UK1 that has an index on it. TB2 then has a PK2, a UK2(typetext) of its own with an index, and a foreign key FK2 that points to TB1's PK1. TB3 has a PK3 and a FK3 that pointsto FK2. TB1 TB2 TB3 -------------- ------------------------------- ---------------------- PK1, UK1 PK2, UK2, FK2(PK1) PK3, FK3(PK2) Now in lipqxx I am parsing an input list of objects that are then written to these tables. Each object may produce one rowin TB1, one row in TB2, and one row in TB3. The UK1 and UK2 indentifiers are used to prevent duplicate entries for TB1and TB2 respectively. I know COPY FROM obeys these unique checks; however, my problem is the FKs. So lets say I try toinsert a row into TB1. If it is unique on UK1 then it inserts a new row with some new primary key int4 identifier and ifit is a duplicate then no insert is done but the already existing row's primary key identifier is returned. This identifier(duplicateor not) is used when populating TB2's row as the FK2 identifier. The row that is to be inserted intoTB2 needs the primary key indentifier from the result of the attempted insert into TB1. Similarily the insert into TB3needs the result of the pk indentifier of the attempted insert into TB2. Once that is done then I move on to parsing thenext object for insertion into the 3 tables. So lets say I want to insert a list of objects using COPY FROM... whats the way to do it? How can I at the very least geta list of the primary keys of TB1(newly inserted rows or from already existings row) returned from the COPY FROM insertinto TB1 so I can use them for the COPY FROM insert into TB2 and so on? Is there a better way to do this? P.S. I am going to setup autovacuum for these bulk loads. My question though is why for bulkloads is VACUUM useful? I understandthat it frees up dead rows as a result of UPDATE and such, but where are the dead rows created from plain INSERTS? Thanks, Morgan
В списке pgsql-performance по дате отправления: