Обсуждение: BUG #5779: Large INSERT transaction with FOREIGN KEY constraints exhausts available RAM
BUG #5779: Large INSERT transaction with FOREIGN KEY constraints exhausts available RAM
От
"Michel Alexandre Salim"
Дата:
The following bug has been logged online: Bug reference: 5779 Logged by: Michel Alexandre Salim Email address: michel.salim@cs.fau.de PostgreSQL version: 9.0.1 Operating system: Ubuntu 10.04 Description: Large INSERT transaction with FOREIGN KEY constraints exhausts available RAM Details: Explanation: I have a table with close to a billion rows that was initially imported in csv form; as such, a lot of the columns were not normalized. For speed reasons, I created one temporary table for each column that needs to be normalized, and then did an INSERT with JOINs to populate the new table with the normalized data. If foreign key constraints are enabled, this runs out of memory (the machine has 4GB RAM + 6 GB swap and nothing else is running that requires much RAM; 4 GB of the swap was actually added to try and debug this). With vm.overcommit_memory set to the default 0 the OOM killer consistently kills the process handling the query; with it set to the recommended 2 postgresql itself aborts the query when it cannot allocate more RAM. I tried changing the fkey constraints to DEFERRABLE hoping that that would consume less RAM, but the same result occurs. Why should the memory usage pattern be different when integrity checks are done as part of the transaction (even when pushed back to the end), and in a separate transaction? The dataset I'm using is the Eclipse UDC dataset for 2009 -- http://archive.eclipse.org/technology/phoenix/usagedata/ -- and the problem is independently described at this other site, with a simpler query: http://www.jory.info/serendipity/archives/30-PostgreSQL-Out-of-Memory-with-l arge-INSERT.html My own invocations were CREATE TABLE udc (id INTEGER PRIMARY KEY, eclipse_id INTEGER NOT NULL, acâ ion_type_id INTEGER NOT NULL, target_type_id INTEGER, source_id INTEGER NOT NULâ , version_id INTEGER NOT NULL, target_id INTEGER, tstamp TIMESTAMPTZ NOT NULL,â FOREIGN KEY (action_type_id) REFERENCES actions (id) DEFERRABLE, FOREIGN KEY (tâ rget_type_id) REFERENCES class_types (id) DEFERRABLE, FOREIGN KEY (source_id) Râ FERENCES classes (id) DEFERRABLE, FOREIGN KEY (version_id) REFERENCES versions â id) DEFERRABLE , FOREIGN KEY (target_id) REFERENCES classes (id) DEFERRABLE); â NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "udc_pkey" for tâ ble "udc" â CREATE TABLE â udc=# INSERT INTO udc (id, eclipse_id, action_type_id, target_type_id, source_iâ , version_id, target_id, tstamp) SELECT old_udc.id, old_udc.eclipse_id, old_udcâ action_type, newttype.type_id, old_udc.source, newv.v_id, newt.t, old_udc.tstamâ FROM old_udc NATURAL JOIN newttype NATURAL JOIN newv NATURAL JOIN newt; â ERROR: out of memory â DETAIL: Failed on request of size 1048576.
On Wed, Dec 1, 2010 at 3:09 AM, Michel Alexandre Salim <michel.salim@cs.fau.de> wrote: > I tried changing the fkey constraints to DEFERRABLE hoping that that would > consume less RAM, but the same result occurs. Why should the memory usage > pattern be different when integrity checks are done as part of the > transaction (even when pushed back to the end), and in a separate > transaction? I'm just guessing here, but it may be that the trigger queue is what's filling up all the memory. I'm guessing that a trigger event is getting queued for each row you INSERT. But when you add the foreign key later, it does a bulk validation of the entire table instead of validating each individual row. If that really is what's going on here, it's a known shortcoming of the existing implementation which, unfortunately, no one has gotten around to fixing (partly because it's not entirely obvious what the design should be). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company