Re: Postgres insert performance and storage requirement compared to Oracle

Поиск
Список
Период
Сортировка
От Mladen Gogala
Тема Re: Postgres insert performance and storage requirement compared to Oracle
Дата
Msg-id 4CC86A3D.8000507@vmsinfo.com
обсуждение исходный текст
Ответ на Re: Postgres insert performance and storage requirement compared to Oracle  (Scott Carey <scott@richrelevance.com>)
Ответы Re: Postgres insert performance and storage requirement compared to Oracle
Список pgsql-performance
On 10/27/2010 1:48 PM, Scott Carey wrote:

It is almost always significantly faster than a direct bulk load into a table.  
* The temp table has no indexes, the final table usually does, bulk operations on indexes are faster than per row operations.
* The final table might require both updates and inserts, doing these in bulk from a temp stage table is far faster than per row.
* You don't even have to commit after the merge from the temp table, and can loop until its all done, then commit -- though this can have table/index bloat implications if doing updates.

Scott, I find this very hard to believe. If you are inserting into a temporary table and then into the target table, you will do 2 inserts instead of just one. What you are telling me is that it is faster for me to drive from NYC to Washington DC by driving first to Miami and then from Miami to DC.

2) This is what I had in mind:

mgogala=# create table a(c1 int);
CREATE TABLE
mgogala=# create temporary table t1(c1 int) on commit delete rows;
CREATE TABLE
mgogala=# begin;
BEGIN
mgogala=# insert into t1 select generate_series(1,1000);
INSERT 0 1000
mgogala=# insert into a select * from t1;
INSERT 0 1000
mgogala=# commit;
COMMIT
mgogala=# select count(*) from a;count
------- 1000
(1 row)

mgogala=# select count(*) from t1;count
-------    0
(1 row)

The table is created with "on commit obliterate rows" option which means 
that there is no need to do "truncate". The "truncate" command is a 
heavy artillery. Truncating a temporary table is like shooting ducks in 
a duck pond, with a howitzer.
???  Test it.  DELETE is slow, truncate is nearly instantaneous for normal tables.  For temp tables its the same thing.  Maybe in Oracle TRUNCATE is a howitzer, in Postgres its lightweight.  

Truncate has specific list of tasks to do:
1)  lock the table in the exclusive mode to prevent concurrent transactions on the table.
2)  Release the file space and update the table headers.
3)  Flush any buffers possibly residing in shared memory.
4)  Repeat the procedures on the indexes.

Of course, in case of the normal table, all of these changes are logged, possibly producing WAL archives. That is still much faster than delete which depends on the number of rows that need to be deleted, but not exactly lightweight, either. In Postgres, truncate recognizes that the table is a temporary table so it makes a few shortcuts, which makes the truncate faster.

1) No need to flush buffers.
2) Locking requirements are much less stringent.
3) No WAL archives are produced.

Temporary tables are completely different beasts in Oracle and Postgres. Yes, you are right, truncate of a temporary table is a big no-no in the Oracle world, especially in the RAC environment. However, I do find "ON COMMIT DELETE ROWS" trick to be more elegant than the truncate. Here is the classic Tom Kyte, on the topic of truncating the temporary tables: http://tinyurl.com/29kph3p

"
NO. truncate is DDL. DDL is expensive. Truncation is something that should be done very infrequently.
 Now, I don't mean "turn your truncates into DELETE's" -- that would  be even worse. I mean -- avoid having
 to truncate or delete every row in the first  place. Use a transaction based temporary table and upon commit, it'll empty itself."

Your loop above requires a commit after every 1000 rows.  What if you require that all rows are seen at once or not at all?  What if you fail part way through?  One big transaction is often a better idea and/or required.  Especially in postgres, with no undo-log, bulk inserts in one large transaction work out very well -- usually better than multiple smaller transactions.

I don't contest that. I also prefer to do things in one big transaction, if possible.

-- 
Mladen Gogala 
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions


В списке pgsql-performance по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: Postgres insert performance and storage requirement compared to Oracle
Следующее
От: Jon Nelson
Дата:
Сообщение: Re: temporary tables, indexes, and query plans