Обсуждение: 8.4 COPY performance regression on Solaris
Hi,
It seems that a COPY of 8M rows to a table to 8.4rc1 takes >30% longer than it does to 8.3.7 on Solaris.
Here are the steps I've taken to reproduce this problem on two different solaris boxes (Solaris 10 11/06 s10x_u3wos_10 X86 and Solaris 10 8/07 s10x_u4wos_12b X86). I've tried this on a Linux box, and I do not see the problem there.
1. Run the following in psql client to generate a 8M row data file.
3. Run the following on each.
Thanks, Alan
It seems that a COPY of 8M rows to a table to 8.4rc1 takes >30% longer than it does to 8.3.7 on Solaris.
Here are the steps I've taken to reproduce this problem on two different solaris boxes (Solaris 10 11/06 s10x_u3wos_10 X86 and Solaris 10 8/07 s10x_u4wos_12b X86). I've tried this on a Linux box, and I do not see the problem there.
1. Run the following in psql client to generate a 8M row data file.
copy (select generate_series(1,8000000), ('1 second'::interval * generate_series(1,8000000) + '2007-01-01'::timestamp)) to '/export/home/alan/work/pgsql/dump.out' with csv;
2. Build 8.3.7 and 8.4rc1 with the following config../configure --prefix=`pwd`/../pgsql CC=/opt/SUNWspro/bin/cc CFLAGS="-xO3 -xarch=native \
-xspace -W0,-Lt -W2,-Rcond_elim -Xa -xildoff -xc99=none -xCC" --without-readline --with-includes=/opt/csw/include --with-libraries=/opt/csw/lib
-xspace -W0,-Lt -W2,-Rcond_elim -Xa -xildoff -xc99=none -xCC" --without-readline --with-includes=/opt/csw/include --with-libraries=/opt/csw/lib
3. Run the following on each.
pg_ctl stop -D data -m fast
rm -rf data
initdb -D data
cat postgresql.conf >> data/postgresql.conf
pg_ctl start -l cq.log -D data -w
psql -f ddl.sql postgres
time psql -c "copy t from '/export/home/alan/work/pgsql/dump.out' with csv" postgres
Here are the numbers from several runs I've done.rm -rf data
initdb -D data
cat postgresql.conf >> data/postgresql.conf
pg_ctl start -l cq.log -D data -w
psql -f ddl.sql postgres
time psql -c "copy t from '/export/home/alan/work/pgsql/dump.out' with csv" postgres
8.3.7 - Solaris 10 11/06 s10x_u3wos_10 X86
real 0m43.971s
user 0m0.002s
sys 0m0.003s
real 0m44.042s
user 0m0.002s
sys 0m0.003s
real 0m44.828s
user 0m0.002s
sys 0m0.004s
real 0m43.921s
user 0m0.002s
sys 0m0.003s
8.4rc1 - Solaris 10 11/06 s10x_u3wos_10 X86
real 1m0.041s
user 0m0.002s
sys 0m0.003s
real 1m0.258s
user 0m0.002s
sys 0m0.004s
real 1m0.173s
user 0m0.002s
sys 0m0.003s
real 1m0.402s
user 0m0.002s
sys 0m0.003s
real 1m0.767s
user 0m0.002s
sys 0m0.003s
real 0m43.971s
user 0m0.002s
sys 0m0.003s
real 0m44.042s
user 0m0.002s
sys 0m0.003s
real 0m44.828s
user 0m0.002s
sys 0m0.004s
real 0m43.921s
user 0m0.002s
sys 0m0.003s
8.4rc1 - Solaris 10 11/06 s10x_u3wos_10 X86
real 1m0.041s
user 0m0.002s
sys 0m0.003s
real 1m0.258s
user 0m0.002s
sys 0m0.004s
real 1m0.173s
user 0m0.002s
sys 0m0.003s
real 1m0.402s
user 0m0.002s
sys 0m0.003s
real 1m0.767s
user 0m0.002s
sys 0m0.003s
8.3.7 - Solaris 10 8/07 s10x_u4wos_12b X86
real 0m36.242s
user 0m0.002s
sys 0m0.004s
real 0m37.206s
user 0m0.002s
sys 0m0.004s
real 0m38.431s
user 0m0.002s
sys 0m0.004s
real 0m38.885s
user 0m0.002s
sys 0m0.004s
real 0m38.177s
user 0m0.002s
sys 0m0.004s
real 0m38.332s
user 0m0.002s
sys 0m0.004s
real 0m38.401s
user 0m0.002s
sys 0m0.004s
real 0m36.817s
user 0m0.002s
sys 0m0.004s
real 0m39.505s
user 0m0.002s
sys 0m0.004s
real 0m38.871s
user 0m0.002s
sys 0m0.004s
real 0m38.939s
user 0m0.002s
sys 0m0.004s
real 0m38.823s
user 0m0.002s
sys 0m0.004s
real 0m37.955s
user 0m0.002s
sys 0m0.004s
real 0m39.196s
user 0m0.002s
sys 0m0.004s
8.4rc1 - Solaris 10 8/07 s10x_u4wos_12b X86
real 0m50.603s
user 0m0.002s
sys 0m0.004s
real 0m49.945s
user 0m0.002s
sys 0m0.004s
real 0m50.547s
user 0m0.002s
sys 0m0.004s
real 0m50.061s
user 0m0.002s
sys 0m0.004s
real 0m48.151s
user 0m0.002s
sys 0m0.004s
real 0m50.133s
user 0m0.002s
sys 0m0.004s
real 0m50.583s
user 0m0.002s
sys 0m0.004s
Has anyone else seen this problem?real 0m36.242s
user 0m0.002s
sys 0m0.004s
real 0m37.206s
user 0m0.002s
sys 0m0.004s
real 0m38.431s
user 0m0.002s
sys 0m0.004s
real 0m38.885s
user 0m0.002s
sys 0m0.004s
real 0m38.177s
user 0m0.002s
sys 0m0.004s
real 0m38.332s
user 0m0.002s
sys 0m0.004s
real 0m38.401s
user 0m0.002s
sys 0m0.004s
real 0m36.817s
user 0m0.002s
sys 0m0.004s
real 0m39.505s
user 0m0.002s
sys 0m0.004s
real 0m38.871s
user 0m0.002s
sys 0m0.004s
real 0m38.939s
user 0m0.002s
sys 0m0.004s
real 0m38.823s
user 0m0.002s
sys 0m0.004s
real 0m37.955s
user 0m0.002s
sys 0m0.004s
real 0m39.196s
user 0m0.002s
sys 0m0.004s
8.4rc1 - Solaris 10 8/07 s10x_u4wos_12b X86
real 0m50.603s
user 0m0.002s
sys 0m0.004s
real 0m49.945s
user 0m0.002s
sys 0m0.004s
real 0m50.547s
user 0m0.002s
sys 0m0.004s
real 0m50.061s
user 0m0.002s
sys 0m0.004s
real 0m48.151s
user 0m0.002s
sys 0m0.004s
real 0m50.133s
user 0m0.002s
sys 0m0.004s
real 0m50.583s
user 0m0.002s
sys 0m0.004s
Вложения
Alan Li wrote: > Hi, > > It seems that a COPY of 8M rows to a table to 8.4rc1 takes >30% longer > than it does to 8.3.7 on Solaris. > > Here are the steps I've taken to reproduce this problem on two different > solaris boxes (Solaris 10 11/06 s10x_u3wos_10 X86 and Solaris 10 8/07 > s10x_u4wos_12b X86). I've tried this on a Linux box, and I do not see > the problem there. tried that on my box (though I increased the testset size by 10x to get more sensible runtimes) and I can reproduce that on Linux(CentoS 5.3/x86_64, Nehalem Xeon E5530) as well. I get ~450000 rows/s on 8.3 and only ~330000/s on 8.4 on 8.4 I get: 3m59/4m01/3m56s runtime and a profile of samples % symbol name 636302 19.6577 XLogInsert 415510 12.8366 CopyReadLine 225347 6.9618 DoCopy 131143 4.0515 ParseDateTime 122043 3.7703 DecodeNumber 81730 2.5249 DecodeDate 81045 2.5038 DecodeDateTime 80900 2.4993 pg_verify_mbstr_len 80235 2.4787 pg_next_dst_boundary 67571 2.0875 LWLockAcquire 64548 1.9941 heap_insert 64178 1.9827 LWLockRelease 63609 1.9651 PageAddItem 63402 1.9587 heap_form_tuple 56544 1.7468 timestamp_in 48697 1.5044 heap_fill_tuple 45248 1.3979 pg_atoi 42390 1.3096 IsSystemRelation 41287 1.2755 BufferGetBlockNumber 38936 1.2029 ValidateDate 36619 1.1313 ExecStoreTuple 35367 1.0926 DecodeTime on 8.3.7 I get 2m58s,2m54s,2m55s and a profile of: samples % symbol name 460966 16.2924 XLogInsert 307386 10.8643 CopyReadLine 301745 10.6649 DoCopy 153452 5.4236 pg_next_dst_boundary 119757 4.2327 DecodeNumber 105356 3.7237 heap_formtuple 83456 2.9497 ParseDateTime 83020 2.9343 pg_verify_mbstr_len 72735 2.5708 DecodeDate 70425 2.4891 LWLockAcquire 65820 2.3264 LWLockRelease 61823 2.1851 DecodeDateTime 55895 1.9756 hash_any 51305 1.8133 PageAddItem 47440 1.6767 AllocSetAlloc 47218 1.6689 heap_insert 38912 1.3753 DecodeTime 34871 1.2325 ReadBuffer_common 34519 1.2200 date2j 33093 1.1696 DetermineTimeZoneOffset 31334 1.1075 MemoryContextAllocZero 30951 1.0939 RelationGetBufferForTuple If I do the same test utilizing WAL bypass the picture changes: 8.3 runtimes:2m16,2min14s,2min22s and profile: samples % symbol name 445583 16.7777 CopyReadLine 332772 12.5300 DoCopy 156974 5.9106 pg_next_dst_boundary 131952 4.9684 heap_formtuple 119114 4.4850 DecodeNumber 94340 3.5522 ParseDateTime 81624 3.0734 pg_verify_mbstr_len 75012 2.8245 DecodeDate 74950 2.8221 DecodeDateTime 64467 2.4274 hash_any 62859 2.3669 PageAddItem 62054 2.3365 LWLockAcquire 57209 2.1541 LWLockRelease 45812 1.7250 hash_search_with_hash_value 41530 1.5637 DetermineTimeZoneOffset 40790 1.5359 heap_insert 39694 1.4946 AllocSetAlloc 38855 1.4630 ReadBuffer_common 36056 1.3576 MemoryContextAllocZero 36030 1.3567 DecodeTime 29057 1.0941 UnpinBuffer 28291 1.0653 PinBuffer 8.4 runtime: 2m1s,2m,1m59s and profile: 404775 17.9532 CopyReadLine 208482 9.2469 DoCopy 148898 6.6042 ParseDateTime 118645 5.2623 DecodeNumber 80972 3.5914 DecodeDate 79005 3.5042 pg_verify_mbstr_len 73645 3.2664 PageAddItem 72167 3.2009 DecodeDateTime 65264 2.8947 heap_form_tuple 52680 2.3365 timestamp_in 46264 2.0520 pg_next_dst_boundary 45819 2.0322 ExecStoreTuple 45745 2.0290 heap_fill_tuple 43690 1.9378 heap_insert 38453 1.7055 InputFunctionCall 37050 1.6433 LWLockAcquire 36853 1.6346 BufferGetBlockNumber 36428 1.6157 heap_compute_data_size 33818 1.5000 DetermineTimeZoneOffset 33468 1.4844 DecodeTime 30896 1.3703 tm2timestamp 30888 1.3700 GetCurrentTransactionId Stefan
Hi, Looking at the XLogInsert() from 8.3 and 8.4, the 8.4 version includes a call to RecoveryInProgress() at the top as well as a call to TRACE_POSTGRESQL_XLOG_INSERT(). Could either of those have caused a context switch or cache flush resulting in worse performance. Cheers, Ken
Kenneth Marshall <ktm@rice.edu> writes: > Looking at the XLogInsert() from 8.3 and 8.4, the 8.4 > version includes a call to RecoveryInProgress() at > the top as well as a call to TRACE_POSTGRESQL_XLOG_INSERT(). > Could either of those have caused a context switch or > cache flush resulting in worse performance. Hmm. TRACE_POSTGRESQL_XLOG_INSERT() should be a no-op (or at least, none of the complainants have admitted to building with --enable-dtrace). RecoveryInProgress() should be just a quick test of a local boolean, so it's hard to believe that it costs anything noticeable; but if anyone who is able to reproduce the problem wants to test this theory, try taking out these lines /* cross-check on whether we should be here or not */ if (RecoveryInProgress()) elog(FATAL, "cannot make new WAL entries during recovery"); which are only a sanity check anyway. regards, tom lane