B-Tree contention?
От | Alan Li |
---|---|
Тема | B-Tree contention? |
Дата | |
Msg-id | bc34668d0906071830s2961662am68d59cb521fec209@mail.gmail.com обсуждение исходный текст |
Список | pgsql-hackers |
Hello,<br /><br />I have multiple files with that have very similar distributions and I'm seeing contention when concurrentCOPY's are happening against a table with a b-tree index on the timestamp column. Each file look something likethe following:<br /><br /><div style="margin-left: 40px;">~4M rows with timestamp1<br />~4M rows with timestamp2<br />...<br/></div><br />To simulate the problem, I have a single file that has:<br /><br /><div style="margin-left: 40px;">aint, t timestamp<br /> [1-4000000],2007-01-01 00:00:01<br />[1-4000000], 2007-01-01 00:00:02<br /></div><br />Themachine is an eight-core system w/ 16gb of memory. postgresql.conf is configured with:<br /><br /><div style="margin-left:40px;">shared_buffers = 256MB #256MB w/ one COPY, 512MB w/ two COPY's<br /> wal_buffers = 16MB<br/>checkpoint_segments = 100<br /></div><br />When I COPY a single file into the table without an index on the timecolumn, it takes:<br /><br /><div style="margin-left: 40px;">real 0m19.628s<br />user 0m0.001s<br /> sys 0m0.002s<br/></div><br />When I have two COPY's with the same file concurrently into the same table without an index on thetime column, it takes:<br /><br /><div style="margin-left: 40px;">real 0m39.933s<br />user 0m0.001s<br /> sys 0m0.000s<br /><br />real 0m41.294s<br />user 0m0.000s<br />sys 0m0.003s<br /></div><br />So it takes roughlytwice the time and we're not getting any increase in throughput for the parallel COPY's. The top LWLock contentionsshows that it's the WALInsertLock.<br /><br /><div style="margin-left: 40px;">postgres=# select * from lwlocksorder by blk desc limit 20;<br /> pid | lockid | share | write | blk <br />-------+--------+-------+---------+--------<br/> 14065 | 7 | 0 | 8000001 | 311884<br /> 14062 | 7 | 0 | 8000001 | 289561<br /> 14062 | 45 | 0 | 89783 | 1287<br /> 14065 | 45 | 0 | 90104 | 1207<br/> 14062 | 8 | 0 | 652 | 10<br /> 14062 | 17 | 0 | 86469 | 7<br /> 14065 | 370| 0 | 7 | 6<br /> 14062 | 370 | 0 | 251466 | 6<br /> 14065 | 8 | 0 | 660 | 4<br /> 14065 | 29 | 46010 | 5337 | 3<br /> 14065 | 17 | 0 | 86467 | 3<br /> 14062 | 35 | 8830 | 5279 | 1<br /> 14062 | 40 | 6735 | 5139 | 1<br /> 14062 | 37 | 12860 | 5304| 1<br /> 14062 | 29 | 46037 | 5387 | 1<br /> 14065 | 521 | 0 | 1 | 1<br /> 14065| 28 | 6775 | 5236 | 1<br /> 14065 | 37 | 12831 | 5057 | 1<br /> 14062 | 506 | 0 | 2043 | 1<br /> 14065 | 40 | 6763 | 5251 | 1<br />(20 rows)<br /></div><br />When I COPY a single fileinto the table with an index on the time column, it takes:<br /><br /><div style="margin-left: 40px;">real 0m51.486s<br/>user 0m0.000s<br />sys 0m0.002s<br /></div><br />When I have two COPY's with the same file concurrentlyinto the same table with an index on the time column, it takes:<br /><br /><div style="margin-left: 40px;">real 2m4.414s<br />user 0m0.001s<br />sys 0m0.001s<br /><br />real 2m4.428s<br />user 0m0.000s<br />sys 0m0.002s<br /></div><br />So it's taking more than twice the time when the timestamp index was added. The top LWLockcontentions shows that there's a bit more contention on the WALInsertLock, but there's additional contention on twobuffer pages.<br /><br /><div style="margin-left: 40px;">postgres=# select * from lwlocks order by blk desc limit 20;<br/> pid | lockid | share | write | blk <br />-------+--------+----------+----------+--------<br /> 13888| 7 | 0 | 16027716 | 342522<br /> 13889 | 7 | 0 | 16027881 | 339595<br /> 13889 | 400| 4000000 | 4000002 | 72875<br /> 13889 | 388 | 4000001 | 4000000 | 72545<br /> 13888 | 388 | 4000001 | 4000001 | 70554<br /> 13888 | 400 | 4000000 | 4000002 | 68748<br /> 13889 | 45 | 0 | 89402 | 1340<br/> 13888 | 45 | 0 | 89259 | 1290<br /> 13889 | 33 | 12117131 | 6772 | 1102<br /> 13888 | 33 | 12118267 | 7178 | 1054<br /> 13888 | 29 | 4232749 | 7159 | 415<br /> 13889 | 29 | 4233831| 7057 | 353<br /> 13888 | 34 | 4190329 | 6907 | 318<br /> 13889 | 34 | 4191618 | 6870| 292<br /> 13888 | 38 | 4185303 | 7004 | 283<br /> 13889 | 38 | 4186310 | 7084 | 259<br /> 13889 | 922 | 0 | 278 | 86<br /> 13889 | 940 | 0 | 207 | 80<br /> 13889 | 102316| 0 | 346 | 72<br /> 13889 | 854 | 0 | 194 | 70<br />(20 rows)<br /></div><br />The two hotspots seem to represent the first BTree pages that contain the '2007-01-01 00:00:01' and '2007-01-01 00:00:02'keys. I suspect it's due to concurrent _bt_doinsert(), which tries to acquire both a read lock (_bt_search) anda write lock (_bt_findinsertloc) on those first BTree pages. Does this sound like a reasonable explanation for the contention?<br/><br />Btw, this is against a 8.4beta2 build as of yesterday, and I'm pretty sure that this happens in REL8_3_STABLEas well.<br /><br />Thanks, Alan<br />
В списке pgsql-hackers по дате отправления: