Обсуждение: Performance problem. Could it be related to 8.3-beta4?
I built a new system and installed 8.3-beta4. While i'm testing it i noticed a big performance gap between the production system running 8.2.5 and the new one. The new one, in spite of being much better, is much slower. The same insert query takes 20 minutes in the production system and 2 hours in the new one. The machines' specs: Production: Fedora Core 6, postgresql 8.2.5, AMD XP2600, 2 GB mem, two 7200 ide disks with pg_xlog alone in the second disk. New: Fedora 8, postgresql 8.3-beta4, Core Duo 2.33 MHz, 4 GB mem, two 7200 sata disks in Raid 1. The relevant (IHMO) lines of postgresql.conf: Production system: shared_buffers = 32MB # min 128kB or max_connections*16kB max_prepared_transactions = 150 # can be 0 or more work_mem = 32MB # min 64kB maintenance_work_mem = 512MB # min 1MB max_stack_depth = 4MB # min 100kB vacuum_cost_delay = 1000 # 0-1000 milliseconds vacuum_cost_limit = 5000 # 0-10000 credits max_fsm_pages = 1000000 # min max_fsm_relations*16, 6 bytes each bgwriter_lru_maxpages = 0 # 0-1000 buffers max written/round bgwriter_all_maxpages = 0 # 0-1000 buffers max written/round fsync = off # turns forced synchronization on or # full_page_writes = on # recover from partial page writes wal_buffers = 512kB # min 32kB checkpoint_segments = 60 # in logfile segments, min 1, 16MB checkpoint_timeout = 15min # range 30s-1h effective_cache_size = 512MB geqo_effort = 8 # range 1-10 #autovacuum = off # enable autovacuum subprocess? New system's postgresql.conf: shared_buffers = 32MB # min 128kB or max_connections*16kB #temp_buffers = 8MB # min 800kB max_prepared_transactions = 50 # can be 0 or more work_mem = 32MB # min 64kB maintenance_work_mem = 768MB # min 1MB max_stack_depth = 4MB # min 100kB max_fsm_pages = 1000000 # min max_fsm_relations*16, 6 bytes vacuum_cost_delay = 1000 # 0-1000 milliseconds vacuum_cost_limit = 5000 # 0-10000 credits bgwriter_lru_maxpages = 0 # 0-1000 max buffers written/round fsync = off # turns forced synchronization on or off full_page_writes = off wal_buffers = 1024kB # min 32kB checkpoint_segments = 60 # in logfile segments, min 1, 16MB checkpoint_timeout = 10min # range 30s-1h effective_cache_size = 768MB geqo_effort = 8 # range 1-10 Explain in the production server: fahstats=# explain fahstats-# insert into usuarios ( fahstats(# data, fahstats(# usuario, fahstats(# pontos, fahstats(# wus fahstats(# ) fahstats-# select fahstats-# (select data_serial from data_serial) as data, fahstats-# ui.usuario_serial as usuario, fahstats-# sum(pontos) as pontos, fahstats-# sum(wus) as wus fahstats-# from usuarios_temp as ut inner join usuarios_indice as ui fahstats-# on ut.usuario = ui.usuario_nome and ut.n_time = ui.n_time fahstats-# group by data, ui.usuario_serial fahstats-# ; QUERY PLAN ------------------------------------------------------------------------------------------------------- Subquery Scan "*SELECT*" (cost=300078.40..324227.64 rows=878154 width=20) -> HashAggregate (cost=300078.40..313250.71 rows=878154 width=12) InitPlan -> Seq Scan on data_serial (cost=0.00..31.40 rows=2140 width=4) -> Merge Join (cost=101968.41..249646.83 rows=5040017 width=12) Merge Cond: ((ut.n_time = ui.n_time) AND ((ut.usuario)::text = "inner"."?column4?")) -> Index Scan using usuarios_temp_ndx on usuarios_temp ut (cost=0.00..55060.07 rows=883949 width=26) -> Sort (cost=101968.41..104163.79 rows=878154 width=22) Sort Key: ui.n_time, (ui.usuario_nome)::text -> Seq Scan on usuarios_indice ui (cost=0.00..15276.54 rows=878154 width=22) (10 rows) The same explain in the new server: Subquery Scan "*SELECT*" (cost=313715.35..337863.02 rows=878097 width=20) -> HashAggregate (cost=313715.35..326886.81 rows=878097 width=12) InitPlan -> Seq Scan on data_serial (cost=0.00..34.00 rows=2400 width=4) -> Merge Join (cost=101809.80..257473.94 rows=5620741 width=12) Merge Cond: ((ut.n_time = ui.n_time) AND ((ut.usuario)::text = (ui.usuario_nome)::text)) -> Index Scan using usuarios_temp_ndx on usuarios_temp ut (cost=0.00..52883.91 rows=883856 width=23) -> Sort (cost=101809.80..104005.04 rows=878097 width=19) Sort Key: ui.n_time, ui.usuario_nome -> Seq Scan on usuarios_indice ui (cost=0.00..15123.97 rows=878097 width=19) (10 rows) I though the new raid 1 array would have some problem but indeed it is faster: iostat -d -k in the new server: Device:__tps____kB_read/s____kB_wrtn/s____kB_read____kB_wrtn sda_____0.00_________0.01_________0.00_______1105________145 sdb____35.26_______442.22_______992.55___39105956___87771701 md1___264.72_______442.22_______990.33___39105169___87574680 md0_____0.00_________0.01_________0.00________724________108 md2_____0.00_________0.01_________0.00________516__________5 iostat -d -k in the production server: Device:__tps____kB_read/s____kB_wrtn/s____kB_read____kB_wrtn hda_____1.64_________5.89_______328.40___13789188__769151093 hdc____60.32_______194.32_______537.77__455123494_1259522944 dm-0___82.50_________5.86_______328.39___13736257__769116252 dm-1____0.01_________0.02_________0.01______51212______34832 Any idea about what could be wrong or how to investigate it? Regards, Clodoaldo Pinto Neto
Clodoaldo <clodoaldo.pinto.neto@gmail.com> writes: > The same insert query takes 20 minutes in the production system and 2 > hours in the new one. Hmph. It's the same plan, so it's not a planner mistake. Could you post EXPLAIN ANALYZE rather than just EXPLAIN for both cases? Also, it'd be worth watching "vmstat 1" output while the queries are running on each system, to see if there's any obvious discrepancy in behavior at that level. regards, tom lane
On Fri, 4 Jan 2008, Clodoaldo wrote: > I built a new system and installed 8.3-beta4. How did you get 8.3-beta4? If you used the RPMs available from the PostgreSQL FTP site, those are optimized for finding bugs at the expense of speed. If that's where you got them from, try putting this line in your postgresql.conf and testing 8.3 again: debug_assertions = false If things speed up considerably you're suffering from that issue, and may need to build your own copy of 8.3 from source in order to get a fair comparision. > Production system: > #autovacuum = off # enable autovacuum subprocess? > > New system's postgresql.conf: I didn't see anything about autovacum on the new system. If it's off on 8.2 and on in 8.3 (the default) that might be contributing to your difference. Also, I hope you don't normally run your production system with fsync=off as that's kind of scary. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
2008/1/4, Tom Lane <tgl@sss.pgh.pa.us>: > Clodoaldo <clodoaldo.pinto.neto@gmail.com> writes: > > The same insert query takes 20 minutes in the production system and 2 > > hours in the new one. > > Hmph. It's the same plan, so it's not a planner mistake. Could you > post EXPLAIN ANALYZE rather than just EXPLAIN for both cases? New server's explain analyze now with debug_assertions = false as suggested by Greg: fahstats=> explain analyze fahstats-> insert into usuarios ( fahstats(> data, fahstats(> usuario, fahstats(> pontos, fahstats(> wus fahstats(> ) fahstats-> select fahstats-> (select data_serial from data_serial) as data, fahstats-> ui.usuario_serial as usuario, fahstats-> sum(pontos) as pontos, fahstats-> sum(wus) as wus fahstats-> from usuarios_temp as ut inner join usuarios_indice as ui fahstats-> on ut.usuario = ui.usuario_nome and ut.n_time = ui.n_time fahstats-> group by data, ui.usuario_serial fahstats-> ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Subquery Scan "*SELECT*" (cost=323715.27..347866.73 rows=878235 width=20) (actual time=11072.973..13236.655 rows=878022 loops=1) -> HashAggregate (cost=323715.27..336888.79 rows=878235 width=12) (actual time=11072.960..12494.611 rows=878022 loops=1) InitPlan -> Seq Scan on data_serial (cost=0.00..1.01 rows=1 width=4) (actual time=0.008..0.009 rows=1 loops=1) -> Merge Join (cost=101826.80..263869.07 rows=5984519 width=12) (actual time=4515.658..10039.560 rows=883856 loops=1) Merge Cond: ((ut.n_time = ui.n_time) AND ((ut.usuario)::text = (ui.usuario_nome)::text)) -> Index Scan using usuarios_temp_ndx on usuarios_temp ut (cost=0.00..52895.92 rows=883856 width=23) (actual time=0.097..2188.894 rows=883856 loops=1) -> Sort (cost=101826.80..104022.38 rows=878235 width=19) (actual time=4515.538..4846.120 rows=883896 loops=1) Sort Key: ui.n_time, ui.usuario_nome Sort Method: quicksort Memory: 90130kB -> Seq Scan on usuarios_indice ui (cost=0.00..15126.35 rows=878235 width=19) (actual time=0.022..300.928 rows=878062 loops=1) Trigger for constraint datas: time=21853.261 calls=878022 Total runtime: 6889463.591 ms (13 rows) Time: 6889813.327 ms Same explain analyze in the production server: Subquery Scan "*SELECT*" (cost=317921.51..342084.63 rows=878659 width=20) (actual time=32184.729..37067.875 rows=878645 loops=1) -> HashAggregate (cost=317921.51..331101.40 rows=878659 width=12) (actual time=32184.714..34808.398 rows=878645 loops=1) InitPlan -> Seq Scan on data_serial (cost=0.00..1.01 rows=1 width=4) (actual time=0.015..0.017 rows=1 loops=1) -> Merge Join (cost=102030.95..261051.62 rows=5686888 width=12) (actual time=12060.037..29542.750 rows=884482 loops=1) Merge Cond: ((ut.n_time = ui.n_time) AND ((ut.usuario)::text = "inner"."?column4?")) -> Index Scan using usuarios_temp_ndx on usuarios_temp ut (cost=0.00..55089.16 rows=884482 width=26) (actual time=0.196..8751.167 rows=884482 loops=1) -> Sort (cost=102030.95..104227.60 rows=878659 width=22) (actual time=12059.789..13205.930 rows=884522 loops=1) Sort Key: ui.n_time, (ui.usuario_nome)::text -> Seq Scan on usuarios_indice ui (cost=0.00..15285.59 rows=878659 width=22) (actual time=0.051..1375.554 rows=878685 loops=1) Trigger for constraint datas: time=34157.875 calls=878645 Total runtime: 808956.038 ms (12 rows) Time: 809219.070 ms > Also, it'd be worth watching "vmstat 1" output while the queries are > running on each system, to see if there's any obvious discrepancy > in behavior at that level. # vmstat 1 in new system while running the insert query: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 8 64 25320 58176 3342492 0 0 156 355 27 88 1 0 88 11 0 0 8 64 25344 58176 3342492 0 0 0 0 168 212 0 1 0 99 0 0 8 64 25344 58176 3342492 0 0 0 0 143 185 0 1 0 99 0 0 8 64 25468 58176 3342492 0 0 0 0 166 189 0 1 0 99 0 0 8 64 25568 58176 3342492 0 0 0 0 159 210 0 1 0 99 0 0 8 64 25708 58176 3342492 0 0 0 0 151 185 0 1 0 99 0 Some minutes later in the same system while still running the insert query: # vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 5 64 25060 58352 3348704 0 0 156 356 27 88 1 0 87 11 0 0 5 64 24988 58352 3348732 0 0 48 0 113 156 1 1 50 49 0 0 5 64 24920 58352 3348808 0 0 48 0 116 150 0 0 50 50 0 0 6 64 24920 58352 3348884 0 0 64 0 146 163 0 0 41 58 0 0 6 64 24860 58352 3348964 0 0 56 0 136 204 0 0 0 100 0 0 6 64 24800 58352 3349044 0 0 56 0 128 194 0 0 0 99 0 About 40 minutes later still running the query: # vmstat 1 # new system procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 4 64 25700 60492 3334264 0 0 155 358 28 88 1 0 87 12 0 0 4 64 25672 60492 3334292 0 0 80 0 167 160 0 0 0 100 0 0 4 64 25672 60492 3334372 0 0 72 0 157 157 0 0 0 99 0 0 2 64 25544 60492 3334448 0 0 72 0 175 170 0 0 0 100 0 0 1 64 25224 60500 3334732 0 0 248 100 143 278 0 0 11 88 0 0 1 64 25024 60500 3334900 0 0 128 8 23 168 0 0 50 50 0 0 3 64 24956 60500 3335028 0 0 124 3336 264 168 0 0 46 53 0 0 2 64 24896 60500 3335112 0 0 80 0 180 168 0 0 50 50 0 0 1 64 24672 60508 3335360 0 0 168 48 35 220 0 0 50 49 0 0 1 64 24332 60508 3335772 0 0 452 104 76 400 0 0 50 49 0 0 3 64 25132 60508 3334556 0 0 652 7992 255 534 0 1 36 63 0 0 3 64 24996 60508 3334688 0 0 64 0 154 126 0 0 0 100 0 # vmstat 1 -p /dev/sdb3 # new system - don't know it this has any relevance. sdb3 reads read sectors writes requested writes 1643755 81073154 23523967 188191736 1643763 81073282 23524094 188192752 1643783 81073602 23524106 188192848 1643790 81073714 23524106 188192848 1643882 81075178 23524144 188193152 1643933 81075986 23524908 188199264 1643943 81076138 23525230 188201840 Production system while running the insert query: # vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 1116 76224 16196 1704500 0 0 228 869 8 11 12 2 72 14 0 1 0 1116 71264 16196 1704500 0 0 0 0 1009 32 59 41 0 0 0 1 0 1116 66304 16196 1704500 0 0 0 0 1009 23 56 44 0 0 0 1 0 1116 60984 16196 1704500 0 0 0 0 1009 27 59 41 0 0 0 1 0 1116 55904 16204 1704492 0 0 0 12 1011 31 60 40 0 0 0 1 0 1116 50424 16204 1704500 0 0 0 0 1009 29 66 34 0 0 0 1 0 1116 50848 16204 1699168 0 0 0 8 1015 37 84 16 0 0 0 A few minutes later: # vmstat 1 # production system procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 1 1108 50832 10452 1670888 0 0 229 869 9 11 12 2 72 14 0 2 3 1108 52232 10448 1668680 0 0 5820 36556 1478 1094 59 30 0 11 0 2 7 1108 50908 10460 1670172 0 0 4476 8416 1417 1126 10 8 0 82 0 0 8 1108 52420 10392 1668368 0 0 6460 10568 1509 1465 8 8 0 84 0 0 2 1108 50528 10400 1671140 0 0 4304 10188 1355 973 5 8 0 87 0 0 2 1108 51848 10416 1669616 0 0 552 32 1153 399 0 2 0 98 0 I forgot to say that the raid 1 is software raid. Regards, Clodoaldo Pinto Neto
2008/1/5, Greg Smith <gsmith@gregsmith.com>: > On Fri, 4 Jan 2008, Clodoaldo wrote: > > > I built a new system and installed 8.3-beta4. > > How did you get 8.3-beta4? I built from the source rpm, which i installed in my machine. There is something I forgot to mention. I created a patch to change XLOG_SEG_SIZE and built with it: -#define XLOG_SEG_SIZE (16*1024*1024) +#define XLOG_SEG_SIZE (1024*1024*1024) I can expect it to make no difference but could it change the performance to worse? And with that big gap? > If you used the RPMs available from the > PostgreSQL FTP site, those are optimized for finding bugs at the expense > of speed. If that's where you got them from, try putting this line in > your postgresql.conf and testing 8.3 again: > > debug_assertions = false I tried with it restarting the server and it makes no significant difference. > > Production system: > > #autovacuum = off # enable autovacuum subprocess? > > > > New system's postgresql.conf: > > I didn't see anything about autovacum on the new system. If it's off on > 8.2 and on in 8.3 (the default) that might be contributing to your > difference. Missed that. New system: autovacuum = off # 'on' requires track_counts > Also, I hope you don't normally run your production system with fsync=off > as that's kind of scary. I do. That is a hobby site and I have a good UPS. In four years I hadn't a problem with crashes. If it happens I will just reload from the daily backup. Regards, Clodoaldo Pinto Neto
Clodoaldo <clodoaldo.pinto.neto@gmail.com> writes: >> How did you get 8.3-beta4? > I built from the source rpm, which i installed in my machine. There is > something I forgot to mention. I created a patch to change > XLOG_SEG_SIZE and built with it: > -#define XLOG_SEG_SIZE (16*1024*1024) > +#define XLOG_SEG_SIZE (1024*1024*1024) What made you decide that was a good idea?? Please revert that change and see if the performance goes back to normal. regards, tom lane
On Sat, 5 Jan 2008, Clodoaldo wrote: > I created a patch to change > XLOG_SEG_SIZE and built with it: > -#define XLOG_SEG_SIZE (16*1024*1024) > +#define XLOG_SEG_SIZE (1024*1024*1024) People sometimes get a small improvement setting this to 2-4 times larger than the default when running a large server (one with lots of disks where WAL issues are a bottleneck). Making it 64X as large as you're doing here is so gigantic I have no idea how much it degrades performance during a segment change or checkpoint; it's certainly not good. Just so you understand what you're playing with here a bit better: the standard WAL segment is 16MB. You're making it 1GB instead. Every time your system needs a new WAL segment, it first creates that file, writes a 0 to every byte in it, then the segment is ready for use. So on your server, whenever a new WAL segment is required, the entire system grinds to a halt as the database server writes 1GB worth of data before that segment is ready. Will that slow things to take 10X as long? Sure is possible. Since you've got checkpoint_segments at 60, this also means that eventually your database server will have (2*60+1)*1GB=121GB worth of disk space used just by the WAL segments--that's on top of whatever is used by the database. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Jan 5, 2008 1:15 PM, Greg Smith <gsmith@gregsmith.com> wrote: > On Sat, 5 Jan 2008, Clodoaldo wrote: > > > I created a patch to change > > XLOG_SEG_SIZE and built with it: > > -#define XLOG_SEG_SIZE (16*1024*1024) > > +#define XLOG_SEG_SIZE (1024*1024*1024) > > People sometimes get a small improvement setting this to 2-4 times larger > than the default when running a large server (one with lots of disks where > WAL issues are a bottleneck). Making it 64X as large as you're doing here > is so gigantic I have no idea how much it degrades performance during a > segment change or checkpoint; it's certainly not good. > > Just so you understand what you're playing with here a bit better: the > standard WAL segment is 16MB. You're making it 1GB instead. Every time > your system needs a new WAL segment, it first creates that file, writes a > 0 to every byte in it, then the segment is ready for use. So on your > server, whenever a new WAL segment is required, the entire system grinds > to a halt as the database server writes 1GB worth of data before that > segment is ready. Will that slow things to take 10X as long? Sure is > possible. Actually, even more importantly, the OP should strive to make the 8.3b4 environment as much like the 8.2.5 environment, then benchmark it, before setting off on experiments in performance enhancements. He needs to set a baseline before messing about with any of the settings.
2008/1/5, Tom Lane <tgl@sss.pgh.pa.us>: > Clodoaldo <clodoaldo.pinto.neto@gmail.com> writes: > >> How did you get 8.3-beta4? > > > I built from the source rpm, which i installed in my machine. There is > > something I forgot to mention. I created a patch to change > > XLOG_SEG_SIZE and built with it: > > -#define XLOG_SEG_SIZE (16*1024*1024) > > +#define XLOG_SEG_SIZE (1024*1024*1024) > > What made you decide that was a good idea?? > > Please revert that change and see if the performance goes back to > normal. I discovered i had a faulty raid, one of the drives were not working: $ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] md2 : active raid1 sda1[0] sdb1[1] 224768 blocks [2/2] [UU] md0 : active raid0 sda2[0] sdb2[1] 4497920 blocks 256k chunks md1 : active raid1 sdb3[1] 310094592 blocks [2/1] [_U] After some web reading I managed to make the raid functional: # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] md2 : active raid1 sda1[0] sdb1[1] 224768 blocks [2/2] [UU] md0 : active raid0 sda2[0] sdb2[1] 4497920 blocks 256k chunks md1 : active raid1 sda3[0] sdb3[1] 310094592 blocks [2/2] [UU] Just to remember the production server completes the insert query in 800 sec. The new server with the faulty raid and 1GB xlog_seg_size did it in 6,889 sec. With the fixed raid and 1GB xlog_seg_size the time lowered to 1,380 sec. Then I rebuilt and reinstalled postgresql with the xlog_seg_size set to the default 16MB and did initdb. Now the time is 7,642 sec. I'm lost. It looks like 1GB xlog_seg_size is indeed faster than 16MB but again it is slower than the production server which uses the default xlog_seg_size. Regards, Clodoaldo Pinto Neto
On Jan 6, 2008 5:06 AM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > > Then I rebuilt and reinstalled postgresql with the xlog_seg_size set > to the default 16MB and did initdb. Now the time is 7,642 sec. > > I'm lost. It looks like 1GB xlog_seg_size is indeed faster than 16MB > but again it is slower than the production server which uses the > default xlog_seg_size. How fast was it the second time you ran it?
2008/1/6, Scott Marlowe <scott.marlowe@gmail.com>: > On Jan 6, 2008 5:06 AM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > > > > Then I rebuilt and reinstalled postgresql with the xlog_seg_size set > > to the default 16MB and did initdb. Now the time is 7,642 sec. > > > > I'm lost. It looks like 1GB xlog_seg_size is indeed faster than 16MB > > but again it is slower than the production server which uses the > > default xlog_seg_size. > > How fast was it the second time you ran it? You mean the new server with 16MB xlog_seg_size? Yes, I did run it twice and both took about the same time. In all tests (all configurations) I did an analyze before running so I think that is the real time. Regards, Clodoaldo Pinto Neto
On Jan 6, 2008 1:46 PM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > 2008/1/6, Scott Marlowe <scott.marlowe@gmail.com>: > > > On Jan 6, 2008 5:06 AM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > > > > > > Then I rebuilt and reinstalled postgresql with the xlog_seg_size set > > > to the default 16MB and did initdb. Now the time is 7,642 sec. > > > > > > I'm lost. It looks like 1GB xlog_seg_size is indeed faster than 16MB > > > but again it is slower than the production server which uses the > > > default xlog_seg_size. > > > > How fast was it the second time you ran it? > > You mean the new server with 16MB xlog_seg_size? Yes, I did run it > twice and both took about the same time. In all tests (all > configurations) I did an analyze before running so I think that is the > real time. OK, to eliminate the chance that it's sick hardware, I would suggest installing 8.2.5 with exactly the same settings and build (as much as possible) and see how that works. If it's still slow, I would suspect the hardware is making the difference and investigate that first. Once you get even performance from 8.2.5 on both sets of hardware, then you can make a valid comparison with 8.3b4. Unless you've already done that... then I don't have a clue what to do...
2008/1/7, Scott Marlowe <scott.marlowe@gmail.com>: > On Jan 6, 2008 1:46 PM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > > 2008/1/6, Scott Marlowe <scott.marlowe@gmail.com>: > > > > > On Jan 6, 2008 5:06 AM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > > > > > > > > Then I rebuilt and reinstalled postgresql with the xlog_seg_size set > > > > to the default 16MB and did initdb. Now the time is 7,642 sec. > > > > > > > > I'm lost. It looks like 1GB xlog_seg_size is indeed faster than 16MB > > > > but again it is slower than the production server which uses the > > > > default xlog_seg_size. > > > > > > How fast was it the second time you ran it? > > > > You mean the new server with 16MB xlog_seg_size? Yes, I did run it > > twice and both took about the same time. In all tests (all > > configurations) I did an analyze before running so I think that is the > > real time. > > OK, to eliminate the chance that it's sick hardware, I would suggest > installing 8.2.5 with exactly the same settings and build (as much as > possible) and see how that works. If it's still slow, I would suspect > the hardware is making the difference and investigate that first. > Once you get even performance from 8.2.5 on both sets of hardware, > then you can make a valid comparison with 8.3b4. Unless you've already > done that... then I don't have a clue what to do... I just did it. Built and installed 8.2.5. Copied the postgresql.conf from the production. Issued an analyze and ran the insert query twice: fahstats=> insert into usuarios ( fahstats(> data, fahstats(> usuario, fahstats(> pontos, fahstats(> wus fahstats(> ) fahstats-> select fahstats-> (select data_serial from data_serial) as data, fahstats-> ui.usuario_serial as usuario, fahstats-> sum(pontos) as pontos, fahstats-> sum(wus) as wus fahstats-> from usuarios_temp as ut inner join usuarios_indice as ui fahstats-> on ut.usuario = ui.usuario_nome and ut.n_time = ui.n_time fahstats-> group by data, ui.usuario_serial fahstats-> ; INSERT 0 880479 Time: 384991.682 ms The second time it ran in 403 sec, half the production time. Regards, Clodoaldo Pinto Neto
On Mon, 7 Jan 2008, Clodoaldo wrote: > I just did it. Built and installed 8.2.5. Copied the postgresql.conf > from the production. Issued an analyze and ran the insert query twice: > The second time it ran in 403 sec, half the production time. OK, you're getting close now. What you should do now is run your query on 8.2.5 with EXPLAIN ANALYZE (the sample you gave before had just EXPLAIN), run it again on that same server with 8.3, then post the two plans. Now that it's a fair comparision looking at the differences between the two should give an idea of the cause. The only thing I noticed in your original explain plans that was different between the two was: 8.2: Merge Cond: ((ut.n_time = ui.n_time) AND ((ut.usuario)::text = "inner"."?column4?")) 8.3: Merge Cond: ((ut.n_time = ui.n_time) AND ((ut.usuario)::text = (ui.usuario_nome)::text)) I don't read enough plans to know if that's impacting things, but seeing the ANALYZE output that includes actual times in addition to estimated costs will help sort out the real issue here. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: > The only thing I noticed in your original explain plans that was different > between the two was: > 8.2: > Merge Cond: ((ut.n_time = ui.n_time) AND > ((ut.usuario)::text = "inner"."?column4?")) > 8.3: > Merge Cond: ((ut.n_time = ui.n_time) AND > ((ut.usuario)::text = (ui.usuario_nome)::text)) I believe that is just a cosmetic change: the EXPLAIN code in 8.2 and before tended to punt (and print "?columnN?") for any but the simplest references to subplan node outputs, whereas 8.3 will (in theory) get it right all the time. With no other visible differences, there's no reason to think these plans are different. regards, tom lane
2008/1/7, Greg Smith <gsmith@gregsmith.com>: > On Mon, 7 Jan 2008, Clodoaldo wrote: > > > I just did it. Built and installed 8.2.5. Copied the postgresql.conf > > from the production. Issued an analyze and ran the insert query twice: > > The second time it ran in 403 sec, half the production time. > > OK, you're getting close now. What you should do now is run your query on > 8.2.5 with EXPLAIN ANALYZE (the sample you gave before had just EXPLAIN), > run it again on that same server with 8.3, then post the two plans. Now > that it's a fair comparision looking at the differences between the two > should give an idea of the cause. The two following queries ran in this server spec: Fedora 8, Core Duo 2.33 MHz, 4 GB mem, two 7200 sata disks in Raid 1. $ uname -a Linux s1 2.6.23.9-85.fc8 #1 SMP Fri Dec 7 15:49:36 EST 2007 x86_64 x86_64 x86_64 GNU/Linux Insert query with 8.2.5, default xlog_seg_size: fahstats=> explain analyze fahstats-> insert into usuarios ( fahstats(> data, fahstats(> usuario, fahstats(> pontos, fahstats(> wus fahstats(> ) fahstats-> select fahstats-> (select data_serial from data_serial) as data, fahstats-> ui.usuario_serial as usuario, fahstats-> sum(pontos) as pontos, fahstats-> sum(wus) as wus fahstats-> from usuarios_temp as ut inner join usuarios_indice as ui fahstats-> on ut.usuario = ui.usuario_nome and ut.n_time = ui.n_time fahstats-> group by data, ui.usuario_serial fahstats-> ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Subquery Scan "*SELECT*" (cost=326089.49..350310.28 rows=880756 width=20) (actual time=11444.566..13114.365 rows=880691 loops=1) -> HashAggregate (cost=326089.49..339300.83 rows=880756 width=12) (actual time=11444.554..12438.188 rows=880691 loops=1) InitPlan -> Seq Scan on data_serial (cost=0.00..1.01 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1) -> Merge Join (cost=102546.09..267675.46 rows=5841302 width=12) (actual time=5173.428..10674.007 rows=886533 loops=1) Merge Cond: ((ut.n_time = ui.n_time) AND ((ut.usuario)::text = "inner"."?column4?")) -> Index Scan using usuarios_temp_ndx on usuarios_temp ut (cost=0.00..58476.33 rows=886533 width=26) (actual time=0.093..2493.622 rows=886533 loops=1) -> Sort (cost=102546.09..104747.98 rows=880756 width=22) (actual time=5173.315..5470.835 rows=886573 loops=1) Sort Key: ui.n_time, (ui.usuario_nome)::text -> Seq Scan on usuarios_indice ui (cost=0.00..15578.56 rows=880756 width=22) (actual time=0.023..364.002 rows=880731 loops=1) Trigger for constraint datas: time=14231.240 calls=880691 Total runtime: 356862.302 ms (12 rows) Time: 357750.531 ms Same insert query with 8.3-beta4, default xlog_seg_size: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Subquery Scan "*SELECT*" (cost=316145.48..340289.33 rows=877958 width=20) (actual time=10650.036..12997.377 rows=877895 loops=1) -> HashAggregate (cost=316145.48..329314.85 rows=877958 width=12) (actual time=10650.023..12193.890 rows=877895 loops=1) InitPlan -> Seq Scan on data_serial (cost=0.00..1.01 rows=1 width=4) (actual time=0.009..0.010 rows=1 loops=1) -> Merge Join (cost=101792.68..259032.28 rows=5711219 width=12) (actual time=4299.239..9645.146 rows=883729 loops=1) Merge Cond: ((ut.n_time = ui.n_time) AND ((ut.usuario)::text = (ui.usuario_nome)::text)) -> Index Scan using usuarios_temp_ndx on usuarios_temp ut (cost=0.00..52880.46 rows=883729 width=23) (actual time=0.097..2164.406 rows=883729 loops=1) -> Sort (cost=101792.68..103987.58 rows=877958 width=19) (actual time=4299.116..4604.372 rows=883769 loops=1) Sort Key: ui.n_time, ui.usuario_nome Sort Method: quicksort Memory: 90120kB -> Seq Scan on usuarios_indice ui (cost=0.00..15121.58 rows=877958 width=19) (actual time=0.028..297.058 rows=877935 loops=1) Trigger for constraint datas: time=33179.197 calls=877895 Total runtime: 9546878.520 ms (13 rows) Time: 9547801.116 ms Regards, Clodoaldo Pinto Neto
Clodoaldo escribió: 8.2: > Trigger for constraint datas: time=14231.240 calls=880691 > Total runtime: 356862.302 ms > (12 rows) > > Time: 357750.531 ms 8.3: > Trigger for constraint datas: time=33179.197 calls=877895 > Total runtime: 9546878.520 ms > (13 rows) So what's this constraint and why is it taking longer on the 8.3 server? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2008/1/8, Alvaro Herrera <alvherre@commandprompt.com>: > Clodoaldo escribió: > > 8.2: > > > Trigger for constraint datas: time=14231.240 calls=880691 > > Total runtime: 356862.302 ms > > (12 rows) > > > > Time: 357750.531 ms > > 8.3: > > > Trigger for constraint datas: time=33179.197 calls=877895 > > Total runtime: 9546878.520 ms > > (13 rows) > > So what's this constraint and why is it taking longer on the 8.3 server? fahstats=> \d+ usuarios Table "public.usuarios" Column | Type | Modifiers | Description ---------+---------+-----------+------------- usuario | integer | not null | data | integer | not null | wus | integer | not null | pontos | real | not null | Indexes: "ndx_usuarios_data" btree (data) CLUSTER "ndx_usuarios_usuario" btree (usuario) Foreign-key constraints: "datas" FOREIGN KEY (data) REFERENCES datas(data_serial) ON UPDATE RESTRICT ON DELETE RESTRICT Has OIDs: no fahstats=> \d+ datas Table "public.datas" Column | Type | Modifiers | Description -------------+--------------------------+-------------------------------------------------------------+------------- data_serial | integer | not null default nextval('datas_data_serial_seq'::regclass) | data | timestamp with time zone | not null | have_data | boolean | not null default true | Indexes: "pk_data" PRIMARY KEY, btree (data) "data_ndx" UNIQUE, btree (data_serial) CLUSTER Has OIDs: no Regards, Clodoaldo Pinto Neto
2008/1/8, Clodoaldo <clodoaldo.pinto.neto@gmail.com>: > 2008/1/8, Alvaro Herrera <alvherre@commandprompt.com>: > > Clodoaldo escribió: > > > > 8.2: > > > > > Trigger for constraint datas: time=14231.240 calls=880691 > > > Total runtime: 356862.302 ms > > > (12 rows) > > > > > > Time: 357750.531 ms > > > > 8.3: > > > > > Trigger for constraint datas: time=33179.197 calls=877895 > > > Total runtime: 9546878.520 ms > > > (13 rows) > > > > So what's this constraint and why is it taking longer on the 8.3 server? > > fahstats=> \d+ usuarios > Table "public.usuarios" > Column | Type | Modifiers | Description > ---------+---------+-----------+------------- > usuario | integer | not null | > data | integer | not null | > wus | integer | not null | > pontos | real | not null | > Indexes: > "ndx_usuarios_data" btree (data) CLUSTER > "ndx_usuarios_usuario" btree (usuario) > Foreign-key constraints: > "datas" FOREIGN KEY (data) REFERENCES datas(data_serial) ON UPDATE > RESTRICT ON DELETE RESTRICT > Has OIDs: no > > fahstats=> \d+ datas > Table "public.datas" > Column | Type | > Modifiers | Description > -------------+--------------------------+-------------------------------------------------------------+------------- > data_serial | integer | not null default > nextval('datas_data_serial_seq'::regclass) | > data | timestamp with time zone | not null > | > have_data | boolean | not null default true > | > Indexes: > "pk_data" PRIMARY KEY, btree (data) > "data_ndx" UNIQUE, btree (data_serial) CLUSTER > Has OIDs: no Can someone reproduce or understand this? If so is there a chance to have it fixed in RC2? I see there is a post by Tom Lane in the hackers list about mergejoins but I don't understand if it is the same issue. Regards, Clodoaldo Pinto Neto
If possible can you send the data dump of these tables.
usuarios_temp , usuarios_indice ?
Thanks,
Gokul.
usuarios_temp , usuarios_indice ?
Thanks,
Gokul.
On Jan 10, 2008 4:00 PM, Clodoaldo < clodoaldo.pinto.neto@gmail.com> wrote:
2008/1/8, Clodoaldo < clodoaldo.pinto.neto@gmail.com>:Can someone reproduce or understand this? If so is there a chance to> 2008/1/8, Alvaro Herrera <alvherre@commandprompt.com>:
> > Clodoaldo escribió:
> >
> > 8.2:
> >
> > > Trigger for constraint datas: time=14231.240 calls=880691
> > > Total runtime: 356862.302 ms
> > > (12 rows)
> > >
> > > Time: 357750.531 ms
> >
> > 8.3:
> >
> > > Trigger for constraint datas: time=33179.197 calls=877895
> > > Total runtime: 9546878.520 ms
> > > (13 rows)
> >
> > So what's this constraint and why is it taking longer on the 8.3 server?
>
> fahstats=> \d+ usuarios
> Table "public.usuarios"
> Column | Type | Modifiers | Description
> ---------+---------+-----------+-------------
> usuario | integer | not null |
> data | integer | not null |
> wus | integer | not null |
> pontos | real | not null |
> Indexes:
> "ndx_usuarios_data" btree (data) CLUSTER
> "ndx_usuarios_usuario" btree (usuario)
> Foreign-key constraints:
> "datas" FOREIGN KEY (data) REFERENCES datas(data_serial) ON UPDATE
> RESTRICT ON DELETE RESTRICT
> Has OIDs: no
>
> fahstats=> \d+ datas
> Table "public.datas"
> Column | Type |
> Modifiers | Description
> -------------+--------------------------+-------------------------------------------------------------+-------------
> data_serial | integer | not null default
> nextval('datas_data_serial_seq'::regclass) |
> data | timestamp with time zone | not null
> |
> have_data | boolean | not null default true
> |
> Indexes:
> "pk_data" PRIMARY KEY, btree (data)
> "data_ndx" UNIQUE, btree (data_serial) CLUSTER
> Has OIDs: no
have it fixed in RC2?
I see there is a post by Tom Lane in the hackers list about mergejoins
but I don't understand if it is the same issue.
Regards, Clodoaldo Pinto Neto
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend