Обсуждение: Poor performance on seq scan
Hello, I have a big table called products. Table size: 1123MB. Toast table size: 32MB. Indexes size: 380MB. I try to do a query like this: select id,name from products where name like '%Mug%'; Yes, I know that tsearch2 is better for this, but please read on. The above query gives this plan: Seq Scan on product (cost=0.00..153489.52 rows=31390 width=40) Filter: (name ~~ '%Mug%'::text) When I use this with explain analyze: "Seq Scan on product (cost=0.00..153489.52 rows=31390 width=40) (actual time=878.873..38300.588 rows=72567 loops=1)" " Filter: (name ~~ '%Mug%'::text)" "Total runtime: 38339.026 ms" Meanwhile, "iostat 5" gives something like this: tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 0 12 125.66 128 15.75 125.26 128 15.68 10 0 85 6 0 0 12 124.66 129 15.67 124.39 129 15.64 12 0 85 3 0 0 12 117.13 121 13.87 117.95 121 13.96 12 0 84 5 0 0 12 104.84 118 12.05 105.84 118 12.19 10 0 87 2 0 130 transfers per second with 12-15MB/sec transfer speed. (FreeBSD 6.1 with two STATA150 drives in gmirror RAID1) I made another test. I create a file with the identifiers and names of the products: psql#\o products.txt psql#select id,name from product; Then I can search using grep: grep "Mug" products.txt | cut -f1 -d\| There is a huge difference. This command runs within 0.5 seconds. That is, at least 76 times faster than the seq scan. It is the same if I vacuum, backup and restore the database. I thought that the table is stored in one file, and the seq scan will be actually faster than grepping the file. Can you please tell me what am I doing wrong? I'm not sure if I can increase the performance of a seq scan by adjusting the values in postgresql.conf. I do not like the idea of exporting the product table periodically into a txt file, and search with grep. :-) Another question: I have a btree index on product(name). It contains all product names and the identifiers of the products. Wouldn't it be easier to seq scan the index instead of seq scan the table? The index is only 66MB, the table is 1123MB. I'm new to this list and also I just recently started to tune postgresql so please forgive me if this is a dumb question. Regards, Laszlo
Laszlo Nagy wrote: > I made another test. I create a file with the identifiers and names of > the products: > > psql#\o products.txt > psql#select id,name from product; > > Then I can search using grep: > > grep "Mug" products.txt | cut -f1 -d\| > > There is a huge difference. This command runs within 0.5 seconds. That > is, at least 76 times faster than the seq scan. It is the same if I > vacuum, backup and restore the database. I thought that the table is > stored in one file, and the seq scan will be actually faster than > grepping the file. Can you please tell me what am I doing wrong? I'm > not sure if I can increase the performance of a seq scan by adjusting > the values in postgresql.conf. I do not like the idea of exporting the > product table periodically into a txt file, and search with grep. :-) Is there any other columns besides id and name in the table? How big is products.txt compared to the heap file? > Another question: I have a btree index on product(name). It contains > all product names and the identifiers of the products. Wouldn't it be > easier to seq scan the index instead of seq scan the table? The index > is only 66MB, the table is 1123MB. Probably, but PostgreSQL doesn't know how to do that. Even if it did, it depends on how many matches there is. If you scan the index and then fetch the matching rows from the heap, you're doing random I/O to the heap. That becomes slower than scanning the heap sequentially if you're going to get more than a few hits. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Lazlo, > Meanwhile, "iostat 5" gives something like this: > > tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id > 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 > 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 This is your problem. Do the following and report the results here: Take the number of GB of memory you have (say 2 for 2GB), multiply it by 250000. This is the number of 8KB pages you can fit in twice your ram. Let's say you have 2GB - the result is 500,000. Use that number to do the following test on your database directory: time bash -c "dd if=/dev/zero of=/<dbdir>/bigfile bs=8k count=<number_from_above> && sync" Then do this: time bash -c "dd if=/<dbdir>/bigfile of=/dev/null bs=8k" > > I made another test. I create a file with the identifiers and > names of the products: > > psql#\o products.txt > psql#select id,name from product; > > Then I can search using grep: > > grep "Mug" products.txt | cut -f1 -d\| > > There is a huge difference. This command runs within 0.5 > seconds. That is, at least 76 times faster than the seq scan. The file probably fits in the I/O cache. Your disks will at most go between 60-80MB/s, or from 5-7 times faster than what you see now. RAID 1 with one query will only deliver one disk worth of bandwidth. - Luke
Luke Lonergan írta: > Lazlo, > > >> Meanwhile, "iostat 5" gives something like this: >> >> tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id >> 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 >> 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 >> > > This is your problem. Do the following and report the results here: > > Take the number of GB of memory you have (say 2 for 2GB), multiply it by > 250000. This is the number of 8KB pages you can fit in twice your ram. > Let's say you have 2GB - the result is 500,000. > > Use that number to do the following test on your database directory: > time bash -c "dd if=/dev/zero of=/<dbdir>/bigfile bs=8k > count=<number_from_above> && sync" > I have 1GB RAM. The data directory is in /usr/local/pgsql/data. The root of this fs is /usr. time sh -c "dd if=/dev/zero of=/usr/test/bigfile bs=8k count=250000 && sync " 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 48.030627 secs (42639460 bytes/sec) 0.178u 8.912s 0:48.31 18.7% 9+96k 37+15701io 0pf+0w > Then do this: > time bash -c "dd if=/<dbdir>/bigfile of=/dev/null bs=8k" > time sh -c "dd if=/usr/test/bigfile of=/dev/null bs=8k" 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 145.293473 secs (14095609 bytes/sec) 0.110u 5.857s 2:25.31 4.1% 10+99k 32923+0io 0pf+0w At this point I thought there was another process reading doing I/O so I retried: 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 116.395211 secs (17595226 bytes/sec) 0.137u 5.658s 1:56.51 4.9% 10+103k 29082+0io 0pf+1w and again: 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 120.198224 secs (17038521 bytes/sec) 0.063u 5.780s 2:00.21 4.8% 10+98k 29776+0io 0pf+0w This is a mirrored disk with two SATA disks. In theory, writing should be slower than reading. Is this a hardware problem? Or is it that "sync" did not do the sync? Laszlo
Heikki Linnakangas wrote: > > Is there any other columns besides id and name in the table? How big > is products.txt compared to the heap file? Yes, many other columns. The products.txt is only 59MB. It is similar to the size of the index size (66MB). > >> Another question: I have a btree index on product(name). It contains >> all product names and the identifiers of the products. Wouldn't it be >> easier to seq scan the index instead of seq scan the table? The index >> is only 66MB, the table is 1123MB. > > Probably, but PostgreSQL doesn't know how to do that. Even if it did, > it depends on how many matches there is. If you scan the index and > then fetch the matching rows from the heap, you're doing random I/O to > the heap. That becomes slower than scanning the heap sequentially if > you're going to get more than a few hits. I have 700 000 rows in the table, and usually there are less than 500 hits. So probably using a "seq index scan" would be faster. :-) Now I also tried this: create table test(id int8 not null primary key, name text); insert into test select id,name from product; And then: zeusd1=> explain analyze select id,name from test where name like '%Tiffany%'; QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..26559.62 rows=79 width=40) (actual time=36.595..890.903 rows=117 loops=1) Filter: (name ~~ '%Tiffany%'::text) Total runtime: 891.063 ms (3 rows) But this might be coming from the disk cache. Thank you for your comments. We are making progress. Laszlo
Laszlo Nagy <gandalf 'at' designaproduct.biz> writes: > This is a mirrored disk with two SATA disks. In theory, writing should > be slower than reading. Is this a hardware problem? Or is it that > "sync" did not do the sync? SATA disks are supposed to be capable of lying to pg's fsync (pg asking the kernel to synchronize a write and waiting until it is finished). Same can probably happen to the "sync" command. -- Guillaume Cottenceau Create your personal SMS or WAP Service - visit http://mobilefriends.ch/
Laszlo Nagy <gandalf 'at' designaproduct.biz> writes: > > Probably, but PostgreSQL doesn't know how to do that. Even if it > > did, it depends on how many matches there is. If you scan the index > > and then fetch the matching rows from the heap, you're doing random > > I/O to the heap. That becomes slower than scanning the heap > > sequentially if you're going to get more than a few hits. > I have 700 000 rows in the table, and usually there are less than 500 > hits. So probably using a "seq index scan" would be faster. :-) Now I You can confirm this idea by temporarily disabling sequential scans. Have a look at this chapter: http://www.postgresql.org/docs/7.4/interactive/runtime-config.html#RUNTIME-CONFIG-QUERY -- Guillaume Cottenceau Create your personal SMS or WAP Service - visit http://mobilefriends.ch/
Guillaume Cottenceau wrote: > Laszlo Nagy <gandalf 'at' designaproduct.biz> writes: > >>> Probably, but PostgreSQL doesn't know how to do that. Even if it >>> did, it depends on how many matches there is. If you scan the index >>> and then fetch the matching rows from the heap, you're doing random >>> I/O to the heap. That becomes slower than scanning the heap >>> sequentially if you're going to get more than a few hits. >>> >> I have 700 000 rows in the table, and usually there are less than 500 >> hits. So probably using a "seq index scan" would be faster. :-) Now I >> > > You can confirm this idea by temporarily disabling sequential > scans. Have a look at this chapter: > I don't think it will anyway do a "seq index scan" as Laszlo envisions. PostgreSQL cannot do "fetch index tuple and apply %Mug% to it. If it matches, fetch heap tuple". Even if you disable sequential scans, it's still going to fetch every heap tuple to see if it matches "%Mug%". It's just going to do it in index order, which is slower than a seq scan. BTW: in addition to setting enable_seqscan=false, you probably have to add a dummy where-clause like "name > ''" to force the index scan. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Guillaume Cottenceau wrote: > >Laszlo Nagy <gandalf 'at' designaproduct.biz> writes: > > > >>>Probably, but PostgreSQL doesn't know how to do that. Even if it > >>>did, it depends on how many matches there is. If you scan the index > >>>and then fetch the matching rows from the heap, you're doing random > >>>I/O to the heap. That becomes slower than scanning the heap > >>>sequentially if you're going to get more than a few hits. > >>> > >>I have 700 000 rows in the table, and usually there are less than 500 > >>hits. So probably using a "seq index scan" would be faster. :-) Now I > >> > > > >You can confirm this idea by temporarily disabling sequential > >scans. Have a look at this chapter: > > I don't think it will anyway do a "seq index scan" as Laszlo envisions. > PostgreSQL cannot do "fetch index tuple and apply %Mug% to it. If it > matches, fetch heap tuple". Even if you disable sequential scans, it's > still going to fetch every heap tuple to see if it matches "%Mug%". It's > just going to do it in index order, which is slower than a seq scan. Are you saying that an indexscan "Filter" only acts after getting the heap tuple? If that's the case, then there's room for optimization here, namely if the affected column is part of the index key, then we could do the filtering before fetching the heap tuple. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Are you saying that an indexscan "Filter" only acts after getting the > heap tuple? If that's the case, then there's room for optimization > here, namely if the affected column is part of the index key, then we > could do the filtering before fetching the heap tuple. That's right. Yes, there's definitely room for optimization. In general, it seems we should detach the index scan and heap fetch more. Perhaps make them two different nodes, like the bitmap index scan and bitmap heap scan. It would allow us to do the above. It's also going to be necessary if we ever get to implement index-only scans. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera <alvherre@commandprompt.com> writes: > Are you saying that an indexscan "Filter" only acts after getting the > heap tuple? Correct. > If that's the case, then there's room for optimization > here, namely if the affected column is part of the index key, then we > could do the filtering before fetching the heap tuple. Only if the index is capable of disgorging the original value of the indexed column, a fact not in evidence in general (counterexample: polygons indexed by their bounding boxes in an r-tree). But yeah, it's interesting to think about applying filters at the index fetch step for index types that can hand back full values. This has been discussed before --- I think we had gotten as far as speculating about doing joins with just index values. See eg here: http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php A lot of the low-level concerns have already been dealt with in order to support bitmap indexscans, but applying non-indexable conditions before fetching from the heap is still not done. regards, tom lane
Lazlo, You can ignore tuning postgres and trying to use indexes, your problem is a bad hardware / OS configuration. The disks youare using should read 4-5 times faster than they are doing. Look to the SATA chipset driver in your FreeBSD config -perhaps upgrading your kernel would help. Still, the most you should expect is 5-6 times faster query than before. The data in your table is slightly larger thanRAM. When you took it out of the DBMS it was smaller than RAM, so it fit in the I/O cache. With a text scan query you are stuck with a seqscan unless you use a text index like tsearch. Buy more disks and a Raidcontroller and use Raid5 or Raid10. - Luke Msg is shrt cuz m on ma treo -----Original Message----- From: Laszlo Nagy [mailto:gandalf@designaproduct.biz] Sent: Tuesday, September 12, 2006 08:16 AM Eastern Standard Time To: Luke Lonergan; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Poor performance on seq scan Luke Lonergan írta: > Lazlo, > > >> Meanwhile, "iostat 5" gives something like this: >> >> tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id >> 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 >> 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 >> > > This is your problem. Do the following and report the results here: > > Take the number of GB of memory you have (say 2 for 2GB), multiply it by > 250000. This is the number of 8KB pages you can fit in twice your ram. > Let's say you have 2GB - the result is 500,000. > > Use that number to do the following test on your database directory: > time bash -c "dd if=/dev/zero of=/<dbdir>/bigfile bs=8k > count=<number_from_above> && sync" > I have 1GB RAM. The data directory is in /usr/local/pgsql/data. The root of this fs is /usr. time sh -c "dd if=/dev/zero of=/usr/test/bigfile bs=8k count=250000 && sync " 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 48.030627 secs (42639460 bytes/sec) 0.178u 8.912s 0:48.31 18.7% 9+96k 37+15701io 0pf+0w > Then do this: > time bash -c "dd if=/<dbdir>/bigfile of=/dev/null bs=8k" > time sh -c "dd if=/usr/test/bigfile of=/dev/null bs=8k" 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 145.293473 secs (14095609 bytes/sec) 0.110u 5.857s 2:25.31 4.1% 10+99k 32923+0io 0pf+0w At this point I thought there was another process reading doing I/O so I retried: 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 116.395211 secs (17595226 bytes/sec) 0.137u 5.658s 1:56.51 4.9% 10+103k 29082+0io 0pf+1w and again: 250000+0 records in 250000+0 records out 2048000000 bytes transferred in 120.198224 secs (17038521 bytes/sec) 0.063u 5.780s 2:00.21 4.8% 10+98k 29776+0io 0pf+0w This is a mirrored disk with two SATA disks. In theory, writing should be slower than reading. Is this a hardware problem? Or is it that "sync" did not do the sync? Laszlo
Laszlo Nagy <gandalf@designaproduct.biz> writes: > Meanwhile, "iostat 5" gives something like this: > tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id > 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 > 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 > 0 12 125.66 128 15.75 125.26 128 15.68 10 0 85 6 0 > 0 12 124.66 129 15.67 124.39 129 15.64 12 0 85 3 0 > 0 12 117.13 121 13.87 117.95 121 13.96 12 0 84 5 0 > 0 12 104.84 118 12.05 105.84 118 12.19 10 0 87 2 0 Why is that showing 85+ percent *system* CPU time?? I could believe a lot of idle CPU if the query is I/O bound, or a lot of user time if PG was being a hog about doing the ~~ comparisons (not too unlikely BTW). But if the kernel is eating all the CPU, there's something very wrong, and I don't think it's Postgres' fault. regards, tom lane
Tom Lane wrote: > Only if the index is capable of disgorging the original value of the > indexed column, a fact not in evidence in general (counterexample: > polygons indexed by their bounding boxes in an r-tree). But yeah, > it's interesting to think about applying filters at the index fetch > step for index types that can hand back full values. This has been > discussed before --- I think we had gotten as far as speculating about > doing joins with just index values. See eg here: > http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php > A lot of the low-level concerns have already been dealt with in order to > support bitmap indexscans, but applying non-indexable conditions before > fetching from the heap is still not done. > To overcome this problem, I created a smaller "shadow" table: CREATE TABLE product_search ( id int8 NOT NULL, name_desc text, CONSTRAINT pk_product_search PRIMARY KEY (id) ); insert into product_search select id, name || ' ' || coalesce(description,'') from product; Obviously, this is almost like an index, but I need to maintain it manually. I'm able to search with zeusd1=> explain analyze select id from product_search where name_desc like '%Mug%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Seq Scan on product_search (cost=0.00..54693.34 rows=36487 width=8) (actual time=20.036..2541.971 rows=91399 loops=1) Filter: (name_desc ~~ '%Mug%'::text) Total runtime: 2581.272 ms (3 rows) The total runtime remains below 3 sec in all cases. Of course I still need to join the main table to the result: explain analyze select s.id,p.name from product_search s inner join product p on (p.id = s.id) where s.name_desc like '%Tiffany%' QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..55042.84 rows=58 width=40) (actual time=164.437..3982.610 rows=117 loops=1) -> Seq Scan on product_search s (cost=0.00..54693.34 rows=58 width=8) (actual time=103.651..2717.914 rows=117 loops=1) Filter: (name_desc ~~ '%Tiffany%'::text) -> Index Scan using pk_product_id on product p (cost=0.00..6.01 rows=1 width=40) (actual time=10.793..10.796 rows=1 loops=117) Index Cond: (p.id = "outer".id) Total runtime: 4007.283 ms (6 rows) Took 4 seconds. Awesome! With the original table, it used to be one or two minutes! Now you can ask, why am I not using tsearch2 for this? Here is answer: CREATE TABLE product_search ( id int8 NOT NULL, ts_name_desc tsvector, CONSTRAINT pk_product_search PRIMARY KEY (id) ); insert into product_search select id, to_tsvector(name || ' ' coalesce(description,'')) from product; CREATE INDEX idx_product_search_ts_name_desc ON product_search USING gist (ts_name_desc); VACUUM product_search; zeusd1=> explain analyze select id from product_search where ts_name_desc @@ to_tsquery('mug'); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912 width=8) (actual time=954.669..13112.009 rows=91434 loops=1) Filter: (ts_name_desc @@ '''mug'''::tsquery) -> Bitmap Index Scan on idx_product_search_ts_name_desc (cost=0.00..25.19 rows=912 width=0) (actual time=932.455..932.455 rows=91436 loops=1) Index Cond: (ts_name_desc @@ '''mug'''::tsquery) Total runtime: 13155.724 ms (5 rows) zeusd1=> explain analyze select id from product_search where ts_name_desc @@ to_tsquery('tiffany'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912 width=8) (actual time=13151.725..13639.112 rows=76 loops=1) Filter: (ts_name_desc @@ '''tiffani'''::tsquery) -> Bitmap Index Scan on idx_product_search_ts_name_desc (cost=0.00..25.19 rows=912 width=0) (actual time=13123.705..13123.705 rows=81 loops=1) Index Cond: (ts_name_desc @@ '''tiffani'''::tsquery) Total runtime: 13639.478 ms (5 rows) At least 13 seconds, and the main table is not joined yet. Can anybody explain to me, why the seq scan is faster than the bitmap index? In the last example there were only 81 rows returned, but it took more than 13 seconds. :( Even if the whole table can be cached into memory (which isn't the case), the bitmap index should be much faster. Probably there is a big problem with my schema but I cannot find it. What am I doing wrong? Thanks, Laszlo
Tom Lane wrote: > Why is that showing 85+ percent *system* CPU time?? I could believe a > lot of idle CPU if the query is I/O bound, or a lot of user time if PG > was being a hog about doing the ~~ comparisons (not too unlikely BTW). > I'm sorry, this was really confusing. I don't know what it was - probably a background system process, started from cron (?). I retried the same query and I got this: zeusd1=> explain analyze select id,name from product where name like '%Mug%'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Seq Scan on product (cost=0.00..206891.34 rows=36487 width=40) (actual time=17.188..44585.176 rows=91399 loops=1) Filter: (name ~~ '%Mug%'::text) Total runtime: 44631.150 ms (3 rows) tty ad4 ad6 cpu tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 62 115.25 143 16.06 116.03 143 16.17 3 0 9 3 85 0 62 122.11 144 17.12 121.78 144 17.07 6 0 3 2 89 0 62 126.18 158 19.45 125.86 157 19.28 5 0 11 6 79 0 62 126.41 131 16.13 127.52 132 16.39 5 0 9 6 80 0 62 127.80 159 19.81 126.89 158 19.55 5 0 9 0 86 0 62 125.29 165 20.15 126.26 165 20.30 5 0 14 2 80 0 62 127.22 164 20.32 126.74 165 20.37 5 0 9 0 86 0 62 121.34 150 17.75 120.76 149 17.54 1 0 13 3 82 0 62 121.40 143 16.92 120.33 144 16.89 5 0 11 3 82 0 62 127.38 154 19.12 127.17 154 19.09 8 0 8 5 80 0 62 126.88 129 15.95 127.00 128 15.84 5 0 9 5 82 0 62 118.48 121 13.97 119.28 121 14.06 6 0 17 3 74 0 62 127.23 146 18.10 126.79 146 18.04 9 0 20 2 70 0 62 127.27 153 18.98 128.00 154 19.21 5 0 17 0 79 0 62 127.02 130 16.09 126.28 130 16.00 10 0 16 3 70 0 62 123.17 125 15.00 122.40 125 14.91 5 0 14 2 80 0 62 112.37 130 14.24 112.62 130 14.27 0 0 14 3 83 0 62 115.83 138 15.58 113.97 138 15.33 3 0 18 0 79 A bit better transfer rate, but nothing serious. Regards, Laszlo
Lazlo, On 9/12/06 10:01 AM, "Laszlo Nagy" <gandalf@designaproduct.biz> wrote: > zeusd1=> explain analyze select id from product_search where name_desc > like '%Mug%'; > QUERY PLAN > ------------------------------------------------------------------------------ > ------------------------------------------ > Seq Scan on product_search (cost=0.00..54693.34 rows=36487 width=8) > (actual time=20.036..2541.971 rows=91399 loops=1) > Filter: (name_desc ~~ '%Mug%'::text) > Total runtime: 2581.272 ms > (3 rows) > > The total runtime remains below 3 sec in all cases. By creating a table with only the name field you are searching, you have just reduced the size of rows so that they fit in memory. That is why your query runs faster. If your searched data doesn't grow, this is fine. If it does, you will need to fix your disk drive OS problem. - Luke
Laszlo Nagy <gandalf@designaproduct.biz> writes: > Tom Lane wrote: >> Why is that showing 85+ percent *system* CPU time?? > I'm sorry, this was really confusing. I don't know what it was - > probably a background system process, started from cron (?). I retried > the same query and I got this: > [ around 80% idle CPU, 10% system, < 10% user ] OK, so then the thing really is I/O bound, and Luke is barking up the right tree. The system CPU percentage still seems high though. I wonder if there is a software aspect to your I/O speed woes ... could the thing be doing PIO instead of DMA for instance? regards, tom lane
>> tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id >> 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 >> 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 >> 0 12 125.66 128 15.75 125.26 128 15.68 10 0 85 6 0 >> 0 12 124.66 129 15.67 124.39 129 15.64 12 0 85 3 0 >> 0 12 117.13 121 13.87 117.95 121 13.96 12 0 84 5 0 >> 0 12 104.84 118 12.05 105.84 118 12.19 10 0 87 2 0 > > Why is that showing 85+ percent *system* CPU time?? I could believe a > lot of idle CPU if the query is I/O bound, or a lot of user time if PG > was being a hog about doing the ~~ comparisons (not too unlikely BTW). > But if the kernel is eating all the CPU, there's something very wrong, > and I don't think it's Postgres' fault. There IS a bug for SATA disk drives in some versions of the Linux kernel. On a lark I ran some of the I/O tests in thisthread, and much to my surprise discovered my write speed was 6 MB/sec ... ouch! On an identical machine, differentkernel, the write speed was 54 MB/sec. A couple of hours of research turned up this: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=168363 The fix for me was to edit /boot/grub/grub.conf, like this: kernel /vmlinuz-2.6.12-1.1381_FC3 ro root=LABEL=/ rhgb quiet \ ramdisk_size=12000000 ide0=noprobe ide1=noprobe Notice the "ideX=noprobe". Instant fix -- after reboot the disk write speed jumped to what I expected. Craig
Craig A. James wrote: > > There IS a bug for SATA disk drives in some versions of the Linux > kernel. On a lark I ran some of the I/O tests in this thread, and > much to my surprise discovered my write speed was 6 MB/sec ... ouch! > On an identical machine, different kernel, the write speed was 54 MB/sec. My disks are running in SATA150 mode. Whatever it means. I'm using FreeBSD, and not just because it dynamically alters the priority of long running processes. :-) Laszlo
Lazlo, On 9/12/06 2:49 PM, "Laszlo Nagy" <gandalf@designaproduct.biz> wrote: > I'm using FreeBSD, and not just because it dynamically alters the > priority of long running processes. :-) Understood. Linux and FreeBSD often share some driver technology. I have had extremely bad performance historically with onboard SATA chipsets on Linux. The one exception has been with the Intel based chipsets (not the CPU, the I/O chipset). It is very likely that you are having problems with the driver for the chipset. Are you running RAID1 in hardware? If so, turn it off and see what the performance is. The onboard hardware RAID is worse than useless, it actually slows the I/O down. If you want RAID with onboard chipsets, use software RAID, or buy an adapter from 3Ware or Areca for $200. - Luke
Laszlo Nagy wrote: > Craig A. James wrote: >> >> There IS a bug for SATA disk drives in some versions of the Linux >> kernel. On a lark I ran some of the I/O tests in this thread, and >> much to my surprise discovered my write speed was 6 MB/sec ... ouch! >> On an identical machine, different kernel, the write speed was 54 MB/sec. > My disks are running in SATA150 mode. Whatever it means. > > I'm using FreeBSD, and not just because it dynamically alters the > priority of long running processes. :-) > I dunno if this has been suggested, but try changing the sysctl vfs.read_max. The default is 8 and results in horrible RAID performance (having said that, not sure if RAID1 is effected, only striped RAID levels...), anyway try 16 or 32 and see if you seq IO rate improves at all (tho the underlying problem does look like a poor SATA chipset/driver combination). I also found that building your ufs2 filesystems with 32K blocks and 4K fragments improved sequential performance considerably (even for 8K reads). Cheers Mark
> I have had extremely bad performance historically with onboard SATA chipsets > on Linux. The one exception has been with the Intel based chipsets (not the > CPU, the I/O chipset). > This board has Intel chipset. I cannot remember the exact type but it was not in the low end category. dmesg says: <Intel ICH7 SATA300 controller> kernel: ad4: 152626MB <SAMSUNG HD160JJ ZM100-33> at ata2-master SATA150 kernel: ad4: 152627MB <SAMSUNG HD160JJ ZM100-33> at ata3-master SATA150 > It is very likely that you are having problems with the driver for the > chipset. > > Are you running RAID1 in hardware? If so, turn it off and see what the > performance is. The onboard hardware RAID is worse than useless, it > actually slows the I/O down. > I'm using software raid, namely gmirror: GEOM_MIRROR: Device gm0 created (id=2574033628). GEOM_MIRROR: Device gm0: provider ad4 detected. GEOM_MIRROR: Device gm0: provider ad6 detected. GEOM_MIRROR: Device gm0: provider ad4 activated. GEOM_MIRROR: Device gm0: provider ad6 activated. #gmirror list Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 2574033628 Providers: 1. Name: mirror/gm0 Mediasize: 160040803328 (149G) Sectorsize: 512 Mode: r5w5e6 Consumers: 1. Name: ad4 Mediasize: 160040803840 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 1153981856 2. Name: ad6 Mediasize: 160041885696 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3520427571 I tried to do: #sysctl vfs.read_max=32 vfs.read_max: 6 -> 32 but I could not reach better disk read performance. Thank you for your suggestions. Looks like I need to buy SCSI disks. Regards, Laszlo
On 13-Sep-06, at 6:16 AM, Laszlo Nagy wrote: > >> I have had extremely bad performance historically with onboard >> SATA chipsets >> on Linux. The one exception has been with the Intel based >> chipsets (not the >> CPU, the I/O chipset). >> > This board has Intel chipset. I cannot remember the exact type but > it was not in the low end category. > dmesg says: > > <Intel ICH7 SATA300 controller> > kernel: ad4: 152626MB <SAMSUNG HD160JJ ZM100-33> at ata2-master > SATA150 > kernel: ad4: 152627MB <SAMSUNG HD160JJ ZM100-33> at ata3-master > SATA150 > >> It is very likely that you are having problems with the driver for >> the >> chipset. >> >> Are you running RAID1 in hardware? If so, turn it off and see >> what the >> performance is. The onboard hardware RAID is worse than useless, it >> actually slows the I/O down. >> > I'm using software raid, namely gmirror: > > GEOM_MIRROR: Device gm0 created (id=2574033628). > GEOM_MIRROR: Device gm0: provider ad4 detected. > GEOM_MIRROR: Device gm0: provider ad6 detected. > GEOM_MIRROR: Device gm0: provider ad4 activated. > GEOM_MIRROR: Device gm0: provider ad6 activated. > > #gmirror list > Geom name: gm0 > State: COMPLETE > Components: 2 > Balance: round-robin > Slice: 4096 > Flags: NONE > GenID: 0 > SyncID: 1 > ID: 2574033628 > Providers: > 1. Name: mirror/gm0 > Mediasize: 160040803328 (149G) > Sectorsize: 512 > Mode: r5w5e6 > Consumers: > 1. Name: ad4 > Mediasize: 160040803840 (149G) > Sectorsize: 512 > Mode: r1w1e1 > State: ACTIVE > Priority: 0 > Flags: DIRTY > GenID: 0 > SyncID: 1 > ID: 1153981856 > 2. Name: ad6 > Mediasize: 160041885696 (149G) > Sectorsize: 512 > Mode: r1w1e1 > State: ACTIVE > Priority: 0 > Flags: DIRTY > GenID: 0 > SyncID: 1 > ID: 3520427571 > > > I tried to do: > > #sysctl vfs.read_max=32 > vfs.read_max: 6 -> 32 > > but I could not reach better disk read performance. > > Thank you for your suggestions. Looks like I need to buy SCSI disks. Well before you go do that try the areca SATA raid card > > Regards, > > Laszlo > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Lazlo, >> Thank you for your suggestions. Looks like I need to buy SCSI disks. > > Well before you go do that try the areca SATA raid card Yes, by all means spend $200 and buy the Areca or 3Ware RAID card - it's a simple switch out of the cables and you should be golden. Again - you should only expect an increase in performance from 4-6 times from what you are getting now unless you increase the number of disks. - Luke
pgsql-performance-owner@postgresql.org wrote: > This board has Intel chipset. I cannot remember the exact type but it > was not in the low end category. > dmesg says: > > <Intel ICH7 SATA300 controller> > kernel: ad4: 152626MB <SAMSUNG HD160JJ ZM100-33> at ata2-master SATA150 > kernel: ad4: 152627MB <SAMSUNG HD160JJ ZM100-33> at ata3-master SATA150 There have been reported problems with ICH7 on FreeBSD mailing lists, though I can't find any that affect performance. > Components: 2 > Balance: round-robin > Slice: 4096 See if changing balance algorithm to "split", and slice size to 8192 or more, while keeping vfs.read_max to 16 or more helps your performance. (e.g. gmirror configure -b split -s 8192 gm0) Also, how is your file system mounted? (what does output from 'mount' say?)
On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote: > Laszlo Nagy wrote: > > I made another test. I create a file with the identifiers and names of > > the products: > > > > psql#\o products.txt > > psql#select id,name from product; > > > > Then I can search using grep: > > > > grep "Mug" products.txt | cut -f1 -d\| > > > > There is a huge difference. This command runs within 0.5 seconds. That > > is, at least 76 times faster than the seq scan. It is the same if I > > vacuum, backup and restore the database. I thought that the table is > > stored in one file, and the seq scan will be actually faster than > > grepping the file. Can you please tell me what am I doing wrong? I'm > > not sure if I can increase the performance of a seq scan by adjusting > > the values in postgresql.conf. I do not like the idea of exporting the > > product table periodically into a txt file, and search with grep. :-) > > Is there any other columns besides id and name in the table? How big is > products.txt compared to the heap file? > > > Another question: I have a btree index on product(name). It contains > > all product names and the identifiers of the products. Wouldn't it be > > easier to seq scan the index instead of seq scan the table? The index > > is only 66MB, the table is 1123MB. > > Probably, but PostgreSQL doesn't know how to do that. Even if it did, it > depends on how many matches there is. If you scan the index and then > fetch the matching rows from the heap, you're doing random I/O to the > heap. That becomes slower than scanning the heap sequentially if you're > going to get more than a few hits. Why match rows from the heap if ALL required data are in the index itself? Why look at the heap at all? This is the same performance problem in PostgreSQL I noticed when doing some "SELECT count(*)" queries. Look at this: explain analyze select count(*) from transakcja where data > '2005-09-09' and miesiac >= (9 + 2005 * 12) and kwota < 50; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=601557.86..601557.87 rows=1 width=0) (actual time=26733.479..26733.484 rows=1 loops=1) -> Bitmap Heap Scan on transakcja (cost=154878.00..596928.23 rows=1851852 width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1) Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision)) Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone) -> Bitmap Index Scan on idx_transakcja_miesiac_kwota (cost=0.00..154878.00 rows=5555556 width=0) (actual time=9919.967..9919.967 rows=1690402 loops=1) Index Cond: ((miesiac >= 24069) AND (kwota < 50::double precision)) Total runtime: 26733.980 ms (7 rows) The actual time retrieving tuples from the index is less than 10 seconds, but the system executes needless heap scan that takes up additional 16 seconds. Best regards, Peter
Hi, Piotr, Piotr Kołaczkowski wrote: > Why match rows from the heap if ALL required data are in the index itself? > Why look at the heap at all? Because the index does not contain any transaction informations, so it has to look to the heap to find out which of the rows are current. This is one of the more debated points in the PostgreSQL way of MVCC implementation. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Because there is no MVCC information in the index. cug 2006/9/12, Piotr Kołaczkowski <P.Kolaczkowski@elka.pw.edu.pl>: > On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote: > > Laszlo Nagy wrote: > > > I made another test. I create a file with the identifiers and names of > > > the products: > > > > > > psql#\o products.txt > > > psql#select id,name from product; > > > > > > Then I can search using grep: > > > > > > grep "Mug" products.txt | cut -f1 -d\| > > > > > > There is a huge difference. This command runs within 0.5 seconds. That > > > is, at least 76 times faster than the seq scan. It is the same if I > > > vacuum, backup and restore the database. I thought that the table is > > > stored in one file, and the seq scan will be actually faster than > > > grepping the file. Can you please tell me what am I doing wrong? I'm > > > not sure if I can increase the performance of a seq scan by adjusting > > > the values in postgresql.conf. I do not like the idea of exporting the > > > product table periodically into a txt file, and search with grep. :-) > > > > Is there any other columns besides id and name in the table? How big is > > products.txt compared to the heap file? > > > > > Another question: I have a btree index on product(name). It contains > > > all product names and the identifiers of the products. Wouldn't it be > > > easier to seq scan the index instead of seq scan the table? The index > > > is only 66MB, the table is 1123MB. > > > > Probably, but PostgreSQL doesn't know how to do that. Even if it did, it > > depends on how many matches there is. If you scan the index and then > > fetch the matching rows from the heap, you're doing random I/O to the > > heap. That becomes slower than scanning the heap sequentially if you're > > going to get more than a few hits. > > Why match rows from the heap if ALL required data are in the index itself? > Why look at the heap at all? > > This is the same performance problem in PostgreSQL I noticed when doing > some "SELECT count(*)" queries. Look at this: > > explain analyze select count(*) from transakcja where data > '2005-09-09' and > miesiac >= (9 + 2005 * 12) and kwota < 50; > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=601557.86..601557.87 rows=1 width=0) (actual > time=26733.479..26733.484 rows=1 loops=1) > -> Bitmap Heap Scan on transakcja (cost=154878.00..596928.23 rows=1851852 > width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1) > Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision)) > Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone) > -> Bitmap Index Scan on idx_transakcja_miesiac_kwota > (cost=0.00..154878.00 rows=5555556 width=0) (actual time=9919.967..9919.967 > rows=1690402 loops=1) > Index Cond: ((miesiac >= 24069) AND (kwota < 50::double > precision)) > Total runtime: 26733.980 ms > (7 rows) > > The actual time retrieving tuples from the index is less than 10 seconds, but > the system executes needless heap scan that takes up additional 16 seconds. > > Best regards, > Peter > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- PostgreSQL Bootcamp, Big Nerd Ranch Europe, Nov 2006 http://www.bignerdranch.com/news/2006-08-21.shtml