Обсуждение: Performance of a view
Hello all, I have a fairly complex query whose performance problem I have isolated to a fairly small subset. The pertinent parts of the table structure are as follows: //------------------------- tbl_claim claim_id integer SERIAL PRIMARY KEY; claimnum varchar(32); //------------------------- tbl_invoice invoice_id integer SERIAL PRIMARY KEY; claim_id integer integer; invoicedate timestamp; active integer; //------------------------- there is an index on claimnum, and the claim_id in tbl_invoice is a foreign key which references tbl_claim I have a view which is defined as follows: //------------------------- SELECT tbl_claim.claim_id, count(tbl_invoice.invoice_id) AS count, min(tbl_invoice.invoicedate) AS invoicedate FROM tbl_claim LEFT JOIN tbl_invoice ON tbl_claim.claim_id = tbl_invoice.claim_id AND tbl_invoice.active = 1 GROUP BY tbl_claim.claim_id; //------------------------- If I runn the following: EXPLAIN ANALYZE SELECT tbl_claim.claim_id FROM tbl_claim WHERE claimnum = 'L1J8823'; I get: Index Scan using idx_claim_claimnum on tbl_claim (cost=0.00..10.01 rows=2 width=4) (actual time=0.079..0.088 rows=2 loops=1) Index Cond: ((claimnum)::text = 'L1J8823'::text) Total runtime: 0.123 ms If I run: EXPLAIN ANALYZE SELECT tbl_claim.claim_id FROM tbl_claim INNER JOIN vw_claiminvoicecount ON tbl_claim.claim_id = vw_claiminvoicecount.claim_id WHERE tbl_claim.claim_id = 217778; I get: Nested Loop (cost=17.21..25.50 rows=4 width=4) (actual time=0.069..0.076 rows=1 loops=1) -> Index Scan using tbl_claim_pkey on tbl_claim (cost=0.00..8.21 rows=2 width=4) (actual time=0.020..0.021 rows=1 loops=1) Index Cond: (claim_id = 217778) -> Materialize (cost=17.21..17.23 rows=2 width=4) (actual time=0.044..0.047 rows=1 loops=1) -> Subquery Scan vw_claiminvoicecount (cost=0.00..17.21 rows=2 width=4) (actual time=0.041..0.043 rows=1 loops=1) -> GroupAggregate (cost=0.00..17.19 rows=2 width=16) (actual time=0.039..0.040 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..17.17 rows=2 width=16) (actual time=0.024..0.030 rows=1 loops=1) -> Index Scan using tbl_claim_pkey on tbl_claim (cost=0.00..8.21 rows=2 width=4) (actual time=0.005..0.007 rows=1 loops=1) Index Cond: (217778 = claim_id) -> Index Scan using idx_tbl_invoice_claim_id on tbl_invoice (cost=0.00..4.39 rows=7 width=16) (actual time=0.014..0.018 rows=1 loops=1) Index Cond: ("outer".claim_id = tbl_invoice.claim_id) Filter: (active = 1) Total runtime: 0.232 ms However, if I run: EXPLAIN ANALYZE SELECT tbl_claim.claim_id FROM tbl_claim INNER JOIN vw_claiminvoicecount ON tbl_claim.claim_id = vw_claiminvoicecount.claim_id WHERE tbl_claim.claimnum = 'L1J8823'; I get: Merge Join (cost=60015.93..69488.39 rows=3 width=4) (actual time=4605.711..4605.762 rows=2 loops=1) Merge Cond: ("outer".claim_id = "inner".claim_id) -> Subquery Scan vw_claiminvoicecount (cost=60005.91..68940.54 rows=215119 width=4) (actual time=3074.520..4491.423 rows=157215 loops=1) -> GroupAggregate (cost=60005.91..66789.35 rows=215119 width=16) (actual time=3074.515..4265.315 rows=157215 loops=1) -> Merge Left Join (cost=60005.91..64100.37 rows=215119 width=16) (actual time=3074.493..3845.516 rows=162280 loops=1) Merge Cond: ("outer".claim_id = "inner".claim_id) -> Sort (cost=29403.35..29941.15 rows=215119 width=4) (actual time=1253.372..1392.089 rows=157216 loops=1) Sort Key: tbl_claim.claim_id -> Seq Scan on tbl_claim (cost=0.00..7775.19 rows=215119 width=4) (actual time=0.031..336.606 rows=215119 loops=1) -> Sort (cost=30602.56..31146.52 rows=217582 width=16) (actual time=1821.075..1967.639 rows=151988 loops=1) Sort Key: tbl_invoice.claim_id -> Seq Scan on tbl_invoice (cost=0.00..6967.61 rows=217582 width=16) (actual time=0.066..507.189 rows=219530 loops=1) Filter: (active = 1) -> Sort (cost=10.02..10.03 rows=2 width=4) (actual time=0.144..0.145 rows=2 loops=1) Sort Key: tbl_claim.claim_id -> Index Scan using idx_claim_claimnum on tbl_claim (cost=0.00..10.01 rows=2 width=4) (actual time=0.120..0.127 rows=2 loops=1) Index Cond: ((claimnum)::text = 'L1J8823'::text) Total runtime: 4620.653 ms I roughly understand what is happening...in the first query, the dataset is being knocked down to one row, then somehow the view is being constructed using only that subset of the claim table. In the second query, the view is being constructed from the entire dataset which is hundreds of thousands of rows, and thus is much slower. My question is how would I go about obtaining the behavior from the faster query in the slower query? I have switched the order of the tables, and tried many different permutations of the query, but no matter what I do, it seems that unless I specifically hard-code a claim_id filter on the claim_id, I am forced to run through every record. Thoughts?
Hi I have example: rnum := 0; FOR r IN EXECUTE ''SELECT * FROM ''||table_name||'' WHERE var = ''||var LOOP IF rnum = 0 THEN -- how could I get the count of records here ??? -- I tried GET DIAGNOSTICS rcount = ROW_COUNT; -- BUT it returns ZERO = 0 :( END IF; rnum := rnum + 1; END LOOP; -- here I will get count of records in "rnum", but I need in LOOP's -- first step Many thanx from Zet
On Tue, Nov 15, 2005 at 05:56:27PM +0300, Zet wrote: > Hi > > I have example: <snip> > -- here I will get count of records in "rnum", but I need in LOOP's > -- first step You can't, because by that stage the query hasn't finished running yet. What you need to do is put the output for the query in a temporary table. Then GET DIAGNOSTICS will have the rowcount. Then you can loop over that table. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
On Nov 14, 2005, at 7:40 PM, John McCawley wrote: > I have a view which is defined as follows: > > //------------------------- > SELECT tbl_claim.claim_id, count(tbl_invoice.invoice_id) AS count, > min(tbl_invoice.invoicedate) AS invoicedate > FROM tbl_claim > LEFT JOIN tbl_invoice ON tbl_claim.claim_id = > tbl_invoice.claim_id AND tbl_invoice.active = 1 > GROUP BY tbl_claim.claim_id; > //------------------------- <snip> > I roughly understand what is happening...in the first query, the > dataset is being knocked down to one row, then somehow the view is > being constructed using only that subset of the claim table. In > the second query, the view is being constructed from the entire > dataset which is hundreds of thousands of rows, and thus is much > slower. > > My question is how would I go about obtaining the behavior from the > faster query in the slower query? I have switched the order of the > tables, and tried many different permutations of the query, but no > matter what I do, it seems that unless I specifically hard-code a > claim_id filter on the claim_id, I am forced to run through every > record. > > Thoughts? I'd be curious to see what would happen if you added claimnum as a field in your view. I don't have a complete understanding of the postgres internals in terms of how it is able to push outer clauses down in to its views, but I think it might be able to optimize in that fashion if it is able to add a WHERE clause internally to the view, which it can't do in the case of claimnum since it doesn't exist in the view. -- Thomas F. O'Connell Database Architecture and Programming Co-Founder Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax)
> I'd be curious to see what would happen if you added claimnum as a > field in your view. I don't have a complete understanding of the > postgres internals in terms of how it is able to push outer clauses > down in to its views, but I think it might be able to optimize in > that fashion if it is able to add a WHERE clause internally to the > view, which it can't do in the case of claimnum since it doesn't > exist in the view. I added the claimnum and this actually slowed it down a bit because of the additional group by, however I then changed my where clause to filter on the view's claimnum rather than tbl_claim's claimnum, and I got the results I wanted. It seems to me that in the future I should always construct my views such that my WHERE clauses end up on the view and not on any tables that they join with. The only problem with this is that very often I don't know in advance what fields the client will want to search by, and now I'll end up with two steps instead of one (modify my code AND modify the view), however the speed increase is an order of magnatude and well worth it. Thanks! John