Обсуждение: Wrong plan sequential scan instead of an index one
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, take a look at those plans: test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE pvcp in (select id from l_pvcp where value ilike '%pi%'); QUERY PLAN - --------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=154279.01..154279.01 rows=1 width=8) (actual time=4010.094..4010.096 rows=1 loops=1) -> Hash IN Join (cost=2.22..153835.49 rows=177404 width=8) (actual time=2.908..4001.814 rows=7801 loops=1) Hash Cond: ("outer".pvcp = "inner".id) -> Seq Scan on t_oa_2_00_card (cost=0.00..147670.82 rows=877682 width=12) (actual time=0.030..2904.522 rows=877682loops=1) -> Hash (cost=2.17..2.17 rows=19 width=4) (actual time=0.093..0.093 rows=1 loops=1) -> Seq Scan on l_pvcp (cost=0.00..2.17 rows=19 width=4) (actual time=0.066..0.081 rows=1 loops=1) Filter: (value ~~* '%pi%'::text) Total runtime: 4010.413 ms (8 rows) test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE pvcp in (select id from l_pvcp where value ilike 'pi'); QUERY PLAN - ---------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=93540.82..93540.83 rows=1 width=8) (actual time=55.333..55.334 rows=1 loops=1) -> Nested Loop (cost=84.60..93447.44 rows=37348 width=8) (actual time=2.730..46.770 rows=7801 loops=1) -> HashAggregate (cost=2.18..2.22 rows=4 width=4) (actual time=0.089..0.092 rows=1 loops=1) -> Seq Scan on l_pvcp (cost=0.00..2.17 rows=4 width=4) (actual time=0.065..0.081 rows=1 loops=1) Filter: (value ~~* 'pi'::text) -> Bitmap Heap Scan on t_oa_2_00_card (cost=82.42..23216.95 rows=11548 width=12) (actual time=2.633..29.566 rows=7801loops=1) Recheck Cond: (t_oa_2_00_card.pvcp = "outer".id) -> Bitmap Index Scan on i3_t_oa_2_00_card (cost=0.00..82.42 rows=11548 width=0) (actual time=2.050..2.050rows=7801 loops=1) Index Cond: (t_oa_2_00_card.pvcp = "outer".id) Total runtime: 55.454 ms (10 rows) Isn't too much choose a sequential scan due to 19 estimated rows when with 4 estimated does a correct index scan ? Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGDNlB7UpzwH2SGd4RAjY8AJ9yrIaQe297m3Lh7+ZVM4i9hoqlYQCeJFGL z00RLwJ5yR/7bOT2TVx+JVA= =1lOI -----END PGP SIGNATURE-----
> Hi all, > take a look at those plans: Try changing random_page_cost from the default 4 to 2 in postgresql.conf: random_page_cost = 2 The default in postgresql is somewhat conservative. This setting indicates for postgresql how fast your disks are, the lower the faster. Could this setting be changed to 2 as default rather than 4? regards Claus
Gaetano Mendola wrote: > -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > Hi all, take a look at those plans: > > > test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE > pvcp in (select id from l_pvcp where value ilike '%pi%'); > -> Hash IN Join (cost=2.22..153835.49 rows=177404 width=8) (actual > time=2.908..4001.814 rows=7801 loops=1) Hash Cond: ("outer".pvcp = > "inner".id) > Isn't too much choose a sequential scan due to 19 estimated rows when > with 4 estimated does a correct index scan ? I don't think it's the matches on l_pvcp that's the problem, it's the fact that it thinks its getting 177404 rows matching the IN. Now, why 19 rows from the subquery should produce such a large estimate in the outer query I'm not sure. Any strange distribution of values on pvcp? -- Richard Huxton Archonet Ltd
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Claus Guttesen wrote: >> Hi all, >> take a look at those plans: > > Try changing random_page_cost from the default 4 to 2 in postgresql.conf: > > random_page_cost = 2 > > The default in postgresql is somewhat conservative. This setting > indicates for postgresql how fast your disks are, the lower the > faster. > > Could this setting be changed to 2 as default rather than 4? I have tuned that number already at 2.5, lowering it to 2 doesn't change the plan. Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGDOGa7UpzwH2SGd4RAjvaAKDAbz/vxwyOBPCILGpw8rBSvTFMtACfRPBe yMge0RFfww0ef7xrGBLal7o= =k+RM -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Richard Huxton wrote: > Gaetano Mendola wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> >> Hi all, take a look at those plans: >> >> >> test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE >> pvcp in (select id from l_pvcp where value ilike '%pi%'); > >> -> Hash IN Join (cost=2.22..153835.49 rows=177404 width=8) (actual >> time=2.908..4001.814 rows=7801 loops=1) Hash Cond: ("outer".pvcp = >> "inner".id) > >> Isn't too much choose a sequential scan due to 19 estimated rows when >> with 4 estimated does a correct index scan ? > > I don't think it's the matches on l_pvcp that's the problem, it's the > fact that it thinks its getting 177404 rows matching the IN. > > Now, why 19 rows from the subquery should produce such a large estimate > in the outer query I'm not sure. Any strange distribution of values on > pvcp? I don't know what do you mean for strange, this is the distribution: test=# select count(*) from t_oa_2_00_card; count - -------- 877682 (1 row) test=# select count(*), pvcp from t_oa_2_00_card group by pvcp; count | pvcp - -------+------ 13 | 2 | 94 57 | 93 250 | 90 8158 | 89 4535 | 88 3170 | 87 13711 | 86 5442 | 85 2058 | 84 44 | 83 1 | 82 4 | 80 1 | 79 14851 | 78 12149 | 77 149 | 76 9 | 75 4 | 74 2 | 73 5 | 72 28856 | 71 12847 | 70 8183 | 69 11246 | 68 9232 | 67 14433 | 66 13970 | 65 3616 | 64 2996 | 63 7801 | 62 3329 | 61 949 | 60 35168 | 59 18752 | 58 1719 | 57 1031 | 56 1585 | 55 2125 | 54 9007 | 53 22060 | 52 2800 | 51 5629 | 50 16970 | 49 8254 | 48 11448 | 47 20253 | 46 3637 | 45 13876 | 44 19002 | 43 17940 | 42 5022 | 41 24478 | 40 2374 | 39 4885 | 38 3779 | 37 3532 | 36 11783 | 35 15843 | 34 14546 | 33 29171 | 32 5048 | 31 13411 | 30 6746 | 29 375 | 28 9244 | 27 10577 | 26 36096 | 25 3827 | 24 29497 | 23 20362 | 22 8068 | 21 2936 | 20 661 | 19 8224 | 18 3016 | 17 7731 | 16 8792 | 15 4486 | 14 3 | 13 6859 | 12 4576 | 11 13377 | 10 14578 | 9 6991 | 8 52714 | 7 6477 | 6 11445 | 5 24690 | 4 10522 | 3 2917 | 2 34694 | 1 (92 rows) I think that estimate is something like: 877682 / 92 * 19 Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGDONZ7UpzwH2SGd4RAhs3AKCYWgyn3vkzDvhWl/tF1TRs/nDT7QCeJDZu k9hQ0WBS1cFHcCjIs3jca0Y= =RIDE -----END PGP SIGNATURE-----
Gaetano Mendola wrote: > > Richard Huxton wrote: >> >> Now, why 19 rows from the subquery should produce such a large estimate >> in the outer query I'm not sure. Any strange distribution of values on >> pvcp? > > I don't know what do you mean for strange, this is the distribution: > > test=# select count(*) from t_oa_2_00_card; > count > - -------- > 877682 > (1 row) > > test=# select count(*), pvcp from t_oa_2_00_card group by pvcp; > count | pvcp > - -------+------ > (92 rows) > > > I think that estimate is something like: 877682 / 92 * 19 So if you actually had 19 matches for '%pi%' it might be a sensible plan then. I'm afraid I don't know of any way to improve PG's prediction on how many matches you'll get for a substring pattern though. -- Richard Huxton Archonet Ltd
I don't know about postgres, but in oracle it could be better to write: SELECT COUNT(distinct c.id) FROM t_oa_2_00_card c,l_pvcp l WHERE l.value ilike '%pi%' and c.pvcp=l.id; or SELECT COUNT(c.id) FROM t_oa_2_00_card c, (select distinct id from l_pvcp where value ilike '%pi%') l WHERE c.pvcp=l.id; depending how many rows, what kind of rows, ... are in l_pvcp table. having index in t_oa_2_00_card.pvcp can slow queries in oracle. Ismo On Fri, 30 Mar 2007, Gaetano Mendola wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Richard Huxton wrote: > > Gaetano Mendola wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > >> > >> Hi all, take a look at those plans: > >> > >> > >> test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE > >> pvcp in (select id from l_pvcp where value ilike '%pi%'); > > > >> -> Hash IN Join (cost=2.22..153835.49 rows=177404 width=8) (actual > >> time=2.908..4001.814 rows=7801 loops=1) Hash Cond: ("outer".pvcp = > >> "inner".id) > > > >> Isn't too much choose a sequential scan due to 19 estimated rows when > >> with 4 estimated does a correct index scan ? > > > > I don't think it's the matches on l_pvcp that's the problem, it's the > > fact that it thinks its getting 177404 rows matching the IN. > > > > Now, why 19 rows from the subquery should produce such a large estimate > > in the outer query I'm not sure. Any strange distribution of values on > > pvcp? > > I don't know what do you mean for strange, this is the distribution: > > test=# select count(*) from t_oa_2_00_card; > count > - -------- > 877682 > (1 row) > > test=# select count(*), pvcp from t_oa_2_00_card group by pvcp; > count | pvcp > - -------+------ > 13 | > 2 | 94 > 57 | 93 > 250 | 90 > 8158 | 89 > 4535 | 88 > 3170 | 87 > 13711 | 86 > 5442 | 85 > 2058 | 84 > 44 | 83 > 1 | 82 > 4 | 80 > 1 | 79 > 14851 | 78 > 12149 | 77 > 149 | 76 > 9 | 75 > 4 | 74 > 2 | 73 > 5 | 72 > 28856 | 71 > 12847 | 70 > 8183 | 69 > 11246 | 68 > 9232 | 67 > 14433 | 66 > 13970 | 65 > 3616 | 64 > 2996 | 63 > 7801 | 62 > 3329 | 61 > 949 | 60 > 35168 | 59 > 18752 | 58 > 1719 | 57 > 1031 | 56 > 1585 | 55 > 2125 | 54 > 9007 | 53 > 22060 | 52 > 2800 | 51 > 5629 | 50 > 16970 | 49 > 8254 | 48 > 11448 | 47 > 20253 | 46 > 3637 | 45 > 13876 | 44 > 19002 | 43 > 17940 | 42 > 5022 | 41 > 24478 | 40 > 2374 | 39 > 4885 | 38 > 3779 | 37 > 3532 | 36 > 11783 | 35 > 15843 | 34 > 14546 | 33 > 29171 | 32 > 5048 | 31 > 13411 | 30 > 6746 | 29 > 375 | 28 > 9244 | 27 > 10577 | 26 > 36096 | 25 > 3827 | 24 > 29497 | 23 > 20362 | 22 > 8068 | 21 > 2936 | 20 > 661 | 19 > 8224 | 18 > 3016 | 17 > 7731 | 16 > 8792 | 15 > 4486 | 14 > 3 | 13 > 6859 | 12 > 4576 | 11 > 13377 | 10 > 14578 | 9 > 6991 | 8 > 52714 | 7 > 6477 | 6 > 11445 | 5 > 24690 | 4 > 10522 | 3 > 2917 | 2 > 34694 | 1 > (92 rows) > > > I think that estimate is something like: 877682 / 92 * 19 > > > Regards > Gaetano Mendola > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.5 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGDONZ7UpzwH2SGd4RAhs3AKCYWgyn3vkzDvhWl/tF1TRs/nDT7QCeJDZu > k9hQ0WBS1cFHcCjIs3jca0Y= > =RIDE > -----END PGP SIGNATURE----- > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
On Fri, Mar 30, 2007 at 12:08:26PM +0200, Gaetano Mendola wrote: > Claus Guttesen wrote: > > Try changing random_page_cost from the default 4 to 2 in postgresql.conf: > > > > random_page_cost = 2 > > I have tuned that number already at 2.5, lowering it to 2 doesn't change > the plan. The following 19-fold overestimate is influencing the rest of the plan: -> Seq Scan on l_pvcp (cost=0.00..2.17 rows=19 width=4) (actual time=0.066..0.081 rows=1 loops=1) Filter: (value ~~* '%pi%'::text) Have you tried increasing the statistics target on l_pvcp.value? I ran your queries against canned data in 8.2.3 and better statistics resulted in more accurate row count estimates for this and other parts of the plan. I don't recall if estimates for non-leading-character matches in earlier versions can benefit from better statistics. -- Michael Fuhr
On Fri, Mar 30, 2007 at 04:46:11AM -0600, Michael Fuhr wrote: > Have you tried increasing the statistics target on l_pvcp.value? > I ran your queries against canned data in 8.2.3 and better statistics > resulted in more accurate row count estimates for this and other > parts of the plan. I don't recall if estimates for non-leading-character > matches in earlier versions can benefit from better statistics. This might work only in 8.2. I see the following in the Release Notes: * Improve the optimizer's selectivity estimates for LIKE, ILIKE, and regular expression operations (Tom) -- Michael Fuhr
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael Fuhr wrote: > On Fri, Mar 30, 2007 at 12:08:26PM +0200, Gaetano Mendola wrote: >> Claus Guttesen wrote: >>> Try changing random_page_cost from the default 4 to 2 in postgresql.conf: >>> >>> random_page_cost = 2 >> I have tuned that number already at 2.5, lowering it to 2 doesn't change >> the plan. > > The following 19-fold overestimate is influencing the rest of the > plan: > > -> Seq Scan on l_pvcp (cost=0.00..2.17 rows=19 width=4) (actual time=0.066..0.081 rows=1 loops=1) > Filter: (value ~~* '%pi%'::text) > > Have you tried increasing the statistics target on l_pvcp.value? > I ran your queries against canned data in 8.2.3 and better statistics > resulted in more accurate row count estimates for this and other > parts of the plan. I don't recall if estimates for non-leading-character > matches in earlier versions can benefit from better statistics. > test=# alter table l_pvcp alter column value set statistics 1000; ALTER TABLE test=# analyze l_pvcp; ANALYZE test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE pvcp in (select id from l_pvcp where value ilike '%pi%'); QUERY PLAN - --------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=154321.83..154321.84 rows=1 width=8) (actual time=4948.627..4948.628 rows=1 loops=1) -> Hash IN Join (cost=2.22..153877.08 rows=177898 width=8) (actual time=2.262..4940.395 rows=7801 loops=1) Hash Cond: ("outer".pvcp = "inner".id) -> Seq Scan on t_oa_2_00_card (cost=0.00..147695.25 rows=880125 width=12) (actual time=0.040..3850.074 rows=877682loops=1) -> Hash (cost=2.17..2.17 rows=19 width=4) (actual time=0.073..0.073 rows=1 loops=1) -> Seq Scan on l_pvcp (cost=0.00..2.17 rows=19 width=4) (actual time=0.052..0.067 rows=1 loops=1) Filter: (value ~~* '%pi%'::text) Total runtime: 4948.717 ms (8 rows) and nothing changed. Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGDPVS7UpzwH2SGd4RAp+DAJ9Z5HdDcKx9rOQDbm+uAdb8uEc8OgCgjGmM Z351j5icCHT4yMOLEu3ZcJY= =CY1c -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael Fuhr wrote: > On Fri, Mar 30, 2007 at 04:46:11AM -0600, Michael Fuhr wrote: >> Have you tried increasing the statistics target on l_pvcp.value? >> I ran your queries against canned data in 8.2.3 and better statistics >> resulted in more accurate row count estimates for this and other >> parts of the plan. I don't recall if estimates for non-leading-character >> matches in earlier versions can benefit from better statistics. > > This might work only in 8.2. I see the following in the Release Notes: > > * Improve the optimizer's selectivity estimates for LIKE, ILIKE, > and regular expression operations (Tom) I will try same select on a 8.2 ( that one was a 8.1 ) and I'll let you know. Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGDPXk7UpzwH2SGd4RAsQcAKCs5sh3mYuE2TMdbtdxxgSOs989JACglT1H 44s1hJZJ5upBzIPwLigoxa4= =Aas2 -----END PGP SIGNATURE-----
Gaetano Mendola wrote: > > The match 19 for '%pi%' is estimated, the real matches are: > > test=# select id from l_pvcp where value ilike '%pi%'; > id > - ---- > 62 > (1 row) > > > test=# select id from l_pvcp where value ilike 'pi'; > id > - ---- > 62 > (1 row) > > so one row in both cases, that's why I expect for both same plan. Ah, but it's got no way of knowing what matches you'll get for '%anything%'. There's no easy way to get statistics for matching substrings. -- Richard Huxton Archonet Ltd
Gaetano Mendola wrote: > Michael Fuhr wrote: >> On Fri, Mar 30, 2007 at 04:46:11AM -0600, Michael Fuhr wrote: >>> Have you tried increasing the statistics target on l_pvcp.value? >>> I ran your queries against canned data in 8.2.3 and better statistics >>> resulted in more accurate row count estimates for this and other >>> parts of the plan. I don't recall if estimates for non-leading-character >>> matches in earlier versions can benefit from better statistics. >> This might work only in 8.2. I see the following in the Release Notes: >> >> * Improve the optimizer's selectivity estimates for LIKE, ILIKE, >> and regular expression operations (Tom) > > > I will try same select on a 8.2 ( that one was a 8.1 ) and I'll let you > know. You will also need to set statistics for the column to at least 100 to trigger the improved selectivity estimate if memory serves. Not enough time to check the code, but Tom could better advise. Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com
Richard Huxton <dev@archonet.com> writes: > Ah, but it's got no way of knowing what matches you'll get for > '%anything%'. There's no easy way to get statistics for matching substrings. 8.2 actually tries the match on the most-common-values list, if said list is big enough (I think the threshold is stats target = 100). Not sure if that will help here, though. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: > Richard Huxton <dev@archonet.com> writes: >> Ah, but it's got no way of knowing what matches you'll get for >> '%anything%'. There's no easy way to get statistics for matching substrings. > > 8.2 actually tries the match on the most-common-values list, if said > list is big enough (I think the threshold is stats target = 100). > Not sure if that will help here, though. I didn't change the stats target and I obtain on a 8.2 engine the result I was expecting. test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE pvcp in (select id from l_pvcp where value ilike '%pi%'); QUERY PLAN - --------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=163228.76..163228.77 rows=1 width=8) (actual time=23.398..23.398 rows=1 loops=1) -> Nested Loop (cost=74.71..163020.31 rows=83380 width=8) (actual time=2.237..18.580 rows=7801 loops=1) -> HashAggregate (cost=2.22..2.41 rows=19 width=4) (actual time=0.043..0.045 rows=1 loops=1) -> Seq Scan on l_pvcp (cost=0.00..2.17 rows=19 width=4) (actual time=0.028..0.037 rows=1 loops=1) Filter: (value ~~* '%pi%'::text) -> Bitmap Heap Scan on t_oa_2_00_card (cost=72.49..8525.04 rows=4388 width=12) (actual time=2.188..9.204 rows=7801loops=1) Recheck Cond: (t_oa_2_00_card.pvcp = l_pvcp.id) -> Bitmap Index Scan on i3_t_oa_2_00_card (cost=0.00..71.39 rows=4388 width=0) (actual time=1.768..1.768rows=7801 loops=1) Index Cond: (t_oa_2_00_card.pvcp = l_pvcp.id) Total runtime: 23.503 ms (10 rows) test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE pvcp in (select id from l_pvcp where value ilike 'pi'); QUERY PLAN - --------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=38343.44..38343.45 rows=1 width=8) (actual time=23.386..23.387 rows=1 loops=1) -> Nested Loop (cost=76.52..38299.55 rows=17554 width=8) (actual time=2.246..18.576 rows=7801 loops=1) -> HashAggregate (cost=2.18..2.22 rows=4 width=4) (actual time=0.041..0.043 rows=1 loops=1) -> Seq Scan on l_pvcp (cost=0.00..2.17 rows=4 width=4) (actual time=0.026..0.035 rows=1 loops=1) Filter: (value ~~* 'pi'::text) -> Bitmap Heap Scan on t_oa_2_00_card (cost=74.33..9519.48 rows=4388 width=12) (actual time=2.198..9.161 rows=7801loops=1) Recheck Cond: (t_oa_2_00_card.pvcp = l_pvcp.id) -> Bitmap Index Scan on i3_t_oa_2_00_card (cost=0.00..73.24 rows=4388 width=0) (actual time=1.779..1.779rows=7801 loops=1) Index Cond: (t_oa_2_00_card.pvcp = l_pvcp.id) Total runtime: 23.491 ms (10 rows) I had to lower the random_page_cost = 2.5 in order to avoid the sequential scan on the big table t_oa_2_00_card. this is a +1 to update our engines to a 8.2. Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGEN237UpzwH2SGd4RAo9yAJ9K7bTa5eEUjvPjk/OcAMgt+AncmQCfbkBH FlomqoY1ASv3TDkd9L5hgG4= =ZLS8 -----END PGP SIGNATURE-----