Обсуждение: Planner Row Estimate with Function
I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the restof the query planning. The result of the function is roughly unique - there are a handful with multiple entries - butthe planner is estimating 227,745 rows. I re-ran ANALYZE on the table and the results did not change. Any suggestionson how to get more accurate planner result? Function definition: CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$ BEGIN RETURN SUBSTRING($1 FROM 3 FOR 13); END; $_$ LANGUAGE plpgsql IMMUTABLE; Explain output: # explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) = lpad(CAST('57729970'AS text), 13, '0'); QUERY PLAN ------------------------------------------------------------------------------------------- Index Scan using idx_event_card_id on event (cost=0.25..468642.89 rows=227745 width=104) Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text) Statistics: # SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM pg_statsWHERE tablename = 'idx_event_card_id'; null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation | histogram_bounds -----------+-----------+------------+------------------+-------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.0935673 | 17 | -1 | | | 0.672617 | {0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760} Thanks. Michael
Hello 2009/12/29 Michael Fork <mfork00@yahoo.com>: > I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the restof the query planning. The result of the function is roughly unique - there are a handful with multiple entries - butthe planner is estimating 227,745 rows. I re-ran ANALYZE on the table and the results did not change. Any suggestionson how to get more accurate planner result? > > > Function definition: > > CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$ > BEGIN > RETURN SUBSTRING($1 FROM 3 FOR 13); > END; > $_$ LANGUAGE plpgsql IMMUTABLE; > try CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$ SELECT SUBSTRING($1 FROM 3 FOR 13); $_$ LANGUAGE sql; regards Pavel Stehule > Explain output: > > # explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) = lpad(CAST('57729970'AS text), 13, '0'); > QUERY PLAN > ------------------------------------------------------------------------------------------- > Index Scan using idx_event_card_id on event (cost=0.25..468642.89 rows=227745 width=104) > Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text) > > Statistics: > > # SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM pg_statsWHERE tablename = 'idx_event_card_id'; > null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation | histogram_bounds > -----------+-----------+------------+------------------+-------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------- > 0.0935673 | 17 | -1 | | | 0.672617 | {0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760} > Thanks. > > > Michael > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Pavel, Thanks for the suggestion but unfortunately the planner estimate was not really affected: QUERY PLAN ------------------------------------------------------------------------------------------- Index Scan using idx_event_card_id on event (cost=0.00..401311.59 rows=223890 width=103) Index Cond: ("substring"(reference_code, 3, 13) = '0000057729970'::text) (2 rows) Thanks. Michael ----- Original Message ---- From: Pavel Stehule <pavel.stehule@gmail.com> To: Michael Fork <mfork00@yahoo.com> Cc: pgsql-general@postgresql.org Sent: Tue, December 29, 2009 12:18:52 PM Subject: Re: [GENERAL] Planner Row Estimate with Function Hello 2009/12/29 Michael Fork <mfork00@yahoo.com>: > I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the restof the query planning. The result of the function is roughly unique - there are a handful with multiple entries - butthe planner is estimating 227,745 rows. I re-ran ANALYZE on the table and the results did not change. Any suggestionson how to get more accurate planner result? > > > Function definition: > > CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$ > BEGIN > RETURN SUBSTRING($1 FROM 3 FOR 13); > END; > $_$ LANGUAGE plpgsql IMMUTABLE; > try CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$ SELECT SUBSTRING($1 FROM 3 FOR 13); $_$ LANGUAGE sql; regards Pavel Stehule > Explain output: > > # explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) = lpad(CAST('57729970'AS text), 13, '0'); > QUERY PLAN > ------------------------------------------------------------------------------------------- > Index Scan using idx_event_card_id on event (cost=0.25..468642.89 rows=227745 width=104) > Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text) > > Statistics: > > # SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM pg_statsWHERE tablename = 'idx_event_card_id'; > null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation | histogram_bounds > -----------+-----------+------------+------------------+-------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------- > 0.0935673 | 17 | -1 | | | 0.672617 | {0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760} > Thanks. > > > Michael > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
2009/12/29 Michael Fork <mfork00@yahoo.com>: > Pavel, > > Thanks for the suggestion but unfortunately the planner estimate was not really affected: any string estimation are not exact. you can use following dirty trick: http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Using_IMMUTABLE_functions_as_hints_for_the_optimizer Regards Pavel > > QUERY PLAN > ------------------------------------------------------------------------------------------- > Index Scan using idx_event_card_id on event (cost=0.00..401311.59 rows=223890 width=103) > Index Cond: ("substring"(reference_code, 3, 13) = '0000057729970'::text) > (2 rows) > Thanks. > > > Michael > > > > ----- Original Message ---- > From: Pavel Stehule <pavel.stehule@gmail.com> > To: Michael Fork <mfork00@yahoo.com> > Cc: pgsql-general@postgresql.org > Sent: Tue, December 29, 2009 12:18:52 PM > Subject: Re: [GENERAL] Planner Row Estimate with Function > > Hello > > 2009/12/29 Michael Fork <mfork00@yahoo.com>: >> I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the restof the query planning. The result of the function is roughly unique - there are a handful with multiple entries - butthe planner is estimating 227,745 rows. I re-ran ANALYZE on the table and the results did not change. Any suggestionson how to get more accurate planner result? >> >> >> Function definition: >> >> CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$ >> BEGIN >> RETURN SUBSTRING($1 FROM 3 FOR 13); >> END; >> $_$ LANGUAGE plpgsql IMMUTABLE; >> > > try > CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$ > SELECT SUBSTRING($1 FROM 3 FOR 13); > $_$ LANGUAGE sql; > > regards > Pavel Stehule > >> Explain output: >> >> # explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) = lpad(CAST('57729970'AS text), 13, '0'); >> QUERY PLAN >> ------------------------------------------------------------------------------------------- >> Index Scan using idx_event_card_id on event (cost=0.25..468642.89 rows=227745 width=104) >> Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text) >> >> Statistics: >> >> # SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM pg_statsWHERE tablename = 'idx_event_card_id'; >> null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation | histogram_bounds >> -----------+-----------+------------+------------------+-------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------- >> 0.0935673 | 17 | -1 | | | 0.672617 | {0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760} >> Thanks. >> >> >> Michael >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > >
Michael Fork wrote on 29.12.2009 18:08: > I have an index scan on a custom function that is returning a wildly > incorrect row estimate that is throwing off the rest of the query > planning. The result of the function is roughly unique - there are a > handful with multiple entries - but the planner is estimating 227,745 > rows. I re-ran ANALYZE on the table and the results did not change. > Any suggestions on how to get more accurate planner result? You can add the ROWS nnn option to your create statement to give the planner a hint about the number of rows: http://www.postgresql.org/docs/current/static/sql-createfunction.html Check out the /ROWS result_rows/ part. Thomas
Michael Fork <mfork00@yahoo.com> writes: > I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the restof the query planning. The result of the function is roughly unique - there are a handful with multiple entries - butthe planner is estimating 227,745 rows. I re-ran ANALYZE on the table and the results did not change. Any suggestionson how to get more accurate planner result? What PG version is this exactly? Also, what happened to the type='CREDIT' condition in your query? Is that a partial index? regards, tom lane
springboard_v2=# SELECT version(); version -------------------------------------------------------------------------------------------------- PostgreSQL 8.3.7 on amd64-portbld-freebsd7.2, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD] (1 row) Yes, this is partial index. I should have included the index definition earlier: # CREATE INDEX CONCURRENTLY idx_event_card_id ON trail.event(parsecardidfromreferencecode(reference_code)) WHERE type = 'CREDIT';Thanks. Michael ----- Original Message ---- From: Tom Lane <tgl@sss.pgh.pa.us> To: Michael Fork <mfork00@yahoo.com> Cc: pgsql-general@postgresql.org Sent: Tue, December 29, 2009 3:43:06 PM Subject: Re: [GENERAL] Planner Row Estimate with Function Michael Fork <mfork00@yahoo.com> writes: > I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the restof the query planning. The result of the function is roughly unique - there are a handful with multiple entries - butthe planner is estimating 227,745 rows. I re-ran ANALYZE on the table and the results did not change. Any suggestionson how to get more accurate planner result? What PG version is this exactly? Also, what happened to the type='CREDIT' condition in your query? Is that a partial index? regards, tom lane
I tried the trick below and the planner estimate was roughly the same: springboard_v2=# explain select * from trail.event where type='CREDIT' and lpad(CAST('57729970' AS text), 13, '0') = ANY(parsecardidfromreferencecode(reference_code)); QUERY PLAN --------------------------------------------------------------------------------------------- Bitmap Heap Scan on event (cost=1669366.06..15120311.84 rows=2178778 width=103) Recheck Cond: (type = 'CREDIT'::text) Filter: ('0000057729970'::text = ANY (parsecardidfromreferencecode(reference_code))) -> Bitmap Index Scan on idx_event_card_id (cost=0.00..1668821.37 rows=44565021 width=0) Thanks. Michael ----- Original Message ---- From: Pavel Stehule <pavel.stehule@gmail.com> To: Michael Fork <mfork00@yahoo.com> Cc: pgsql-general@postgresql.org Sent: Tue, December 29, 2009 1:16:10 PM Subject: Re: [GENERAL] Planner Row Estimate with Function 2009/12/29 Michael Fork <mfork00@yahoo.com>: > Pavel, > > Thanks for the suggestion but unfortunately the planner estimate was not really affected: any string estimation are not exact. you can use following dirty trick: http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Using_IMMUTABLE_functions_as_hints_for_the_optimizer Regards Pavel > > QUERY PLAN > ------------------------------------------------------------------------------------------- > Index Scan using idx_event_card_id on event (cost=0.00..401311.59 rows=223890 width=103) > Index Cond: ("substring"(reference_code, 3, 13) = '0000057729970'::text) > (2 rows) > Thanks. > > > Michael > > > > ----- Original Message ---- > From: Pavel Stehule <pavel.stehule@gmail.com> > To: Michael Fork <mfork00@yahoo.com> > Cc: pgsql-general@postgresql.org > Sent: Tue, December 29, 2009 12:18:52 PM > Subject: Re: [GENERAL] Planner Row Estimate with Function > > Hello > > 2009/12/29 Michael Fork <mfork00@yahoo.com>: >> I have an index scan on a custom function that is returning a wildly incorrect row estimate that is throwing off the restof the query planning. The result of the function is roughly unique - there are a handful with multiple entries - butthe planner is estimating 227,745 rows. I re-ran ANALYZE on the table and the results did not change. Any suggestionson how to get more accurate planner result? >> >> >> Function definition: >> >> CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$ >> BEGIN >> RETURN SUBSTRING($1 FROM 3 FOR 13); >> END; >> $_$ LANGUAGE plpgsql IMMUTABLE; >> > > try > CREATE FUNCTION parsecardidfromreferencecode(text) RETURNS text AS $_$ > SELECT SUBSTRING($1 FROM 3 FOR 13); > $_$ LANGUAGE sql; > > regards > Pavel Stehule > >> Explain output: >> >> # explain select * from trail.event where type='CREDIT' and parsecardidfromreferencecode(reference_code) = lpad(CAST('57729970'AS text), 13, '0'); >> QUERY PLAN >> ------------------------------------------------------------------------------------------- >> Index Scan using idx_event_card_id on event (cost=0.25..468642.89 rows=227745 width=104) >> Index Cond: (parsecardidfromreferencecode(reference_code) = '0000057729970'::text) >> >> Statistics: >> >> # SELECT null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs, correlation, histogram_bounds FROM pg_statsWHERE tablename = 'idx_event_card_id'; >> null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | correlation | histogram_bounds >> -----------+-----------+------------+------------------+-------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------- >> 0.0935673 | 17 | -1 | | | 0.672617 | {0000007932138,0000029448430,0000033432203,0000037841145,0000042171261,0000046399858,0000050938838,0000055122354,0003139385377,1220821134582,1261876426760} >> Thanks. >> >> >> Michael >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > >
Michael Fork <mfork00@yahoo.com> writes: >> Also, what happened to the type='CREDIT' condition in your query? Is >> that a partial index? > Yes, this is partial index. I should have included the index definition earlier: Ah. The optimizer is ignoring the index stats on the grounds that they are not representative of the whole table. I'm not entirely sure offhand whether it would be safe to use them anyway if the index predicate is known to match the query --- it seems a bit risky but on the other hand it'd probably be better than having no stats at all. What you can do to get a better estimate is to create a non-partial index on parsecardidfromreferencecode(reference_code). It's fairly likely that if you do that, maintaining the partial index as well is not worth your time; but that's a decision you'd have to make. regards, tom lane
That solved it. Thanks! Michael ----- Original Message ---- From: Tom Lane <tgl@sss.pgh.pa.us> To: Michael Fork <mfork00@yahoo.com> Cc: pgsql-general@postgresql.org Sent: Tue, December 29, 2009 11:19:42 PM Subject: Re: [GENERAL] Planner Row Estimate with Function Michael Fork <mfork00@yahoo.com> writes: >> Also, what happened to the type='CREDIT' condition in your query? Is >> that a partial index? > Yes, this is partial index. I should have included the index definition earlier: Ah. The optimizer is ignoring the index stats on the grounds that they are not representative of the whole table. I'm not entirely sure offhand whether it would be safe to use them anyway if the index predicate is known to match the query --- it seems a bit risky but on the other hand it'd probably be better than having no stats at all. What you can do to get a better estimate is to create a non-partial index on parsecardidfromreferencecode(reference_code). It's fairly likely that if you do that, maintaining the partial index as well is not worth your time; but that's a decision you'd have to make. regards, tom lane