planner and worst case scenario
От | Joseph Shraibman |
---|---|
Тема | planner and worst case scenario |
Дата | |
Msg-id | cbvvju$16u2$1@news.hub.org обсуждение исходный текст |
Список | pgsql-performance |
Here is my query, that returns one row: SELECT f1, f2,(SELECT dfield FROM d WHERE d.ukey = f1) FROM m WHERE status IN(2) AND jid IN(17674) ORDER BY pkey DESC LIMIT 25 OFFSET 0; Here was the really bad plan chosen. This didn't come back for a long while and had to be cancelled: QUERY PLAN ------------------------------------------------------------------------------------------------------ Limit (cost=0.00..10493.05 rows=25 width=118) -> Index Scan Backward using m_pkey on m (cost=0.00..1883712.97 rows=4488 width=118) Filter: ((status = 2) AND (jid = 17674)) SubPlan -> Index Scan using d_pkey on d (cost=0.00..3.83 rows=1 width=24) Index Cond: (ukey = $0) (6 rows) After an ANALYZE the plan was much better: QUERY PLAN ------------------------------------------------------------------------------------------------------ Limit (cost=22060.13..22060.19 rows=25 width=119) -> Sort (cost=22060.13..22067.61 rows=2993 width=119) Sort Key: serial -> Index Scan using m_jid_uid_key on m (cost=0.00..21887.32 rows=2993 width=119) Index Cond: (jid = 17674) Filter: (status = 2) SubPlan -> Index Scan using d_pkey on d (cost=0.00..3.83 rows=1 width=24) Index Cond: (ukey = $0) (9 rows) The thing is since there was only 1 row in the (very big) table with that jid, the ANALYZE didn't include that row in the stats table, so I'm figuring there was a small random change that made it choose the better query. Doing: ALTER TABLE m ALTER jid SET STATISTICS 1000; produce a much more accurate row guess: QUERY PLAN ------------------------------------------------------------------------------------------------------ Limit (cost=2909.65..2909.71 rows=25 width=115) -> Sort (cost=2909.65..2910.64 rows=395 width=115) Sort Key: serial -> Index Scan using m_jid_uid_key on m (cost=0.00..2892.61 rows=395 width=115) Index Cond: (jbid = 17674) Filter: (status = 2) SubPlan -> Index Scan using d_pkey on d (cost=0.00..3.83 rows=1 width=24) Index Cond: (userkey = $0) (9 rows) It seems the problem is that the pg planner goes for the job with the lowest projected time, but ignores the worst case scenario. I think the odds of this problem happening again are lower since the SET STATISTICS, but I don't know what triggered the really bad plan in the first place. Did pg think that because so many rows would match the limit would be filled up soon, so that a more accurate and lower assumption would cause it to choose the better plan?
В списке pgsql-performance по дате отправления:
Предыдущее
От: "P.A.M. van Dam "Дата:
Сообщение: Re: How can one see what queries are running withing a