Обсуждение: [BUGS] BUG #14780: PostgreSQL 9.6 selects a wrong plan during aggregationagainst timestamp columns
[BUGS] BUG #14780: PostgreSQL 9.6 selects a wrong plan during aggregationagainst timestamp columns
От
sogawa@yandex.ru
Дата:
The following bug has been logged on the website: Bug reference: 14780 Logged by: sogawa-sps Email address: sogawa@yandex.ru PostgreSQL version: 9.6.2 Operating system: Windows 10 Description: PostgreSQL 9.6 selects a wrong plan during aggregation against indexed timestamp columns while it's ok for other types. Given: table “log” that has three columns: user_id, day, hours. user_id character varying(36) COLLATE pg_catalog."default" NOT NULL, day timestamp without time zone, hours doubleprecision All columns have indexes. The issue is that aggregation against the 'day' field works extremely slow because makes a full scan filtering the entries that doesn’t relate to user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' select min(day) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' [ { "Execution Time": 146502.05, "Planning Time": 0.893, "Plan": { "Startup Cost": 789.02, "Actual Rows": 1, "Plans": [ { "Startup Cost": 0.44, "Actual Rows":1, "Plans": [ { "Index Cond": "(log.day IS NOT NULL)", "StartupCost": 0.44, "Scan Direction": "Forward", "Plan Width": 8, "RowsRemoved by Index Recheck": 0, "Actual Rows": 1, "Node Type": "Index Scan", "Total Cost": 1395792.54, "Plan Rows": 1770, "Relation Name": "log", "Alias": "log", "Parallel Aware": false, "Actual Total Time": 146502.015, "Output": [ "log.day" ], "Parent Relationship": "Outer", "Actual Startup Time": 146502.015, "Schema": "public", "Filter": "((log.user_id)::text= 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)", "Actual Loops": 1, "Rows Removed by Filter":12665610, "Index Name": "index_log_day" } ], "Node Type": "Limit", "Plan Rows": 1, "Parallel Aware": false, "Actual Total Time": 146502.016, "Output": [ "log.day" ], "Parent Relationship": "InitPlan", "ActualStartup Time": 146502.016, "Plan Width": 8, "Subplan Name": "InitPlan 1 (returns $0)", "Actual Loops": 1, "Total Cost": 789.02 } ], "Node Type": "Result", "Plan Rows": 1, "Parallel Aware": false, "Actual Total Time": 146502.019, "Output": [ "$0" ], "Actual Startup Time": 146502.019, "Plan Width": 8, "Actual Loops": 1, "TotalCost": 789.03 }, "Triggers": [] } ] However the almost similar query but for the double type has a correct. select min(hours) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' Server selects entries for user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' first and then aggregates among them what is correct [ { "Execution Time": 5.989, "Planning Time": 1.186, "Plan": { "Partial Mode": "Simple", "Startup Cost": 6842.66, "Actual Rows": 1, "Plans": [ { "Startup Cost":66.28, "Plan Width": 8, "Rows Removed by Index Recheck": 0, "Actual Rows": 745, "Plans": [ { "Startup Cost": 0, "Plan Width": 0, "Actual Rows": 745, "Node Type": "Bitmap Index Scan", "Index Cond": "((log.user_id)::text= 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)", "Plan Rows": 1770, "Parallel Aware": false, "Actual Total Time": 0.25, "Parent Relationship": "Outer", "ActualStartup Time": 0.25, "Total Cost": 65.84, "Actual Loops": 1, "IndexName": "index_log_user_id" } ], "Recheck Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)", "Exact Heap Blocks": 742, "Node Type": "Bitmap HeapScan", "Plan Rows": 1770, "Relation Name": "log", "Alias": "log", "ParallelAware": false, "Actual Total Time": 5.793, "Output": [ "day", "hours", "user_id" ], "Lossy Heap Blocks": 0, "Parent Relationship":"Outer", "Actual Startup Time": 0.357, "Total Cost": 6838.23, "Actual Loops":1, "Schema": "public" } ], "Node Type": "Aggregate", "Strategy": "Plain", "Plan Rows": 1, "Parallel Aware": false, "Actual Total Time": 5.946, "Output": [ "min(hours)" ], "Actual Startup Time": 5.946, "Plan Width": 8, "Actual Loops":1, "Total Cost": 6842.67 }, "Triggers": [] } ] Optimizer have to select correct plan for the timestamp fields like it does for double. WA: Rewrite query into: select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On 16 August 2017 at 02:47, <sogawa@yandex.ru> wrote: > > select min(day) from log where user_id = > 'ab056f5a-390b-41d7-ba56-897c14b679bf' This is a classic database optimization problem that is difficult to always get right. Whether to use an index on day to find the lowest values and scan until you find the specified user_id or to scan all the records for that user_id to find the minimum day will depend on the number of records each user_id has and how they're distributed across the days. If you have some users with many records then using an index on user_id can perform terribly when those ids are referenced. The classic solution is to have an index on <user_id, day> and then the database can look up the correct value in a single index probe. These kinds of problems are better addressed to pgsql-general -- greg -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs