9.5: Memory-bounded HashAgg
От | Jeff Davis |
---|---|
Тема | 9.5: Memory-bounded HashAgg |
Дата | |
Msg-id | 1407706010.6623.16.camel@jeff-desktop обсуждение исходный текст |
Ответы |
Re: 9.5: Memory-bounded HashAgg
Re: 9.5: Memory-bounded HashAgg Re: 9.5: Memory-bounded HashAgg |
Список | pgsql-hackers |
This patch is requires the Memory Accounting patch, or something similar to track memory usage. The attached patch enables hashagg to spill to disk, which means that hashagg will contain itself to work_mem even if the planner makes a bad misestimate of the cardinality. This is a well-known concept; there's even a Berkeley homework assignment floating around to implement it -- in postgres 7.2, no less. I didn't take the exact same approach as the homework assignment suggests, but it's not much different, either. My apologies if some classes are still using this as a homework assignment, but postgres needs to eventually have an answer to this problem. Included is a GUC, "enable_hashagg_disk" (default on), which allows the planner to choose hashagg even if it doesn't expect the hashtable to fit in memory. If it's off, and the planner misestimates the cardinality, hashagg will still use the disk to contain itself to work_mem. One situation that might surprise the user is if work_mem is set too low, and the user is *relying* on a misestimate to pick hashagg. With this patch, it would end up going to disk, which might be significantly slower. The solution for the user is to increase work_mem. Rough Design: Change the hash aggregate algorithm to accept a generic "work item", which consists of an input file as well as some other bookkeeping information. Initially prime the algorithm by adding a single work item where the file is NULL, indicating that it should read from the outer plan. If the memory is exhausted during execution of a work item, then continue to allow existing groups to be aggregated, but do not allow new groups to be created in the hash table. Tuples representing new groups are saved in an output partition file referenced in the work item that is currently being executed. When the work item is done, emit any groups in the hash table, clear the hash table, and turn each output partition file into a new work item. Each time through at least some groups are able to stay in the hash table, so eventually none will need to be saved in output partitions, no new work items will be created, and the algorithm will terminate. This is true even if the number of output partitions is always one. Open items: * costing * EXPLAIN details for disk usage * choose number of partitions intelligently * performance testing Initial tests indicate that it can be competitive with sort+groupagg when the disk is involved, but more testing is required. Feedback welcome. Regards, Jeff Davis
Вложения
В списке pgsql-hackers по дате отправления: