Re: Introducing coarse grain parallelism by postgres_fdw.
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: Introducing coarse grain parallelism by postgres_fdw. |
Дата | |
Msg-id | 20140808.122410.237073172.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: Introducing coarse grain parallelism by postgres_fdw. (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Ответы |
Re: Introducing coarse grain parallelism by postgres_fdw.
(Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
|
Список | pgsql-hackers |
Hi, thank you for the comment. > Hi Kyotaro, > I looked at the patches and felt that the approach taken here is too > intrusive, considering that the feature is only for foreign scans. I agree to you premising that it's only for foreign scans but I regard it as an example of parallel execution planning. > There are quite a few members added to the generic Path, Plan structures, > whose use is is induced only through foreign scans. Each path now stores > two sets of costs, one with parallelism and one without. The parallel > values will make sense only when there is a foreign scan, which uses > parallelism, in the plan tree. So, those costs are maintained unnecessarily > or the memory for those members is wasted in most of the cases, where the > tables involved are not foreign. Also, not many foreign tables will be able > to use the parallelism, e.g. file_fdw. Although, that's my opinion; I would > like hear from others. I intended to discuss what the estimation and planning for parallel exexution (not limited to foreign scan) would be like. Backgroud worker would be able to take on executing some portion of path tree in 'parallel'. The postgres_fdw for this patch is simply a case in planning of parallel executions. Although, as you see, it does only choosing whether to go parallel for the path constructed regardless of parallel execution but thinking of the possible alternate paths of parallel execution will cost too much. Limiting to parallel scans for this discussion, the overall gain by multiple simultaneous scans distributed in path/plan tree won't be known before cost counting is done up to the root node (more precisely the common parent of them). This patch foolishly does bucket brigade of parallel cost up to root node, but there should be smarter way to shortcut it, for example, simplly picking up parallelizable nodes by scanning completed path/plan tree and calculate the probably-eliminable costs from them, then subtract it from or compare to the total (nonparallel) cost. This might be more acceptable for everyone than current implement. > Instead, an FDW which can use parallelism can add two paths one with and > one without parallelism with appropriate costs and let the logic choosing > the cheapest path take care of the actual choice. In fact, I thought, > parallelism would be always faster than the non-parallel one, except when > the foreign server is too much loaded. But we won't be able to check that > anyway. Can you point out a case where the parallelism may not win over > serial execution? It always wins against serial execution if parallel execution can launched with no extra cost. But actually it costs extra resource so I thought that parallel execution should be curbed for small gain. It's the two GUCs added by this patch and what choose_parallel_scans() does, although in non-automated way. The overloading issue is not a matter confined to parallel execution but surely it will be more severe since it is less visible and controllable from users. However, it anyhow would should go to manual configuration at end. > BTW, the name parallelism seems to be misleading here. All, it will be able > to do is fire the queries (or data fetch requests) asynchronously. So, we > might want to change the naming appropriately. It is right ragarding what I did exactly to postgres_fdw. But not allowing all intermedate tuples from child execution nodes in parallel to be piled up on memory without restriction, I suppose all 'parallel' execution to be a kind of this 'asynchronous' startup/fething. As for postgres_fdw, it would look more like 'parallel' (and perhaps more effeicient) by processing queries using libpq's single-row mode instead of a cursor but the similar processing takes place under system calls even for the case. Well, I will try to make the version not including parallel costs in plan/path structs, and single-row mode for postgres_fdw. I hope it will go towards anything. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: