Re: Parallel Seq Scan
От | Robert Haas |
---|---|
Тема | Re: Parallel Seq Scan |
Дата | |
Msg-id | CA+TgmoZJcsOxN+46ppEDeigN-E0f5WxUH0F7H-M6YLLF1A7KFg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Parallel Seq Scan
(Amit Kapila <amit.kapila16@gmail.com>)
|
Список | pgsql-hackers |
On Fri, Apr 24, 2015 at 8:32 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> - InitializeParallelWorkers() still mixes together general parallel >> executor concerns with concerns specific to parallel sequential scan >> (e.g. EstimatePartialSeqScanSpace). > > Here we are doing 2 things, first one is for planned statement and > then second one is node specific which in the case is parallelheapscan > descriptor. So If I understand correctly, you want that we remove second > one and have a recursive function to achieve the same. Right. >> - It's hard to believe this is right: >> >> + if (parallelstmt->inst_options) >> + receiver = None_Receiver; >> >> Really? Flush the tuples if there are *any instrumentation options >> whatsoever*? At the very least, that doesn't look too future-proof, >> but I'm suspicious that it's outright incorrect. > > instrumentation info is for explain statement where we don't need > tuples and it is set same way for it as well, refer ExplainOnePlan(). > What makes you feel this is incorrect? Well, for one thing, it's going to completely invalidate the result of EXPLAIN. I mean, consider this: Hash Join -> Parallel Seq Scan -> Hash -> Seq Scan If you have the workers throw away the rows from the parallel seq scan instead of sending them back to the master, the master won't join those rows against the other table. And then the "actual" row counts, timing, etc. will all be totally wrong. Worse, if the user is EXPLAIN-ing a SELECT INTO command, the results will be totally wrong. I don't think you can use ExplainOnePlan() as precedent for the theory that explain_options != 0 means discard everything, because that function does not do that. It bases the decision to throw away the output on the fact that EXPLAIN was used, and throws it away unless an IntoClause was also specified. It does this even if instrument_options == 0. Meanwhile, auto_explain does NOT throw away the output even if instrument_options != 0, nor should it! But even if none of that were an issue, throwing away part of the results from an internal plan tree is not the same thing as throwing away the final result stream, and is dead wrong. >> - I think ParallelStmt probably shouldn't be defined in parsenodes.h. >> That file is included in a lot of places, and adding all of those >> extra #includes there doesn't seem like a good idea for modularity >> reasons even if you don't care about partial rebuilds. Something that >> includes a shm_mq obviously isn't a "parse" node in any meaningful >> sense anyway. > > How about tcop/tcopprot.h? The comment of that file is "prototypes for postgres.c". Generally, unless there is some reason to do otherwise, the prototypes for a .c file in src/backend go in a .h file with the same name in src/include. I don't see why we should do differently here. ParallelStmt should be defined and used in a file living in src/backend/executor, and the header should have the same name and go in src/include/executor. >> - I don't think you need both setup cost and startup cost. Starting >> up more workers isn't particularly more expensive than starting up >> fewer of them, because most of the overhead is in waiting for them to >> actually start, and the number of workers is reasonable, then they're >> all be doing that in parallel with each other. I suggest removing >> parallel_startup_cost and keeping parallel_setup_cost. > > There is some work (like creation of shm queues, launching of workers) > which is done proportional to number of workers during setup time. I > have kept 2 parameters to distinguish such work. I think you have a > point that start of some or all workers could be parallel, but I feel > that still is a work proportinal to number of workers. For future > parallel operations also such a parameter could be useful where we need > to setup IPC between workers or some other stuff where work is proportional > to workers. That's technically true, but the incremental work involved in supporting a new worker is extremely small compare with worker startup times. I'm guessing that the setup cost is going to be on the order of hundred-thousands or millions and and the startup cost is going to be on the order of tens or ones. Unless you can present some contrary evidence, I think we should rip it out. And I actually hope you *can't* present some contrary evidence. Because if you can, then that might mean that we need to cost every possible path from 0 up to N workers and let the costing machinery decide which one is better. If you can't, then we can cost the non-parallel path and the maximally-parallel path and be done. And that would be much better, because it will be faster. Remember, just because we cost a bunch of parallel paths doesn't mean that any of them will actually be chosen. We need to avoid generating too much additional planner work in cases where we don't end up deciding on parallelism anyway. >> - In cost_funnel(), I don't think it's right to divide the run cost by >> nWorkers + 1. Suppose we've got a plan that looks like this: >> >> Funnel >> -> Hash Join >> -> Partial Seq Scan on a >> -> Hash >> -> Seq Scan on b >> >> The sequential scan on b is going to get executed once per worker, >> whereas the effort for the sequential scan on a is going to be divided >> over all the workers. So the right way to cost this is as follows: >> >> (a) The cost of the partial sequential scan on a is equal to the cost >> of a regular sequential scan, plus a little bit of overhead to account >> for communication via the ParallelHeapScanDesc, divided by the number >> of workers + 1. >> (b) The cost of the remaining nodes under the funnel works normally. >> (c) The cost of the funnel is equal to the cost of the hash join plus >> number of tuples multiplied by per-tuple communication overhead plus a >> large fixed overhead reflecting the time it takes the workers to >> start. >> > > IIUC, the change for this would be to remove the change related to > run cost (divide the run cost by nWorkers + 1) from cost_funnel > and made similar change as suggested by point (a) in cost calculation > of partial sequence scan. Right. > As of now, we don't do anything which can > move Funnel node on top of hash join, so not sure if you are expecting > any extra handling as part of point (b) or (c). But we will want to do that in the future, so we should set up the costing correctly now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Etsuro FujitaДата:
Сообщение: Re: ATSimpleRecursion() and inheritance foreign parents