Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions
От | Robert Haas |
---|---|
Тема | Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions |
Дата | |
Msg-id | CA+TgmoaqqsrTj8BH5XDE3zRQoH0nP-8LQYw8kEfF9UMLXm96LA@mail.gmail.com обсуждение исходный текст |
Ответ на | [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions (Rafia Sabih <rafia.sabih@enterprisedb.com>) |
Ответы |
Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions
(Rafia Sabih <rafia.sabih@enterprisedb.com>)
|
Список | pgsql-hackers |
On Wed, Feb 22, 2017 at 10:25 AM, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote: > 1. Allow parallelism for queries in PL functions by passing > CURSOR_OPT_PARALLEL_OK instead of 0 to exec_prepare_plan called from > exec_stmt_execsql or exec_stmt_dynexecute. Similarly, pass > CURSOR_OPT_PARALLEL_OK instead of 0 to SPI_execute and exec_run_select > called from exec_stmt_dynexecute. CURSOR_OPT_PARALLEL_OK is passed to > these functions after checking if the statement is not trigger, since > in that case using parallelism may not be efficient. > > 2. In ExecutePlan there is an additional check to see if the query is > coming from SQL or PL functions and is having a parallel plan. In that > case we ignore the check of numberTuples since it is always one for > these functions and existing checks restrict parallelism for these > cases. Though, I understand this may not be the most elegant way to do > this, and would be pleased to know some better alternative. I think I see the problem that you're trying to solve, but I agree that this doesn't seem all that elegant. The reason why we have that numberTuples check is because we're afraid that we might be in a context like the extended-query protocol, where the caller can ask for 1 tuple, and then later ask for another tuple. That won't work, because once we shut down the workers we can't reliably generate the rest of the query results. However, I think it would probably work fine to let somebody ask for less than the full number of tuples if it's certain that they won't later ask for any more. So maybe what we ought to do is allow CURSOR_OPT_PARALLEL_OK to be set any time we know that ExecutorRun() will be called for the QueryDesc at most once rather than (as at present) only where we know it will be executed only once with a tuple-count of zero. Then we could change things in ExecutePlan so that it doesn't disable parallel query when the tuple-count is non-zero, but does take an extra argument "bool execute_only_once", and it disables parallel execution if that is not true. Also, if ExecutorRun() is called a second time for the same QueryDesc when execute_only_once is specified as true, it should elog(ERROR, ...). Then exec_execute_message(), for example, can pass that argument as false when the tuple-count is non-zero, but other places that are going to fetch a limited number of rows could pass it as true even though they also pass a row-count. I'm not sure if that's exactly right, but something along those lines seems like it should work. I think that a final patch for this functionality should involve adding CURSOR_OPT_PARALLEL_OK to appropriate places in each PL, plus maybe some infrastructure changes like the ones mentioned above. Maybe it can be divided into two patches, one to make the infrastructure changes and a second to add CURSOR_OPT_PARALLEL_OK to more places. + /* Allow parallelism if the query is read-only */ + if(read_only) + plan.cursor_options = CURSOR_OPT_PARALLEL_OK; + else + plan.cursor_options = 0; I don't think this can ever be right. If the only thing we are worried about is the query being read-only, then we could just pass CURSOR_OPT_PARALLEL_OK everywhere and planner.c would figure it out without any help from us. But that's not the problem. The problem is that the PL may be using a function like SPI_prepare_cursor(), where it's going to later use SPI_cursor_fetch() or similar. Parallel query can't handle cursor operations. Whatever changes we make to spi.c should involve passing CURSOR_OPT_PARALLEL_OK everywhere that we can be sure there will be no cursor operations and not anywhere that we might have cursor operations. Cursor operations - or more specifically anything that might try to suspend execution of the query and resume later - are the problem. Things that will already cause the tests in standard_planner() to disable parallelism don't need to be rechecked elsewhere: if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && dynamic_shared_memory_type != DSM_IMPL_NONE&& parse->commandType == CMD_SELECT && !parse->hasModifyingCTE && max_parallel_workers_per_gather> 0 && !IsParallelWorker() && !IsolationIsSerializable()) { /* all thecheap tests pass, so scan the query tree */ glob->maxParallelHazard = max_parallel_hazard(parse); glob->parallelModeOK= (glob->maxParallelHazard != PROPARALLEL_UNSAFE); } else { /* skip the query tree scan,just assume it's unsafe */ glob->maxParallelHazard = PROPARALLEL_UNSAFE; glob->parallelModeOK = false; } -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Robins TharakanДата:
Сообщение: Re: [HACKERS] Allow pg_dumpall to work without pg_authid
Следующее
От: Robert HaasДата:
Сообщение: Re: [HACKERS] Index usage for elem-contained-by-const-range clauses