Re: Parallel Seq Scan

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: Parallel Seq Scan
Дата	4 апреля 2015 г. 12:19:33
Msg-id	CAApHDvp2STf0=pQfpq+e7WA4QdYmpFM5qu_YtUpE7R0jLnH82Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Seq Scan (Andres Freund <andres@2ndquadrant.com>)
Ответы	Re: Parallel Seq Scan Re: Parallel Seq Scan
Список	pgsql-hackers

Дерево обсуждения

So I've just finished reading the impressive 244 emails (so far) about Parallel Seq scan, and I've had a quick skim over the latest patch.

Its quite exciting to think that one day we'll have parallel query in PostgreSQL, but I have to say, that I think that there's a major point about the proposed implementation that seems to have gotten forgotten about, which I can't help but think won't get that far off the ground unless more thought goes into it.

On 11 February 2015 at 09:56, Andres Freund <andres@2ndquadrant.com> wrote:

I think we're getting to the point where having a unique mapping from
the plan to the execution tree is proving to be rather limiting
anyway. Check for example discussion about join removal. But even for
current code, showing only the custom plans for the first five EXPLAIN
EXECUTEs is pretty nasty (Try explain that to somebody that doesn't know
pg internals. Their looks are worth gold and can kill you at the same
time) and should be done differently.

Going over the previous emails in this thread I see that it has been a long time since anyone discussed anything around how we might decide at planning time how many workers should be used for the query, and from the emails I don't recall anyone proposing a good idea about how this might be done, and I for one can't see how this is at all possible to do at planning time.

I think that the planner should know nothing of parallel query at all, and the planner quite possibly should go completely unmodified for this patch. One major problem I can see is that, given a query such as:

SELECT * FROM million_row_product_table WHERE category = 'ELECTRONICS';

Where we have a non-unique index on category, some plans which may be considered might be:

1. Index scan on the category index to get all rows matching 'ELECTRONICS'

2. Sequence scan on the table, filter matching rows.

3. Parallel plan which performs a series of partial sequence scans pulling out all matching rows.

I really think that if we end choosing things like plan 3, when plan 2 was thrown out because of its cost, then we'll end up consuming more CPU and I/O than we can possibly justify using. The environmentalist in me screams that this is wrong. What if we kicked off 128 worker process on some high-end hardware to do this? I certainly wouldn't want to pay the power bill. I understand there's costing built in to perhaps stop this, but I still think it's wrong headed, and we need to still choose the fastest non-parallel plan and only consider parallelising that later.

Instead what I think should happen is:

The following link has been seen before on this thread, but I'll post it again:

http://docs.oracle.com/cd/A57673_01/DOC/server/doc/A48506/pqoconce.htm

There's one key sentence in there that should not be ignored:

"It is important to note that the query is parallelized dynamically at execution time."

"dynamically at execution time"... I think this also needs to happen in PostgreSQL. If we attempt to do this parallel stuff at plan time, and we happen to plan at some quiet period, or perhaps worse, some application's start-up process happens to PREPARE a load of queries when the database is nice and quite, then quite possibly we'll end up with some highly parallel queries. Then perhaps come the time these queries are actually executed the server is very busy... Things will fall apart quite quickly due to the masses of IPC and context switches that would be going on.

I completely understand that this parallel query stuff is all quite new to us all and we're likely still trying to nail down the correct infrastructure for it to work well, so this is why I'm proposing that the planner should know nothing of parallel query, instead I think it should work more along the lines of:

* Planner should be completely oblivious to what parallel query is.

* Before executor startup the plan is passed to a function which decides if we should parallelise it, and does so if the plan meets the correct requirements. This should likely have a very fast exit path such as:

if root node's cost < parallel_query_cost_threshold

return; /* the query is not expensive enough to attempt to make parallel */

The above check will allow us to have an almost zero overhead for small low cost queries.

This function would likely also have some sort of logic in order to determine if the server has enough spare resource at the current point in time to allow queries to be parallelised (Likely this is not too important to nail this down for a first implementation).

* The plan should then be completely traversed node by node to determine which nodes can be made parallel. This would likely require an interface function to each node which returns true or false, depending on if it's safe to parallelise. For seq scan this could be a simple test to see if we're scanning a temp table.

* Before any changes are made to the plan, a complete copy of it should be made.

* Funnel nodes could then be injected below the last node in each branch which supports parallelism. If more than one branch exists with parallel enabled nodes, then it should be up to this function to determine, based on cost, which nodes will benefit the most from the additional workers. Certain other node types would need something else below the Funnel node, e.g Partial aggregation would need a new node below the Funnel to complete the aggregation.

* The first parallel enabled nodes should be passed off to the worker processes for execution.

So I quite strongly agree with Andres' comment above that we really need to move away from this 1:1 assumption about the relationship between plan nodes and executor nodes. Tom did mention some possible reasons here in his response to my INNER JOIN removals patch -> http://www.postgresql.org/message-id/32139.1427667410@sss.pgh.pa.us

Tom wrote:

"What you're doing here violates the rule that planstate trees have a one-to-one relationship to plan trees. EXPLAIN used to iterate over those trees in lockstep, and there probably still is code that does similar things (in third-party modules if not core), so I don't think we should abandon that principle."

So perhaps this needs analysis. If it's not possible, then perhaps the parallel nodes could be inserted at the end of planning, providing the executor could be coded in such a way that the parallel plan can still work with 0 worker processes. Unfortunately it seems that transitions through nodes that don't do anything is not free, so with this method there would be a slowdown of parallel enabled plans when they're executed without any worker processes.

Also here -> https://technet.microsoft.com/en-us/library/ms178065%28v=sql.105%29.aspx

There's some text that says:

"The SQL Server query optimizer does not use a parallel execution plan for a query if any one of the following conditions is true:"

"* A serial execution plan is considered faster than any possible parallel execution plan for the particular query."

I'm finding it a bit hard to get a true meaning from that, but if I'm not mistaken it means that the serial plan will be preferred over a parallel plan, as if the parallel plan does not get allocated any workers at execution time, then we don't want to be left with a slow plan...

Apologies if any of this has been discussed any already designed around, I just didn't see anything in the emails to indicate that it has.

Regards

David Rowley

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Andrew Gierth
Дата: 04 апреля 2015 г., 10:16:09
Сообщение: Re: Abbreviated keys for Numeric

Следующее

От: Amit Kapila
Дата: 04 апреля 2015 г., 16:16:00
Сообщение: Re: TABLESAMPLE patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel Seq Scan

Предыдущее

Следующее