Re: Parallel copy

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: Parallel copy
Дата	31 октября 2020 г. 01:09:32
Msg-id	029c7797-8526-ec37-7444-c2a8d28cc82c@iki.fi обсуждение исходный текст
Ответ на	Re: Parallel copy (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: Parallel copy (Tomas Vondra <tomas.vondra@2ndquadrant.com>) RE: Parallel copy ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
Список	pgsql-hackers

Дерево обсуждения

On 30/10/2020 22:56, Tomas Vondra wrote:
> I agree this design looks simpler. I'm a bit worried about serializing
> the parsing like this, though. It's true the current approach (where the
> first phase of parsing happens in the leader) has a similar issue, but I
> think it would be easier to improve that in that design.
> 
> My plan was to parallelize the parsing roughly like this:
> 
> 1) split the input buffer into smaller chunks
> 
> 2) let workers scan the buffers and record positions of interesting
> characters (delimiters, quotes, ...) and pass it back to the leader
> 
> 3) use the information to actually parse the input data (we only need to
> look at the interesting characters, skipping large parts of data)
> 
> 4) pass the parsed chunks to workers, just like in the current patch
> 
> 
> But maybe something like that would be possible even with the approach
> you propose - we could have a special parse phase for processing each
> buffer, where any worker could look for the special characters, record
> the positions in a bitmap next to the buffer. So the whole sequence of
> states would look something like this:
> 
>       EMPTY
>       FILLED
>       PARSED
>       READY
>       PROCESSING

I think it's even simpler than that. You don't need to communicate the 
"interesting positions" between processes, if the same worker takes care 
of the chunk through all states from FILLED to DONE.

You can build the bitmap of interesting positions immediately in FILLED 
state, independently of all previous blocks. Once you've built the 
bitmap, you need to wait for the information on where the first line 
starts, but presumably finding the interesting positions is the 
expensive part.

> Of course, the question is whether parsing really is sufficiently
> expensive for this to be worth it.

Yeah, I don't think it's worth it. Splitting the lines is pretty fast, I 
think we have many years to come before that becomes a bottleneck. But 
if it turns out I'm wrong and we need to implement that, the path is 
pretty straightforward.

- Heikki

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tomas Vondra
Дата: 31 октября 2020 г., 01:08:19
Сообщение: Re: Extending range type operators to cope with elements

Следующее

От: Tom Lane
Дата: 31 октября 2020 г., 01:12:19
Сообщение: Re: making update/delete of inheritance trees scale better

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel copy

Предыдущее

Следующее