Re: Parallel copy

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Parallel copy
Дата
Msg-id CA+TgmoZw+F3y+oaxEsHEZBxdL1x1KAJ7pRMNgCqX0WjmjGNLrA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel copy  (Andres Freund <andres@anarazel.de>)
Ответы Re: Parallel copy
Список pgsql-hackers
On Thu, Apr 9, 2020 at 2:55 PM Andres Freund <andres@anarazel.de> wrote:
> I'm fairly certain that we do *not* want to distribute input data between processes on a single tuple basis. Probably
noteven below a few hundred kb. If there's any sort of natural clustering in the loaded data - extremely common, think
timestamps- splitting on a granular basis will make indexing much more expensive. And have a lot more contention. 

That's a fair point. I think the solution ought to be that once any
process starts finding line endings, it continues until it's grabbed
at least a certain amount of data for itself. Then it stops and lets
some other process grab a chunk of data.

Or are you are arguing that there should be only one process that's
allowed to find line endings for the entire duration of the load?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Default setting for enable_hashagg_disk
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16345: ts_headline does not find phrase matches correctly