Re: Loading 500m json files to database

Поиск

Список

Период

Сортировка

От	Peter J. Holzer
Тема	Re: Loading 500m json files to database
Дата	24 марта 2020 г. 09:27:34
Msg-id	20200324092734.GA14300@hjp.at обсуждение исходный текст
Ответ на	Re: Loading 500m json files to database (pinker <pinker@onet.eu>)
Список	pgsql-general

Дерево обсуждения

On 2020-03-23 17:18:45 -0700, pinker wrote:
> Christopher Browne-3 wrote
> > Well, you're paying for a lot of overhead in that, as you're
> > establishing a psql command, connecting to a database, spawning a
> > backend process, starting a transactions, committing a transaction,
> > closing the backend process, disconnecting from the database, and
> > cleaning up after the launching of the psql command.  And you're
> > doing that 500 million times.
> >
> > The one thing I left off that was the loading of a single tuple into
> > json_parts.
[...]
> > Reducing that overhead is the single most important thing you can do.
>
> Yes, I was thinking about that but no idea now how to do it right now.

Do you know any programming language (Python, Perl, ...)? You'll
probably get a huge gain from writing a script that just opens the
connection once and then inserts each file.

Copy usually is even faster by a fair amount, but since you have to read
the data for each row from a different file (and - if I understood you
correctly, a remote one at that), the additional speedup is probably not
that great in this case.

Splitting the work int batches and executing several batches in parallel
probably helps.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

signature.asc

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Loading 500m json files to database

Вложения