Re: Loading 500m json files to database

Поиск

Список

Период

Сортировка

От	Ertan Küçükoğlu
Тема	Re: Loading 500m json files to database
Дата	23 марта 2020 г. 11:49:25
Msg-id	7B61A925-5F3D-421C-949B-77B1EE026168@1nar.com.tr обсуждение исходный текст
Ответ на	Loading 500m json files to database (pinker <pinker@onet.eu>)
Ответы	Re: Loading 500m json files to database Re: Loading 500m json files to database
Список	pgsql-general

Дерево обсуждения

> On 23 Mar 2020, at 13:20, pinker <pinker@onet.eu> wrote:
>
> Hi, do you have maybe idea how to make loading process faster?
>
> I have 500 millions of json files (1 json per file) that I need to load to
> db.
> My test set is "only" 1 million files.
>
> What I came up with now is:
>
> time for i in datafiles/*; do
>  psql -c "\copy json_parts(json_data) FROM $i"&
> done
>
> which is the fastest so far. But it's not what i expect. Loading 1m of data
> takes me ~3h so loading 500 times more is just unacceptable.
>
> some facts:
> * the target db is on cloud so there is no option to do tricks like turning
> fsync off
> * version postgres 11
> * i can spin up huge postgres instance if necessary in terms of cpu/ram
> * i tried already hash partitioning (to write to 10 different tables instead
> of 1)
>
>
> Any ideas?
Hello,

I may not be knowledge enough to answer your question.

However, if possible, you may think of using a local physical computer to do all uploading and after do backup/restore
oncloud system. 

Compressed backup will be far less internet traffic compared to direct data inserts.

Moreover you can do additional tricks as you mentioned.

Thanks & regards,
Ertan

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Loading 500m json files to database