Re: Loading 500m json files to database

Поиск

Список

Период

Сортировка

От	Adrian Klaver
Тема	Re: Loading 500m json files to database
Дата	23 марта 2020 г. 18:31:36
Msg-id	bfe3f14a-37b7-2047-20bf-8ece013ae470@aklaver.com обсуждение исходный текст
Ответ на	Loading 500m json files to database (pinker <pinker@onet.eu>)
Ответы	Re: Loading 500m json files to database
Список	pgsql-general

Дерево обсуждения

On 3/23/20 3:24 AM, pinker wrote:
> Hi, do you have maybe idea how to make loading process faster?
> 
> I have 500 millions of json files (1 json per file) that I need to load to
> db.
> My test set is "only" 1 million files.
> 
> What I came up with now is:
> 
> time for i in datafiles/*; do
>    psql -c "\copy json_parts(json_data) FROM $i"&
> done
> 
> which is the fastest so far. But it's not what i expect. Loading 1m of data
> takes me ~3h so loading 500 times more is just unacceptable.

Aggregating the JSON files as others have suggested would help greatly.

Knowing what is happening in json_parts() might help folks provide 
further tips.


> 
> some facts:
> * the target db is on cloud so there is no option to do tricks like turning
> fsync off
> * version postgres 11
> * i can spin up huge postgres instance if necessary in terms of cpu/ram
> * i tried already hash partitioning (to write to 10 different tables instead
> of 1)
> 
> 
> Any ideas?
> 
> 
> 
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

В списке pgsql-general по дате отправления:

Предыдущее

От: Julien Rouhaud
Дата: 23 марта 2020 г., 18:22:47
Сообщение: Re: PG12 autovac issues

Следующее

От: Michael Lewis
Дата: 23 марта 2020 г., 18:55:37
Сообщение: Re: Runtime partition pruning

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Loading 500m json files to database

Предыдущее

Следующее