Re: Loading 500m json files to database

Поиск
Список
Период
Сортировка
От Reid Thompson
Тема Re: Loading 500m json files to database
Дата
Msg-id 157720d0e07960130e2eef7282a3118f071e7f29.camel@omnicell.com
обсуждение исходный текст
Ответ на Loading 500m json files to database  (pinker <pinker@onet.eu>)
Список pgsql-general
On Mon, 2020-03-23 at 03:24 -0700, pinker wrote:
> [EXTERNAL SOURCE]
> 
> 
> 
> Hi, do you have maybe idea how to make loading process faster?
> 
> I have 500 millions of json files (1 json per file) that I need to load to
> db.
> My test set is "only" 1 million files.
> 
> What I came up with now is:
> 
> time for i in datafiles/*; do
>   psql -c "\copy json_parts(json_data) FROM $i"&
> done
> 
> which is the fastest so far. But it's not what i expect. Loading 1m of data
> takes me ~3h so loading 500 times more is just unacceptable.
> 
> some facts:
> * the target db is on cloud so there is no option to do tricks like turning
> fsync off
> * version postgres 11
> * i can spin up huge postgres instance if necessary in terms of cpu/ram
> * i tried already hash partitioning (to write to 10 different tables instead
> of 1)
> 
> 
> Any ideas?


https://www.gnu.org/software/parallel/ 




В списке pgsql-general по дате отправления:

Предыдущее
От: Ronnie S
Дата:
Сообщение: Partitioned table migration strategy
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: PG12 autovac issues