Re: Loading 500m json files to database

Поиск

Список

Период

Сортировка

От	David G. Johnston
Тема	Re: Loading 500m json files to database
Дата	24 марта 2020 г. 04:11:28
Msg-id	CAKFQuwbhc02XWAJ9DNyao5M1EUf=ECxMQX0U+RftgAeDJNmKTw@mail.gmail.com обсуждение исходный текст
Ответ на	Loading 500m json files to database (pinker <pinker@onet.eu>)
Ответы	Re: Loading 500m json files to database
Список	pgsql-general

Дерево обсуждения

On Mon, Mar 23, 2020 at 3:24 AM pinker <pinker@onet.eu> wrote:

time for i in datafiles/*; do
psql -c "\copy json_parts(json_data) FROM $i"&
done

Don't know whether this is faster but it does avoid spinning up a connection multiple times.

#bash, linux

function append_each_split_file_to_etl_load_script() {
for filetoload in ./*; do
ronumber="$(basename $filetoload)"
# only process files since subdirs can be present
if [[ -f "$filetoload" ]]; then
echo ""
echo "\set invoice"' `cat '"'""$filetoload""'"'`'
echo ", ('$ronumber',:'invoice')"
fi >> "$PSQLSCRIPT"
done

echo "" >> "$PSQLSCRIPT"
echo ";" >> "$PSQLSCRIPT"
echo "" >> "$PSQLSCRIPT"
}

There is a bit other related code that is needed (for my specific usage) but this is the core of it. Use psql variables to capture the contents of each file into a variable and then just perform a normal insert (specifically, a VALUES (...), (...) variant). Since you can intermix psql and SQL you basically output a bloody long script, that has memory issues at scale - but you can divide and conquer - and then "psql --file bloody_long_script_part_1_of_100000.psql".

David J.

В списке pgsql-general по дате отправления:

Предыдущее

От: Adrian Klaver
Дата: 24 марта 2020 г., 03:50:46
Сообщение: Re: Loading 500m json files to database

Следующее

От: Rob Sargent
Дата: 24 марта 2020 г., 04:51:27
Сообщение: Re: Loading 500m json files to database

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Loading 500m json files to database

Предыдущее

Следующее