Re: Insert into on conflict, data size upto 3 billion records

Поиск
Список
Период
Сортировка
От Ron
Тема Re: Insert into on conflict, data size upto 3 billion records
Дата
Msg-id 2a876340-e227-c96a-1d79-f8a74aa45e4a@gmail.com
обсуждение исходный текст
Ответ на Insert into on conflict, data size upto 3 billion records  (Karthik Kumar Kondamudi <karthikkumar09@gmail.com>)
Ответы Re: Insert into on conflict, data size upto 3 billion records  (Karthik K <kar6308@gmail.com>)
Список pgsql-general
On 2/12/21 12:46 PM, Karthik Kumar Kondamudi wrote:
Hi, 

I'm looking for suggestions on how I can improve the performance of the below merge statement, we have a batch process that batch load the data into the _batch tables using Postgres and the task is to update the main target tables if the record exists else into it, sometime these batch table could go up to 5 billion records. Here is the current scenario

target_table_main has 700,070,247  records and is hash partitioned into 50 chunks, it has an index on logical_ts and the batch table has 2,715,020,546 close to 3 billion records, so I'm dealing with a huge set of data so looking of doing this in the most efficient way.

Many times, I have drastically sped up batch processing by #1 partitioning on the same field as an index, and #2 pre-sorting the input data by that field.

That way, you get excellent "locality of data" (meaning lots of writes to the same hot bits of cache, which later get asynchronously flushed to disk).  Unfortunately for your situation, the purpose of hash partitioning is to reduce locality of data.  (Sometimes that's useful, but not when processing batches.)

--
Angular momentum makes the world go 'round.

В списке pgsql-general по дате отправления:

Предыдущее
От: Rob Sargent
Дата:
Сообщение: certs in connection string
Следующее
От: "Seamus Abshere"
Дата:
Сообщение: Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]