Re: Most efficient way to insert without duplicates

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: Most efficient way to insert without duplicates
Дата	18 апреля 2013 г. 00:07:30
Msg-id	CAMkU=1zNL+7=HhiTx1O_2arDRws63sWZr0SC1ByggF5fN1WZJA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Most efficient way to insert without duplicates (François Beausoleil <francois@teksol.info>)
Список	pgsql-general

Дерево обсуждения

On Wed, Apr 17, 2013 at 1:19 PM, François Beausoleil <francois@teksol.info> wrote:

Le 2013-04-17 à 14:15, Jeff Janes a écrit :

It looks like 12% of the time is being spent figuring out what rows to insert, and 88% actually doing the insertions.

So I think that index maintenance is killing you. You could try adding a sort to your select so that rows are inserted in index order, or inserting in batches in which the batches are partitioned by service_id (which is almost the same thing as sorting, since service_id is the lead column)

This analysis is based on your example, which inserted 7 million rows. But I just noticed you also said you only have a few thousands rows to insert per day. So if you make your example better match your use case, perhaps that analysis would no longer hold.

In that case, partitioning the original table by service_id % N would help, since the index would be much smaller, right?

Probably not. If you partition the table but do not change your loading method, then the relevant thing would be the sum of the index sizes over all partitions, which would be about the same as now.

On the other hand, if you change the method to load the data in batches, you don't need to partition the table, you just need to align the batches with the index order. You could use partitioning as a way to do that, but it is just as easy (or easier) to do so without partitioning.

Once you solve the index maintenance problem, partitioning might help solve the select part of the deduplication problem, though. You would only need to check against existing data for the partition into which you already know the current batch is going to be loaded.

Also, the constraint_exclusion code is usually not smart enough to deal with constraints that use the modulus (unless the modulus itself appears in the where clause). You would have to use range partitioning instead.

Cheers,

Jeff

В списке pgsql-general по дате отправления:

Предыдущее

От: Mike Roest
Дата: 17 апреля 2013 г., 23:45:59
Сообщение: Re: Fetching Server configured port from C Module

Следующее

От: Carlos Mennens
Дата: 18 апреля 2013 г., 06:06:23
Сообщение: Inherit Superuser Role Help

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Most efficient way to insert without duplicates

Предыдущее

Следующее