Re: Import large data set into a table and resolve duplicates?

Поиск
Список
Период
Сортировка
От Francisco Olarte
Тема Re: Import large data set into a table and resolve duplicates?
Дата
Msg-id CA+bJJbytzU2qerqmibSj4jTGcGJtQUvyg-Stw+8NC6QYSqEP1w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Import large data set into a table and resolve duplicates?  (Eugene Dzhurinsky <jdevelop@gmail.com>)
Список pgsql-general
Hi Eugene:

On Sun, Feb 15, 2015 at 6:36 PM, Eugene Dzhurinsky <jdevelop@gmail.com> wrote:
​...​
 
Since the "dictionary" already has an index on the "series", it seems that
patch_data doesn't need to have any index here.
​....
At this point "patch_data" needs to get an index on "already_exists = false",
which seems to be cheap.

​As I told you before, do not focus in the indexes too much. When you do bulk updates like this they tend to be much slower than a proper sort.

The reason is locality of reference. When you do the things with sorts you do two or three nicely ordered passes on the data, using full pages. When you use indexes you spend a lot of time parsing index structures and switching read-index, read-data, index, data, .... ( They are cached, but you have to switch to them anyway ). Also, with your kind of data indexes on series are going to be big, so less cache available​ for data.


As I said before, it depends on your data anyway, with the current machines this day what I'll do with this problem would be to just make a program ( in perl, seems adequate for this ), copy dictionary to client memory and just read the patch spitting the result file and inserting the needed lines along the way, seems it should fit in 1Gb without problems, which is not much by today standards.

Regards.
Francisco Olarte.



В списке pgsql-general по дате отправления:

Предыдущее
От: Francisco Olarte
Дата:
Сообщение: Fwd: Import large data set into a table and resolve duplicates?
Следующее
От: Eugene Dzhurinsky
Дата:
Сообщение: Re: Import large data set into a table and resolve duplicates?