Creating table with data from a join

Поиск

Список

Период

Сортировка

От	Igor Stassiy
Тема	Creating table with data from a join
Дата	14 июля 2015 г. 12:12:38
Msg-id	CAKVOjezLRzeSPAZFho21aXQiYCfDBfj5wXRKKkeqRi95BPQu9g@mail.gmail.com обсуждение исходный текст
Ответы	Re: Creating table with data from a join (David Rowley <david.rowley@2ndquadrant.com>) Re: Creating table with data from a join (Julien Rouhaud <julien.rouhaud@dalibo.com>) Re: Creating table with data from a join (Marc Mamin <M.Mamin@intershop.de>)
Список	pgsql-general

Дерево обсуждения

Hello,

I am benchmarking different ways of putting data into table on table creation:

1. INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id;

2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id = b.id;

3. psql -c "COPY (SELECT * FROM a JOIN b on a.id = b.id) TO STDOUT" |

parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";

(the parallel command is available as part of parallel deb package in Ubuntu for example, it splits the stdin by newline character and feeds it to the corresponding command)

Both tables a and b have ~16M records and one of the columns in a is geometry (ranging from several KB in size to several MB). Columns in b are mostly integers.

The machine that I am running these commands on has the following parameters:

default_statistics_target = 50 # pgtune wizard 2012-06-06 maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 constraint_exclusion = on # pgtune wizard 2012-06-06 checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06 effective_cache_size = 48GB # pgtune wizard 2012-06-06

work_mem = 80MB # pgtune wizard 2012-06-06

wal_buffers = 8MB # pgtune wizard 2012-06-06

checkpoint_segments = 16 # pgtune wizard 2012-06-06

shared_buffers = 16GB # pgtune wizard 2012-06-06

max_connections = 400 # pgtune wizard 2012-06-06

One would expect the 3rd option to be faster than 1 and 2, however 2 outperforms both by a large margin (sometimes x2). This is especially surprising taking into account that COPY doesn't acquire a global lock on the table, only a RowExclusiveLock

(according to http://www.postgresql.org/message-id/10611.1014867684@sss.pgh.pa.us)

So is option 2 a winner by design? Could you please suggest other alternatives to try (if there are any)? And what might be the reason that 3 is not outperforming the other 2?

Thank you,

Igor

В списке pgsql-general по дате отправления:

Предыдущее

От: Venkata Balaji N
Дата: 14 июля 2015 г., 04:06:39
Сообщение: Re: Disconnected but query still running

Следующее

От: Shujie Shang
Дата: 14 июля 2015 г., 12:59:46
Сообщение: Re: Index Only Scan vs Cache

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Creating table with data from a join

Предыдущее

Следующее