Re: Efficient Way to Merge Two Large Tables

Поиск

Список

Период

Сортировка

От	Joshua Rubin
Тема	Re: Efficient Way to Merge Two Large Tables
Дата	16 июля 2010 г. 19:06:14
Msg-id	AANLkTildDEt9k4yPn9aUsME3VYSE3I8WvsTqtJY5W-SB@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Efficient Way to Merge Two Large Tables (Julian Mehnle <julian@mehnle.net>)
Ответы	Re: Efficient Way to Merge Two Large Tables
Список	pgsql-general

Дерево обсуждения

Hi Julian,

Sorry for the slow response. I think I will need to chop up the query some how, but have not yet found an efficient way to do that. row_id is the primary key in both tables, so that might work.

Here is the explain:

urls_jrubin_merged=# EXPLAIN UPDATE table1 SET row_id = table2.row_id FROM table2 WHERE table1.row_id = table2.row_id;

QUERY PLAN

--------------------------------------------------------------------------------------

Merge Join (cost=57257969.62..12983795937.97 rows=4308749788074 width=121)

Merge Cond: (table2.row_id = table1.row_id)

-> Sort (cost=15885110.79..16029412.85 rows=288604128 width=8)

Sort Key: table2.row_id

-> Seq Scan on table2 (cost=0.00..2137231.26 rows=288604128 width=8)

-> Materialize (cost=41372858.83..42105903.14 rows=293217725 width=121)

-> Sort (cost=41372858.83..41519467.69 rows=293217725 width=121)

Sort Key: table1.row_id

-> Seq Scan on todo (cost=0.00..5922587.45 rows=293217725 width=121)

(9 rows)

Thanks,
--
Joshua Rubin

On Tue, Jul 13, 2010 at 5:08 PM, Julian Mehnle <julian@mehnle.net> wrote:

Joshua Rubin wrote:

> I have two tables each with nearly 300M rows. There is a 1:1
> relationship between the two tables and they are almost always joined
> together in queries. The first table has many columns, the second has
> a foreign key to the primary key of the first table and one more
> column. It is expected that for every row in table1, there is a
> corresponding row in table2. We would like to just add the one column
> to the first table and drop the second table to allow us to index this
> extra column.
>
> This query would work after adding the column to the first table:
> UPDATE table1 SET new_column = table2.new_column FROM table2 WHERE
> table1.row_id = table2.row_id;
>
> However, this will take much too long, I have not successfully
> completed this on our staging server after running it for 3+ days.

Can you get the query plan (EXPLAIN) of the update query? My guess is the
join cost scales superlinearly.

You might be able to chop this up into smaller UPDATEs by limiting the
rows to be updated in each round by the primary key.

E.g.:

UPDATE table1 SET new_column = table2.new_column FROM table2
WHERE
table1.row_id = table2.row_id and
table1.row_id >= 0e6 and table1.row_id < 1e6 and
table2.row_id >= 0e6 and table2.row_id < 1e6;

for a moving row_id window.

This has helped me in the past with a similar scenario (where both tables
were partitioned by the PK, but it would presumably still work in the
unpartitioned case).

-Julian

В списке pgsql-general по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 16 июля 2010 г., 18:59:51
Сообщение: Re: [WISHLIST] EXECUTE SPRINTF

Следующее

От: Joshua Rubin
Дата: 16 июля 2010 г., 19:42:12
Сообщение: Re: Efficient Way to Merge Two Large Tables

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Efficient Way to Merge Two Large Tables

Предыдущее

Следующее