Update Perrformance with nested loop vs. merge-join

Поиск

Список

Период

Сортировка

От	Nathan Boley
Тема	Update Perrformance with nested loop vs. merge-join
Дата	6 декабря 2006 г. 19:25:30
Msg-id	f943525a0612051701j48b9b37s8b38062366ae4b64@mail.gmail.com обсуждение исходный текст
Ответы	Re: Update Perrformance with nested loop vs. merge-join (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-general

Дерево обсуждения

Hi,

I just created a new column on a relatively large table (~ 2 mil rows over 400 mb in tablesize) and am trying to populate it from another much smaller table by using an update query .
The purpose of the query to set a bool flag if the user for the action on table1 is the last user, as determined from table 2.

The large table has a two column primary key and one of the columns is the primary key on the smaller table.

My original update query is as follows:

update table1
    set lastuser = (table1.user = table2.lastuser)::bool from table2
    where table1.pkeycolumn1 = table2.pkey

which has an explain output of

Merge Join (cost=883182.60..1024622.03 rows=6873573 width=89)
Merge Cond: ("outer".pkeycolumn1 = "inner".pkey)
-> Sort (cost=93387.50..94711.01 rows=529404 width=18)
        Sort Key: table2.pkey
        -> Seq Scan on table2 (cost=0.00..15037.04 rows=529404 width=18)
-> Sort (cost=789795.10..795605.39 rows=2324116 width=81)
        Sort Key: table1.pkeycolumn1
        -> Seq Scan on table1 (cost=0.00..71900.16 rows=2324116 width=81)

This seems like it *should* be the best query to me, but when I try and run it it takes forever.
However, when I add a subselect clause to the end to force the planner to use a nested loop
select on table2, the query runs MUCH faster (1 hour instead of > 10, I never let it finish)

Here is the query:

update table1
    set lastuser = (table1.user = table2.lastuser)::bool from table2
    where table1.pkeycolumn1 = table2.pkey
    and table1.pkeycolumn1 is in ( select pkey from table3 limit 1000000)

where table1.pkeycolumn1 is a foreign key in table3.pkey. Also, note that the num of rows in table 3 is *much* less than 1000000.
Also, the limit clause is necessary to force the planner into using the nested loop. (Aside: Is there a better way to do this inside of a query?)
Here is the explain for the above query:

Hash Join (cost=13863.09..109298.79 rows=51388 width=89)
Hash Cond: ("outer".pkeycolumn1 = "inner".pkey)
-> Seq Scan on table1 (cost=0.00..71900.16 rows=2324116 width=81)
-> Hash (cost=13854.99..13854.99 rows=1621 width=26)
        -> Nested Loop (cost=8205.72..13854.99 rows=1621 width=26)
              -> HashAggregate (cost=8205.72..8207.72 rows=200 width=8)
                    -> Limit (cost=0.00..5891.43 rows=185143 width=8)
                          -> Seq Scan on table3 (cost=0.00..5891.43 rows=185143 width=8)
              -> Index Scan using table2-pkey-index" on table2 (cost= 0.00..28.14 rows=8 width=18)
                    Index Cond: ("outer".pkey = table2.pkey)

My only theory up to this point is that it has something to do with how the tablespace is being cached. I notice that when I use plan 1, my
computer goes through long periods of io and with bursts of processor activity every minute or so. When I use plan 2, the io sits at about 90%
of my total resource usage while my normal processor usage sits at about 10%. Maybe it keeps trying to cache and resort table2? I don't really
have any idea, but that is my only guess.

If anyone knows why this may be happenning, I would really appreciate it.

Thanks,
Nathan

В списке pgsql-general по дате отправления:

Предыдущее

От: km
Дата: 06 декабря 2006 г., 19:16:48
Сообщение: dynamic SQL - variable substitution in plpgsql

Следующее

От: "Schwenker, Stephen"
Дата: 06 декабря 2006 г., 19:25:31
Сообщение: Re: HELP: Urgent, Vacuum problem

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Update Perrformance with nested loop vs. merge-join

Предыдущее

Следующее