Trying to figure out why these queries are so slow

Поиск
Список
Период
Сортировка
От Tim Uckun
Тема Trying to figure out why these queries are so slow
Дата
Msg-id AANLkTim-BRRttUjWBDQ1_3gXztp_8RhYkhK7HBbmL6wp@mail.gmail.com
обсуждение исходный текст
Ответы Re: Trying to figure out why these queries are so slow  (Thomas Guettler <hv@tbz-pariv.de>)
Re: Trying to figure out why these queries are so slow  (Thom Brown <thom@linux.com>)
Re: Trying to figure out why these queries are so slow  (John R Pierce <pierce@hogranch.com>)
Список pgsql-general
I have two tables. Table C has about 300K records in it. Table E has
about a million records in it.  Today I tried to run this query.

update C
             set result_count = X.result_count
            from C
            inner join (select  c_id, count(c_id) as result_count
                          from E
                          where c_id is not null
                          group by c_id) as  X
            on C.id = X.c_id

All the fields mentioned are indexed. In the case of Table C it's the
primary key. In the case table E it's just an index (non unique).

I let this query run for about three hours before I cancelled it.

Next I did a CREATE TABLE of the sub query thinking the subquery was
the problem I extracted the results into a separate table and in that
table made the c_id field the primary key. That table only had about
80 thousand records. The creation of the table took just a couple of
minutes. Perfectly reasonable.

I tried to run the same thing again but this time joining the newly
created table instead of the subquery. I killed that query after an
hour.

Then I tried to be clever about it so I added a

Where id in (select id from C  where result_count =0 limit 10) into
the where clause. I figure this would run fast because it's only
updating ten records. The result_count field is indexed so that part
should run very quick and the where ID IN should be quick because it's
looking up ten items in the PKEY right?  Well that didn't go very well
either. I killed that query after about 15 minutes.

In the end (once again) I wrote a ruby script to pull up the records
from table E in small chunks and pull the corresponding records
records from C in small chunks and update them myself. In other words
I wrote a ruby script to do the join and the update in chunks.

I am simply baffled.  What am I doing wrong.   Is the database not
tuned? Do I not have enough RAM? Is there some parameter I need to
tweak someplace?

There was nothing else going on in the database when this query was
running. I shut off all applications touching the database.

The database machine is a linode host. It's got two gigs of RAM  I
realize that this is not the ideal host for a database but it's what I
have to work with.

kernel.shmmax=536870912 , kernel.shmall=2097152,  shared_buffers =
128MB effective_cache_size = 256MB log_min_duration_statement = 1000
max_connections = 100

This was a case of doing an update on about 80,000 records using an
inner join on two tables on using the primary keys of both records.
Why would this query take more than a few seconds to run?

В списке pgsql-general по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: streaming replication question
Следующее
От: Tim Uckun
Дата:
Сообщение: Re: Trying to figure out why these queries are so slow