Re: about multiprocessingmassdata

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: about multiprocessingmassdata
Дата
Msg-id 4F7C7827.7080705@fuzzy.cz
обсуждение исходный текст
Ответ на about multiprocessingmassdata  (superman0920 <superman0920@gmail.com>)
Список pgsql-performance
On 4.4.2012 17:52, superman0920 wrote:
> Hi list!
>
> i have a table which has 8500000 rows records. i write a java program to
> update these records.
> i use 100 threads to update the records. For example, thread-1 update
> 1~85000 records; thread-2 update 85001~170000 and so on.
> The update sql's aim is remove the space in the column and it is simple:
> update poi set py=replace(py,' ','') where id=?;

That's a very naive approach. It's very likely each thread will do an
index scan for each update (to evaluate the 'id=?' condition. And that's
going to cost you much more than you gain because index scans are quite
CPU and I/O intensive.

Simply update the whole table by

   UPDATE poi SET py = replace(py, ' ','');

Have you actually tried how this performs or did you guess 'it's
definitely going to be very slow so I'll use multiple threads to make
that faster'?

If you really need to parallelize this, you need to do that differently
- e.g. use 'ctid' to skip to update a whole page like this:

   UPDATE poi SET py = replace(py, ' ','')
    WHERE ctid >= '(n,0)'::tid AND ctid < '(n+1,0)'::tid AND;

where 'n' ranges between 0 and number of pages the table (e.g. in pg_class).

But try the simple UPDATE first, my guess is it's going to be much
faster than you expect.

Tomas

В списке pgsql-performance по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: H800 + md1200 Performance problem
Следующее
От: Scott Marlowe
Дата:
Сообщение: Re: H800 + md1200 Performance problem