Re: Search then Delete Performance

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Search then Delete Performance
Дата
Msg-id 24498.1284520504@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Search then Delete Performance  (Michael Hull <mikehulluk@googlemail.com>)
Список pgsql-general
Michael Hull <mikehulluk@googlemail.com> writes:
> I am fairly new to practical databases, but I am trying out the c
> interface to postgres and am wondering how to improve performance. I
> am a researcher, and I am trying to perform a large parameter sweep.
> Since this will involve a couple of thousand simulations, I have a
> process that manages which simulations have been done, and which still
> need to be done, so I can run it easily on a cluster.

There's basically no way for the performance of that loop to not suck.
You're incurring two round trips to the server per jobid change,
plus query parse/plan times; so even though the underlying table
manipulations are relatively simple, there's just too much overhead.

One way to fix this problem is to push the procedural logic into a
stored procedure that runs on the server, so that your app sends
one query, gets back one rowset of its newly assigned jobs, and
all the database update side-effects are done by the procedure.
That's only a small conceptual leap from where you are, but does require
getting up to speed on plpgsql or one of the other procedural languages.

The more radical way to fix it is to write the whole thing as one
SQL command.  This requires thinking about your problem as an operation
on a set of rows, rather than an iteration, so it can be a pretty big
conceptual jump for database novices.  I'm not sure it's actually
possible to do it given the specific table organization you've adopted
--- if you need to both delete rows in unassignedjobs and insert rows
in assignedjobs, there's no way to do it in one SQL operation.  But if
you're not yet wedded to that representation, you should consider having
just one table and implementing the state change as an update to a
status column instead of moving the data to a different table.  Then you
could probably implement the operation in a single UPDATE ... RETURNING
command.

Also, this looks suspiciously like a priority queue, which means you're
basically reinventing the wheel.  People have sweated the details on
this type of thing before, and come up with very clever solutions that
perform well even with multiple clients concurrently trying to obtain
job assignments.  Simple solutions tend to slow way down or even
deadlock under those conditions :-(.  Check the Postgres list archives.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Arjen Nienhuis
Дата:
Сообщение: Re: Search then Delete Performance
Следующее
От: John R Pierce
Дата:
Сообщение: Re: Search then Delete Performance