Re: Postgres refusing to use >1 core

Поиск

Список

Период

Сортировка

От	Craig Ringer
Тема	Re: Postgres refusing to use >1 core
Дата	10 мая 2011 г. 04:02:11
Msg-id	4DC8B8AB.9070502@postnewspapers.com.au обсуждение исходный текст
Ответ на	Re: Postgres refusing to use >1 core (Aren Cambre <aren@arencambre.com>)
Список	pgsql-performance

Дерево обсуждения

On 10/05/11 10:40, Aren Cambre wrote:
>     how are you reading through the table? if you are using OFFSET, you
>     owe me a steak dinner.
>
>
> Nope. :-)
>
> Below is my exact code for the main thread. The C# PLINQ statement is
> highlighted. Let me know if I can help to explain this.

Looking at that code, I can't help but wonder why you're not doing it
server side in batches. In general, it's really inefficient to use this
pattern:

rows = runquery("select * from something");
for (item in rows) {
  // do something with item
}

Adding implicit parallelism within the loop won't help you much if
client-side CPU use isn't your limitation. If each computation done on
"item" is very expensive in client-side CPU this pattern makes sense,
but otherwise should be avoided in favour of grabbing big chunks of rows
and processing them all at once in batch SQL statements that let the
database plan I/O efficiently.

Even if you're going to rely on client-side looping - say, because of
complex or library-based computation that must be done for each record -
you must ensure that EACH THREAD HAS ITS OWN CONNECTION, whether that's
a new connection established manually or one grabbed from an appropriate
pool. Your code below shows no evidence of that at all; you're probably
sharing one connection between all the threads, achieving no real
parallelism whatsoever.

Try limiting your parallel invocation to 4 threads (since that's number
of cores you have) and making sure each has its own connection. In your
case, that probably means having a new Geocoder instance grab a
connection from a pool that contains at least 5 connections (one per
Geocoder, plus the main connection).

It also looks - though I don't know C# and npgsql so I can't be sure -
like you're passing some kind of query result object to the Geocoder.
Avoid that, because they might be using the connection to progressively
read data behind the scenes in which case you might land up having
locking issues, accidentally serializing your parallel work on the
single main connection, etc. Instead, retrieve the contents of the
IDataRecord (whatever that is) and pass that to the new Geocoder
instance, so the new Geocoder has *absolutely* *no* *link* to the
arrestsConnection and cannot possibly depend on it accidentally.

Even better, use a server-side work queue implementation like pgq, and
have each worker use its private connection to ask the server for the
next record to process when it's done with the previous one, so you
don't need a co-ordinating queue thread in your client side at all. You
can also optionally make your client workers independent processes
rather than threads that way, which simplifies debugging and resource
management.

--
Craig Ringer

В списке pgsql-performance по дате отправления:

Предыдущее

От: Shaun Thomas
Дата: 10 мая 2011 г., 03:14:58
Сообщение: Re: Benchmarking a large server

Следующее

От: Didik Prasetyo
Дата: 10 мая 2011 г., 04:32:17
Сообщение: partition query on multiple cores

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Postgres refusing to use >1 core

Предыдущее

Следующее