Re: Postgres refusing to use >1 core

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Postgres refusing to use >1 core
Дата
Msg-id 4DC8B8AB.9070502@postnewspapers.com.au
обсуждение исходный текст
Ответ на Re: Postgres refusing to use >1 core  (Aren Cambre <aren@arencambre.com>)
Список pgsql-performance
On 10/05/11 10:40, Aren Cambre wrote:
>     how are you reading through the table? if you are using OFFSET, you
>     owe me a steak dinner.
>
>
> Nope. :-)
>
> Below is my exact code for the main thread. The C# PLINQ statement is
> highlighted. Let me know if I can help to explain this.

Looking at that code, I can't help but wonder why you're not doing it
server side in batches. In general, it's really inefficient to use this
pattern:

rows = runquery("select * from something");
for (item in rows) {
  // do something with item
}

Adding implicit parallelism within the loop won't help you much if
client-side CPU use isn't your limitation. If each computation done on
"item" is very expensive in client-side CPU this pattern makes sense,
but otherwise should be avoided in favour of grabbing big chunks of rows
and processing them all at once in batch SQL statements that let the
database plan I/O efficiently.

Even if you're going to rely on client-side looping - say, because of
complex or library-based computation that must be done for each record -
you must ensure that EACH THREAD HAS ITS OWN CONNECTION, whether that's
a new connection established manually or one grabbed from an appropriate
pool. Your code below shows no evidence of that at all; you're probably
sharing one connection between all the threads, achieving no real
parallelism whatsoever.

Try limiting your parallel invocation to 4 threads (since that's number
of cores you have) and making sure each has its own connection. In your
case, that probably means having a new Geocoder instance grab a
connection from a pool that contains at least 5 connections (one per
Geocoder, plus the main connection).

It also looks - though I don't know C# and npgsql so I can't be sure -
like you're passing some kind of query result object to the Geocoder.
Avoid that, because they might be using the connection to progressively
read data behind the scenes in which case you might land up having
locking issues, accidentally serializing your parallel work on the
single main connection, etc. Instead, retrieve the contents of the
IDataRecord (whatever that is) and pass that to the new Geocoder
instance, so the new Geocoder has *absolutely* *no* *link* to the
arrestsConnection and cannot possibly depend on it accidentally.

Even better, use a server-side work queue implementation like pgq, and
have each worker use its private connection to ask the server for the
next record to process when it's done with the previous one, so you
don't need a co-ordinating queue thread in your client side at all. You
can also optionally make your client workers independent processes
rather than threads that way, which simplifies debugging and resource
management.

--
Craig Ringer

В списке pgsql-performance по дате отправления:

Предыдущее
От: Shaun Thomas
Дата:
Сообщение: Re: Benchmarking a large server
Следующее
От: Didik Prasetyo
Дата:
Сообщение: partition query on multiple cores