Re: Parallell Optimizer

Поиск
Список
Период
Сортировка
От Gavin Flower
Тема Re: Parallell Optimizer
Дата
Msg-id 51B781AB.7060205@archidevsys.co.nz
обсуждение исходный текст
Ответ на Re: Parallell Optimizer  (Hannu Krosing <hannu@2ndQuadrant.com>)
Список pgsql-hackers
On 11/06/13 19:24, Hannu Krosing wrote:
On 06/10/2013 10:37 PM, Fred&Dani&Pandora&Aquiles wrote:
Hi,
 
>> I asked a while ago in this group about the possibility to implement a
>> parallel planner in a multithread way, and  the replies were that the
>> proposed approach couldn't be implemented, because the postgres is not
>> thread-safe. With the new feature Background Worker Processes, such
>> implementation would be possible?


Well, there are versions of genetic algorithms that use the concept of islands in which the populations evolve in parallel in the different islands and allows interaction between the islands and so on. I'm working in an algorithm based on multiagent systems. At the present moment, I mean in H2, the agents are threads, there are a few locks related to agents solutions, and a few locks for the best current solution in the environment where the agents are 'running'. The agents can exchange messages with a purpose. The environment is shared by the all agents and they use the environment to get informations from another agents (current solution for example), tries to update the best current solution and so on.
If you do this as an academic exercise, then I'd recommend thinking in "messages" only.

Separate out the message delivery entirely from your core design.

This makes the whole concept much simpler and more generic.

Message delivery can be made almost instantaneous in case of threads
or to take a few tens of microseconds to several seconds
between different physical nodes

Which speed is "fast enough" depends entirely on your query - for a query
running 5 hours on single CPU and 5 minutes on a cluster, message
delay of 50 ms is entirely acceptable
-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ
I suspect (from my position of almost total ignorance of this area!) that once a generic method works independently of how closely coupled the different parallel parts are, then a later optimisation could be  added dependent on how the parts were related. So running on a multi core chip could have a different communication system to that running across multiple computer geographically dispersed.  Thogh in practice, I suspect that bthe most common use case would involve many processor chips in the same 'box' (even if said box was distributed across a large room!).

Anyhow, I think that separating out how to effectively parallelise Postgres from how the parts communicate is a Good Thing (TM).  Though knowing Grim Reality, it is bound to b e more complicated in Reality! :-(  As the useful size of work of the parallel units obviously does relate to the communication overhead.

Possibly the biggest challenge will be in devising a planning methodology that can efficiently decide on an appropriate parallel strategy. Maybe a key word to tell the planner that you know this is a very big query and you don't mind it taking a long to come up with a decent plan?  The planner would need to know details of the processing unit topology, communication overheads, and possibly other details - to make a really effective plan in the distributed case.

My mind boggles, just thinking of the number of different variables that might be required to create an 'optimal' plan for parallel processing in a distributed system!


Cheers,
Gavin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: Parallell Optimizer