Re: Parallel query execution
От | Stephen Frost |
---|---|
Тема | Re: Parallel query execution |
Дата | |
Msg-id | 20130115231557.GB16126@tamriel.snowman.net обсуждение исходный текст |
Ответ на | Re: Parallel query execution (Gavin Flower <GavinFlower@archidevsys.co.nz>) |
Ответы |
Re: Parallel query execution
Re: Parallel query execution |
Список | pgsql-hackers |
* Gavin Flower (GavinFlower@archidevsys.co.nz) wrote: > How about being aware of multiple spindles - so if the requested > data covers multiple spindles, then data could be extracted in > parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. We're implementing our own poor-man's parallelism using exactly this to use as much of the CPU and I/O bandwidth as we can. I have every confidence that it could be done better and be simpler for us if it was handled in the backend. > On large multiple processor machines, there are different blocks of > memory that might be accessed at different speeds depending on the > processor. Possibly a mechanism could be used to split a transaction > over multiple processors to ensure the fastest memory is used? Let's work on getting it working on the h/w that PG is most commonly deployed on first.. I agree that we don't want to paint ourselves into a corner with this, but I don't think massive NUMA systems are what we should focus on first (are you familiar with any that run PG today..?). I don't expect we're going to be trying to fight with the Linux (or whatever) kernel over what threads run on what processors with access to what memory on small-NUMA systems (x86-based). > Once a selection of rows has been made, then if there is a lot of > reformatting going on, then could this be done in parallel? I can > of think of 2 very simplistic strategies: (A) use a different > processor core for each column, or (B) farm out sets of rows to > different cores. I am sure in reality, there are more subtleties > and aspects of both the strategies will be used in a hybrid fashion > along with other approaches. Given our row-based storage architecture, I can't imagine we'd do anything other than take a row-based approach to this.. I would think we'd do two things: parallelize based on partitioning, and parallelize seqscan's across the individual heap files which are split on a per-1G boundary already. Perhaps we can generalize that and scale it based on the number of available processors and the size of the relation but I could see advantages in matching up with what the kernel thinks are independent files. > I expect that before any parallel algorithm is invoked, then some > sort of threshold needs to be exceeded to make it worth while. Certainly. That's need to be included in the optimization model to support this. Thanks, Stephen
В списке pgsql-hackers по дате отправления: