Google Summer of code 2013
От | Akansha Singh |
---|---|
Тема | Google Summer of code 2013 |
Дата | |
Msg-id | 8d6a48cb-cc2d-4940-8037-8814477c000c@default обсуждение исходный текст |
Список | pgsql-students |
Hi, I guess parallel query wont be fruitful or most probably it might have been already implemented.. > There's certainly a lot of groundwork to do, and I do share the concern > that the project will have to deal with a lot of dirty work (e.g. when > transfering data between the processes). But couldn't it be a useful > part of the discussion? A one-off implementation of parallelized hash table building and/or usage..? No, I don't see that as particularly relevant to the discussion around how to do parallelize queries. There are a ton of examples already of parallel hash table building and various other independent pieces (parallel sort, parallel aggregation, etc). What is needed for parallel query processing in PG is to figure out what we mean by it and how to actually implement it. Following that would be making the query planner and optimizer aware of it, last would be picking a specific parallelized implementation of each node and writing it (and that, really, is the 'easy' part in all of this...). > I don't expect a commitable patch at the end, but rather something that > "works" and may be used as a basis for improvements and to build the > actual groundwork. I don't think it'd really help us get any farther with parallel query execution. To be honest, I'd be a bit surprised if this hasn't been done already and patches posted to the list in the past.. > I think that depends on the workload type. For example for databases > handling DWH-like queries, parallel hash aggregate is going to be a > major improvement. DWH is what I deal with day-in and day-out and I certainly agree that parallelizing hash builds would be wonderful- but that doesn't mean that a patch which implements it without any consideration for the rest of the challenges around parallel query execution will actually move us, as a project, any closer to getting it. In fact, I'd expect most DWH implementations to do what we've done already- massive partitioning and parallelizing through multiple client connections. > Karel mentioned he's currently working on his bachelor thesis, which is > about hash tables too. That's another reason why he proposed this topic. That's wonderful, I'd love to hear about some ways to improve our hashing system (I've even proposed one modification a few days ago that I'd like to see tested more). I believe that costing around hashing needs to be improved too. Parallel-anything is a 'sexy' project, but unless it's focused on how we answer the hard questions around how do we do parallel work efficiently while maintaining scalability and portability then it's not moving us forward. regards Akansha Singh
В списке pgsql-students по дате отправления: