Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
От | Peter Geoghegan |
---|---|
Тема | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
Дата | |
Msg-id | CAH2-Wzn7wLGmD6wUNm_G3foGTtZnW=5meb7FsNKZpGUHLcdhcg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Rushabh Lathia <rushabh.lathia@gmail.com>) |
Ответы |
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
(Rushabh Lathia <rushabh.lathia@gmail.com>)
|
Список | pgsql-hackers |
On Tue, Sep 19, 2017 at 3:21 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > As per the earlier discussion in the thread, I did experiment using > BufFileSet interface from parallel-hash-v18.patchset. I took the reference > of parallel-hash other patches to understand the BufFileSet APIs, and > incorporate the changes to parallel create index. > > In order to achieve the same: > > - Applied 0007-Remove-BufFile-s-isTemp-flag.patch and > 0008-Add-BufFileSet-for-sharing-temporary-files-between-b.patch from the > parallel-hash-v18.patchset. > - Removed the buffile.c/logtap.c/fd.c changes from the parallel CREATE > INDEX v10 patch. > - incorporate the BufFileSet API to the parallel tuple sort for CREATE > INDEX. > - Changes into few existing functions as well as added few to support the > BufFileSet changes. I'm glad that somebody is working on this. (Someone closer to the more general work on shared/parallel BufFile infrastructure than I am.) I do have some quick feedback, and I hope to be able to provide that to both you and Thomas, as needed to see this one through. I'm not going to get into the tricky details around resource management just yet. I'll start with some simpler questions, to get a general sense of the plan here. I gather that you're at least aware that your v11 of the patch doesn't preserve randomAccess support for parallel sorts, because you didn't include my 0002-* testing GUCs patch, which was specifically designed to make various randomAccess stuff testable. I also figured this to be true because I noticed this FIXME among (otherwise unchanged) tuplesort code: > +static void > +leader_takeover_tapes(Tuplesortstate *state) > +{ > + Sharedsort *shared = state->shared; > + int nLaunched = state->nLaunched; > + int j; > + > + Assert(LEADER(state)); > + Assert(nLaunched >= 1); > + Assert(nLaunched == shared->workersFinished); > + > + /* > + * Create the tapeset from worker tapes, including a leader-owned tape at > + * the end. Parallel workers are far more expensive than logical tapes, > + * so the number of tapes allocated here should never be excessive. FIXME > + */ > + inittapestate(state, nLaunched + 1); > + state->tapeset = LogicalTapeSetCreate(nLaunched + 1, shared->tapes, > + state->fileset, state->worker); It's not surprising to me that you do not yet have this part working, because much of my design was about changing as little as possible above the BufFile interface, in order for tuplesort.c (and logtape.c) code like this to "just work" as if it was the serial case. It doesn't look like you've added the kind of BufFile multiplexing code that I expected to see in logtape.c. This is needed to compensate for the code removed from fd.c and buffile.c. Perhaps it would help me to go look at Thomas' latest parallel hash join patch -- did it gain some kind of transparent multiplexing ability that you actually (want to) use here? Though randomAccess isn't used by CREATE INDEX in general, and so not supporting randomAccess within tuplesort.c for parallel callers doesn't matter as far as this CREATE INDEX user-visible feature is concerned, I still believe that randomAccess is important (IIRC, Robert thought so too). Specifically, it seems like a good idea to have randomAccess support, both on general principle (why should the parallel case be different?), and because having it now will probably enable future enhancements to logtape.c. Enhancements that have it manage parallel sorts based on partitioning/distribution/bucketing [1]. I'm pretty sure that partitioning-based parallel sort is going to become very important in the future, especially for parallel GroupAggregate. The leader needs to truly own the tapes it reclaims from workers in order for all of this to work. Questions on where you're going with randomAccess support: 1. Is randomAccess support a goal for you here at all? 2. If so, is preserving eager recycling of temp file space during randomAccess (materializing a final output tape within the leader) another goal for you here? Do we need to preserve that property of serial external sorts, too, so that it remains true that logtape.c ensures that "the total space usage is essentially just the actual data volume, plus insignificant bookkeeping and start/stop overhead"? (I'm quoting from master's logtape.c header comments.) 3. Any ideas on next steps in support of those 2 goals? What problems do you foresee, if any? > CREATE INDEX serial_idx ON parallel_sort_test (randint); > > Without patch: > > Time: 3430054.220 ms (57:10.054) > > With patch (max_parallel_workers_maintenance = 8): > > Time: 1163445.271 ms (19:23.445) This looks very similar to my v10. While I will need to follow up on this, to make sure, it seems likely that this patch has exactly the same performance characteristics as v10. Thanks [1] https://wiki.postgresql.org/wiki/Parallel_External_Sort#Partitioning_for_parallelism_.28parallel_external_sort_beyond_CREATE_INDEX.29 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Andres FreundДата:
Сообщение: Re: [HACKERS] src/test/subscription/t/002_types.pl hanging onparticular environment