Re: Suggestion for concurrent index creation using a single full scan operation
От | Tim Kane |
---|---|
Тема | Re: Suggestion for concurrent index creation using a single full scan operation |
Дата | |
Msg-id | CADVWZZJge1BiaBiVV1RAOKA=ygA7Uz_9wyu8WGgm0=nxNraCsQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Suggestion for concurrent index creation using a single full scan operation (Noah Misch <noah@leadboat.com>) |
Список | pgsql-hackers |
<div dir="ltr"><div class="gmail_default" style="font-family:times new roman,serif;font-size:small">Wow.. thanks guys, reallyappreciate the detailed analysis.</div><div class="gmail_default" style="font-family:times new roman,serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:times new roman,serif;font-size:small">Tim</div></div><divclass="gmail_extra"><br /><br /><div class="gmail_quote">On Wed, Jul 24,2013 at 4:08 AM, Noah Misch <span dir="ltr"><<a href="mailto:noah@leadboat.com" target="_blank">noah@leadboat.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"><div class="im">On Tue, Jul 23, 2013 at 01:06:26PM +0100, Tim Kane wrote:<br/> > I haven't given this a lot of thought, but it struck me that when<br /> > rebuilding tables (be it fora restore process, or some other operational<br /> > activity) - there is more often than not a need to build an indexor two,<br /> > sometimes many indexes, against the same relation.<br /> ><br /> > It strikes me that in orderto build just one index, we probably need to<br /> > perform a full table scan (in a lot of cases). If we are building<br/> > multiple indexes sequentially against that same table, then we're probably<br /> > performing multiplesequential scans in succession, once for each index.<br /><br /></div>Check.<br /><div class="im"><br /> > Couldwe architect a mechanism that allowed multiple index creation<br /> > statements to execute concurrently, with allof their inputs fed directly<br /> > from a single sequential scan against the full relation?<br /> ><br /> >From a language construct point of view, this may not be trivial to<br /> > implement for raw/interactive SQL - butpossibly this is a candidate for<br /> > the custom format restore?<br /><br /></div>As Greg Stark mentioned, pg_restorecan already issue index build commands in<br /> parallel. Where applicable, that's probably superior to havingone backend<br /> build multiple indexes during a single heap scan. Index builds are<br /> CPU-intensive, and thepg_restore approach takes advantage of additional CPU<br /> cores in addition to possibly saving I/O.<br /><br /> However,the pg_restore method is not applicable if you want CREATE INDEX<br /> CONCURRENTLY, and it's not applicable forimplicit index building such as<br /> happens for ALTER TABLE rewrites and for VACUUM FULL. Backend-managed<br /> concurrentindex builds could shine there.<br /><div class="im"><br /> > I presume this would substantially increase thememory overhead required to<br /> > build those indexes, though the performance gains may be advantageous.<br /><br/></div>The multi-index-build should respect maintenance_work_mem overall. Avoiding<br /> cases where that makes concurrentbuilds slower than sequential builds is a<br /> key challenge for such a project:<br /><br /> - If the index buildseach fit in maintenance_work_mem when run sequentially<br /> and some spill to disk when run concurrently, expectconcurrency to lose.<br /> - If the heap is small enough to stay in cache from one index build to the<br /> next,performing the builds concurrently is probably a wash or a loss.<br /> - Concurrency should help when a wide-row tablelarge enough to exhaust OS<br /> cache has narrow indexes that all fit in maintenance_work_mem. I don't know<br /> whether concurrency would help for a huge-table scenario where the indexes<br /> do overspill maintenance_work_mem. You would have N indexes worth of<br /> external merge files competing for disk bandwidth; that couldcancel out<br /> heap I/O savings.<br /><br /> Overall, it's easy to end up with a loss. We could punt by havingan<br /> index_build_concurrency GUC, much like pg_restore relies on the user to<br /> discover a good "-j" value. But if finding cases where concurrency helps is<br /> too hard, leaving the GUC at one would become the standard advice.<br/><div class="im"><br /> > Apologies in advance if this is not the correct forum for suggestions..<br /><br/></div>It's the right forum.<br /><br /> Thanks,<br /> nm<br /><span class="HOEnZb"><font color="#888888"><br /> --<br/> Noah Misch<br /> EnterpriseDB <a href="http://www.enterprisedb.com" target="_blank">http://www.enterprisedb.com</a><br/></font></span></blockquote></div><br /></div>
В списке pgsql-hackers по дате отправления: