Re: Suggestion for concurrent index creation using a single full scan operation

Поиск
Список
Период
Сортировка
От Tim Kane
Тема Re: Suggestion for concurrent index creation using a single full scan operation
Дата
Msg-id CADVWZZJge1BiaBiVV1RAOKA=ygA7Uz_9wyu8WGgm0=nxNraCsQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Suggestion for concurrent index creation using a single full scan operation  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
<div dir="ltr"><div class="gmail_default" style="font-family:times new roman,serif;font-size:small">Wow.. thanks guys,
reallyappreciate the detailed analysis.</div><div class="gmail_default" style="font-family:times new
roman,serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:times new
roman,serif;font-size:small">Tim</div></div><divclass="gmail_extra"><br /><br /><div class="gmail_quote">On Wed, Jul
24,2013 at 4:08 AM, Noah Misch <span dir="ltr"><<a href="mailto:noah@leadboat.com"
target="_blank">noah@leadboat.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><div class="im">On Tue, Jul 23, 2013 at 01:06:26PM +0100, Tim Kane
wrote:<br/> > I haven't given this a lot of thought, but it struck me that when<br /> > rebuilding tables (be it
fora restore process, or some other operational<br /> > activity) - there is more often than not a need to build an
indexor two,<br /> > sometimes many indexes, against the same relation.<br /> ><br /> > It strikes me that in
orderto build just one index, we probably need to<br /> > perform a full table scan (in a lot of cases).   If we are
building<br/> > multiple indexes sequentially against that same table, then we're probably<br /> > performing
multiplesequential scans in succession, once for each index.<br /><br /></div>Check.<br /><div class="im"><br /> >
Couldwe architect a mechanism that allowed multiple index creation<br /> > statements to execute concurrently, with
allof their inputs fed directly<br /> > from a single sequential scan against the full relation?<br /> ><br />
>From a language construct point of view, this may not be trivial to<br /> > implement for raw/interactive SQL -
butpossibly this is a candidate for<br /> > the custom format restore?<br /><br /></div>As Greg Stark mentioned,
pg_restorecan already issue index build commands in<br /> parallel.  Where applicable, that's probably superior to
havingone backend<br /> build multiple indexes during a single heap scan.  Index builds are<br /> CPU-intensive, and
thepg_restore approach takes advantage of additional CPU<br /> cores in addition to possibly saving I/O.<br /><br />
However,the pg_restore method is not applicable if you want CREATE INDEX<br /> CONCURRENTLY, and it's not applicable
forimplicit index building such as<br /> happens for ALTER TABLE rewrites and for VACUUM FULL.  Backend-managed<br />
concurrentindex builds could shine there.<br /><div class="im"><br /> > I presume this would substantially increase
thememory overhead required to<br /> > build those indexes, though the performance gains may be advantageous.<br
/><br/></div>The multi-index-build should respect maintenance_work_mem overall.  Avoiding<br /> cases where that makes
concurrentbuilds slower than sequential builds is a<br /> key challenge for such a project:<br /><br /> - If the index
buildseach fit in maintenance_work_mem when run sequentially<br />   and some spill to disk when run concurrently,
expectconcurrency to lose.<br /> - If the heap is small enough to stay in cache from one index build to the<br />  
next,performing the builds concurrently is probably a wash or a loss.<br /> - Concurrency should help when a wide-row
tablelarge enough to exhaust OS<br />   cache has narrow indexes that all fit in maintenance_work_mem.  I don't know<br
/>  whether concurrency would help for a huge-table scenario where the indexes<br />   do overspill
maintenance_work_mem. You would have N indexes worth of<br />   external merge files competing for disk bandwidth; that
couldcancel out<br />   heap I/O savings.<br /><br /> Overall, it's easy to end up with a loss.  We could punt by
havingan<br /> index_build_concurrency GUC, much like pg_restore relies on the user to<br /> discover a good "-j"
value. But if finding cases where concurrency helps is<br /> too hard, leaving the GUC at one would become the standard
advice.<br/><div class="im"><br /> > Apologies in advance if this is not the correct forum for suggestions..<br
/><br/></div>It's the right forum.<br /><br /> Thanks,<br /> nm<br /><span class="HOEnZb"><font color="#888888"><br />
--<br/> Noah Misch<br /> EnterpriseDB                                 <a href="http://www.enterprisedb.com"
target="_blank">http://www.enterprisedb.com</a><br/></font></span></blockquote></div><br /></div> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [bug fix] PITR corrupts the database cluster
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [bug fix] PITR corrupts the database cluster