Re: CONCURRENT INDEXing again (was: Must be owner to truncate?)
От | Tom Lane |
---|---|
Тема | Re: CONCURRENT INDEXing again (was: Must be owner to truncate?) |
Дата | |
Msg-id | 19968.1121185209@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | CONCURRENT INDEXing again (was: Must be owner to truncate?) (Hannu Krosing <hannu@skype.net>) |
Ответы |
Re: CONCURRENT INDEXing again (was: Must be owner to
(Hannu Krosing <hannu@tm.ee>)
|
Список | pgsql-hackers |
Hannu Krosing <hannu@skype.net> writes: > I try to state my reworked idea of concurrent indexing in a more clear > way: > The index build in done 2 transactions, need one new type of lock and a > new system column in pg_class to tell planner not to use an incomplete > index. This similar to vacuum in thet ot needs its own transactions and > is not rollbackable. "Not rollbackable" is certainly "not acceptable"... what if there's a crash partway through? VACUUM FULL is designed so that there isn't anything special that needs to be done to clean up after it if it fails partway through, but this seems to be untrue of your proposal. You'd have committed-created but useless indexes that have to be got rid of somehow. > To be sure that we cover all the tuples in range <= MAX_CTID_1, and no > new tuples are stored there as the result of INSERT or UPDATE, we need a > new type of lock (lets call it "TupleStoreRangeLock"), which prevents > new tuples to be placed below MAX_CTID_1 and which is aquired before > starting the initial build. Checking for such a lock will put a nontrivial distributed cost on every insert and update, whether the facility is in use or not. > After the initial build of index for tuples below MAX_CTID_1 is > finished, it is made visible to the rest of the system by committing the > transaction, but marking the index as "incomplete" (probably a new > column in pg_class is needed for that), so that it will not be used by > planner, but all new inerts/updates will see and use it. I can see how this might work for a new index build, but it doesn't work for REINDEX --- you get to have other transactions inserting into either the old or the new index, not both. You could possibly make it work by creating a complete new index with its own OID, and then swapping that out for the old index at completion --- but see below. > After that we need to wait for all other running transactions to > complete, so we can be sure that all other backends know about the new > index. That could be a pretty long time ... and presumably the index build is pretty long, too, or we'd not be bothering with all this mechanism. All the while, tuples are getting inserted into highly nonoptimal pages far from the start of the table. Doesn't this idea conflict rather badly with your desire expressed nearby to force tuples to be inserted near the front of the table? > As the final step we need to scan all tuples in range ( MAX_CTID_1 to > MAX_CTID_2 ) and insert their corresponding index entries into the new > index. If the entry already exists for exact same ctid, that is ok. I don't think that's as easy as it sounds; at the very least it requires mechanism comparable to the unique-index checking code, which we don't have for any index type except btree. Also, in an index that's actually highly non-unique, that mechanism is *expensive* --- you may have to scan many pages of identically-keyed entries to see if any of them match the target ctid ... all the while holding a lock that prevents anyone else from inserting on the same starting page. What's more, the pseudo uniqueness check has to be done on both sides --- else index build might decide the entry's not there, insert it, only to have the original tuple inserter come along right after and insert again. So this is a second major operational mode that has to affect everybody in the system, not only index build. I'm not sure whether there are race conditions associated with getting in and out of this mode, but it wouldn't surprise me. > After reaching MAX_CTID_2, the index is ready for use by planner as well > and can be marked as "complete" in pg_class. In case of REINDEX, the new > index can be made to replace the old one at this point. AFAICS, the "replace" bit requires exclusive lock to make sure that no one is in the midst of using the old index. This means that you have a situation where you need to upgrade your table lock at the very end of the operation --- which means the whole thing is prone to failing at the very end because of deadlock. regards, tom lane
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Bruce MomjianДата:
Сообщение: Re: [PATCHES] thousands comma numeric formatting in psql