Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
| От | Heikki Linnakangas |
|---|---|
| Тема | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements |
| Дата | |
| Msg-id | 4b368c17-a614-401b-a335-398450249c47@iki.fi обсуждение исходный текст |
| Ответ на | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements (Hannu Krosing <hannuk@google.com>) |
| Список | pgsql-hackers |
Didn't know which part of this thread to quote and reply to, so I'll comment on the whole thing. This is a mix of a summary of the ideas already discussed, and a new proposal. Firstly, I think the STIR approach is the right approach at the high level. I don't like the logical decoding idea, for the reasons Matthias and Mikhail already mentioned. Maybe there's some synergy with REPACK, but it feels different enough that I doubt it. Let's focus on the STIR approach. Summary of CIC as it is today ----------------------------- To recap, the CIC approach at very high level is: 1. Build the index, while backends are modifying the table concurrently 2. Retail insert all the tuples that we missed in step 1. A lot of logic and coordination goes into determining what was missed in step 1. Currently, it involves snapshots, waiting for concurrent transactions to finish, and re-scanning the index and the table. The STIR idea is to maintain a little data structure on the side where we collect items that are inserted between steps 1 and 2, to avoid re-scanning the table. Shmem struct ------------ One high-level observation: We're using the catalog for inter-process communication, with the indisready and indisvalid flags, and now with STIR by having a special, ephemeral index AM. That feels unnecessarily difficult. I propose that we introduce a little shared memory struct to keep track of in-progress CONCURRENTLY index builds. In the first transaction that inserts the catalog entry with indisready=false, also create a shmem struct. In that struct, we can store information about what state the build is in, and whether insertions should go to the STIR or to the real index. Avoid one wait-for-all-transactions step using the shmem struct --------------------------------------------------------------- As one small incremental improvement, we could use the shmem struct to avoid one of the "wait for all transactions" steps in the current implementation. In validate_index(), after we mark the index as 'indisready' we have to wait for all transactions to finish, to ensure that all subsequent insertions have seen the indisready=true change. We could avoid that by setting a flag in the shmem struct instead, so that all backends would see instantly that the flag is flipped. Improved STIR approach ---------------------- Here's another proposal using the STIR approach. It's a little different from the patches so far: - Instead of having an ephemeral index AM, I'm imagining that index_insert() has access to the shmem struct, and knows about the STIR and can redirect insertions to it. - I want to avoid re-scanning the index as well as the heap. To accomplish that, track more precisely which tuples are already in the index and which are not, by storing XID cutoffs in the shmem struct. The proposal: 1. Insert the catalog entry with indisvalid = false and indisready = false. Commit the transaction. 2. Wait for all transactions to finish. - Now we know that all subsequently-started transactions will see the index and will take it into account when deciding HOT chains. (No changes to current implementation so far) - All subsequently-started transactions will now also check the shmem struct for the status of the index build, in index_insert(). We'll use the shmem struct to coordinate the later steps. 3. Atomically do the following: 3.1 Take snapshot A 3.2 Store the snapshot's xmax in the shmem struct where all concurrent backends can see it. Let's call this "cutoff A". After this step, whenever a backend inserts a new tuple, it will append its TID to the STIR if the transaction's XID >= cutoff A. (No insertions to the actual index yet) 4. Build the index using snapshot A. It will include all tuples visible or in-progress according to the snapshot. 5. Atomically do the following: 5.1. Take snapshot B 5.2. Store the snapshot's xmax in the shmem struct. We'll call this cutoff B. From now on, backends insert all tuples >= cutoff B directly to the index. Tuples between A and B continue to be appended to the STIR. 6. Wait for all transactions < B to finish. At this stage: - All tuples < A are in the index. They were included in the bulk ambuild. - All tuples between A and B are in the STIR. - All tuples >= B are inserted to the index by the backends 7. Retail insert all the tuples from the STIR to the index. Snapshot refreshing ------------------- The above proposal doesn't directly accomplish the original goal of advancing the global xmin horizon. You still need two long-lived snapshots. It does however make CIC faster, by eliminating the full index scan and table scan in the validate_index() stage. That already helps a little. I believe it can be extended to also advance xmin horizon: - In step 4, while we are building the index, we can periodically get a new snapshot, update the cutoff in the shmem struct, and drain the STIR of the tuples that are already in it. - In step 7, we can take a new snapshot as often as we like. The snapshot is only used to evaluate expressions. - Heikki
В списке pgsql-hackers по дате отправления: