Re: BUG #17245: Index corruption involving deduplicated entries
От | Kamigishi Rei |
---|---|
Тема | Re: BUG #17245: Index corruption involving deduplicated entries |
Дата | |
Msg-id | 9076c0bd-e238-5f8d-1117-7f5549e35c33@koumakan.jp обсуждение исходный текст |
Ответ на | Re: BUG #17245: Index corruption involving deduplicated entries (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: BUG #17245: Index corruption involving deduplicated entries
(Peter Geoghegan <pg@bowt.ie>)
|
Список | pgsql-bugs |
On 26.10.2021 23:36, Peter Geoghegan wrote: > On Mon, Oct 25, 2021 at 2:29 PM Peter Geoghegan <pg@bowt.ie> wrote: > A few more questions for you, if you don't mind: > > 1. Can you tell me which tables the 4 known-corrupt indexes > (page_redirect_namespace_len, page_len, transcode_key_idx, and > page_main_title) are defined against? page and transcode. > 2. Can you show me the definition of the table or tables, using \d from psql? azurlane_wiki=> \d mediawiki.page Table "mediawiki.page" Column | Type | Collation | Nullable | Default --------------------+--------------------------+-----------+----------+------------------------------------------------- page_id | integer | | not null | nextval('mediawiki.page_page_id_seq'::regclass) page_namespace | integer | | not null | page_title | text | | not null | page_restrictions | text | | | page_is_redirect | smallint | | not null | 0 page_is_new | smallint | | not null | 0 page_random | double precision | | not null | random() page_touched | timestamp with time zone | | not null | page_links_updated | timestamp with time zone | | | page_latest | integer | | not null | page_len | integer | | not null | page_content_model | text | | | page_lang | text | | | titlevector | tsvector | | | Indexes: "page_pkey" PRIMARY KEY, btree (page_id) "name_title" UNIQUE, btree (page_namespace, page_title) "page_len" btree (page_len) "page_main_title" btree (page_title text_pattern_ops) WHERE page_namespace = 0 "page_mediawiki_title" btree (page_title text_pattern_ops) WHERE page_namespace = 8 "page_project_title" btree (page_title text_pattern_ops) WHERE page_namespace = 4 "page_random" btree (page_random) "page_redirect_namespace_len" btree (page_is_redirect, page_namespace, page_len) "page_talk_title" btree (page_title text_pattern_ops) WHERE page_namespace = 1 "page_user_title" btree (page_title text_pattern_ops) WHERE page_namespace = 2 "page_utalk_title" btree (page_title text_pattern_ops) WHERE page_namespace = 3 "ts2_page_title" gist (titlevector) Referenced by: TABLE "mediawiki.cu_changes" CONSTRAINT "cu_changes_cuc_page_id_fkey" FOREIGN KEY (cuc_page_id) REFERENCES mediawiki.page(page_id) ON DELETE SET NULL TABLE "mediawiki.revision" CONSTRAINT "revision_rev_page_fkey" FOREIGN KEY (rev_page) REFERENCES mediawiki.page(page_id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED Triggers: page_deleted AFTER DELETE ON mediawiki.page FOR EACH ROW EXECUTE FUNCTION mediawiki.page_deleted() ts2_page_title BEFORE INSERT OR UPDATE ON mediawiki.page FOR EACH ROW EXECUTE FUNCTION mediawiki.ts2_page_title() azurlane_wiki=> \d mediawiki.transcode Table "mediawiki.transcode" Column | Type | Collation | Nullable | Default --------------------------+--------------------------+-----------+----------+----------------------------------------------------------- transcode_id | integer | | not null | nextval('mediawiki.transcode_transcode_id_seq'::regclass) transcode_image_name | character varying(255) | | not null | transcode_key | character varying(48) | | not null | transcode_error | text | | not null | transcode_time_addjob | timestamp with time zone | | | transcode_time_startwork | timestamp with time zone | | | transcode_time_success | timestamp with time zone | | | transcode_time_error | timestamp with time zone | | | transcode_final_bitrate | integer | | not null | Indexes: "transcode_key_idx" btree (transcode_key) "transcode_name_key" UNIQUE, btree (transcode_image_name, transcode_key) "transcode_time_inx" btree (transcode_time_addjob, transcode_time_startwork, transcode_time_success, transcode_time_error) > 3. Do you notice any consistent pattern here? For example, are foreign > keys involved? It does not look like it: [xiatian@freebsd ~]$ pg_dump -U azurlane_wiki -d azurlane_wiki -s | grep -i FOREIGN ADD CONSTRAINT account_credentials_acd_user_fkey FOREIGN KEY (acd_user) REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL; ADD CONSTRAINT account_requests_acr_user_fkey FOREIGN KEY (acr_user) REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL; ADD CONSTRAINT cu_changes_cuc_page_id_fkey FOREIGN KEY (cuc_page_id) REFERENCES mediawiki.page(page_id) ON DELETE SET NULL; ADD CONSTRAINT cu_changes_cuc_user_fkey FOREIGN KEY (cuc_user) REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL; ADD CONSTRAINT cu_log_cul_target_id_fkey FOREIGN KEY (cul_target_id) REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL; ADD CONSTRAINT cu_log_cul_user_fkey FOREIGN KEY (cul_user) REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL; ADD CONSTRAINT revision_rev_page_fkey FOREIGN KEY (rev_page) REFERENCES mediawiki.page(page_id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED; ADD CONSTRAINT user_newtalk_user_id_fkey FOREIGN KEY (user_id) REFERENCES mediawiki.mwuser(user_id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED; > 4. How do the table or tables tend to get updated? Regarding page: New rows are added rarely, usually once a week or two, in small batches (separate transactions). The page_title field is almost never updated (pages are moved extremely rarely). Updates to other fields, like page_len, happen multiple times daily, during page edits, and sometimes happen in large batches but as separate transactions. Regarding transcode: I assume inserts happen with each file upload, often in large batches (separate transactions). Updates happen whenever a transcode job runs, which is usually also in batches. I can check the wiki's recent changes and upload log for more precise data. > Also: > 5. Do you use CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY much?> I assume not, `find . -name "*.php" | xargs grep -i "CONCURRENTLY"` against MediaWiki includes/ gives no SQL results, source code comments only. > If my hypothesis is true, then we might expect problems to not even go > away following a REINDEX. Or maybe they would go away sometimes, but > not other times. REINDEX *seems* to have helped in my case (tested with both page_main_title and the three that showed up in pg_amcheck output). I am considering writing a script that would SELECT COUNT(*) for each page_title to detect the next possible page_main_title index corruption. -- K. R.
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Herman verschootenДата:
Сообщение: Re: ERROR: posting list tuple with 20 items cannot be split at offset 168