Re: BUG #17245: Index corruption involving deduplicated entries

Поиск
Список
Период
Сортировка
От Kamigishi Rei
Тема Re: BUG #17245: Index corruption involving deduplicated entries
Дата
Msg-id 9076c0bd-e238-5f8d-1117-7f5549e35c33@koumakan.jp
обсуждение исходный текст
Ответ на Re: BUG #17245: Index corruption involving deduplicated entries  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: BUG #17245: Index corruption involving deduplicated entries  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-bugs
On 26.10.2021 23:36, Peter Geoghegan wrote:
> On Mon, Oct 25, 2021 at 2:29 PM Peter Geoghegan <pg@bowt.ie> wrote:
> A few more questions for you, if you don't mind:
> 
> 1. Can you tell me which tables the 4 known-corrupt indexes
> (page_redirect_namespace_len, page_len, transcode_key_idx, and
> page_main_title) are defined against?

page and transcode.

> 2. Can you show me the definition of the table or tables, using \d from psql?

azurlane_wiki=> \d mediawiki.page
                                                  Table "mediawiki.page"
        Column       |           Type           | Collation | Nullable | 
                     Default

--------------------+--------------------------+-----------+----------+-------------------------------------------------
  page_id            | integer                  |           | not null | 
nextval('mediawiki.page_page_id_seq'::regclass)
  page_namespace     | integer                  |           | not null |
  page_title         | text                     |           | not null |
  page_restrictions  | text                     |           |          |
  page_is_redirect   | smallint                 |           | not null | 0
  page_is_new        | smallint                 |           | not null | 0
  page_random        | double precision         |           | not null | 
random()
  page_touched       | timestamp with time zone |           | not null |
  page_links_updated | timestamp with time zone |           |          |
  page_latest        | integer                  |           | not null |
  page_len           | integer                  |           | not null |
  page_content_model | text                     |           |          |
  page_lang          | text                     |           |          |
  titlevector        | tsvector                 |           |          |
Indexes:
     "page_pkey" PRIMARY KEY, btree (page_id)
     "name_title" UNIQUE, btree (page_namespace, page_title)
     "page_len" btree (page_len)
     "page_main_title" btree (page_title text_pattern_ops) WHERE 
page_namespace = 0
     "page_mediawiki_title" btree (page_title text_pattern_ops) WHERE 
page_namespace = 8
     "page_project_title" btree (page_title text_pattern_ops) WHERE 
page_namespace = 4
     "page_random" btree (page_random)
     "page_redirect_namespace_len" btree (page_is_redirect, 
page_namespace, page_len)
     "page_talk_title" btree (page_title text_pattern_ops) WHERE 
page_namespace = 1
     "page_user_title" btree (page_title text_pattern_ops) WHERE 
page_namespace = 2
     "page_utalk_title" btree (page_title text_pattern_ops) WHERE 
page_namespace = 3
     "ts2_page_title" gist (titlevector)
Referenced by:
     TABLE "mediawiki.cu_changes" CONSTRAINT 
"cu_changes_cuc_page_id_fkey" FOREIGN KEY (cuc_page_id) REFERENCES 
mediawiki.page(page_id) ON DELETE SET NULL
     TABLE "mediawiki.revision" CONSTRAINT "revision_rev_page_fkey" 
FOREIGN KEY (rev_page) REFERENCES mediawiki.page(page_id) ON DELETE 
CASCADE DEFERRABLE INITIALLY DEFERRED
Triggers:
     page_deleted AFTER DELETE ON mediawiki.page FOR EACH ROW EXECUTE 
FUNCTION mediawiki.page_deleted()
     ts2_page_title BEFORE INSERT OR UPDATE ON mediawiki.page FOR EACH 
ROW EXECUTE FUNCTION mediawiki.ts2_page_title()

azurlane_wiki=> \d mediawiki.transcode
                                                       Table 
"mediawiki.transcode"
           Column          |           Type           | Collation | 
Nullable |                          Default

--------------------------+--------------------------+-----------+----------+-----------------------------------------------------------
  transcode_id             | integer                  |           | not 
null | nextval('mediawiki.transcode_transcode_id_seq'::regclass)
  transcode_image_name     | character varying(255)   |           | not 
null |
  transcode_key            | character varying(48)    |           | not 
null |
  transcode_error          | text                     |           | not 
null |
  transcode_time_addjob    | timestamp with time zone |           | 
      |
  transcode_time_startwork | timestamp with time zone |           | 
      |
  transcode_time_success   | timestamp with time zone |           | 
      |
  transcode_time_error     | timestamp with time zone |           | 
      |
  transcode_final_bitrate  | integer                  |           | not 
null |
Indexes:
     "transcode_key_idx" btree (transcode_key)
     "transcode_name_key" UNIQUE, btree (transcode_image_name, 
transcode_key)
     "transcode_time_inx" btree (transcode_time_addjob, 
transcode_time_startwork, transcode_time_success, transcode_time_error)

> 3. Do you notice any consistent pattern here? For example, are foreign
> keys involved?

It does not look like it:

[xiatian@freebsd ~]$ pg_dump -U azurlane_wiki -d azurlane_wiki -s | grep 
-i FOREIGN
     ADD CONSTRAINT account_credentials_acd_user_fkey FOREIGN KEY 
(acd_user) REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL;
     ADD CONSTRAINT account_requests_acr_user_fkey FOREIGN KEY 
(acr_user) REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL;
     ADD CONSTRAINT cu_changes_cuc_page_id_fkey FOREIGN KEY 
(cuc_page_id) REFERENCES mediawiki.page(page_id) ON DELETE SET NULL;
     ADD CONSTRAINT cu_changes_cuc_user_fkey FOREIGN KEY (cuc_user) 
REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL;
     ADD CONSTRAINT cu_log_cul_target_id_fkey FOREIGN KEY 
(cul_target_id) REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL;
     ADD CONSTRAINT cu_log_cul_user_fkey FOREIGN KEY (cul_user) 
REFERENCES mediawiki.mwuser(user_id) ON DELETE SET NULL;
     ADD CONSTRAINT revision_rev_page_fkey FOREIGN KEY (rev_page) 
REFERENCES mediawiki.page(page_id) ON DELETE CASCADE DEFERRABLE 
INITIALLY DEFERRED;
     ADD CONSTRAINT user_newtalk_user_id_fkey FOREIGN KEY (user_id) 
REFERENCES mediawiki.mwuser(user_id) ON DELETE CASCADE DEFERRABLE 
INITIALLY DEFERRED;

> 4. How do the table or tables tend to get updated?

Regarding page:
New rows are added rarely, usually once a week or two, in small batches 
(separate transactions).
The page_title field is almost never updated (pages are moved extremely 
rarely).
Updates to other fields, like page_len, happen multiple times daily, 
during page edits, and sometimes happen in large batches but as separate 
transactions.

Regarding transcode:
I assume inserts happen with each file upload, often in large batches 
(separate transactions). Updates happen whenever a transcode job runs, 
which is usually also in batches.

I can check the wiki's recent changes and upload log for more precise data.

 > Also:
 > 5. Do you use CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY much?>

I assume not, `find . -name "*.php" | xargs grep -i "CONCURRENTLY"` 
against MediaWiki includes/ gives no SQL results, source code comments only.

> If my hypothesis is true, then we might expect problems to not even go
> away following a REINDEX. Or maybe they would go away sometimes, but
> not other times.

REINDEX *seems* to have helped in my case (tested with both 
page_main_title and the three that showed up in pg_amcheck output). I am 
considering writing a script that would SELECT COUNT(*) for each 
page_title to detect the next possible page_main_title index corruption.

-- 
K. R.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Herman verschooten
Дата:
Сообщение: Re: ERROR: posting list tuple with 20 items cannot be split at offset 168
Следующее
От: Semab Tariq
Дата:
Сообщение: Re: BUG #17248: Installation fails...