Re: pt_toast table seems to be

Поиск

Список

Период

Сортировка

От	Rui DeSousa
Тема	Re: pt_toast table seems to be
Дата	4 января 2018 г. 21:33:48
Msg-id	4A05FCB9-1095-41A0-8F6E-1A92D8DB63ED@icloud.com обсуждение исходный текст
Ответ на	pt_toast table seems to be (Jorge Daniel <elgaita@hotmail.com>)
Список	pgsql-admin

Дерево обсуждения

Correct. The following parameter would have to change otherwise a system crash will leave the database in a corrupted state that is unrecoverable.

rbd_cache_max_dirty = 25165824

rbd_cache_max_dirty = 0

On Jan 4, 2018, at 10:15 AM, Jorge Daniel <elgaita@hotmail.com> wrote:

Hi Rui:

We're using Jewel and the actual parameters for rbd caching:

rbd_op_threads = 1
rbd_op_thread_timeout = 60
rbd_non_blocking_aio = true
rbd_cache = true
rbd_cache_writethrough_until_flush = true
rbd_cache_size = 33554432
rbd_cache_max_dirty = 25165824
rbd_cache_target_dirty = 16777216
rbd_cache_max_dirty_age = 1
rbd_cache_max_dirty_object = 0
rbd_cache_block_writes_upfront = false
rbd_concurrent_management_ops = 10
rbd_balance_snap_reads = false
rbd_localize_snap_reads = false
rbd_balance_parent_reads = false
rbd_localize_parent_reads = true
rbd_readahead_trigger_requests = 10
rbd_readahead_max_bytes = 524288
rbd_readahead_disable_after_bytes = 52428800
rbd_clone_copy_on_read = false
rbd_blacklist_on_break_lock = true
rbd_blacklist_expire_seconds = 0
rbd_request_timed_out_seconds = 30
rbd_skip_partial_discard = false
rbd_enable_alloc_hint = true
rbd_tracing = false
rbd_validate_pool = true
rbd_validate_names = true
rbd_mirroring_resync_after_disconnect = false
rbd_auto_exclusive_lock_until_manual_request = true
rbd_default_format = 2
rbd_default_order = 22
rbd_default_stripe_count = 0
rbd_default_stripe_unit = 0
rbd_default_features = 1
rbd_default_map_options =
rbd_journal_order = 24
rbd_journal_splay_width = 4
rbd_journal_commit_age = 5
rbd_journal_object_flush_interval = 0
rbd_journal_object_flush_bytes = 0
rbd_journal_object_flush_age = 0
rbd_journal_pool =
rbd_journal_max_payload_bytes = 16384
rbd_journal_max_concurrent_object_sets = 0
rbd_mirror_journal_commit_age = 5
rbd_mirror_journal_poll_age = 5
rbd_mirror_journal_max_fetch_bytes = 32768
rbd_mirror_sync_point_update_age = 30
rbd_mirror_concurrent_image_syncs = 5

Are those the ones you were refering?

Jorge Daniel Fernandez

From: Rui DeSousa <rui.desousa@icloud.com>
Sent: Wednesday, January 3, 2018 6:42 PM
To: Jorge Daniel
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: pt_toast table seems to be

Interesting setup. What is your ceph settings? Are you using caching and is it set to write-back or write-through?

If you’re using caching it should be using write-through with the max dirty set to zero.

rbd cache max dirty to 0

http://docs.ceph.com/docs/master/rbd/rbd-config-ref/
librbd Settings — Ceph Documentation
docs.ceph.com
The user space implementation of the Ceph block device (i.e., librbd) cannot take advantage of the Linux page cache, so it includes its own in-memory caching, called ...

On Jan 3, 2018, at 3:38 PM, Jorge Daniel <elgaita@hotmail.com> wrote:

We're using ceph, ext4 volumes where resides our PGDATA , it's mounted like this:
/dev/rbd5 on /pg_data type ext4 (rw,relatime,discard,stripe=1024,data=ordered)

Jorge Daniel Fernandez

From: Rui DeSousa <rui.desousa@icloud.com>
Sent: Wednesday, January 3, 2018 5:20 PM
To: Jorge Daniel
Subject: Re: pt_toast table seems to be

I’m not a docker expert nor have I ever run Postgres in a docker container; but my thought is docker is not handle sync() truly to form — i.e. buffering for performance improvements instead. If it buffers it but then crashes before the data is actual saved it will lead to corruption. I’ve seen similar issues with RAID controllers that buffer the sync() call but eventual fail to actual save the data for whatever reason; buffer overrun, system crashed, etc.

How is the Postgres data stored In the docker container? What file system is it using? Is using a docker data volume, directory mount, or storage plug in?

Doing a quick google search it does seem that docker has it’s own storage model which introduces it’s own COW semantics, plugins, etc; although, I can’t find anything on if it guarantees sync() calls.

On Jan 3, 2018, at 2:43 PM, Jorge Daniel <elgaita@hotmail.com> wrote:

Hi Rui, every time autoAnalyze ran against that table , after a long time running, the OOM killer act up and killed the autovacuum , in some cases the engine got in recovery and other cases it was causing the crash of the whole container. We're pretty sure that this last ones were the ones that lead to this issue.

We were able to reproduce the OOM killing and our findings lead us to a strange Docker setting for the SHMMAX:

Linux 434d18e30d83 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
postgres@434d18e30d83:~$ cat /proc/sys/kernel/shmmax
18446744073692774399

after we a fixed that number, we were able to run the ANALYZE and Vacuum to its end without any OOM killing.

This is related to
https://github.com/moby/moby/issues/16315

I don't have now that syslog to examine if the sync() has worked properly or not.

Container is reporting incorrect /proc/sys/kernel/shmmax value · Issue #16315 · moby/moby
github.com
BUG REPORT INFORMATION My host is Ubuntu 15.04 (kernel 3.19.0-28-generic) docker version: Client version: 1.5.0 Client API version: 1.17 Go version (client): go1.3.3 Git commit (client): a8a31ef OS...

Jorge Daniel Fernandez

From: Rui DeSousa <rui.desousa@icloud.com>
Sent: Wednesday, January 3, 2018 4:23 PM
To: Jorge Daniel
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: pt_toast table seems to be

Do you mean the Postgres instance is killed or the docker instance is killed? For OOM, what’s the memory configuration and how much swap is allocated?

I would look at your disk subsystem from the prospected of Postgres; when it issues a sync() does your setup honor it? If you docker instance is crashing then it seems like sync() might not honored which would lead to corruption issues you’re describing.

On Jan 3, 2018, at 10:25 AM, Jorge Daniel <elgaita@hotmail.com> wrote:

Hi guys
This is my first post in the comunity so my apologies in advance about the formalities.

In a production db , and after several OOM killer events on the postgres running on a docker (now fixed) we remain with a kind of corrupted pg_toast 😐:

2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL JDBC Driver:user=veon:db=veon:ERROR: unexpected chunk number 0 (expected 1) for toast value 17143928 in pg_toast_77809
2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL JDBC Driver:user=veon:db=veon:STATEMENT: SELECT "id", "needs_pumping", "needs_lucene", "surveybatchid", "ip_address", "is_open_proxy", "is_partial", "cookies", "social_media_source", "social_media_review_id", "social_media_pull_id", "social_media_pull_id_long", "source_config_unitid", "source_config_source_id", "updated", "external_id_printable_coder", "unique_reusable_id", "invite_errmsg", "reminder_errmsg", "last_seen_pagename", "last_submitted_pagename", "has_content_fields", "logid", "alertid", "language_name", "e_status", "e_reminder_status", "e_lastupdated", "e_creationdate", "e_sampleddate", "e_responsedate", "e_invitationdate", "reminderdate", "e_delivereddate", "e_remindeddate", "e_accepteddate", "e_initialfinishdate", "e_ta_completed_date", "e_expirationdate", "e_survey_method", "e_survey_source", "e_survey_type", "parse_status", "tagging_attempts", "e_optout", "e_bounce_category", "feed_fileid", "feed_file_recordid", "startdate", "e_lastname", "e_firstname", "e_address", "e_address2", "e_city", "e_state", "e_postalcode", "e_phone", "e_email", "is_mobile", "is_mobile_first", "is_mobile_finished", "is_cookie_confirmation_needed", "exclude_from_problem_tracker", "is_invitation_error", "page_number_seen_last", "page_number_seen_highest", "pages_submitted", "pages_validation_failed", "last_submit_date", "user_agent", "jsonb_fields", "thrift", "episodeid", "e_unitid", "e_committed_survey_specid", "customer_veonid", "xmin" FROM "survey_prior" WHERE survey_veon.surveyid = ANY($1)

The usual fix is :

REINDEX TABLE pg_toast_77809;

After this we can select * from survery_prior with no problem, and we can dump the entire table if we wish it.

Then the jobs resumes and we're happy until we hit again with the same error , this occurs randomly , we suspect on the UPDATE on certain row is creating the corruption.

We can't reproduce the ERROR because we can't do a full scan of the table while the error is still alive () on-call guys applies the fix inmediatly to resume production ).

So how can we discover the bad-rows and expose them, or how do we check the consistency of the pt_toast table?

We need this evidence to ask for window and a complete dump/restore that will wipe out this problem.

Thx in advance

Jorge Daniel Fernandez

В списке pgsql-admin по дате отправления:

Предыдущее

От: "Dave Bolt"
Дата: 04 января 2018 г., 19:25:09
Сообщение: RE: Failed to execute pg_dump

Следующее

От: Matthias Wirtz
Дата: 04 января 2018 г., 23:40:58
Сообщение: Missing Color Preferences

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: pt_toast table seems to be

Предыдущее

Следующее