Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

Поиск

Список

Период

Сортировка

От	Mathieu Fenniak
Тема	Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables
Дата	13 мая 2017 г. 02:19:10
Msg-id	CAHoiPjyvnhB4WXuo7LUfWFeVocoJ1ZpB9FeqA6keamQu2_td8g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables (Andres Freund <andres@anarazel.de>)
Ответы	Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables
Список	pgsql-general

Дерево обсуждения

Hi Tom, Andres,

I've taken your patches, Tom, and applied them to a test deployment of my actual application.

The most accessible way I have to reproduce this issue is to run a maintenance task that we typically run during a software deployment, which will remove some tenant schemas from our database system. I ran that task with and without the patches, and monitored how long logical replication stalled. This test was on a database with 22 schemas containing 400 tables each (~8800 tables).

As a baseline, PostgreSQL 9.5.6 stalled completely for 21m 13s

PostgreSQL 9.5.6 + all three patches stalled completely for 4m 11s.

Definitely a fantastic improvement. :-)

This test scenario (performing the maintenance task) is unfortunately very similar to the synthetic test... but it's the most consistent manner I've found to reproduce this stall. I've observed the replication stall in production at somewhat arbitrary times, so I'm not sure if there may be different performance profiles in more production-like usage.

Here's a CPU performance sampling from the 4m period w/ the patched PostgreSQL. This is from the `perf` tool on Linux, rather than the earlier samples from my development machine on OSX. `perf report`: https://gist.github.com/mfenniak/9a3484c426a497f3903051d9fdf8b221 Raw perf data (49kb bzip2): https://www.dropbox.com/s/3jfxtg9kjzjztmp/linux-pg-walsender-patched.perf.data.bz2?dl=0

Mathieu

On Wed, May 10, 2017 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote:

Hi Tom, Mathieu,

On 2017-05-10 17:02:11 -0400, Tom Lane wrote:
> Mathieu Fenniak <mathieu.fenniak@replicon.com> writes:
> > Andres, it seems like the problem is independent of having large data
> > manipulations mixed with schema changes. The below test case demonstrates
> > it with just schema changes.
>
> > Tom, I've tested your patch, and it seems to have a positive impact for
> > sure. I've documented a test case to reproduce this issue (below), and
> > your avoid-search-in-catcache-invalidate.patch reduces the test case time
> > from 63 seconds per run to 27 seconds per run.
>
> Thanks for the test case.

Indeed!

> + 59.28% 59.05% 150349 postmaster postgres [.] hash_seq_search
> + 8.40% 8.36% 21250 postmaster postgres [.] CallSyscacheCallbacks
> + 6.37% 6.34% 16115 postmaster postgres [.] LocalExecuteInvalidationMessage
> + 5.69% 5.67% 14418 postmaster postgres [.] CatCacheInvalidate
> + 3.14% 3.12% 7942 postmaster postgres [.] SysCacheInvalidate
> + 1.72% 1.71% 4354 postmaster postgres [.] ReorderBufferCommit
> + 1.37% 1.33% 3512 postmaster postgres [.] hash_search_with_hash_value
> + 1.15% 1.15% 2913 postmaster postgres [.] InvalidateCatalogSnapshot
>
> I looked at the hash_seq_search time a bit more, and realized that
> actually there's a pretty easy fix for that, which is to reduce
> the initial size of RelfilenodeMapHash from 1024 entries to 64.

Heh, that's not a bad idea. Given the table resizes automatically, there
seems little reason not to go there.

> You could quibble about where to set that exactly, but 1024 is
> just way too many --- in your example, there are never more than
> 5 entries in the hash, despite the presence of 10000 tables in
> the database.

That's likely because there's no DML.

> We're at a point of diminishing returns here; I think any further
> improvement would require reducing the number of invalidation calls,
> as Andres was speculating about upthread. Still, this shows that
> it doesn't take very much work to get a 10X improvement in the
> overhead associated with inval activities. We've never seen this
> overhead stick out quite this much before, and maybe logical replication
> will always be an atypical workload, but I think this may be worth
> committing even if Andres does managed to cut the number of calls.

I've seen these callsites prominently in profiles not using logical
decoding. Never quite as big as here, but still. Temp table heavy
workloads IIRC are one of the easy way to trigger it.

> It would be interesting to see how much these patches help for your real
> use-case, as opposed to this toy example. Assuming that the results are
> positive, I would advocate for back-patching these changes as far as 9.4
> where logical decoding came in.

+1.

> BTW, I also noticed that we're getting scarily close to exceeding
> MAX_SYSCACHE_CALLBACKS. There are 30-some calls to
> CacheRegisterSyscacheCallback in our code, and while I think not all of
> them can be reached in a single process, we demonstrably get as high as 21
> registered callbacks in some regression test runs. That's not leaving a
> lot of daylight for add-on modules. The second patch attached includes
> increasing MAX_SYSCACHE_CALLBACKS from 32 to 64. I think we'd be well
> advised to apply and back-patch that, even if we don't use the rest of
> the patch.

+1 - I'd advocate for doing so all the way.

Greetings,

Andres Freund

В списке pgsql-general по дате отправления:

Предыдущее

От: Adrian Klaver
Дата: 12 мая 2017 г., 23:57:50
Сообщение: Re: [GENERAL] EnterpriseDB installed PostgreSQL 9.6 vs. REPMGR. Round2 - compilation issues.

Следующее

От: Justin Pryzby
Дата: 13 мая 2017 г., 02:22:52
Сообщение: Re: [GENERAL] Partitioning and Table Inheritance

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

Предыдущее

Следующее