Re: Postgres is not able to handle more than 4k tables!?

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Postgres is not able to handle more than 4k tables!?
Дата
Msg-id 8eb5d49d-4ac0-a6f2-2a07-91f7418f87be@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Postgres is not able to handle more than 4k tables!?  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Postgres is not able to handle more than 4k tables!?  (Nikolay Samokhvalov <samokhvalov@gmail.com>)
Re: Postgres is not able to handle more than 4k tables!?  (Alexander Korotkov <aekorotkov@gmail.com>)
Re: Postgres is not able to handle more than 4k tables!?  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
Hi Stephen,

Thank you for supporting an opinion that it is the problems not only of 
client system design (I agree it is not so good
idea to have thousands tables and thousands active backends) but also of 
Postgres.

We have made more investigation and found out one more problem in 
Postgres causing "invalidation storm".
There are some log living transactions which prevent autovacuum to do it 
work and  remove dead tuples.
So autovacuum is started once and once again and each time did no 
progress but updated statistics and so sent invalidation messages.
autovacuum_naptime was set to 30 seconds, so each 30 seconds autovacuum 
proceed huge number of tables and initiated large number of invalidation 
messages which quite soon cause overflow of validation message buffers 
for backends performing long OLAP queries.

It makes me think about two possible optimizations:

1. Provide separate invalidation messages for relation metadata and its 
statistic.
So update of statistic should not invalidate relation cache.
The main problem with this proposal is that pg_class contains relpages 
and reltuples columns which conceptually are \ part of relation statistic
but stored in relation cache. If relation statistic is updated, then 
most likely this fields are also changed. So we have to remove this relation
from relation cache in any case.

2. Remember in relation info XID of oldest active transaction at the 
moment of last autovacuum.
At next autovacuum iteration we first of all compare this stored XID 
with current oldest active transaction XID
and bypass vacuuming this relation if XID is not changed.

Thoughts?

>> So, that's really the core of your problem.  We don't promise that
>> you can run several thousand backends at once.  Usually it's recommended
>> that you stick a connection pooler in front of a server with (at most)
>> a few hundred backends.
> Sure, but that doesn't mean things should completely fall over when we
> do get up to larger numbers of backends, which is definitely pretty
> common in larger systems.  I'm pretty sure we all agree that using a
> connection pooler is recommended, but if there's things we can do to
> make the system work at least a bit better when folks do use lots of
> connections, provided we don't materially damage other cases, that's
> probably worthwhile.

I also think that Postgres performance should degrade gradually with 
increasing number
of active backends. Actually further investigations of this particular 
case shows that such large number of
database connections was caused by ... Postgres slowdown.
During normal workflow number of active backends is few hundreds.
But "invalidation storm" cause hangout of queries, so user application 
has to initiate more and more new connections to perform required actions.
Yes, this may be not the best behavior of application in this case. At 
least it should first terminate current connection using 
pg_terminate_backend. I just want to notice that large number of 
backends was not the core of the problem.

>
> Making them GUCs does seem like it's a few steps too far... but it'd be
> nice if we could arrange to have values that don't result in the system
> falling over with large numbers of backends and large numbers of tables.
> To get a lot of backends, you'd have to set max_connections up pretty
> high to begin with- perhaps we should contemplate allowing these values
> to vary based on what max_connections is set to?

I think that optimal value of number of lock partitions should depend 
not on number of connections
but on number of available CPU cores and so expected level on concurrency.
It is hard to propose some portable way to obtain this number.
This is why I think that GUCs is better solution.
Certainly I realize that it is very dangerous parameter which should be 
changed with special care.
Not only because of  MAX_SIMUL_LWLOCKS.

There are few places in Postgres when it tries to lock all partitions 
(deadlock detector, logical replication,...).
If there very thousands of partitions, then such lock will be too 
expensive and we get yet another
popular Postgres program: "deadlock detection storm" when due to high 
contention between backends lock can not be obtained
in deadlock timeout and so initiate deadlock detection. Simultaneous 
deadlock detection performed by all backends
(which tries to take ALL partitions locks) paralyze the system (TPS 
falls down to 0).
Proposed patch for this problem was also rejected (once again - problem 
can be reproduced only of powerful server with large number of cores).



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: Stale external URL in doc?
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: Postgres is not able to handle more than 4k tables!?