Re: [HACKERS] shared memory based stat collector (was: Sharingrecord typmods between backends)

Поиск
Список
Период
Сортировка
От David Fetter
Тема Re: [HACKERS] shared memory based stat collector (was: Sharingrecord typmods between backends)
Дата
Msg-id 20170814181705.GB3471@fetter.org
обсуждение исходный текст
Ответ на [HACKERS] shared memory based stat collector (was: Sharing record typmodsbetween backends)  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Sun, Aug 13, 2017 at 05:56:56PM -0700, Andres Freund wrote:
> Hi,
> 
> Since we're getting a bit into the weeds of a different topic, and since
> I think it's an interesting feature, I'm detaching this into a separate
> thread.
> 
> On 2017-08-12 23:37:27 -0400, Tom Lane wrote:
> > >> On 2017-08-12 22:52:57 -0400, Robert Haas wrote:
> > >>> I think it'd be pretty interesting to look at replacing parts of the
> > >>> stats collector machinery with something DHT-based.
> > > On Sat, Aug 12, 2017 at 11:30 PM, Andres Freund <andres@anarazel.de> wrote:
> > >> That seems to involve a lot more than this though, given that currently
> > >> the stats collector data doesn't entirely have to be in memory. I've
> > >> seen sites with a lot of databases with quite some per-database stats
> > >> data. Don't think we can just require that to be in memory :(
> >
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > Hmm.  I'm not sure it wouldn't end up being *less* memory.  Don't we
> > > end up caching 1 copy of it per backend, at least for the database to
> > > which that backend is connected?  Accessing a shared copy would avoid
> > > that sort of thing.
> >
> > Yeah ... the collector itself has got all that in memory anyway.
> > We do need to think about synchronization issues if we make that
> > memory globally available, but I find it hard to see how that would
> > lead to more memory consumption overall than what happens now.
> 
> You both are obviously right.  Another point of potential concern could
> be that we'd pretyt much fully rely on dsm/dht's being available, for
> the server to function correctly. Are we ok with that? Right now
> postgres still works perfectly well, leaving parallelism aside, with
> dynamic_shared_memory_type = none.
> 
> 
> What are your thoughts about how to actually implement this? It seems
> we'd have to do something like:
> 
> 1) Keep the current per-backend & per-transaction state in each
>    backend. That allows both to throw away the information and avoids
>    increasing contention quite noticeably.
> 
> 2) Some plain shared memory with metadata.  A set of shared hashtables
>    for per database, per relation contents.
> 
> 3) Individual database/relation entries are either individual atomics
>    (we don't rely on consistency anyway), or seqcount (like
>    st_changecount) based.
> 
> 4) Instead of sending stats at transaction end, copy them into a
>    "pending" entry.  Nontransactional contents can be moved to
>    the pending entry more frequently.
> 
> 5) Occasionally, try to flush the pending array into the global hash.
>    The lookup in the table would be protected by something
>    LWLockConditionalAcquire() based, to avoid blocking - don't want to
>    introduce chokepoints due to commonly used tables and such.  Updating
>    the actual stats can happen without the partition locks being held.
> 
> I think there's two other relevant points here:
> 
> a) It'd be quite useful to avoid needing a whole cluster's stats in
>    memory. Even if $subject would save memory, I'm hesitant committing
>    to something requiring all stats to be in memory forever. As a first
>    step it seems reasonable to e.g. not require state for all databases
>    to be in memory. The first time per-database stats are required, it
>    could be "paged in". Could even be more aggressive and do that on a
>    per-table level and just have smaller placeholder entries for
>    non-accessed tables, but that seems more work.
> 
>    On the other hand, autoavcuum is likely going to make that approach
>    useless anyway, given it's probably going to access otherwise unneded
>    stats regularly.
> 
> b) I think our tendency to dump all stats whenever we crash isn't really
>    tenable, given how autovacuum etc are tied to them. We should think
>    about ways to avoid that if we're going to do a major rewrite of the
>    stats stuff, which this certainly sounds like.
> 
> 
> If there weren't HS to worry about, these two points kinda sound like
> the data should be persisted into an actual table, rather than some
> weird other storage format. But HS seems to make that untenable.
> 
> Greetings,
> 
> Andres Freund

As I recall, David Gould (hi!) had run across a case where there were
thousand of tables and the stats file became a pretty serious
bottleneck.  There might even be a design or even code to address it.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] Re: [COMMITTERS] pgsql: Don't force-assign transactionid when exporting a snapshot.
Следующее
От: Chris Travers
Дата:
Сообщение: Re: [HACKERS] Orphaned files in base/[oid]