Re: Global temporary tables

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Global temporary tables
Дата
Msg-id CAMsr+YGnJg9OBwG8RU6k1wm64QuPwwtyZkG2D0vsME=r8krCAQ@mail.gmail.com
обсуждение исходный текст
Ответ на Global temporary tables  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы Re: Global temporary tables  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers


On Wed, 31 Jul 2019 at 23:05, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Current Postgres implementation of temporary table causes number of
problems:

1. Catalog bloating: if client creates and deletes too many temporary
tables, then autovacuum get stuck on catalog.

This also upsets logical decoding a little - AFAICS it still has to treat transactions that use temporary tables as catalog-modifying transactions, tracking them in its historic catalog snapshots and doing extra cache flushes etc when decoding them.

This will become even more important as we work to support eager/optimistic output plugin processing of in-progress transactions. We'd have to switch snapshots more, and that can get quite expensive so using temp tables could really hurt performance. Or we'd have to serialize on catalog-changing transactions, in which case using temp tables would negate the benefits of optimistic streaming of in-progress transactions.
 
3. It is not possible to use temporary tables at replica.

For physical replicas, yes.
 
Hot standby
configuration is frequently used to run OLAP queries on replica
and results of such queries are used to be saved in temporary tables.
Right now it is not possible (except "hackers" solution with storing
results in file_fdw).

Right. Because we cannot modify pg_class, pg_attribute etc, even though we could reasonably enough write to local-only relfilenodes on a replica if we didn't have to change WAL-logged catalog tables.

I've seen some hacks suggested around this where we have an unlogged fork of each of the needed catalog tables, allowing replicas to write temp table info to them. We'd scan both the logged and unlogged forks when doing relcache management etc. But there are plenty of ugly issues with this. We'd have to reserve oid ranges for them which is ugly; to make it BC friendly those reservations would probably have to take the form of some kind of placeholder entry in the real pg_class. And it gets ickier from there. It hardly seems worth it when we should probably just implement global temp tables instead.
  
5. Inefficient memory usage and possible memory overflow: each backend
maintains its own local buffers for work with temporary tables.

Is there any reason that would change with global temp tables? We'd still be creating a backend-local relfilenode for each backend that actually writes to the temp table, and I don't see how it'd be useful or practical to keep those in shared_buffers.

Using local buffers has big advantages too. It saves shared_buffers space for data where there's actually some possibility of getting cache hits, or for where we can benefit from lazy/async writeback and write combining. I wouldn't want to keep temp data there if I had the option.

If you're concerned about the memory use of backend local temp buffers, or about how we account for and limit those, that's worth looking into. But I don't think it'd be something that should be affected by global-temp vs backend-local-temp tables.
 
Default size of temporary buffers is 8Mb. It seems to be too small for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data between OS cache and local buffers. But if there are
thousands of backends, each executing queries with temporary tables,
then  total amount of memory used for temporary buffers can exceed
several tens of gigabytes.

Right. But what solution do you propose for this? Putting that in shared_buffers will do nothing except deprive shared_buffers of space that can be used for other more useful things. A server-wide temp buffer would add IPC and locking overheads and AFAICS little benefit. One of the big appeals of temp tables is that we don't need any of that.

If you want to improve server-wide temp buffer memory accounting and management that makes sense. I can see it being useful to have things like a server-wide DSM/DSA pool of temp buffers that backends borrow from and return to based on memory pressure on a LRU-ish basis, maybe. But I can also see how that'd be complex and hard to get right. It'd also be prone to priority inversion problems where an idle/inactive backend must be woken up to release memory or release locks, depriving an actively executing backend of runtime. And it'd be as likely to create inefficiencies with copying and eviction as solve them since backends could easily land up taking turns kicking each other out of memory and re-reading their own data.

I don't think this is something that should be tackled as part of work on global temp tables personally.

 
6. Connection pooler can not reschedule session which has created temporary tables to some other backend because it's data is stored in local buffers.

Yeah, if you're using transaction-associative pooling. That's just part of a more general problem though, there are piles of related issues with temp tables, session GUCs, session advisory locks and more.

I don't see how global temp tables will do you the slightest bit of good here as the data in them will still be backend-local. If it isn't then you should just be using unlogged tables.
 
Definition of this table (metadata) is shared by all backends but data
is private to the backend. After session termination data is obviously lost.

+1 that's what a global temp table should be, and it's IIRC pretty much how the SQL standard specifies temp tables.

I suspect I'm overlooking some complexities here, because to me it seems like we could implement these fairly simply. A new relkind would identify it as a global temp table and the relfilenode would be 0. Same for indexes on temp tables. We'd extend the relfilenode mapper to support a backend-local non-persistent relfilenode map that's used to track temp table and index relfilenodes. If no relfilenode is defined for the table, the mapper would allocate one. We already happily create missing relfilenodes on write so we don't even have to pre-create the actual file. We'd register the relfilenode as a tempfile and use existing tempfile cleanup mechanisms, and we'd use the temp tablespace to store it.

I must be missing something important because it doesn't seem hard.

Global temporary tables are accessed though shared buffers (to solve
problem 2).

I'm far from convinced of the wisdom or necessity of that, but I haven't spent as much time digging into this problem as you have.
 
The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.

That sounds potentially cool, but perhaps a "next step" thing? Allow the creation of global temp tables to specify reloptions, and you can add it as a reloption later. You can't actually eliminate visibility checks anyway because they're still MVCC heaps. Savepoints can create invisible tuples even if you're using temp tables that are cleared on commit, and of course so can DELETEs or UPDATEs. So I'm not sure how much use it'd really be in practice.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: concerns around pg_lsn
Следующее
От: Michael Paquier
Дата:
Сообщение: Refactoring code stripping trailing \n and \r from strings