Re: Global temporary tables

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Global temporary tables
Дата
Msg-id 0baf9e02-23b3-f3c1-6fe9-fd35aee6fee1@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Global temporary tables  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: Global temporary tables  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers


On 01.08.2019 6:10, Craig Ringer wrote:
 
3. It is not possible to use temporary tables at replica.

For physical replicas, yes.

Yes, definitely logical replicas (for example our PgPro-EE multimaster based on logical replication) do not suffer from this problem.
But in case of multimaster we have another problem related with temporary tables: we have to use 2PC for each transaction and using temporary tables in prepared transaction is now prohibited.
This was the motivation of the patch proposed by Stas Kelvich which allows to use temporary tables in prepared transactions under some conditions.
 5. Inefficient memory usage and possible memory overflow: each backend
maintains its own local buffers for work with temporary tables.

Is there any reason that would change with global temp tables? We'd still be creating a backend-local relfilenode for each backend that actually writes to the temp table, and I don't see how it'd be useful or practical to keep those in shared_buffers.

Yes, my implementation of global temp tables is using shared buffers.
It was not strictly needed as far as data is local. It is possible to have shared metadata and private data accessed through local buffers.
But I have done it for three reasons:
1, Make it possible to use parallel plans for temp tables.
2. Eliminate memory overflow problem.
3. Make in possible to reschedule session to other backens (connection pooler).

Using local buffers has big advantages too. It saves shared_buffers space for data where there's actually some possibility of getting cache hits, or for where we can benefit from lazy/async writeback and write combining. I wouldn't want to keep temp data there if I had the option.

Definitely local buffers have some advantages:
- do not require synchronization
- avoid flushing data from shared buffers

But global temp tables are not excluding use of original (local) temp tables.
So you will have a choice: either to use local temp tables which can be easily created on demand and accessed through local buffers,
either create global temp tables, which eliminate catalog bloating, allow parallel queries and which data is  controlled by the same cache replacement discipline as for normal tables...



 
Default size of temporary buffers is 8Mb. It seems to be too small for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data between OS cache and local buffers. But if there are
thousands of backends, each executing queries with temporary tables,
then  total amount of memory used for temporary buffers can exceed
several tens of gigabytes.

Right. But what solution do you propose for this? Putting that in shared_buffers will do nothing except deprive shared_buffers of space that can be used for other more useful things. A server-wide temp buffer would add IPC and locking overheads and AFAICS little benefit. One of the big appeals of temp tables is that we don't need any of that.

I do not think that parallel execution and efficient connection pooling are "little benefit".

If you want to improve server-wide temp buffer memory accounting and management that makes sense. I can see it being useful to have things like a server-wide DSM/DSA pool of temp buffers that backends borrow from and return to based on memory pressure on a LRU-ish basis, maybe. But I can also see how that'd be complex and hard to get right. It'd also be prone to priority inversion problems where an idle/inactive backend must be woken up to release memory or release locks, depriving an actively executing backend of runtime. And it'd be as likely to create inefficiencies with copying and eviction as solve them since backends could easily land up taking turns kicking each other out of memory and re-reading their own data.

I don't think this is something that should be tackled as part of work on global temp tables personally.

My assumptions are the following: temporary tables are mostly used in OLAP queries. And OLAP workload  means that there are few concurrent queries which are working with large datasets.
So size of produced temporary tables can be quite big. For OLAP it seems to be very important to be able to use parallel query execution and use the same cache eviction rule both for persistent and temp tables
(otherwise you either cause swapping, either extra copying of data between OS and Postgres caches).


 
6. Connection pooler can not reschedule session which has created temporary tables to some other backend because it's data is stored in local buffers.

Yeah, if you're using transaction-associative pooling. That's just part of a more general problem though, there are piles of related issues with temp tables, session GUCs, session advisory locks and more.

I don't see how global temp tables will do you the slightest bit of good here as the data in them will still be backend-local. If it isn't then you should just be using unlogged tables.

You can not use the same unlogged table to save intermediate query results in two parallel sessions.

 
Definition of this table (metadata) is shared by all backends but data
is private to the backend. After session termination data is obviously lost.

+1 that's what a global temp table should be, and it's IIRC pretty much how the SQL standard specifies temp tables.

I suspect I'm overlooking some complexities here, because to me it seems like we could implement these fairly simply. A new relkind would identify it as a global temp table and the relfilenode would be 0. Same for indexes on temp tables. We'd extend the relfilenode mapper to support a backend-local non-persistent relfilenode map that's used to track temp table and index relfilenodes. If no relfilenode is defined for the table, the mapper would allocate one. We already happily create missing relfilenodes on write so we don't even have to pre-create the actual file. We'd register the relfilenode as a tempfile and use existing tempfile cleanup mechanisms, and we'd use the temp tablespace to store it.

I must be missing something important because it doesn't seem hard.

As I already wrote, I tried to kill two bird with one stone: eliminate catalog bloating and allow access to temp tables from multiple backends (to be able to perform parallel queries and connection pooling).
This is why I have to use shared buffers for global temp tables.
May be it was not so good idea. But it was one of my primary intention of publishing this patch to know opinion of other people.
In PG-Pro some of my colleagues think  that the most critical problem is inability to use temporary tables at replica.
Other think that it is not a problem at all if you are using logical replication.
From my point of view the most critical problem is
inability to use parallel plans for temporary tables.
But looks like you don't think so.

I see three different activities related with temporary tables:
1. Shared metadata
2. Shared buffers
3. Alternative concurrency control & reducing tuple header size (specialized table access method for temporary tables)

In my proposal I combined 1 and 2, leaving 3 for next step.
I will be interested to know other suggestions.

One more thing - 1 and 2 are really independent: you can share metadata without sharing buffers.
But introducing yet another kind of temporary tables seems to be really overkill:
- local temp tables (private namespace and lcoal buffers)
- tables with shared metadata but local bufferes

- tables with shared metadata and bufferes


 
The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.

That sounds potentially cool, but perhaps a "next step" thing? Allow the creation of global temp tables to specify reloptions, and you can add it as a reloption later. You can't actually eliminate visibility checks anyway because they're still MVCC heaps.

Sorry?
I mean elimination of MVCC overhead (visibility checks) for temp tables only.
I am not sure that we can really fully eliminate it if we support use of temp tables in prepared transactions and autonomous transactions (yet another awful feature we have in PgPro-EE).
Also looks like we need to have some analogue of CID to be able to correctly executed queries like "insert into T (select from T ...)" where T is global temp table.
I didn't think much about it, but I really considering new table access method API for reducing per-tuple storage overhead for temporary and append-only tables.

Savepoints can create invisible tuples even if you're using temp tables that are cleared on commit, and of course so can DELETEs or UPDATEs. So I'm not sure how much use it'd really be in practice.

Yehh, subtransactions can be also a problem for eliminating xmin/xmax for temp tables. Thanks for noticing it.


I noticed that I have not patched some extension - fixed and rebased version of the patch is attached.
Also you can find this version in our github repository: https://github.com/postgrespro/postgresql.builtin_pool.git
branch global_temp.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: unlogged sequences
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: concerns around pg_lsn