Обсуждение: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
[Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Aleksander Alekseev
		    Дата:
		        Hello Some time ago we discussed an idea of "fast temporary tables": https://www.postgresql.org/message-id/20160301182500.2c81c3dc%40fujitsu In two words the idea is following. <The Idea> PostgreSQL stores information about all relations in pg_catalog. Some applications create and delete a lot of temporary tables. It causes a bloating of pg_catalog and running auto vacuum on it. It's quite an expensive operation which affects entire database performance. We could introduce a new type of temporary tables. Information about these tables is stored not in a catalog but in backend's memory. This way user can solve a pg_catalog bloating problem and improve overall database performance. </The Idea> I took me a few months but eventually I made it work. Attached patch has some flaws. I decided not to invest a lot of time in documenting it or pgindent'ing all files yet. In my experience it will be rewritten entirely 3 or 4 times before merging anyway :) But it _works_ and passes all tests I could think of, including non-trivial cases like index-only or bitmap scans of catalog tables. Usage example: ``` CREATE FAST TEMP TABLE fasttab_test1(x int, s text); INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc'); UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2; DELETE FROM fasttab_test1 WHERE x = 3; SELECT * FROM fasttab_test1 ORDER BY x; DROP TABLE fasttab_test1; ``` More sophisticated examples could be find in regression tests: ./src/test/regress/sql/fast_temp.sql Any feedback on this patch will be much appreciated! -- Best regards, Aleksander Alekseev
Вложения
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        Hi, On 07/29/2016 01:15 PM, Aleksander Alekseev wrote: > Hello > > Some time ago we discussed an idea of "fast temporary tables": > > https://www.postgresql.org/message-id/20160301182500.2c81c3dc%40fujitsu > > In two words the idea is following. > > <The Idea> > > PostgreSQL stores information about all relations in pg_catalog. Some > applications create and delete a lot of temporary tables. It causes a > bloating of pg_catalog and running auto vacuum on it. It's quite an > expensive operation which affects entire database performance. > > We could introduce a new type of temporary tables. Information about > these tables is stored not in a catalog but in backend's memory. This > way user can solve a pg_catalog bloating problem and improve overall > database performance. > > </The Idea> Great! Thanks for the patch, this is definitely an annoying issue worth fixing. I've spent a bit of time looking at the patch today, comments below ... > > I took me a few months but eventually I made it work. Attached patch > has some flaws. I decided not to invest a lot of time in documenting > it or pgindent'ing all files yet. In my experience it will be rewritten > entirely 3 or 4 times before merging anyway :) But it _works_ and > passes all tests I could think of, including non-trivial cases like > index-only or bitmap scans of catalog tables. > Well, jokes aside, that's a pretty lousy excuse for not writing any docs, and you're pretty much asking the reviewers to reverse-engineer your reasoning. So I doubt you'll get many serious reviews without fixing this gap. > Usage example: > > ``` > CREATE FAST TEMP TABLE fasttab_test1(x int, s text); > > INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc'); > > UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2; > > DELETE FROM fasttab_test1 WHERE x = 3; > > SELECT * FROM fasttab_test1 ORDER BY x; > > DROP TABLE fasttab_test1; > ``` > > More sophisticated examples could be find in regression tests: > > ./src/test/regress/sql/fast_temp.sql > > Any feedback on this patch will be much appreciated! > > 1) I wonder whether the FAST makes sense - does this really change the performance significantly? IMHO you only move the catalog rows to memory, so why should the tables be any faster? I also believe this conflicts with SQL standard specification of CREATE TABLE. 2) Why do we need the new relpersistence value? ISTM we could easily got with just RELPERSISTENCE_TEMP, which would got right away of many chances as the steps are exactly the same. IMHO if this patch gets in, we should use it as the only temp table implementation (Or can you think of cases where keeping rows in pg_class has advantages?). That'd also eliminate the need for FAST keyword in the CREATE TABLE command. The one thin I'm not sure about is that our handling of temporary tables is not standard compliant - we require each session to create it's own private temporary table. Moving the rows from pg_class into backend memory seems to go in the opposite direction, but as no one was planning to fix this, I don't think it matters much. 3) I think the heapam/indexam/xact and various other places needs a major rework. You've mostly randomly sprinkled the code with function calls to make the patch work - that's fine for an initial version, but a more principled approach is needed. 4) I'm getting failures in the regression suite - apparently the patch somehow affects costing of index only scans, so that a several queries switch from index only scans to bitmap index scans etc. I haven't investigated this more closely, but it seems quite consistent (and I don't see it without the patch). It seems the patch delays building of visibility map, or something like that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Pavel Stehule
		    Дата:
		        2016-07-30 1:46 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
 
 
Hi,
On 07/29/2016 01:15 PM, Aleksander Alekseev wrote:Hello
Some time ago we discussed an idea of "fast temporary tables":
https://www.postgresql.org/message-id/20160301182500.2c81c3dc%40fujitsu
In two words the idea is following.
<The Idea>
PostgreSQL stores information about all relations in pg_catalog. Some
applications create and delete a lot of temporary tables. It causes a
bloating of pg_catalog and running auto vacuum on it. It's quite an
expensive operation which affects entire database performance.
We could introduce a new type of temporary tables. Information about
these tables is stored not in a catalog but in backend's memory. This
way user can solve a pg_catalog bloating problem and improve overall
database performance.
</The Idea>
Great! Thanks for the patch, this is definitely an annoying issue worth fixing. I've spent a bit of time looking at the patch today, comments below ...
Yes, it some what we need long time 
I took me a few months but eventually I made it work. Attached patch
has some flaws. I decided not to invest a lot of time in documenting
it or pgindent'ing all files yet. In my experience it will be rewritten
entirely 3 or 4 times before merging anyway :) But it _works_ and
passes all tests I could think of, including non-trivial cases like
index-only or bitmap scans of catalog tables.
Well, jokes aside, that's a pretty lousy excuse for not writing any docs, and you're pretty much asking the reviewers to reverse-engineer your reasoning. So I doubt you'll get many serious reviews without fixing this gap.Usage example:
```
CREATE FAST TEMP TABLE fasttab_test1(x int, s text);
INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');
UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;
DELETE FROM fasttab_test1 WHERE x = 3;
SELECT * FROM fasttab_test1 ORDER BY x;
DROP TABLE fasttab_test1;
```
More sophisticated examples could be find in regression tests:
./src/test/regress/sql/fast_temp.sql
Any feedback on this patch will be much appreciated!
1) I wonder whether the FAST makes sense - does this really change the performance significantly? IMHO you only move the catalog rows to memory, so why should the tables be any faster? I also believe this conflicts with SQL standard specification of CREATE TABLE.
Probably has zero value to have slow and fast temp tables (from catalogue cost perspective). So the FAST implementation should be used everywhere. But there are some patterns used with work with temp tables,that should not working, and we would to decide if we prepare workaround or not. 
-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
  CREATE TEMP TABLE xxx()
ELSE
  TRUNCATE TABLE xxx;
END IF;
-- modern patter (new code) 
BEGIN
  TRUNCATE TABLE xxx;
EXCEPTION WHEN ..... THEN
  CREATE TEMP TABLE(...)
END;
In this case we can use GUC, because visible behave should be same. 
The benefit of zero catalogue cost temp tables is significant - and for some larger applications the temp tables did hard performance issues. 
2) Why do we need the new relpersistence value? ISTM we could easily got with just RELPERSISTENCE_TEMP, which would got right away of many chances as the steps are exactly the same.
IMHO if this patch gets in, we should use it as the only temp table implementation (Or can you think of cases where keeping rows in pg_class has advantages?). That'd also eliminate the need for FAST keyword in the CREATE TABLE command.
The one thin I'm not sure about is that our handling of temporary tables is not standard compliant - we require each session to create it's own private temporary table. Moving the rows from pg_class into backend memory seems to go in the opposite direction, but as no one was planning to fix this, I don't think it matters much.
3) I think the heapam/indexam/xact and various other places needs a major rework. You've mostly randomly sprinkled the code with function calls to make the patch work - that's fine for an initial version, but a more principled approach is needed.
4) I'm getting failures in the regression suite - apparently the patch somehow affects costing of index only scans, so that a several queries switch from index only scans to bitmap index scans etc. I haven't investigated this more closely, but it seems quite consistent (and I don't see it without the patch). It seems the patch delays building of visibility map, or something like that.
Some other random notes:
1. With this code should not be hard to implement global temp tables - shared persistent structure, temp local data - significant help for any who have to migrate from Oracle.
2. This should to work on slaves - it is one of ToDo
3. I didn't see support for memory store for column's statistics. Some separate questions is about production statistics - pg_stat_user_table, ..
Great and important work, thank you
Pavel
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 07/30/2016 06:49 AM, Pavel Stehule wrote: > 1) I wonder whether the FAST makes sense - does this really change > the performance significantly? IMHO you only move the catalog rows > to memory, so why should the tables be any faster? I also believe > this conflicts with SQL standard specification of CREATE TABLE. > > > Probably has zero value to have slow and fast temp tables (from > catalogue cost perspective). So the FAST implementation should be used > everywhere. But there are some patterns used with work with temp > tables,that should not working, and we would to decide if we prepare > workaround or not. > > -- problematic pattern (old code) > IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN > CREATE TEMP TABLE xxx() > ELSE > TRUNCATE TABLE xxx; > END IF; I'd argue that if you mess with catalogs directly, you're on your own. Not only it's fragile, but this pattern is also prone to race conditions (although a concurrent session can't create a conflicting temporary table). > > -- modern patter (new code) > BEGIN > TRUNCATE TABLE xxx; > EXCEPTION WHEN ..... THEN > CREATE TEMP TABLE(...) > END; > > In this case we can use GUC, because visible behave should be same. What GUC? > > The benefit of zero catalogue cost temp tables is significant - and for > some larger applications the temp tables did hard performance issues. Yeah, catalog bloat is a serious issue in such cases, and it's amplified by indexes created on the temporary tables. > > Some other random notes: > > 1. With this code should not be hard to implement global temp tables - > shared persistent structure, temp local data - significant help for any > who have to migrate from Oracle. The patch moves in pretty much the opposite direction - if anything, it'll make it more difficult to implement global temporary tables, because it removes the definitions from the catalog, thus impossible to share by catalogs. To get global temporary tables, I think the best approach would be to share the catalog definition and only override the filename. Or something like that. > > 2. This should to work on slaves - it is one of ToDo > No, it does not work on slaves, because it still does a read-write transaction. test=# begin read only; BEGIN test=# create fast temporary table x (id int); ERROR: cannot execute CREATE TABLE in a read-only transaction No idea how difficult it'd be to make it work. > 3. I didn't see support for memory store for column's statistics. Some > separate questions is about production statistics - pg_stat_user_table, .. That seems to work (both analyze and pg_stat_user_tables). Not sure where it's in the code, and I'm not willing to reverse engineer it. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule <pavel.stehule@gmail.com> writes:
> But there are some patterns used with work with temp tables,that should not
> working, and we would to decide if we prepare workaround or not.
> -- problematic pattern (old code)
> IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
>   CREATE TEMP TABLE xxx()
> ELSE
>   TRUNCATE TABLE xxx;
> END IF;
> -- modern patter (new code)
> BEGIN
>   TRUNCATE TABLE xxx;
> EXCEPTION WHEN ..... THEN
>   CREATE TEMP TABLE(...)
> END;
If the former stops working, that's a sufficient reason to reject the
patch: it hasn't been thought through carefully enough.  The key reason
why I don't think that's negotiable is that if there aren't (apparently)
catalog entries corresponding to the temp tables, that will almost
certainly break many things in the backend and third-party extensions,
not only user code patterns like this one.  We'd constantly be fielding
bug reports that "feature X doesn't work with temp tables anymore".
In short, I think that the way to make something like this work is to
figure out how to have "virtual" catalog rows describing a temp table.
Or maybe to partition the catalogs so that vacuuming away temp-table
rows is easier/cheaper than today.
        regards, tom lane
			
		Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	David Steele
		    Дата:
		        On 7/30/16 10:47 AM, Tom Lane wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> But there are some patterns used with work with temp tables,that should not >> working, and we would to decide if we prepare workaround or not. > >> -- problematic pattern (old code) >> IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN >> CREATE TEMP TABLE xxx() >> ELSE >> TRUNCATE TABLE xxx; >> END IF; > >> -- modern patter (new code) >> BEGIN >> TRUNCATE TABLE xxx; >> EXCEPTION WHEN ..... THEN >> CREATE TEMP TABLE(...) >> END; > > If the former stops working, that's a sufficient reason to reject the > patch: it hasn't been thought through carefully enough. The key reason > why I don't think that's negotiable is that if there aren't (apparently) > catalog entries corresponding to the temp tables, that will almost > certainly break many things in the backend and third-party extensions, > not only user code patterns like this one. We'd constantly be fielding > bug reports that "feature X doesn't work with temp tables anymore". > > In short, I think that the way to make something like this work is to > figure out how to have "virtual" catalog rows describing a temp table. > Or maybe to partition the catalogs so that vacuuming away temp-table > rows is easier/cheaper than today. In addition the latter pattern burns an xid which can be a problem for high-volume databases. How about CREATE TEMP TABLE IF NOT EXISTS...? -- -David david@pgmasters.net
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 07/30/2016 04:47 PM, Tom Lane wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> But there are some patterns used with work with temp tables,that should not >> working, and we would to decide if we prepare workaround or not. > >> -- problematic pattern (old code) >> IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN >> CREATE TEMP TABLE xxx() >> ELSE >> TRUNCATE TABLE xxx; >> END IF; > >> -- modern patter (new code) >> BEGIN >> TRUNCATE TABLE xxx; >> EXCEPTION WHEN ..... THEN >> CREATE TEMP TABLE(...) >> END; > > If the former stops working, that's a sufficient reason to reject the > patch: it hasn't been thought through carefully enough. The key reason > why I don't think that's negotiable is that if there aren't (apparently) > catalog entries corresponding to the temp tables, that will almost > certainly break many things in the backend and third-party extensions, > not only user code patterns like this one. We'd constantly be fielding > bug reports that "feature X doesn't work with temp tables anymore". > Agreed - breaking internal features for temporary tables is not acceptable. I was thinking more about external code messing with catalogs, but on second thought we probably need to keep the records in pg_class anyway. > > In short, I think that the way to make something like this work is > to figure out how to have "virtual" catalog rows describing a temp > table. Or maybe to partition the catalogs so that vacuuming away > temp-table rows is easier/cheaper than today. > Yeah, and I think the patch tries to do that, although in a rather invasive / unprincipled way. But this will only work for the current behavior (i.e. mostly what SQL standard means by LOCAL). For GLOBAL temporary tables I think we need to keep physical catalog row, and only override the storage filename. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Aleksander Alekseev
		    Дата:
		        Hello. Thanks everyone for great comments! > Well, jokes aside, that's a pretty lousy excuse for not writing any > docs I think maybe I put it in a wrong way. There are currently a lot of comments in a code, more then enough to understand how this feature works. What I meant is that this is not a final version of a patch and a few paragraphs are yet to be written. At least it's how I see it. If you believe that some parts of the code are currently hard to understand and some comments could be improved, please name it and I will be happy to fix it. > IMHO if this patch gets in, we should use it as the only temp table > implementation (Or can you think of cases where keeping rows in > pg_class has advantages?). That'd also eliminate the need for FAST > keyword in the CREATE TABLE command. > Probably has zero value to have slow and fast temp tables (from > catalogue cost perspective). So the FAST implementation should be used > everywhere. If there are no objections I see no reason no to do it in a next version of a patch. > I'm getting failures in the regression suite I've run regression suite like 10 times in a row in different environments with different build flags but didn't manage to reproduce it. Also our DBAs are testing this feature for weeks now on real-world applications and they didn't report anything like this. Could you please describe how to reproduce this issue? > This should to work on slaves - it is one of ToDo Glad you noticed! In fact I'm currently researching a possibility of using the same approach for creating writable temporary tables on replicas. > The key reason why I don't think that's negotiable is that if there > aren't (apparently) catalog entries corresponding to the temp tables, > that will almost certainly break many things in the backend and > third-party extensions, not only user code patterns like this one. > In short, I think that the way to make something like this work is to > figure out how to have "virtual" catalog rows describing a temp table. I'm afraid once again I put it in a wrong way. What I meant by "Information about these tables is stored not in a catalog but in backend's memory" is in fact that _records_ of pg_class, pg_type and other catalog relations are stored in-memory. Naturally this records are visible to the user (otherwise \d or \d+ would not work) and you can do queries like ` select * from pg_class where relname = 'tt1' `. In other words part of the catalog is indeed "virtual". > I didn't see support for memory store for column's statistics. Some > separate questions is about production statistics - > pg_stat_user_table, .. > That seems to work (both analyze and pg_stat_user_tables). Not sure > where it's in the code, and I'm not willing to reverse engineer it. Right, `ANALYZE temp_table;` and everything else works. Besides pg_class, pg_type, pg_attribute and other relations pg_statistic records for temp tables are stored in-memory as well. IIRC a lot of pg_stat* relations are in fact views and thus don't require any special support. If you see that some statistics are broken please don't hesitate to report it and I will fix it. Hope I answered all questions so far. I look forward to receive more comments and questions regarding this patch! -- Best regards, Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 08/01/2016 11:45 AM, Aleksander Alekseev wrote: > Hello. > > Thanks everyone for great comments! > >> Well, jokes aside, that's a pretty lousy excuse for not writing any >> docs > > I think maybe I put it in a wrong way. There are currently a lot of > comments in a code, more then enough to understand how this feature > works. What I meant is that this is not a final version of a patch and > a few paragraphs are yet to be written. At least it's how I see it. If > you believe that some parts of the code are currently hard to understand > and some comments could be improved, please name it and I will be happy > to fix it. I don't think there's "a lot of comments in the code", not even remotely. At least not in the files I looked into - heapam, indexam, xact etc. There are a few comments in general, and most of them only comment obvious facts, like "ignore in-memory tuples" right before a trivial if statement. What is needed is an overview of the approach, so that the reviewers can read that first, instead of assembling the knowledge from pieces scattered over comments in many pieces. But I see the fasttab.c contains this: /* TODO TODO comment the general idea - in-memory tuples and indexes, hooks principle, FasttabSnapshots, etc */ The other thing that needs to happen is you need to modify comments in front of some of the modified methods - e.g. the comments may need a paragraph "But when the table is fast temporary, what happens is ..." > >> IMHO if this patch gets in, we should use it as the only temp table >> implementation (Or can you think of cases where keeping rows in >> pg_class has advantages?). That'd also eliminate the need for FAST >> keyword in the CREATE TABLE command. > >> Probably has zero value to have slow and fast temp tables (from >> catalogue cost perspective). So the FAST implementation should be used >> everywhere. > > If there are no objections I see no reason no to do it in a next > version of a patch. I believe there will be a lot of discussion about this. > >> I'm getting failures in the regression suite > > I've run regression suite like 10 times in a row in different > environments with different build flags but didn't manage to reproduce > it. Also our DBAs are testing this feature for weeks now on real-world > applications and they didn't report anything like this. Could you > please describe how to reproduce this issue? > Nothing special: $ ./configure --prefix=/home/user/pg-temporary --enable-debug \ --enable-cassert $ make -s clean && make -s -j4 install $ export PATH=/home/user/pg-temporary/bin:$PATH $ pg_ctl -D ~/tmp/data-temporary init $ pg_ctl -D ~/tmp/data-temporary -l ~/temporary.log start $ make installcheck I get the failures every time - regression diff attached. The first failure in "rolenames" is expected, because of clash with existing user name. The remaining two failures are not. I only get the failure for "installcheck" but not "check" for some reason. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> What is needed is an overview of the approach, so that the reviewers can 
> read that first, instead of assembling the knowledge from pieces 
> scattered over comments in many pieces. But I see the fasttab.c contains 
> this:
> /* TODO TODO comment the general idea - in-memory tuples and indexes, 
> hooks principle, FasttabSnapshots, etc */
A fairly common answer when some feature needs an implementation overview
is to create a README file for it, or add a new section in an existing
README file.
        regards, tom lane
			
		Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Aleksander Alekseev
		    Дата:
		        Thanks everyone for your remarks and comments! Here is an improved version of a patch. Main changes: * Patch passes `make installcheck` * Code is fully commented, also no more TODO's I wish I sent this version of a patch last time. Now I realize it was really hard to read and understand. Hope I managed to correct this flaw. If you believe that some parts of the code are still poorly commented or could be improved in any other way please let me know. And as usual, any other comments, remarks or questions are highly appreciated! -- Best regards, Aleksander Alekseev
Вложения
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Geoff Winkless
		    Дата:
		        On 30 July 2016 at 13:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > I'd argue that if you mess with catalogs directly, you're on your own. Interesting. What would you suggest people use instead? Geoff
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Robert Haas
		    Дата:
		        On Thu, Aug 4, 2016 at 8:14 AM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: > Thanks everyone for your remarks and comments! > > Here is an improved version of a patch. > > Main changes: > * Patch passes `make installcheck` > * Code is fully commented, also no more TODO's > > I wish I sent this version of a patch last time. Now I realize it was > really hard to read and understand. Hope I managed to correct this > flaw. If you believe that some parts of the code are still poorly > commented or could be improved in any other way please let me know. > > And as usual, any other comments, remarks or questions are highly > appreciated! Three general comments: 1. It's always seemed to me that a huge problem with anything of this sort is dependencies. For example, suppose I create a fast temporary table and then I create a functional index on the fast temporary table that uses some SQL function defined in pg_proc. Then, another user drops the function. Then, I try to use the index. With regular temporary tables, dependencies protect us here, but if there are no catalog entries, I wonder how this can ever be made safe. Similar problems exist for triggers, constraints, RLS policies, and attribute defaults. 2. This inserts additional code in a bunch of really low-level places like heap_hot_search_buffer, heap_update, heap_delete, etc. I think what you've basically done here is create a new, in-memory heap AM and then, because we don't have an API for adding new storage managers, you've bolted it onto the existing heapam layer. That's certainly a reasonable approach for creating a PoC, but I think we actually need a real API here. Otherwise, when the next person comes along and wants to add a third heap implementation, they've got to modify all of these same places again. I don't think this code is reasonably maintainable in this form. 3. Currently, temporary tables are parallel-restricted: a query that references temporary tables can use parallelism, but access to the temporary tables is only possible from within the leader. I suspect, although I'm not entirely sure, that lifting this restriction would be easier with our current temporary table implementation than with this one, because the current temporary table implementation mostly relies on a set of buffers that could be moved from backend-private memory to DSM. On a quick look, this implementation uses a bunch of new data structures that are heavy on pointers, so that gets quite a bit more complicated to make parallel-safe (unless we adopt threads instead of processes!). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Dmitry Dolgov
		    Дата:
		        Hi
			
		I tried to dig into this patch (which seems pretty interesting) to help bring
it in good shape. Here are few random notes, I hope they can be helpful:
> I think we actually need a real API here.
Definitely, there are plenty places in the new code with the same pattern:
 * figure out if it's an action related to the fast temporary tables based on
   `ItemPointer`/relation OID/etc
* if it is, add some extra logic or skip something in original implementation
* if it is, add some extra logic or skip something in original implementation
in `heapam.c`, `indexam.c`, `xact.c`, `dependency.c`. I believe it's possible to
make it more generic (although it will contain almost the same logic), e.g.
separate regular and fasttable implementations completely, and decide which one
we should choose in that particular case.
Btw, I'm wondering about the `heap_*` functions in `heapam.c` - some of them are
wrapped and never used directly, although they're contains in
`INTERFACE_ROUTINES` (like `simple_heap_delete` -> `heap_delete`), some of them
aren't. It looks like inconsistency in terms of function names, probably it
should be unified?
> What is needed is an overview of the approach, so that the reviewers can read
> that first,
I feel lack of such information even in new version of this patch (but, I'm
probably a noob in these matters).  I noted that the `fasttab.c` file contains some
general overview, but in terms of "what are we doing", and "why are we doing
this". I think general overview of "how are we doing this" also may be useful.
And there are several slightly obvious commentaries like:
```
+ /* Is it a virtual TID? */
+ if (IsFasttabItemPointer(tid))
```
but I believe an introduction of a new API (see the previous note) will solve
this eventually.
> Why do we need the new relpersistence value? ISTM we could easily got with
> just RELPERSISTENCE_TEMP, which would got right away of many chances as the
> steps are exactly the same.
I agree, it looks like `RELPERSISTENCE_FAST_TEMP` hasn't any influence on the
code.
> For example, suppose I create a fast temporary table and then I create a
> functional index on the fast temporary table that uses some SQL function
> defined in pg_proc.
Just to clarify, did you mean something like this?
```
create fast temp table fasttab(x int, s text);
create or replace function test_function_for_index(t text) returns text as $$
begin
    return lower(t);
end;
$$ language plpgsql immutable;
create index fasttab_s_idx on fasttab (test_function_for_index(s));
drop function test_function_for_index(t text);
```
As far as I understand dependencies should protect in case of fasttable too,
because everything is visible as in regular case, isn't it?
And finally one more question, why items of `FasttabIndexMethodsTable[]` like
this one:
```
+ /* 2187, non-unique */
+ {InheritsParentIndexId, 1,
+ {Anum_pg_inherits_inhparent, 0, 0},
+ {CompareOid, CompareInvalid, CompareInvalid}
+ },
```
have this commentary before them? I assume it's an id and an extra information,
and I'm concerned that they can easily become outdated inside commentary block.
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Robert Haas
		    Дата:
		        On Sat, Aug 6, 2016 at 4:05 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote: >> For example, suppose I create a fast temporary table and then I create a >> functional index on the fast temporary table that uses some SQL function >> defined in pg_proc. > Just to clarify, did you mean something like this? > ``` > create fast temp table fasttab(x int, s text); > create or replace function test_function_for_index(t text) returns text as > $$ > begin > return lower(t); > end; > $$ language plpgsql immutable; > create index fasttab_s_idx on fasttab (test_function_for_index(s)); > drop function test_function_for_index(t text); > ``` > As far as I understand dependencies should protect in case of fasttable too, > because everything is visible as in regular case, isn't it? I think the whole idea of a fast temporary table is that there are no catalog entries. If there are no catalog entries, then dependencies are not visible. If there ARE catalog entries, to what do they refer?Without a pg_class entry for the table, there's notable OID upon which to depend. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> I think the whole idea of a fast temporary table is that there are no
> catalog entries.  If there are no catalog entries, then dependencies
> are not visible.  If there ARE catalog entries, to what do they refer?
>  Without a pg_class entry for the table, there's no table OID upon
> which to depend.
TBH, I think that the chances of such a design getting committed are
not distinguishable from zero.  Tables have to have OIDs; there is just
too much code that assumes that.  And I seriously doubt that it will
work (for any large value of "work") without catalog entries.
        regards, tom lane
			
		Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Anastasia Lubennikova
		    Дата:
		        05.08.2016 19:41, Robert Haas: > > 2. This inserts additional code in a bunch of really low-level places > like heap_hot_search_buffer, heap_update, heap_delete, etc. I think > what you've basically done here is create a new, in-memory heap AM and > then, because we don't have an API for adding new storage managers, > you've bolted it onto the existing heapam layer. That's certainly a > reasonable approach for creating a PoC, but I think we actually need a > real API here. Otherwise, when the next person comes along and wants > to add a third heap implementation, they've got to modify all of these > same places again. I don't think this code is reasonably maintainable > in this form. As I can see, you recommend to clean up the API of storage management code. I strongly agree that it's high time to do it. So, I started the discussion about refactoring and improving API of heapam and heap relations. You can find it on commitfest: https://commitfest.postgresql.org/10/700/ I'll be glad to see your thoughts on the thread. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Andres Freund
		    Дата:
		        On 2016-08-07 14:46:06 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I think the whole idea of a fast temporary table is that there are no > > catalog entries. If there are no catalog entries, then dependencies > > are not visible. If there ARE catalog entries, to what do they refer? > > Without a pg_class entry for the table, there's no table OID upon > > which to depend. > > TBH, I think that the chances of such a design getting committed are > not distinguishable from zero. Tables have to have OIDs; there is just > too much code that assumes that. And I seriously doubt that it will > work (for any large value of "work") without catalog entries. That seems a bit too defeatist. It's obviously not a small change to get there - and I don't think the patch upthread is really attacking the relevant problems yet - but saying that we'll never have temp tables without pg_class/pg_depend bloat seems to be pretty close to just giving up. Having 8 byte oids (as explicit columns instead of magic? Or just oid64?) and then reserving ranges for temp objects stored in a local memory seems to be feasible. The pinning problem could potentially be solved by "session lifetime" pins in pg_depend, which prevents dependent objects being dropped. Obviously that's just spitballing; but I think the problem is too big to just give up. Andres
Andres Freund <andres@anarazel.de> writes:
> On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I think the whole idea of a fast temporary table is that there are no
>>> catalog entries.  If there are no catalog entries, then dependencies
>>> are not visible.  If there ARE catalog entries, to what do they refer?
>>> Without a pg_class entry for the table, there's no table OID upon
>>> which to depend.
>> TBH, I think that the chances of such a design getting committed are
>> not distinguishable from zero.  Tables have to have OIDs; there is just
>> too much code that assumes that.  And I seriously doubt that it will
>> work (for any large value of "work") without catalog entries.
> That seems a bit too defeatist.
Huh?  I didn't say we shouldn't work on the problem --- I just think that
this particular approach isn't good.  Which you seemed to agree with.
        regards, tom lane
			
		Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Andres Freund
		    Дата:
		        On 2016-08-14 21:04:57 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-08-07 14:46:06 -0400, Tom Lane wrote: > >> Robert Haas <robertmhaas@gmail.com> writes: > >>> I think the whole idea of a fast temporary table is that there are no > >>> catalog entries. If there are no catalog entries, then dependencies > >>> are not visible. If there ARE catalog entries, to what do they refer? > >>> Without a pg_class entry for the table, there's no table OID upon > >>> which to depend. > > >> TBH, I think that the chances of such a design getting committed are > >> not distinguishable from zero. Tables have to have OIDs; there is just > >> too much code that assumes that. And I seriously doubt that it will > >> work (for any large value of "work") without catalog entries. > > > That seems a bit too defeatist. > > Huh? I didn't say we shouldn't work on the problem --- I just think that > this particular approach isn't good. Which you seemed to agree with. I took your statement to mean that they need a pg_class entry - even if there were a partial solution to the pg_depend problem allowing to avoid pg_attribute entries, tha't still not really be a solution. If that's not what you mean, sorry - and nice that we agree ;)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Aleksander Alekseev
		    Дата:
		        > > >>> I think the whole idea of a fast temporary table is that there > > >>> are no catalog entries. If there are no catalog entries, then > > >>> dependencies are not visible. If there ARE catalog entries, to > > >>> what do they refer? Without a pg_class entry for the table, > > >>> there's no table OID upon which to depend. > > > > >> TBH, I think that the chances of such a design getting committed > > >> are not distinguishable from zero. Tables have to have OIDs; > > >> there is just too much code that assumes that. And I seriously > > >> doubt that it will work (for any large value of "work") without > > >> catalog entries. > > > > > That seems a bit too defeatist. > > > > Huh? I didn't say we shouldn't work on the problem --- I just > > think that this particular approach isn't good. Which you seemed > > to agree with. > > I took your statement to mean that they need a pg_class entry - even > if there were a partial solution to the pg_depend problem allowing to > avoid pg_attribute entries, tha't still not really be a solution. If > that's not what you mean, sorry - and nice that we agree ;) > > Just to keep things sane I would like to remind that in this concrete patch there _are_ catalog entries: ``` [...] This file contents imlementation of special type of temporary tables --- fast temporary tables (FTT). From user perspective they work exactly as regular temporary tables. However there are no records about FTTs in pg_catalog. These records are stored in backend's memory instead and mixed with regular records during scans of catalog tables. We refer to corresponding tuples of catalog tables as "in-memory" or "virtual" tuples and to all these tuples together --- as "in-memory" or "virtual" catalog. [...] ``` As Tom pointed out a lot of PL/pgSQL code would stop working otherwise. Also I mentioned that in this case even \d and \d+ would not work. I personally find this discussion very confusing. Maybe we should concentrate on a concrete patch instead of some abstract ideas, and topics that are still open. For instance it surprises me that apparently there is no one who objects "lets make all temporary tables fast temporary tables" idea. Since in this case code would use more memory for keeping a virtual catalog wouldn't it be considered a major change of behavior that can break someones production environment? -- Best regards, Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Pavel Stehule
		    Дата:
		        2016-08-15 11:01 GMT+02:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
 
> > >>> I think the whole idea of a fast temporary table is that there
> > >>> are no catalog entries. If there are no catalog entries, then
> > >>> dependencies are not visible. If there ARE catalog entries, to
> > >>> what do they refer? Without a pg_class entry for the table,
> > >>> there's no table OID upon which to depend.
> >
> > >> TBH, I think that the chances of such a design getting committed
> > >> are not distinguishable from zero. Tables have to have OIDs;
> > >> there is just too much code that assumes that. And I seriously
> > >> doubt that it will work (for any large value of "work") without
> > >> catalog entries.
> >
> > > That seems a bit too defeatist.
> >
> > Huh? I didn't say we shouldn't work on the problem --- I just
> > think that this particular approach isn't good. Which you seemed
> > to agree with.
>
> I took your statement to mean that they need a pg_class entry - even
> if there were a partial solution to the pg_depend problem allowing to
> avoid pg_attribute entries, tha't still not really be a solution. If
> that's not what you mean, sorry - and nice that we agree ;)
>
>
Just to keep things sane I would like to remind that in this concrete
patch there _are_ catalog entries:
```
[...]
This file contents imlementation of special type of temporary tables ---
fast temporary tables (FTT). From user perspective they work exactly as
regular temporary tables. However there are no records about FTTs in
pg_catalog. These records are stored in backend's memory instead and
mixed with regular records during scans of catalog tables. We refer to
corresponding tuples of catalog tables as "in-memory" or "virtual"
tuples and to all these tuples together --- as "in-memory" or "virtual"
catalog.
[...]
```
As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
Also I mentioned that in this case even \d and \d+ would not work.
I personally find this discussion very confusing. Maybe we should
concentrate on a concrete patch instead of some abstract ideas and
topics that are still open.
For instance it surprises me that apparently there is no one who
objects "lets make all temporary tables fast temporary tables" idea.
Since in this case code would use more memory for keeping a virtual
catalog wouldn't it be considered a major change of behavior that could
break someones production environment?
It is pretty hard discussion about cost or usability of FTT. The small FTT (for usage in PLpgSQL) can be replaced by arrays. The overhead of pg_catalog of big TT is not significant. So introduction special proprietary table type is debatable. 
Probably size of metadata of temporary tables should be minimal - currently all metadata are cached in memory - and it is not a problem.
But we can change this discussion little bit different. I believe so solution should be *global temporary tables*. These tables has persistent catalogue entries. Data are joined with session. These tables can be effective solution of problem with temporary tables, can be strong benefit for developers (more comfortable, possible static analyse of PLpgSQL) and it simplify life to all people who has do migration from Oracle. So only benefits are there :).
Regards
Pavel
--
Best regards,
Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Aleksander Alekseev
		    Дата:
		        > But we can change this discussion little bit different. I believe so > solution should be *global temporary tables*. These tables has > persistent catalogue entries. Data are joined with session. These > tables can be effective solution of problem with temporary tables, > can be strong benefit for developers (more comfortable, possible > static analyse of PLpgSQL) and it simplify life to all people who has > do migration from Oracle. So only benefits are there :). I don't think that global temporary tables solve "catalog bloating that causes auto vacuum" problem. I suggest we don't change a topic. Or maybe I don't know something about global temporary tables? -- Best regards, Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Pavel Stehule
		    Дата:
		        2016-08-15 12:00 GMT+02:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
> But we can change this discussion little bit different. I believe so
> solution should be *global temporary tables*. These tables has
> persistent catalogue entries. Data are joined with session. These
> tables can be effective solution of problem with temporary tables,
> can be strong benefit for developers (more comfortable, possible
> static analyse of PLpgSQL) and it simplify life to all people who has
> do migration from Oracle. So only benefits are there :).
I don't think that global temporary tables solve "catalog bloating that
causes auto vacuum" problem. I suggest we don't change a topic. Or maybe
I don't know something about global temporary tables?
The global temporary tables has persistent rows in the catalogue. The mapping to files can be marked as special and real mapping should be only in memory. 
So the changes in catalogue related to global temporary tables are pretty less frequently.
Regards
Pavel
 
--
Best regards,
Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Aleksander Alekseev
		    Дата:
		        > The global temporary tables has persistent rows in the catalogue. The > mapping to files can be marked as special and real mapping should be > only in memory. > > So the changes in catalogue related to global temporary tables are > pretty less frequently. I'm afraid I still don't get it. Let say I have an application that does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the time. Naturally there is not only one temp table per session. Could you explain please in more detail how exactly do these persistent rows help? -- Best regards, Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Pavel Stehule
		    Дата:
		        2016-08-15 12:18 GMT+02:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
 
> The global temporary tables has persistent rows in the catalogue. The
> mapping to files can be marked as special and real mapping should be
> only in memory.
>
> So the changes in catalogue related to global temporary tables are
> pretty less frequently.
I'm afraid I still don't get it. Let say I have an application that
does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
time. Naturally there is not only one temp table per session. Could you
explain please in more detail how exactly do these persistent rows help?
when you use global temporary tables, then you create it only once - like usual tables. 
you don't drop these tables.
Regards
Pavel
--
Best regards,
Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Christoph Berg
		    Дата:
		        Re: Tom Lane 2016-07-30 <1184.1469890030@sss.pgh.pa.us> > In short, I think that the way to make something like this work is to > figure out how to have "virtual" catalog rows describing a temp table. > Or maybe to partition the catalogs so that vacuuming away temp-table > rows is easier/cheaper than today. We should also be thinking about how the opposite idea of "global" temp tables (I believe that's what Oracle calls them) would work. These have a persistent structure in the catalogs, just the data is private to every session (or transaction); every connection starts with an empty temp table and for their use. I'd guess that type of global temp tables would fix the bloat problem also very efficiently. (Ad-hoc temp tables shouldn't occur that often so the bloat caused by them wouldn't matter that much. If they do, their structure is likely always the same, and they could be made "global" in the schema.) The bit that needs to be thought out here would be how to maintain statistics for these tables. Obviously ANALYZE shouldn't update any globally visible data. Christoph
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Christoph Berg
		    Дата:
		        Re: To Tom Lane 2016-08-15 <20160815111057.v2mqqjp4aabvwqnc@msg.df7cb.de> > Re: Tom Lane 2016-07-30 <1184.1469890030@sss.pgh.pa.us> > > In short, I think that the way to make something like this work is to > > figure out how to have "virtual" catalog rows describing a temp table. > > Or maybe to partition the catalogs so that vacuuming away temp-table > > rows is easier/cheaper than today. > > We should also be thinking about how the opposite idea of "global" > temp tables (Obviously I should catch up on the rest of the thread when postponing a message for an hour or two. Sorry for the duplicated idea here...) Christoph
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Robert Haas
		    Дата:
		        Why are you sending this off-list? Please let's keep the discussion on the mailing list. I suggest resending this there. On Mon, Aug 15, 2016 at 5:01 AM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: >> > >>> I think the whole idea of a fast temporary table is that there >> > >>> are no catalog entries. If there are no catalog entries, then >> > >>> dependencies are not visible. If there ARE catalog entries, to >> > >>> what do they refer? Without a pg_class entry for the table, >> > >>> there's no table OID upon which to depend. >> > >> > >> TBH, I think that the chances of such a design getting committed >> > >> are not distinguishable from zero. Tables have to have OIDs; >> > >> there is just too much code that assumes that. And I seriously >> > >> doubt that it will work (for any large value of "work") without >> > >> catalog entries. >> > >> > > That seems a bit too defeatist. >> > >> > Huh? I didn't say we shouldn't work on the problem --- I just >> > think that this particular approach isn't good. Which you seemed >> > to agree with. >> >> I took your statement to mean that they need a pg_class entry - even >> if there were a partial solution to the pg_depend problem allowing to >> avoid pg_attribute entries, tha't still not really be a solution. If >> that's not what you mean, sorry - and nice that we agree ;) >> >> > > Just to keep things sane I would like to remind that in this concrete > patch there _are_ catalog entries: > > ``` > [...] > This file contents imlementation of special type of temporary tables --- > fast temporary tables (FTT). From user perspective they work exactly as > regular temporary tables. However there are no records about FTTs in > pg_catalog. These records are stored in backend's memory instead and > mixed with regular records during scans of catalog tables. We refer to > corresponding tuples of catalog tables as "in-memory" or "virtual" > tuples and to all these tuples together --- as "in-memory" or "virtual" > catalog. > [...] > ``` > > As Tom pointed out a lot of PL/pgSQL code would stop working otherwise. > Also I mentioned that in this case even \d and \d+ would not work. > > I personally find this discussion very confusing. Maybe we should > concentrate on a concrete patch instead of some abstract ideas and > topics that are still open. > > For instance it surprises me that apparently there is no one who > objects "lets make all temporary tables fast temporary tables" idea. > Since in this case code would use more memory for keeping a virtual > catalog wouldn't it be considered a major change of behavior that could > break someones production environment? > > -- > Best regards, > Aleksander Alekseev -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Aleksander Alekseev
		    Дата:
		        > Why are you sending this off-list? Please let's keep the discussion > on the mailing list. I suggest resending this there. Sorry for that. I accidentally removed pgsql-hackers@ from CC list or maybe my email client somehow did it for me. Short after that I realized my mistake and sent a copy to the mailing list. -- Best regards, Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Robert Haas
		    Дата:
		        On Mon, Aug 15, 2016 at 5:12 AM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: > Just to keep things sane I would like to remind that in this concrete > patch there _are_ catalog entries: > > ``` > [...] > This file contents imlementation of special type of temporary tables --- > fast temporary tables (FTT). From user perspective they work exactly as > regular temporary tables. However there are no records about FTTs in > pg_catalog. These records are stored in backend's memory instead and > mixed with regular records during scans of catalog tables. We refer to > corresponding tuples of catalog tables as "in-memory" or "virtual" > tuples and to all these tuples together --- as "in-memory" or "virtual" > catalog. > [...] > ``` That doesn't really solve the problem, because OTHER backends won't be able to see them. So, if I create a fast temporary table in one session that depends on a permanent object, some other session can drop the permanent object. If there were REAL catalog entries, that wouldn't work, because the other session would see the dependency. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8/16/16 11:59 AM, Robert Haas wrote: ... > That doesn't really solve the problem, because OTHER backends won't be > able to see them. So, if I create a fast temporary table in one > session that depends on a permanent object, some other session can > drop the permanent object. If there were REAL catalog entries, that > wouldn't work, because the other session would see the dependency. Some discussion about TEMP functions is happening on -general right now, and there's other things where temp objects are good to have, so it'd be nice to have a more generic fix for this stuff. Is the idea of "partitioning" the catalogs to store temp objects separate from permanent fatally flawed? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Aleksander Alekseev
		    Дата:
		        > That doesn't really solve the problem, because OTHER backends won't be > able to see them. So, if I create a fast temporary table in one > session that depends on a permanent object, some other session can > drop the permanent object. If there were REAL catalog entries, that > wouldn't work, because the other session would see the dependency. > This is a good point. However current implementation doesn't allow to do that. There is a related bug though, a minor one. In session 1: ``` CREATE TABLE cities2 (name text, population float, altitude int); CREATE FAST TEMPORARY TABLE capitals2 (state char(2)) INHERITS (cities2); ``` In session 2: ``` DROP TABLE cities2; ERROR: cache lookup failed for relation 16401 ``` Instead of "cache lookup failed" probably a better error message should be displayed. Something like "cannot drop table cities2 because other objects depend on it". I will send a corrected patch shortly. Everything else seems to work as expected. If you discover any other bugs please let me know! -- Best regards, Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 08/17/2016 11:50 AM, Aleksander Alekseev wrote: >> That doesn't really solve the problem, because OTHER backends won't be >> able to see them. So, if I create a fast temporary table in one >> session that depends on a permanent object, some other session can >> drop the permanent object. If there were REAL catalog entries, that >> wouldn't work, because the other session would see the dependency. >> > > This is a good point. However current implementation doesn't allow to > do that. IMHO without handling that, the design is effectively broken and has very little change (or rather none at all) to get committed. I think one way to fix that would be to store the virtual tuples in shared memory (instead of process memory). That will certainly require locking and synchronization, but well - it needs to be shared. > There is a related bug though, a minor one. > > In session 1: > > ``` > CREATE TABLE cities2 (name text, population float, altitude int); > CREATE FAST TEMPORARY TABLE capitals2 (state char(2)) INHERITS (cities2); > ``` > > In session 2: > > ``` > DROP TABLE cities2; > > ERROR: cache lookup failed for relation 16401 > ``` > > Instead of "cache lookup failed" probably a better error message > should be displayed. Something like "cannot drop table cities2 > because other objects depend on it". I will send a corrected patch > shortly. > > Everything else seems to work as expected. > > If you discover any other bugs please let me know! > While a better error message would be nice, this is curing the symptoms and not the cause. I think a proper design needs to prevent the DROP by using dependencies. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 08/15/2016 12:23 PM, Pavel Stehule wrote: > > > 2016-08-15 12:18 GMT+02:00 Aleksander Alekseev > <a.alekseev@postgrespro.ru <mailto:a.alekseev@postgrespro.ru>>: > > > The global temporary tables has persistent rows in the catalogue. The > > mapping to files can be marked as special and real mapping should be > > only in memory. > > > > So the changes in catalogue related to global temporary tables are > > pretty less frequently. > > I'm afraid I still don't get it. Let say I have an application that > does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the > time. Naturally there is not only one temp table per session. Could you > explain please in more detail how exactly do these persistent rows help? > > > when you use global temporary tables, then you create it only once - > like usual tables. > > you don't drop these tables. > I share the view that this is a better/simpler solution to the problem. It will still require virtual (in-memory) tuples for pg_statistic records, but everything else works pretty much as for regular tables. In particular there are no problems with dependencies. The obvious disadvantage is that it requires changes to applications. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Pavel Stehule
		    Дата:
		        2016-08-18 16:33 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
On 08/15/2016 12:23 PM, Pavel Stehule wrote:
2016-08-15 12:18 GMT+02:00 Aleksander Alekseev
<a.alekseev@postgrespro.ru <mailto:a.alekseev@postgrespro.ru>>: 
> The global temporary tables has persistent rows in the catalogue. The
> mapping to files can be marked as special and real mapping should be
> only in memory.
>
> So the changes in catalogue related to global temporary tables are
> pretty less frequently.
I'm afraid I still don't get it. Let say I have an application that
does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
time. Naturally there is not only one temp table per session. Could you
explain please in more detail how exactly do these persistent rows help?
when you use global temporary tables, then you create it only once -
like usual tables.
you don't drop these tables.
I share the view that this is a better/simpler solution to the problem. It will still require virtual (in-memory) tuples for pg_statistic records, but everything else works pretty much as for regular tables. In particular there are no problems with dependencies.
The obvious disadvantage is that it requires changes to applications.
sure - as plpgsql developer I can say, the global temp tables are much more friendly - so rewriting in application is enjoy work. 
Regards
Pavel
 
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Robert Haas
		    Дата:
		        On Tue, Aug 16, 2016 at 8:03 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 8/16/16 11:59 AM, Robert Haas wrote: > ... >> >> That doesn't really solve the problem, because OTHER backends won't be >> able to see them. So, if I create a fast temporary table in one >> session that depends on a permanent object, some other session can >> drop the permanent object. If there were REAL catalog entries, that >> wouldn't work, because the other session would see the dependency. > > Some discussion about TEMP functions is happening on -general right now, and > there's other things where temp objects are good to have, so it'd be nice to > have a more generic fix for this stuff. Is the idea of "partitioning" the > catalogs to store temp objects separate from permanent fatally flawed? I wouldn't say it's fatally flawed. But you might need a world-renowned team of physicians working round the clock for days in a class 1 trauma center to save it. If you imagine that you have a permanent pg_class which holds permanent tables and a temporary pg_class per-backend which stores temporary tables, then you very quickly end up with the same deadly flaw as in Aleksander's design: other backends cannot see all of the dependency entries and can drop things that they shouldn't be permitted to drop. However, you could have a permanent pg_class which holds the records for permanent tables and an *unlogged* table, say pg_class_unlogged, which holds records for temporary tables. Now everybody can see everybody else's data, yet we don't have to create permanent catalog entries. So we are not dead. All of the temporary catalog tables vanish on a crash, too, and in a very clean way, which is great. However: 1. The number of tables for which we would need to add a duplicate, unlogged table is formidable. You need pg_attribute, pg_attrdef, pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc. And the backend changes needed so that we used the unlogged copy for temp tables and the permanent copy for regular tables is probably really large. 2. You can't write to unlogged tables on standby servers, so this doesn't help solve the problem of wanting to use temporary tables on standbys. 3. While it makes creating temporary tables a lighter-weight operation, because you no longer need to write WAL for the catalog entries, there's probably still substantially more overhead than just stuffing them in backend-local RAM. So the performance benefits are probably fairly modest. Overall I feel like the development effort that it would take to make this work would almost certainly be better-expended elsewhere. But of course I'm not in charge of how people who work for other companies spend their time... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Alvaro Herrera
		    Дата:
		        Robert Haas wrote: > However: > > 1. The number of tables for which we would need to add a duplicate, > unlogged table is formidable. You need pg_attribute, pg_attrdef, > pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc. > And the backend changes needed so that we used the unlogged copy for > temp tables and the permanent copy for regular tables is probably > really large. Check. This is the most serious issue, IMV. > 2. You can't write to unlogged tables on standby servers, so this > doesn't help solve the problem of wanting to use temporary tables on > standbys. Check. We could think about relaxing this restriction, which would enable the feature to satisfy that use case. (I think the main complication there is the init fork of btrees on those catalogs; other relations could just be truncated to empty on restart.) > 3. While it makes creating temporary tables a lighter-weight > operation, because you no longer need to write WAL for the catalog > entries, there's probably still substantially more overhead than just > stuffing them in backend-local RAM. So the performance benefits are > probably fairly modest. You also save catalog bloat ... These benefits may not be tremendous, but I think they may be good enough for many users. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Andres Freund
		    Дата:
		        On 2016-08-22 17:50:11 -0300, Alvaro Herrera wrote: > > 2. You can't write to unlogged tables on standby servers, so this > > doesn't help solve the problem of wanting to use temporary tables on > > standbys. > > Check. We could think about relaxing this restriction, which would > enable the feature to satisfy that use case. (I think the main > complication there is the init fork of btrees on those catalogs; other > relations could just be truncated to empty on restart.) Isn't the main complication that visibility currently requires xids to be assigned?
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Aleksander Alekseev
		    Дата:
		        Thank you everyone for great comments! > have a permanent pg_class which holds the records for permanent tables > and an *unlogged* table, say pg_class_unlogged, which holds records > for temporary tables. Now everybody can see everybody else's data, > yet we don't have to create permanent catalog entries. So we are not > dead. All of the temporary catalog tables vanish on a crash, too, and > in a very clean way, which is great. > > [...] > > Overall I feel like the development effort that it would take to make > this work would almost certainly be better-expended elsewhere. Agree. This is an interesting idea but considering named drawbacks, especially: > 2. You can't write to unlogged tables on standby servers, so this > doesn't help solve the problem of wanting to use temporary tables on > standbys. ... I don't think it's worth an effort. >> when you use global temporary tables, then you create it only once - >> like usual tables. >> >> you don't drop these tables. > > I share the view that this is a better/simpler solution to the problem. > It will still require virtual (in-memory) tuples for pg_statistic > records, but everything else works pretty much as for regular tables. In > particular there are no problems with dependencies. > > The obvious disadvantage is that it requires changes to applications. Frankly I have much more faith in Tom's idea of using virtual part of the catalog for all temporary tables, i.e turning all temporary tables into "fast" temporary tables. Instead of introducing a new type of temporary tables that solve catalog bloating problem and forcing users to rewrite applications why not just not to create a problem in a first place? > I think one way to fix that would be to store the virtual tuples in > shared memory (instead of process memory). That will certainly require > locking and synchronization, but well - it needs to be shared. I believe currently this is the most promising course of action. In first implementation we could just place all virtual part of the catalog in a shared memory and protect it with a single lock. If it will work as expected the next step would be elimination of bottlenecks --- using multiple locks, moving part of a virtual catalog to local backend's memory, etc. As always, please don't hesitate to share any thoughts on this topic! -- Best regards, Aleksander Alekseev
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 08/22/2016 10:32 PM, Robert Haas wrote: > > ... > > 1. The number of tables for which we would need to add a duplicate, > unlogged table is formidable. You need pg_attribute, pg_attrdef, > pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc. > And the backend changes needed so that we used the unlogged copy for > temp tables and the permanent copy for regular tables is probably > really large. > > 2. You can't write to unlogged tables on standby servers, so this > doesn't help solve the problem of wanting to use temporary tables on > standbys. > > 3. While it makes creating temporary tables a lighter-weight > operation, because you no longer need to write WAL for the catalog > entries, there's probably still substantially more overhead than just > stuffing them in backend-local RAM. So the performance benefits are > probably fairly modest. > > Overall I feel like the development effort that it would take to make > this work would almost certainly be better-expended elsewhere. But of > course I'm not in charge of how people who work for other companies > spend their time... > Could someone please explain how the unlogged tables are supposed to fix the catalog bloat problem, as stated in the initial patch submission? We'd still need to insert/delete the catalog rows when creating/dropping the temporary tables, causing the bloat. Or is there something I'm missing? -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Claudio Freire
		    Дата:
		        On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 08/22/2016 10:32 PM, Robert Haas wrote: >> >> >> ... >> >> 1. The number of tables for which we would need to add a duplicate, >> unlogged table is formidable. You need pg_attribute, pg_attrdef, >> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc. >> And the backend changes needed so that we used the unlogged copy for >> temp tables and the permanent copy for regular tables is probably >> really large. >> >> 2. You can't write to unlogged tables on standby servers, so this >> doesn't help solve the problem of wanting to use temporary tables on >> standbys. >> >> 3. While it makes creating temporary tables a lighter-weight >> operation, because you no longer need to write WAL for the catalog >> entries, there's probably still substantially more overhead than just >> stuffing them in backend-local RAM. So the performance benefits are >> probably fairly modest. >> >> Overall I feel like the development effort that it would take to make >> this work would almost certainly be better-expended elsewhere. But of >> course I'm not in charge of how people who work for other companies >> spend their time... >> > > Could someone please explain how the unlogged tables are supposed to fix the > catalog bloat problem, as stated in the initial patch submission? We'd still > need to insert/delete the catalog rows when creating/dropping the temporary > tables, causing the bloat. Or is there something I'm missing? Wouldn't more aggressive vacuuming of catalog tables fix the bloat? Perhaps reserving a worker or N to run only on catalog schemas? That'd be far simpler.
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Andres Freund
		    Дата:
		        On 2016-08-23 19:18:04 -0300, Claudio Freire wrote: > On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > Could someone please explain how the unlogged tables are supposed to fix the > > catalog bloat problem, as stated in the initial patch submission? We'd still > > need to insert/delete the catalog rows when creating/dropping the temporary > > tables, causing the bloat. Or is there something I'm missing? Beats me. > Wouldn't more aggressive vacuuming of catalog tables fix the bloat? Not really in my experience, at least not without more drastic vacuum changes. The issue is that if you have a single "long running" transaction (in some workloads that can even just be a 3 min taking query/xact), nothing will be cleaned up during that time. If you have a few hundred temp tables created per sec, you'll be in trouble even then. Not to speak of the case where you have queries taking hours (say a backup). Andres
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 08/24/2016 12:18 AM, Claudio Freire wrote: > On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> On 08/22/2016 10:32 PM, Robert Haas wrote: >>> >>> >>> ... >>> >>> 1. The number of tables for which we would need to add a duplicate, >>> unlogged table is formidable. You need pg_attribute, pg_attrdef, >>> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc. >>> And the backend changes needed so that we used the unlogged copy for >>> temp tables and the permanent copy for regular tables is probably >>> really large. >>> >>> 2. You can't write to unlogged tables on standby servers, so this >>> doesn't help solve the problem of wanting to use temporary tables on >>> standbys. >>> >>> 3. While it makes creating temporary tables a lighter-weight >>> operation, because you no longer need to write WAL for the catalog >>> entries, there's probably still substantially more overhead than just >>> stuffing them in backend-local RAM. So the performance benefits are >>> probably fairly modest. >>> >>> Overall I feel like the development effort that it would take to make >>> this work would almost certainly be better-expended elsewhere. But of >>> course I'm not in charge of how people who work for other companies >>> spend their time... >>> >> >> Could someone please explain how the unlogged tables are supposed to fix the >> catalog bloat problem, as stated in the initial patch submission? We'd still >> need to insert/delete the catalog rows when creating/dropping the temporary >> tables, causing the bloat. Or is there something I'm missing? > > Wouldn't more aggressive vacuuming of catalog tables fix the bloat? > > Perhaps reserving a worker or N to run only on catalog schemas? > > That'd be far simpler. Maybe, although IIRC the issues with catalog bloat were due to a combination of long queries and many temporary tables being created/dropped. In that case simply ramping up autovacuum (or even having a dedicated workers for catalogs) would not realy help due to the xmin horizon being blocked by the long-running queries. Maybe it's entirely crazy idea due to the wine I drank at the dinner, but couldn't we vacuum the temporary table records differently? For example, couldn't we just consider them removable as soon as the backend that owns them disappears? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Claudio Freire
		    Дата:
		        On Tue, Aug 23, 2016 at 7:20 PM, Andres Freund <andres@anarazel.de> wrote: >> Wouldn't more aggressive vacuuming of catalog tables fix the bloat? > > Not really in my experience, at least not without more drastic vacuum > changes. The issue is that if you have a single "long running" > transaction (in some workloads that can even just be a 3 min taking > query/xact), nothing will be cleaned up during that time. If you have a > few hundred temp tables created per sec, you'll be in trouble even > then. Not to speak of the case where you have queries taking hours (say > a backup). Well, my experience isn't as extreme as that (just a few dozen temp tables per minute), but when I see bloat in catalog tables it's because all autovacuum workers are stuck vacuuming huge tables for huge periods of time (hours or days). So that's certainly another bloat case to consider.
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Andres Freund
		    Дата:
		        On 2016-08-23 19:33:33 -0300, Claudio Freire wrote: > On Tue, Aug 23, 2016 at 7:20 PM, Andres Freund <andres@anarazel.de> wrote: > >> Wouldn't more aggressive vacuuming of catalog tables fix the bloat? > > > > Not really in my experience, at least not without more drastic vacuum > > changes. The issue is that if you have a single "long running" > > transaction (in some workloads that can even just be a 3 min taking > > query/xact), nothing will be cleaned up during that time. If you have a > > few hundred temp tables created per sec, you'll be in trouble even > > then. Not to speak of the case where you have queries taking hours (say > > a backup). > > Well, my experience isn't as extreme as that (just a few dozen temp > tables per minute), but when I see bloat in catalog tables it's > because all autovacuum workers are stuck vacuuming huge tables for > huge periods of time (hours or days). Well, that's because our defaults are batshit stupid.
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Claudio Freire
		    Дата:
		        On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >>> Could someone please explain how the unlogged tables are supposed to fix >>> the >>> catalog bloat problem, as stated in the initial patch submission? We'd >>> still >>> need to insert/delete the catalog rows when creating/dropping the >>> temporary >>> tables, causing the bloat. Or is there something I'm missing? >> >> >> Wouldn't more aggressive vacuuming of catalog tables fix the bloat? >> >> Perhaps reserving a worker or N to run only on catalog schemas? >> >> That'd be far simpler. > > > Maybe, although IIRC the issues with catalog bloat were due to a combination > of long queries and many temporary tables being created/dropped. In that > case simply ramping up autovacuum (or even having a dedicated workers for > catalogs) would not realy help due to the xmin horizon being blocked by the > long-running queries. > > Maybe it's entirely crazy idea due to the wine I drank at the dinner, but > couldn't we vacuum the temporary table records differently? For example, > couldn't we just consider them removable as soon as the backend that owns > them disappears? Or perhaps go all the way and generalize that to rows that never become visible outside their parent transaction. As in, delete of rows created by the deleting transaction could clean up, carefully to avoid voiding indexes and all that, but more aggressively than regular deletes.
On Tue, Aug 23, 2016 at 4:15 PM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: > Frankly I have much more faith in Tom's idea of using virtual part of the > catalog for all temporary tables, i.e turning all temporary tables into > "fast" temporary tables. Instead of introducing a new type of temporary tables > that solve catalog bloating problem and forcing users to rewrite applications > why not just not to create a problem in a first place? Would applications really need to be rewritten? Are they really constructing temporary tables where the definition of the table is dynamic, not just the content? I think application authors would be pretty happy to not need to keep recreating the same tables over and over again and dealing with DDL in their run-time code. It's not really rewriting an application to just remove that DDL and move it to the one-time database schema creation. I think it's clear we got the idea of temporary tables wrong when we implemented them and the SQL standard is more useful. It's not just some implementation artifact that it's possible to implement them in an efficient way. It's a fundamental design change and experience shows that separating DDL and making it static while the DML is dynamic is just a better design. -- greg
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 08/24/2016 12:38 AM, Claudio Freire wrote: > On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >>>> Could someone please explain how the unlogged tables are supposed to fix >>>> the >>>> catalog bloat problem, as stated in the initial patch submission? We'd >>>> still >>>> need to insert/delete the catalog rows when creating/dropping the >>>> temporary >>>> tables, causing the bloat. Or is there something I'm missing? >>> >>> >>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat? >>> >>> Perhaps reserving a worker or N to run only on catalog schemas? >>> >>> That'd be far simpler. >> >> >> Maybe, although IIRC the issues with catalog bloat were due to a combination >> of long queries and many temporary tables being created/dropped. In that >> case simply ramping up autovacuum (or even having a dedicated workers for >> catalogs) would not realy help due to the xmin horizon being blocked by the >> long-running queries. >> >> Maybe it's entirely crazy idea due to the wine I drank at the dinner, but >> couldn't we vacuum the temporary table records differently? For example, >> couldn't we just consider them removable as soon as the backend that owns >> them disappears? > > Or perhaps go all the way and generalize that to rows that never > become visible outside their parent transaction. > > As in, delete of rows created by the deleting transaction could clean > up, carefully to avoid voiding indexes and all that, but more > aggressively than regular deletes. > Maybe, but I wouldn't be surprised if such generalization would be an order of magnitude more complicated - and even the vacuuming changes I mentioned are undoubtedly a fair amount of work. Sadly, I don't see how this might fix the other issues mentioned in this thread (e.g. impossibility to create temp tables on standbys), regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Claudio Freire
		    Дата:
		        On Tue, Aug 23, 2016 at 8:50 PM, Greg Stark <stark@mit.edu> wrote: > On Tue, Aug 23, 2016 at 4:15 PM, Aleksander Alekseev > <a.alekseev@postgrespro.ru> wrote: >> Frankly I have much more faith in Tom's idea of using virtual part of the >> catalog for all temporary tables, i.e turning all temporary tables into >> "fast" temporary tables. Instead of introducing a new type of temporary tables >> that solve catalog bloating problem and forcing users to rewrite applications >> why not just not to create a problem in a first place? > > Would applications really need to be rewritten? Are they really > constructing temporary tables where the definition of the table is > dynamic, not just the content? Mine is. But it wouldn't be a big deal to adapt.
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Claudio Freire
		    Дата:
		        On Tue, Aug 23, 2016 at 9:12 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 08/24/2016 12:38 AM, Claudio Freire wrote: >> >> On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra >> <tomas.vondra@2ndquadrant.com> wrote: >>>>> >>>>> Could someone please explain how the unlogged tables are supposed to >>>>> fix >>>>> the >>>>> catalog bloat problem, as stated in the initial patch submission? We'd >>>>> still >>>>> need to insert/delete the catalog rows when creating/dropping the >>>>> temporary >>>>> tables, causing the bloat. Or is there something I'm missing? >>>> >>>> >>>> >>>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat? >>>> >>>> Perhaps reserving a worker or N to run only on catalog schemas? >>>> >>>> That'd be far simpler. >>> >>> >>> >>> Maybe, although IIRC the issues with catalog bloat were due to a >>> combination >>> of long queries and many temporary tables being created/dropped. In that >>> case simply ramping up autovacuum (or even having a dedicated workers for >>> catalogs) would not realy help due to the xmin horizon being blocked by >>> the >>> long-running queries. >>> >>> Maybe it's entirely crazy idea due to the wine I drank at the dinner, but >>> couldn't we vacuum the temporary table records differently? For example, >>> couldn't we just consider them removable as soon as the backend that owns >>> them disappears? >> >> >> Or perhaps go all the way and generalize that to rows that never >> become visible outside their parent transaction. >> >> As in, delete of rows created by the deleting transaction could clean >> up, carefully to avoid voiding indexes and all that, but more >> aggressively than regular deletes. >> > > Maybe, but I wouldn't be surprised if such generalization would be an order > of magnitude more complicated - and even the vacuuming changes I mentioned > are undoubtedly a fair amount of work. After looking at it from a birdseye view, I agree it's conceptually complex (reading HeapTupleSatisfiesSelf already makes one dizzy). But other than that, the implementation seems rather simple. It seems to me, if one figures out that it is safe to do so (a-priori, xmin not committed, xmax is current transaction), it would simply be a matter of chasing the HOT chain root, setting all LP except the first to LP_UNUSED and the first one to LP_DEAD. Of course I may be missing a ton of stuff. > Sadly, I don't see how this might fix the other issues mentioned in this > thread (e.g. impossibility to create temp tables on standbys), No it doesn't :(
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Alvaro Herrera
		    Дата:
		        Claudio Freire wrote: > After looking at it from a birdseye view, I agree it's conceptually > complex (reading HeapTupleSatisfiesSelf already makes one dizzy). > > But other than that, the implementation seems rather simple. It seems > to me, if one figures out that it is safe to do so (a-priori, xmin not > committed, xmax is current transaction), it would simply be a matter > of chasing the HOT chain root, setting all LP except the first to > LP_UNUSED and the first one to LP_DEAD. > > Of course I may be missing a ton of stuff. What you seem to be missing is that rows corresponding to temp tables are not "visible to its own transaction only". The rows are valid after the transaction is gone; what makes the tables temporary is the fact that they are in a temporary schema. And what makes them invisible to one backend is the fact that they are in the temporary schema for another backend. Not that they are uncommitted. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Claudio Freire
		    Дата:
		        On Wed, Aug 24, 2016 at 2:04 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Claudio Freire wrote: > >> After looking at it from a birdseye view, I agree it's conceptually >> complex (reading HeapTupleSatisfiesSelf already makes one dizzy). >> >> But other than that, the implementation seems rather simple. It seems >> to me, if one figures out that it is safe to do so (a-priori, xmin not >> committed, xmax is current transaction), it would simply be a matter >> of chasing the HOT chain root, setting all LP except the first to >> LP_UNUSED and the first one to LP_DEAD. >> >> Of course I may be missing a ton of stuff. > > What you seem to be missing is that rows corresponding to temp tables > are not "visible to its own transaction only". The rows are valid > after the transaction is gone; what makes the tables temporary is the > fact that they are in a temporary schema. And what makes them invisible > to one backend is the fact that they are in the temporary schema for > another backend. Not that they are uncommitted. Yeah, I was thinking of "on commit drop" behavior, but granted there's all the others.
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Robert Haas
		    Дата:
		        On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Could someone please explain how the unlogged tables are supposed to fix the > catalog bloat problem, as stated in the initial patch submission? We'd still > need to insert/delete the catalog rows when creating/dropping the temporary > tables, causing the bloat. Or is there something I'm missing? No, not really. Jim just asked if the idea of partitioning the columns was completely dead in the water, and I said, no, you could theoretically salvage it. Whether that does you much good is another question. IMV, the point here is that you MUST have globally visible dependency entries for this to work sanely. If they're not in a catalog, they have to be someplace else, and backend-private memory isn't good enough, because that's not globally visible. Until we've got a strategy for that problem, this whole effort is going nowhere - even though in other respects it may be a terrific idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 08/24/2016 12:20 AM, Andres Freund wrote: > On 2016-08-23 19:18:04 -0300, Claudio Freire wrote: >> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra >> <tomas.vondra@2ndquadrant.com> wrote: >>> Could someone please explain how the unlogged tables are supposed to fix the >>> catalog bloat problem, as stated in the initial patch submission? We'd still >>> need to insert/delete the catalog rows when creating/dropping the temporary >>> tables, causing the bloat. Or is there something I'm missing? > > Beats me. > Are you puzzled just like me, or are you puzzled why I'm puzzled? -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Andres Freund
		    Дата:
		        On August 24, 2016 9:32:48 AM PDT, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > >On 08/24/2016 12:20 AM, Andres Freund wrote: >> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote: >>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra >>> <tomas.vondra@2ndquadrant.com> wrote: >>>> Could someone please explain how the unlogged tables are supposed >to fix the >>>> catalog bloat problem, as stated in the initial patch submission? >We'd still >>>> need to insert/delete the catalog rows when creating/dropping the >temporary >>>> tables, causing the bloat. Or is there something I'm missing? >> >> Beats me. >> > >Are you puzzled just like me, or are you puzzled why I'm puzzled? Like you. I don't think this addresses the problem to a significant enough degree to care. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Corey Huinker
		    Дата:
		        <div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Aug 24, 2016 at 12:39 PM, Andres Freund <span dir="ltr"><<ahref="mailto:andres@anarazel.de" target="_blank">andres@anarazel.de</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br/><br /> On August 24, 2016 9:32:48 AM PDT, Tomas Vondra <<a href="mailto:tomas.vondra@2ndquadrant.com">tomas.vondra@2ndquadrant.com</a>>wrote:<br /> ><br /> ><br /> >On08/24/2016 12:20 AM, Andres Freund wrote:<br /> >> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:<br />>>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra<br /> >>> <<a href="mailto:tomas.vondra@2ndquadrant.com">tomas.vondra@2ndquadrant.com</a>>wrote:<br /> >>>> Could someoneplease explain how the unlogged tables are supposed<br /> >to fix the<br /> >>>> catalog bloat problem,as stated in the initial patch submission?<br /> >We'd still<br /> >>>> need to insert/delete thecatalog rows when creating/dropping the<br /> >temporary<br /> >>>> tables, causing the bloat. Or is theresomething I'm missing?<br /> >><br /> >> Beats me.<br /> >><br /> ><br /> >Are you puzzled justlike me, or are you puzzled why I'm puzzled?<br /><br /></span>Like you. I don't think this addresses the problem toa significant enough degree to care.<br /><span class="HOEnZb"><font color="#888888"><br /> Andres<br /> --<br /> Sentfrom my Android device with K-9 Mail. Please excuse my brevity.<br /></font></span><div class="HOEnZb"><div class="h5"><br/><br /> --<br /> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" rel="noreferrer" target="_blank">http://www.postgresql.org/<wbr/>mailpref/pgsql-hackers</a><br /></div></div></blockquote></div><br /></div><divclass="gmail_extra">Ok, here's a wild idea, and it probably depends on having native partitioning implemented.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Propagate relpersistence, or a boolean flagon (relpersistence = 't') from pg_class into the child pg_attribute records.</div><div class="gmail_extra"><br /></div><divclass="gmail_extra">Partition the tables pg_class and pg_attribute first by relpersistence, and then by oid.<br/><br />The partitions holding data on persistent objects would basically stay as-is, but the partition wouldn't havemuch activity and no temp-table churn.<br /><br />The temporary ones, however, would fall into essentially a rotatingset of partitions. Pick enough partitions such that the active transactions only cover some of the partitions. Therest can be safely truncated by vacuum.<br /><br />It would mitigate the bloat, existing dictionary queries would stillwork, but the additional lookup cost might not be worth it.</div></div>
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Vik Fearing
		    Дата:
		        On 08/24/2016 06:16 PM, Robert Haas wrote: > On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> Could someone please explain how the unlogged tables are supposed to fix the >> catalog bloat problem, as stated in the initial patch submission? We'd still >> need to insert/delete the catalog rows when creating/dropping the temporary >> tables, causing the bloat. Or is there something I'm missing? > > No, not really. Jim just asked if the idea of partitioning the > columns was completely dead in the water, and I said, no, you could > theoretically salvage it. Whether that does you much good is another > question. > > IMV, the point here is that you MUST have globally visible dependency > entries for this to work sanely. If they're not in a catalog, they > have to be someplace else, and backend-private memory isn't good > enough, because that's not globally visible. Until we've got a > strategy for that problem, this whole effort is going nowhere - even > though in other respects it may be a terrific idea. Why not just have a regular-looking table, with a "global temporary" relpersistence (I don't care which letter it gets) and when a backend tries to access it, it uses its own private relfilenode instead of whatever is in pg_class, creating one if necessary. That way the structure of the table is fixed, with all the dependencies and whatnot, but the content is private to each backend. What's wrong with this idea? -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 08/31/2016 09:20 PM, Vik Fearing wrote: > On 08/24/2016 06:16 PM, Robert Haas wrote: >> On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra >> <tomas.vondra@2ndquadrant.com> wrote: >>> Could someone please explain how the unlogged tables are supposed to fix the >>> catalog bloat problem, as stated in the initial patch submission? We'd still >>> need to insert/delete the catalog rows when creating/dropping the temporary >>> tables, causing the bloat. Or is there something I'm missing? >> >> No, not really. Jim just asked if the idea of partitioning the >> columns was completely dead in the water, and I said, no, you could >> theoretically salvage it. Whether that does you much good is another >> question. >> >> IMV, the point here is that you MUST have globally visible dependency >> entries for this to work sanely. If they're not in a catalog, they >> have to be someplace else, and backend-private memory isn't good >> enough, because that's not globally visible. Until we've got a >> strategy for that problem, this whole effort is going nowhere - even >> though in other respects it may be a terrific idea. > > Why not just have a regular-looking table, with a "global temporary" > relpersistence (I don't care which letter it gets) and when a backend > tries to access it, it uses its own private relfilenode instead of > whatever is in pg_class, creating one if necessary. That way the > structure of the table is fixed, with all the dependencies and whatnot, > but the content is private to each backend. What's wrong with this idea? > It's an improvement (and it's pretty much exactly what I proposed upthread). But it does not solve the problems with pg_statistic for example (each backend needs it's own statistics. So we'd either bloat the pg_statistic (if we manage to solve the problem that the table has the same oid in all backends), or we would need in-memory tuples (just like discussed in the thread so far). -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Andres Freund
		    Дата:
		        On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote: > It's an improvement (and it's pretty much exactly what I proposed > upthread). But it does not solve the problems with pg_statistic for > example (each backend needs it's own statistics. So we'd either bloat > the pg_statistic (if we manage to solve the problem that the table has > the same oid in all backends), or we would need in-memory tuples (just > like discussed in the thread so far). Creating a session private version of pg_statistic would be fairly simple.
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Tomas Vondra
		    Дата:
		        On 08/31/2016 11:43 PM, Andres Freund wrote: > On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote: >> It's an improvement (and it's pretty much exactly what I proposed >> upthread). But it does not solve the problems with pg_statistic for >> example (each backend needs it's own statistics. So we'd either bloat >> the pg_statistic (if we manage to solve the problem that the table has >> the same oid in all backends), or we would need in-memory tuples (just >> like discussed in the thread so far). > > Creating a session private version of pg_statistic would be fairly > simple. Sure. I'm just saying it's not as simple as overriding relpath. ISTM we only need the pg_statistics (as other catalogs are connected to the pg_class entry), which does not have the dependency issues. Or do we need other catalogs? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Andres Freund
		    Дата:
		        On August 31, 2016 3:00:15 PM PDT, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > >On 08/31/2016 11:43 PM, Andres Freund wrote: >> On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote: >>> It's an improvement (and it's pretty much exactly what I proposed >>> upthread). But it does not solve the problems with pg_statistic for >>> example (each backend needs it's own statistics. So we'd either >bloat >>> the pg_statistic (if we manage to solve the problem that the table >has >>> the same oid in all backends), or we would need in-memory tuples >(just >>> like discussed in the thread so far). >> >> Creating a session private version of pg_statistic would be fairly >> simple. > >Sure. I'm just saying it's not as simple as overriding relpath. > >ISTM we only need the pg_statistics (as other catalogs are connected to >the pg_class entry), which does not have the dependency issues. Or do >we >need other catalogs? In my experience pg attribute is usually the worst affected. Many tech takes won't even have stays entries... Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
От
 
		    	Corey Huinker
		    Дата:
		        On Wed, Aug 31, 2016 at 6:07 PM, Andres Freund <andres@anarazel.de> wrote:
Mine too. One database currently has a 400GB pg_attribute table, because we chew through temp tables like popcorn.
In my experience pg attribute is usually the worst affected. Many tech takes won't even have stays entries...
Mine too. One database currently has a 400GB pg_attribute table, because we chew through temp tables like popcorn.