Обсуждение: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

Поиск
Список
Период
Сортировка

[Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Aleksander Alekseev
Дата:
Hello

Some time ago we discussed an idea of "fast temporary tables":

https://www.postgresql.org/message-id/20160301182500.2c81c3dc%40fujitsu

In two words the idea is following.

<The Idea>

PostgreSQL stores information about all relations in pg_catalog. Some
applications create and delete a lot of temporary tables. It causes a
bloating of pg_catalog and running auto vacuum on it. It's quite an
expensive operation which affects entire database performance.

We could introduce a new type of temporary tables. Information about
these tables is stored not in a catalog but in backend's memory. This
way user can solve a pg_catalog bloating problem and improve overall
database performance.

</The Idea>

I took me a few months but eventually I made it work. Attached patch
has some flaws. I decided not to invest a lot of time in documenting
it or pgindent'ing all files yet. In my experience it will be rewritten
entirely 3 or 4 times before merging anyway :) But it _works_ and
passes all tests I could think of, including non-trivial cases like
index-only or bitmap scans of catalog tables.

Usage example:

```
CREATE FAST TEMP TABLE fasttab_test1(x int, s text);

INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');

UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;

DELETE FROM fasttab_test1 WHERE x = 3;

SELECT * FROM fasttab_test1 ORDER BY x;

DROP TABLE fasttab_test1;
```

More sophisticated examples could be find in regression tests:

./src/test/regress/sql/fast_temp.sql

Any feedback on this patch will be much appreciated!

--
Best regards,
Aleksander Alekseev

Вложения

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:
Hi,

On 07/29/2016 01:15 PM, Aleksander Alekseev wrote:
> Hello
>
> Some time ago we discussed an idea of "fast temporary tables":
>
> https://www.postgresql.org/message-id/20160301182500.2c81c3dc%40fujitsu
>
> In two words the idea is following.
>
> <The Idea>
>
> PostgreSQL stores information about all relations in pg_catalog. Some
> applications create and delete a lot of temporary tables. It causes a
> bloating of pg_catalog and running auto vacuum on it. It's quite an
> expensive operation which affects entire database performance.
>
> We could introduce a new type of temporary tables. Information about
> these tables is stored not in a catalog but in backend's memory. This
> way user can solve a pg_catalog bloating problem and improve overall
> database performance.
>
> </The Idea>

Great! Thanks for the patch, this is definitely an annoying issue worth 
fixing. I've spent a bit of time looking at the patch today, comments 
below ...

>
> I took me a few months but eventually I made it work. Attached patch
> has some flaws. I decided not to invest a lot of time in documenting
> it or pgindent'ing all files yet. In my experience it will be rewritten
> entirely 3 or 4 times before merging anyway :) But it _works_ and
> passes all tests I could think of, including non-trivial cases like
> index-only or bitmap scans of catalog tables.
>

Well, jokes aside, that's a pretty lousy excuse for not writing any 
docs, and you're pretty much asking the reviewers to reverse-engineer 
your reasoning. So I doubt you'll get many serious reviews without 
fixing this gap.

> Usage example:
>
> ```
> CREATE FAST TEMP TABLE fasttab_test1(x int, s text);
>
> INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');
>
> UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;
>
> DELETE FROM fasttab_test1 WHERE x = 3;
>
> SELECT * FROM fasttab_test1 ORDER BY x;
>
> DROP TABLE fasttab_test1;
> ```
>
> More sophisticated examples could be find in regression tests:
>
> ./src/test/regress/sql/fast_temp.sql
>
> Any feedback on this patch will be much appreciated!
>
>

1) I wonder whether the FAST makes sense - does this really change the 
performance significantly? IMHO you only move the catalog rows to 
memory, so why should the tables be any faster? I also believe this 
conflicts with SQL standard specification of CREATE TABLE.

2) Why do we need the new relpersistence value? ISTM we could easily got 
with just RELPERSISTENCE_TEMP, which would got right away of many 
chances as the steps are exactly the same.

IMHO if this patch gets in, we should use it as the only temp table 
implementation (Or can you think of cases where keeping rows in pg_class 
has advantages?). That'd also eliminate the need for FAST keyword in the 
CREATE TABLE command.

The one thin I'm not sure about is that our handling of temporary tables 
is not standard compliant - we require each session to create it's own 
private temporary table. Moving the rows from pg_class into backend 
memory seems to go in the opposite direction, but as no one was planning 
to fix this, I don't think it matters much.

3) I think the heapam/indexam/xact and various other places needs a 
major rework. You've mostly randomly sprinkled the code with function 
calls to make the patch work - that's fine for an initial version, but a 
more principled approach is needed.

4) I'm getting failures in the regression suite - apparently the patch 
somehow affects costing of index only scans, so that a several queries 
switch from index only scans to bitmap index scans etc. I haven't 
investigated this more closely, but it seems quite consistent (and I 
don't see it without the patch). It seems the patch delays building of 
visibility map, or something like that.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Pavel Stehule
Дата:


2016-07-30 1:46 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
Hi,

On 07/29/2016 01:15 PM, Aleksander Alekseev wrote:
Hello

Some time ago we discussed an idea of "fast temporary tables":

https://www.postgresql.org/message-id/20160301182500.2c81c3dc%40fujitsu

In two words the idea is following.

<The Idea>

PostgreSQL stores information about all relations in pg_catalog. Some
applications create and delete a lot of temporary tables. It causes a
bloating of pg_catalog and running auto vacuum on it. It's quite an
expensive operation which affects entire database performance.

We could introduce a new type of temporary tables. Information about
these tables is stored not in a catalog but in backend's memory. This
way user can solve a pg_catalog bloating problem and improve overall
database performance.

</The Idea>

Great! Thanks for the patch, this is definitely an annoying issue worth fixing. I've spent a bit of time looking at the patch today, comments below ...

Yes, it some what we need long time


I took me a few months but eventually I made it work. Attached patch
has some flaws. I decided not to invest a lot of time in documenting
it or pgindent'ing all files yet. In my experience it will be rewritten
entirely 3 or 4 times before merging anyway :) But it _works_ and
passes all tests I could think of, including non-trivial cases like
index-only or bitmap scans of catalog tables.


Well, jokes aside, that's a pretty lousy excuse for not writing any docs, and you're pretty much asking the reviewers to reverse-engineer your reasoning. So I doubt you'll get many serious reviews without fixing this gap.

Usage example:

```
CREATE FAST TEMP TABLE fasttab_test1(x int, s text);

INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');

UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;

DELETE FROM fasttab_test1 WHERE x = 3;

SELECT * FROM fasttab_test1 ORDER BY x;

DROP TABLE fasttab_test1;
```

More sophisticated examples could be find in regression tests:

./src/test/regress/sql/fast_temp.sql

Any feedback on this patch will be much appreciated!



1) I wonder whether the FAST makes sense - does this really change the performance significantly? IMHO you only move the catalog rows to memory, so why should the tables be any faster? I also believe this conflicts with SQL standard specification of CREATE TABLE.

Probably has zero value to have slow and fast temp tables (from catalogue cost perspective). So the FAST implementation should be used everywhere. But there are some patterns used with work with temp tables,that should not working, and we would to decide if we prepare workaround or not.

-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
  CREATE TEMP TABLE xxx()
ELSE
  TRUNCATE TABLE xxx;
END IF;

-- modern patter (new code)
BEGIN
  TRUNCATE TABLE xxx;
EXCEPTION WHEN ..... THEN
  CREATE TEMP TABLE(...)
END;

In this case we can use GUC, because visible behave should be same.

The benefit of zero catalogue cost temp tables is significant - and for some larger applications the temp tables did hard performance issues.
 

2) Why do we need the new relpersistence value? ISTM we could easily got with just RELPERSISTENCE_TEMP, which would got right away of many chances as the steps are exactly the same.

IMHO if this patch gets in, we should use it as the only temp table implementation (Or can you think of cases where keeping rows in pg_class has advantages?). That'd also eliminate the need for FAST keyword in the CREATE TABLE command.

The one thin I'm not sure about is that our handling of temporary tables is not standard compliant - we require each session to create it's own private temporary table. Moving the rows from pg_class into backend memory seems to go in the opposite direction, but as no one was planning to fix this, I don't think it matters much.

3) I think the heapam/indexam/xact and various other places needs a major rework. You've mostly randomly sprinkled the code with function calls to make the patch work - that's fine for an initial version, but a more principled approach is needed.

4) I'm getting failures in the regression suite - apparently the patch somehow affects costing of index only scans, so that a several queries switch from index only scans to bitmap index scans etc. I haven't investigated this more closely, but it seems quite consistent (and I don't see it without the patch). It seems the patch delays building of visibility map, or something like that.

Some other random notes:

1. With this code should not be hard to implement global temp tables - shared persistent structure, temp local data - significant help for any who have to migrate from Oracle.

2. This should to work on slaves - it is one of ToDo

3. I didn't see support for memory store for column's statistics. Some separate questions is about production statistics - pg_stat_user_table, ..

Great and important work, thank you

Pavel


 

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:
On 07/30/2016 06:49 AM, Pavel Stehule wrote:
>     1) I wonder whether the FAST makes sense - does this really change
>     the performance significantly? IMHO you only move the catalog rows
>     to memory, so why should the tables be any faster? I also believe
>     this conflicts with SQL standard specification of CREATE TABLE.
>
>
> Probably has zero value to have slow and fast temp tables (from
> catalogue cost perspective). So the FAST implementation should be used
> everywhere. But there are some patterns used with work with temp
> tables,that should not working, and we would to decide if we prepare
> workaround or not.
>
> -- problematic pattern (old code)
> IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
>   CREATE TEMP TABLE xxx()
> ELSE
>   TRUNCATE TABLE xxx;
> END IF;

I'd argue that if you mess with catalogs directly, you're on your own. 
Not only it's fragile, but this pattern is also prone to race conditions 
(although a concurrent session can't create a conflicting temporary table).

>
> -- modern patter (new code)
> BEGIN
>   TRUNCATE TABLE xxx;
> EXCEPTION WHEN ..... THEN
>   CREATE TEMP TABLE(...)
> END;
>
> In this case we can use GUC, because visible behave should be same.

What GUC?

>
> The benefit of zero catalogue cost temp tables is significant - and for
> some larger applications the temp tables did hard performance issues.

Yeah, catalog bloat is a serious issue in such cases, and it's amplified 
by indexes created on the temporary tables.

>
> Some other random notes:
>
> 1. With this code should not be hard to implement global temp tables -
> shared persistent structure, temp local data - significant help for any
> who have to migrate from Oracle.

The patch moves in pretty much the opposite direction - if anything, 
it'll make it more difficult to implement global temporary tables, 
because it removes the definitions from the catalog, thus impossible to 
share by catalogs. To get global temporary tables, I think the best 
approach would be to share the catalog definition and only override the 
filename. Or something like that.

>
> 2. This should to work on slaves - it is one of ToDo
>

No, it does not work on slaves, because it still does a read-write 
transaction.

test=# begin read only;
BEGIN
test=# create fast temporary table x (id int);
ERROR:  cannot execute CREATE TABLE in a read-only transaction

No idea how difficult it'd be to make it work.

> 3. I didn't see support for memory store for column's statistics. Some
> separate questions is about production statistics - pg_stat_user_table, ..

That seems to work (both analyze and pg_stat_user_tables). Not sure 
where it's in the code, and I'm not willing to reverse engineer it.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tom Lane
Дата:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> But there are some patterns used with work with temp tables,that should not
> working, and we would to decide if we prepare workaround or not.

> -- problematic pattern (old code)
> IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
>   CREATE TEMP TABLE xxx()
> ELSE
>   TRUNCATE TABLE xxx;
> END IF;

> -- modern patter (new code)
> BEGIN
>   TRUNCATE TABLE xxx;
> EXCEPTION WHEN ..... THEN
>   CREATE TEMP TABLE(...)
> END;

If the former stops working, that's a sufficient reason to reject the
patch: it hasn't been thought through carefully enough.  The key reason
why I don't think that's negotiable is that if there aren't (apparently)
catalog entries corresponding to the temp tables, that will almost
certainly break many things in the backend and third-party extensions,
not only user code patterns like this one.  We'd constantly be fielding
bug reports that "feature X doesn't work with temp tables anymore".

In short, I think that the way to make something like this work is to
figure out how to have "virtual" catalog rows describing a temp table.
Or maybe to partition the catalogs so that vacuuming away temp-table
rows is easier/cheaper than today.
        regards, tom lane



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
David Steele
Дата:
On 7/30/16 10:47 AM, Tom Lane wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> But there are some patterns used with work with temp tables,that should not
>> working, and we would to decide if we prepare workaround or not.
> 
>> -- problematic pattern (old code)
>> IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
>>   CREATE TEMP TABLE xxx()
>> ELSE
>>   TRUNCATE TABLE xxx;
>> END IF;
> 
>> -- modern patter (new code)
>> BEGIN
>>   TRUNCATE TABLE xxx;
>> EXCEPTION WHEN ..... THEN
>>   CREATE TEMP TABLE(...)
>> END;
> 
> If the former stops working, that's a sufficient reason to reject the
> patch: it hasn't been thought through carefully enough.  The key reason
> why I don't think that's negotiable is that if there aren't (apparently)
> catalog entries corresponding to the temp tables, that will almost
> certainly break many things in the backend and third-party extensions,
> not only user code patterns like this one.  We'd constantly be fielding
> bug reports that "feature X doesn't work with temp tables anymore".
> 
> In short, I think that the way to make something like this work is to
> figure out how to have "virtual" catalog rows describing a temp table.
> Or maybe to partition the catalogs so that vacuuming away temp-table
> rows is easier/cheaper than today.

In addition the latter pattern burns an xid which can be a problem for
high-volume databases.

How about CREATE TEMP TABLE IF NOT EXISTS...?

-- 
-David
david@pgmasters.net



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:
On 07/30/2016 04:47 PM, Tom Lane wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> But there are some patterns used with work with temp tables,that should not
>> working, and we would to decide if we prepare workaround or not.
>
>> -- problematic pattern (old code)
>> IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
>>   CREATE TEMP TABLE xxx()
>> ELSE
>>   TRUNCATE TABLE xxx;
>> END IF;
>
>> -- modern patter (new code)
>> BEGIN
>>   TRUNCATE TABLE xxx;
>> EXCEPTION WHEN ..... THEN
>>   CREATE TEMP TABLE(...)
>> END;
>
> If the former stops working, that's a sufficient reason to reject the
> patch: it hasn't been thought through carefully enough.  The key reason
> why I don't think that's negotiable is that if there aren't (apparently)
> catalog entries corresponding to the temp tables, that will almost
> certainly break many things in the backend and third-party extensions,
> not only user code patterns like this one.  We'd constantly be fielding
> bug reports that "feature X doesn't work with temp tables anymore".
>

Agreed - breaking internal features for temporary tables is not 
acceptable. I was thinking more about external code messing with 
catalogs, but on second thought we probably need to keep the records in 
pg_class anyway.
>
> In short, I think that the way to make something like this work is
> to figure out how to have "virtual" catalog rows describing a temp
> table. Or maybe to partition the catalogs so that vacuuming away
> temp-table rows is easier/cheaper than today.
>

Yeah, and I think the patch tries to do that, although in a rather 
invasive / unprincipled way. But this will only work for the current 
behavior (i.e. mostly what SQL standard means by LOCAL). For GLOBAL 
temporary tables I think we need to keep physical catalog row, and only 
override the storage filename.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Aleksander Alekseev
Дата:
Hello.

Thanks everyone for great comments!

> Well, jokes aside, that's a pretty lousy excuse for not writing any 
> docs

I think maybe I put it in a wrong way. There are currently a lot of
comments in a code, more then enough to understand how this feature
works. What I meant is that this is not a final version of a patch and
a few paragraphs are yet to be written. At least it's how I see it. If
you believe that some parts of the code are currently hard to understand
and some comments could be improved, please name it and I will be happy
to fix it.

> IMHO if this patch gets in, we should use it as the only temp table 
> implementation (Or can you think of cases where keeping rows in
> pg_class has advantages?). That'd also eliminate the need for FAST
> keyword in the CREATE TABLE command.

> Probably has zero value to have slow and fast temp tables (from
> catalogue cost perspective). So the FAST implementation should be used
> everywhere.

If there are no objections I see no reason no to do it in a next
version of a patch.

> I'm getting failures in the regression suite

I've run regression suite like 10 times in a row in different
environments with different build flags but didn't manage to reproduce
it. Also our DBAs are testing this feature for weeks now on real-world
applications and they didn't report anything like this. Could you
please describe how to reproduce this issue?

> This should to work on slaves - it is one of ToDo

Glad you noticed! In fact I'm currently researching a possibility of
using the same approach for creating writable temporary tables on
replicas.

> The key reason why I don't think that's negotiable is that if there
> aren't (apparently) catalog entries corresponding to the temp tables,
> that will almost certainly break many things in the backend and
> third-party extensions, not only user code patterns like this one.

> In short, I think that the way to make something like this work is to
> figure out how to have "virtual" catalog rows describing a temp table.

I'm afraid once again I put it in a wrong way. What I meant by
"Information about these tables is stored not in a catalog but in
backend's memory" is in fact that _records_ of pg_class, pg_type and
other catalog relations are stored in-memory. Naturally this records
are visible to the user (otherwise \d or \d+ would not work) and you
can do queries like ` select * from pg_class where relname = 'tt1' `.
In other words part of the catalog is indeed "virtual".

> I didn't see support for memory store for column's statistics. Some
> separate questions is about production statistics -
> pg_stat_user_table, ..

> That seems to work (both analyze and pg_stat_user_tables). Not sure 
> where it's in the code, and I'm not willing to reverse engineer it.

Right, `ANALYZE temp_table;` and everything else works. Besides
pg_class, pg_type, pg_attribute and other relations pg_statistic
records for temp tables are stored in-memory as well. IIRC a lot of
pg_stat* relations are in fact views and thus don't require any special
support. If you see that some statistics are broken please don't
hesitate to report it and I will fix it.

Hope I answered all questions so far. I look forward to receive more
comments and questions regarding this patch!

-- 
Best regards,
Aleksander Alekseev



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:
On 08/01/2016 11:45 AM, Aleksander Alekseev wrote:
> Hello.
>
> Thanks everyone for great comments!
>
>> Well, jokes aside, that's a pretty lousy excuse for not writing any
>> docs
>
> I think maybe I put it in a wrong way. There are currently a lot of
> comments in a code, more then enough to understand how this feature
> works. What I meant is that this is not a final version of a patch and
> a few paragraphs are yet to be written. At least it's how I see it. If
> you believe that some parts of the code are currently hard to understand
> and some comments could be improved, please name it and I will be happy
> to fix it.

I don't think there's "a lot of comments in the code", not even
remotely. At least not in the files I looked into - heapam, indexam,
xact etc. There are a few comments in general, and most of them only
comment obvious facts, like "ignore in-memory tuples" right before a
trivial if statement.

What is needed is an overview of the approach, so that the reviewers can
read that first, instead of assembling the knowledge from pieces
scattered over comments in many pieces. But I see the fasttab.c contains
this:

/* TODO TODO comment the general idea - in-memory tuples and indexes,
hooks principle, FasttabSnapshots, etc */

The other thing that needs to happen is you need to modify comments in
front of some of the modified methods - e.g. the comments may need a
paragraph "But when the table is fast temporary, what happens is ..."

>
>> IMHO if this patch gets in, we should use it as the only temp table
>> implementation (Or can you think of cases where keeping rows in
>> pg_class has advantages?). That'd also eliminate the need for FAST
>> keyword in the CREATE TABLE command.
>
>> Probably has zero value to have slow and fast temp tables (from
>> catalogue cost perspective). So the FAST implementation should be used
>> everywhere.
>
> If there are no objections I see no reason no to do it in a next
> version of a patch.

I believe there will be a lot of discussion about this.

>
>> I'm getting failures in the regression suite
>
> I've run regression suite like 10 times in a row in different
> environments with different build flags but didn't manage to reproduce
> it. Also our DBAs are testing this feature for weeks now on real-world
> applications and they didn't report anything like this. Could you
> please describe how to reproduce this issue?
>

Nothing special:

$ ./configure --prefix=/home/user/pg-temporary --enable-debug \
   --enable-cassert

$ make -s clean && make -s -j4 install

$ export PATH=/home/user/pg-temporary/bin:$PATH

$ pg_ctl -D ~/tmp/data-temporary init

$ pg_ctl -D ~/tmp/data-temporary -l ~/temporary.log start

$ make installcheck

I get the failures every time - regression diff attached. The first
failure in "rolenames" is expected, because of clash with existing user
name. The remaining two failures are not.

I only get the failure for "installcheck" but not "check" for some reason.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> What is needed is an overview of the approach, so that the reviewers can 
> read that first, instead of assembling the knowledge from pieces 
> scattered over comments in many pieces. But I see the fasttab.c contains 
> this:

> /* TODO TODO comment the general idea - in-memory tuples and indexes, 
> hooks principle, FasttabSnapshots, etc */

A fairly common answer when some feature needs an implementation overview
is to create a README file for it, or add a new section in an existing
README file.
        regards, tom lane



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Aleksander Alekseev
Дата:
Thanks everyone for your remarks and comments!

Here is an improved version of a patch.

Main changes:
* Patch passes `make installcheck`
* Code is fully commented, also no more TODO's

I wish I sent this version of a patch last time. Now I realize it was
really hard to read and understand. Hope I managed to correct this
flaw. If you believe that some parts of the code are still poorly
commented or could be improved in any other way please let me know.

And as usual, any other comments, remarks or questions are highly
appreciated!

--
Best regards,
Aleksander Alekseev

Вложения

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Geoff Winkless
Дата:
On 30 July 2016 at 13:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> I'd argue that if you mess with catalogs directly, you're on your own.

Interesting. What would you suggest people use instead?

Geoff



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Robert Haas
Дата:
On Thu, Aug 4, 2016 at 8:14 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> Thanks everyone for your remarks and comments!
>
> Here is an improved version of a patch.
>
> Main changes:
> * Patch passes `make installcheck`
> * Code is fully commented, also no more TODO's
>
> I wish I sent this version of a patch last time. Now I realize it was
> really hard to read and understand. Hope I managed to correct this
> flaw. If you believe that some parts of the code are still poorly
> commented or could be improved in any other way please let me know.
>
> And as usual, any other comments, remarks or questions are highly
> appreciated!

Three general comments:

1. It's always seemed to me that a huge problem with anything of this
sort is dependencies.  For example, suppose I create a fast temporary
table and then I create a functional index on the fast temporary table
that uses some SQL function defined in pg_proc.  Then, another user
drops the function.  Then, I try to use the index.  With regular
temporary tables, dependencies protect us here, but if there are no
catalog entries, I wonder how this can ever be made safe.  Similar
problems exist for triggers, constraints, RLS policies, and attribute
defaults.

2. This inserts additional code in a bunch of really low-level places
like heap_hot_search_buffer, heap_update, heap_delete, etc.  I think
what you've basically done here is create a new, in-memory heap AM and
then, because we don't have an API for adding new storage managers,
you've bolted it onto the existing heapam layer.  That's certainly a
reasonable approach for creating a PoC, but I think we actually need a
real API here.  Otherwise, when the next person comes along and wants
to add a third heap implementation, they've got to modify all of these
same places again.  I don't think this code is reasonably maintainable
in this form.

3. Currently, temporary tables are parallel-restricted: a query that
references temporary tables can use parallelism, but access to the
temporary tables is only possible from within the leader.  I suspect,
although I'm not entirely sure, that lifting this restriction would be
easier with our current temporary table implementation than with this
one, because the current temporary table implementation mostly relies
on a set of buffers that could be moved from backend-private memory to
DSM.  On a quick look, this implementation uses a bunch of new data
structures that are heavy on pointers, so that gets quite a bit more
complicated to make parallel-safe (unless we adopt threads instead of
processes!).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Dmitry Dolgov
Дата:
​Hi

I tried to dig into this patch (which seems pretty interesting) to help bring
it in good shape. Here are few random notes, I hope they can be helpful:

> I think we actually need a real API here.
Definitely, there are plenty places in the new code with the same pattern:
 * figure out if it's an action related to the fast temporary tables based on
   `ItemPointer`/relation OID/etc
 * if it is, add some extra logic or skip something in original implementation

in `heapam.c`, `indexam.c`, `xact.c`, `dependency.c`. I believe it's possible to
make it more generic (although it will contain almost the same logic), e.g.
separate regular and fasttable implementations completely, and decide which one
we should choose in that particular case.

Btw, I'm wondering about the `heap_*` functions in `heapam.c` - some of them are
wrapped and never used directly, although they're contains in
`INTERFACE_ROUTINES` (like `simple_heap_delete` -> `heap_delete`), some of them
aren't. It looks like inconsistency in terms of function names, probably it
should be unified?

> What is needed is an overview of the approach, so that the reviewers can read
> that first,
I feel lack of such information even in new version of this patch (but, I'm
probably a noob in these matters).  I noted that the `fasttab.c` file contains some
general overview, but in terms of "what are we doing", and "why are we doing
this". I think general overview of "how are we doing this" also may be useful.
And there are several slightly obvious commentaries like:

```
+ /* Is it a virtual TID? */
+ if (IsFasttabItemPointer(tid))
```

but I believe an introduction of a new API (see the previous note) will solve
this eventually.

> Why do we need the new relpersistence value? ISTM we could easily got with
> just RELPERSISTENCE_TEMP, which would got right away of many chances as the
> steps are exactly the same.
I agree, it looks like `RELPERSISTENCE_FAST_TEMP` hasn't any influence on the
code.

> For example, suppose I create a fast temporary table and then I create a
> functional index on the fast temporary table that uses some SQL function
> defined in pg_proc.
Just to clarify, did you mean something like this?
```
create fast temp table fasttab(x int, s text);
create or replace function test_function_for_index(t text) returns text as $$
begin
    return lower(t);
end;
$$ language plpgsql immutable;
create index fasttab_s_idx on fasttab (test_function_for_index(s));
drop function test_function_for_index(t text);
```
As far as I understand dependencies should protect in case of fasttable too,
because everything is visible as in regular case, isn't it?

And finally one more question, why items of `FasttabIndexMethodsTable[]` like
this one:
```
+ /* 2187, non-unique */
+ {InheritsParentIndexId, 1,
+ {Anum_pg_inherits_inhparent, 0, 0},
+ {CompareOid, CompareInvalid, CompareInvalid}
+ },
```
have this commentary before them? I assume it's an id and an extra information,
and I'm concerned that they can easily become outdated inside commentary block.

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Robert Haas
Дата:
On Sat, Aug 6, 2016 at 4:05 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>> For example, suppose I create a fast temporary table and then I create a
>> functional index on the fast temporary table that uses some SQL function
>> defined in pg_proc.
> Just to clarify, did you mean something like this?
> ```
> create fast temp table fasttab(x int, s text);
> create or replace function test_function_for_index(t text) returns text as
> $$
> begin
>     return lower(t);
> end;
> $$ language plpgsql immutable;
> create index fasttab_s_idx on fasttab (test_function_for_index(s));
> drop function test_function_for_index(t text);
> ```
> As far as I understand dependencies should protect in case of fasttable too,
> because everything is visible as in regular case, isn't it?

I think the whole idea of a fast temporary table is that there are no
catalog entries.  If there are no catalog entries, then dependencies
are not visible.  If there ARE catalog entries, to what do they refer?Without a pg_class entry for the table, there's
notable OID upon
 
which to depend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I think the whole idea of a fast temporary table is that there are no
> catalog entries.  If there are no catalog entries, then dependencies
> are not visible.  If there ARE catalog entries, to what do they refer?
>  Without a pg_class entry for the table, there's no table OID upon
> which to depend.

TBH, I think that the chances of such a design getting committed are
not distinguishable from zero.  Tables have to have OIDs; there is just
too much code that assumes that.  And I seriously doubt that it will
work (for any large value of "work") without catalog entries.
        regards, tom lane



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Anastasia Lubennikova
Дата:
05.08.2016 19:41, Robert Haas:
>
> 2. This inserts additional code in a bunch of really low-level places
> like heap_hot_search_buffer, heap_update, heap_delete, etc.  I think
> what you've basically done here is create a new, in-memory heap AM and
> then, because we don't have an API for adding new storage managers,
> you've bolted it onto the existing heapam layer.  That's certainly a
> reasonable approach for creating a PoC, but I think we actually need a
> real API here.  Otherwise, when the next person comes along and wants
> to add a third heap implementation, they've got to modify all of these
> same places again.  I don't think this code is reasonably maintainable
> in this form.

As I can see, you recommend to clean up the API of storage
management code. I strongly agree that it's high time to do it.

So, I started the discussion about refactoring and improving API
of heapam and heap relations.
You can find it on commitfest:
https://commitfest.postgresql.org/10/700/

I'll be glad to see your thoughts on the thread.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Andres Freund
Дата:
On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I think the whole idea of a fast temporary table is that there are no
> > catalog entries.  If there are no catalog entries, then dependencies
> > are not visible.  If there ARE catalog entries, to what do they refer?
> >  Without a pg_class entry for the table, there's no table OID upon
> > which to depend.
>
> TBH, I think that the chances of such a design getting committed are
> not distinguishable from zero.  Tables have to have OIDs; there is just
> too much code that assumes that.  And I seriously doubt that it will
> work (for any large value of "work") without catalog entries.

That seems a bit too defeatist. It's obviously not a small change to get
there - and I don't think the patch upthread is really attacking the
relevant problems yet - but saying that we'll never have temp tables
without pg_class/pg_depend bloat seems to be pretty close to just giving
up.  Having 8 byte oids (as explicit columns instead of magic? Or just
oid64?) and then reserving ranges for temp objects stored in a local
memory seems to be feasible.  The pinning problem could potentially be
solved by "session lifetime" pins in pg_depend, which prevents dependent
objects being dropped.  Obviously that's just spitballing; but I think
the problem is too big to just give up.

Andres



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I think the whole idea of a fast temporary table is that there are no
>>> catalog entries.  If there are no catalog entries, then dependencies
>>> are not visible.  If there ARE catalog entries, to what do they refer?
>>> Without a pg_class entry for the table, there's no table OID upon
>>> which to depend.

>> TBH, I think that the chances of such a design getting committed are
>> not distinguishable from zero.  Tables have to have OIDs; there is just
>> too much code that assumes that.  And I seriously doubt that it will
>> work (for any large value of "work") without catalog entries.

> That seems a bit too defeatist.

Huh?  I didn't say we shouldn't work on the problem --- I just think that
this particular approach isn't good.  Which you seemed to agree with.
        regards, tom lane



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Andres Freund
Дата:
On 2016-08-14 21:04:57 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
> >> Robert Haas <robertmhaas@gmail.com> writes:
> >>> I think the whole idea of a fast temporary table is that there are no
> >>> catalog entries.  If there are no catalog entries, then dependencies
> >>> are not visible.  If there ARE catalog entries, to what do they refer?
> >>> Without a pg_class entry for the table, there's no table OID upon
> >>> which to depend.
> 
> >> TBH, I think that the chances of such a design getting committed are
> >> not distinguishable from zero.  Tables have to have OIDs; there is just
> >> too much code that assumes that.  And I seriously doubt that it will
> >> work (for any large value of "work") without catalog entries.
> 
> > That seems a bit too defeatist.
> 
> Huh?  I didn't say we shouldn't work on the problem --- I just think that
> this particular approach isn't good.  Which you seemed to agree with.

I took your statement to mean that they need a pg_class entry - even if
there were a partial solution to the pg_depend problem allowing to avoid
pg_attribute entries, tha't still not really be a solution. If that's
not what you mean, sorry - and nice that we agree ;)



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Aleksander Alekseev
Дата:
> > >>> I think the whole idea of a fast temporary table is that there
> > >>> are no catalog entries.  If there are no catalog entries, then
> > >>> dependencies are not visible.  If there ARE catalog entries, to
> > >>> what do they refer? Without a pg_class entry for the table,
> > >>> there's no table OID upon which to depend.    
> >     
> > >> TBH, I think that the chances of such a design getting committed
> > >> are not distinguishable from zero.  Tables have to have OIDs;
> > >> there is just too much code that assumes that.  And I seriously
> > >> doubt that it will work (for any large value of "work") without
> > >> catalog entries.    
> >     
> > > That seems a bit too defeatist.    
> > 
> > Huh?  I didn't say we shouldn't work on the problem --- I just
> > think that this particular approach isn't good.  Which you seemed
> > to agree with.    
> 
> I took your statement to mean that they need a pg_class entry - even
> if there were a partial solution to the pg_depend problem allowing to
> avoid pg_attribute entries, tha't still not really be a solution. If
> that's not what you mean, sorry - and nice that we agree ;)
> 
>   

Just to keep things sane I would like to remind that in this concrete
patch there _are_ catalog entries:

```
[...]
This file contents imlementation of special type of temporary tables ---
fast temporary tables (FTT). From user perspective they work exactly as
regular temporary tables. However there are no records about FTTs in
pg_catalog. These records are stored in backend's memory instead and
mixed with regular records during scans of catalog tables. We refer to
corresponding tuples of catalog tables as "in-memory" or "virtual"
tuples and to all these tuples together --- as "in-memory" or "virtual"
catalog.
[...]
```

As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
Also I mentioned that in this case even \d and \d+ would not work.

I personally find this discussion very confusing. Maybe we should
concentrate on a concrete patch instead of some abstract ideas, and
topics that are still open.

For instance it surprises me that apparently there is no one who
objects "lets make all temporary tables fast temporary tables" idea.
Since in this case code would use more memory for keeping a virtual
catalog wouldn't it be considered a major change of behavior that can
break someones production environment?

-- 
Best regards,
Aleksander Alekseev



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Pavel Stehule
Дата:


2016-08-15 11:01 GMT+02:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
> > >>> I think the whole idea of a fast temporary table is that there
> > >>> are no catalog entries.  If there are no catalog entries, then
> > >>> dependencies are not visible.  If there ARE catalog entries, to
> > >>> what do they refer? Without a pg_class entry for the table,
> > >>> there's no table OID upon which to depend.
> >
> > >> TBH, I think that the chances of such a design getting committed
> > >> are not distinguishable from zero.  Tables have to have OIDs;
> > >> there is just too much code that assumes that.  And I seriously
> > >> doubt that it will work (for any large value of "work") without
> > >> catalog entries.
> >
> > > That seems a bit too defeatist.
> >
> > Huh?  I didn't say we shouldn't work on the problem --- I just
> > think that this particular approach isn't good.  Which you seemed
> > to agree with.
>
> I took your statement to mean that they need a pg_class entry - even
> if there were a partial solution to the pg_depend problem allowing to
> avoid pg_attribute entries, tha't still not really be a solution. If
> that's not what you mean, sorry - and nice that we agree ;)
>
>

Just to keep things sane I would like to remind that in this concrete
patch there _are_ catalog entries:

```
[...]
This file contents imlementation of special type of temporary tables ---
fast temporary tables (FTT). From user perspective they work exactly as
regular temporary tables. However there are no records about FTTs in
pg_catalog. These records are stored in backend's memory instead and
mixed with regular records during scans of catalog tables. We refer to
corresponding tuples of catalog tables as "in-memory" or "virtual"
tuples and to all these tuples together --- as "in-memory" or "virtual"
catalog.
[...]
```

As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
Also I mentioned that in this case even \d and \d+ would not work.

I personally find this discussion very confusing. Maybe we should
concentrate on a concrete patch instead of some abstract ideas and
topics that are still open.

For instance it surprises me that apparently there is no one who
objects "lets make all temporary tables fast temporary tables" idea.
Since in this case code would use more memory for keeping a virtual
catalog wouldn't it be considered a major change of behavior that could
break someones production environment?

It is pretty hard discussion about cost or usability of FTT. The small FTT (for usage in PLpgSQL) can be replaced by arrays. The overhead of pg_catalog of big TT is not significant. So introduction special proprietary table type is debatable.

Probably size of metadata of temporary tables should be minimal - currently all metadata are cached in memory - and it is not a problem.

But we can change this discussion little bit different. I believe so solution should be *global temporary tables*. These tables has persistent catalogue entries. Data are joined with session. These tables can be effective solution of problem with temporary tables, can be strong benefit for developers (more comfortable, possible static analyse of PLpgSQL) and it simplify life to all people who has do migration from Oracle. So only benefits are there :).

Regards

Pavel
 

--
Best regards,
Aleksander Alekseev

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Aleksander Alekseev
Дата:
> But we can change this discussion little bit different. I believe so
> solution should be *global temporary tables*. These tables has
> persistent catalogue entries. Data are joined with session. These
> tables can be effective solution of problem with temporary tables,
> can be strong benefit for developers (more comfortable, possible
> static analyse of PLpgSQL) and it simplify life to all people who has
> do migration from Oracle. So only benefits are there :).

I don't think that global temporary tables solve "catalog bloating that
causes auto vacuum" problem. I suggest we don't change a topic. Or maybe
I don't know something about global temporary tables?

-- 
Best regards,
Aleksander Alekseev



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Pavel Stehule
Дата:


2016-08-15 12:00 GMT+02:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
> But we can change this discussion little bit different. I believe so
> solution should be *global temporary tables*. These tables has
> persistent catalogue entries. Data are joined with session. These
> tables can be effective solution of problem with temporary tables,
> can be strong benefit for developers (more comfortable, possible
> static analyse of PLpgSQL) and it simplify life to all people who has
> do migration from Oracle. So only benefits are there :).

I don't think that global temporary tables solve "catalog bloating that
causes auto vacuum" problem. I suggest we don't change a topic. Or maybe
I don't know something about global temporary tables?

The global temporary tables has persistent rows in the catalogue. The mapping to files can be marked as special and real mapping should be only in memory.

So the changes in catalogue related to global temporary tables are pretty less frequently.

Regards

Pavel
 

--
Best regards,
Aleksander Alekseev

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Aleksander Alekseev
Дата:
> The global temporary tables has persistent rows in the catalogue. The
> mapping to files can be marked as special and real mapping should be
> only in memory.
> 
> So the changes in catalogue related to global temporary tables are
> pretty less frequently.

I'm afraid I still don't get it. Let say I have an application that
does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
time. Naturally there is not only one temp table per session. Could you
explain please in more detail how exactly do these persistent rows help?

-- 
Best regards,
Aleksander Alekseev



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Pavel Stehule
Дата:


2016-08-15 12:18 GMT+02:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
> The global temporary tables has persistent rows in the catalogue. The
> mapping to files can be marked as special and real mapping should be
> only in memory.
>
> So the changes in catalogue related to global temporary tables are
> pretty less frequently.

I'm afraid I still don't get it. Let say I have an application that
does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
time. Naturally there is not only one temp table per session. Could you
explain please in more detail how exactly do these persistent rows help?


when you use global temporary tables, then you create it only once - like usual tables.

you don't drop these tables.

Regards

Pavel
 
--
Best regards,
Aleksander Alekseev

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Christoph Berg
Дата:
Re: Tom Lane 2016-07-30 <1184.1469890030@sss.pgh.pa.us>
> In short, I think that the way to make something like this work is to
> figure out how to have "virtual" catalog rows describing a temp table.
> Or maybe to partition the catalogs so that vacuuming away temp-table
> rows is easier/cheaper than today.

We should also be thinking about how the opposite idea of "global"
temp tables (I believe that's what Oracle calls them) would work.
These have a persistent structure in the catalogs, just the data is
private to every session (or transaction); every connection starts
with an empty temp table and for their use.

I'd guess that type of global temp tables would fix the bloat problem
also very efficiently. (Ad-hoc temp tables shouldn't occur that often
so the bloat caused by them wouldn't matter that much. If they do,
their structure is likely always the same, and they could be made
"global" in the schema.)

The bit that needs to be thought out here would be how to maintain
statistics for these tables. Obviously ANALYZE shouldn't update any
globally visible data.

Christoph



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Christoph Berg
Дата:
Re: To Tom Lane 2016-08-15 <20160815111057.v2mqqjp4aabvwqnc@msg.df7cb.de>
> Re: Tom Lane 2016-07-30 <1184.1469890030@sss.pgh.pa.us>
> > In short, I think that the way to make something like this work is to
> > figure out how to have "virtual" catalog rows describing a temp table.
> > Or maybe to partition the catalogs so that vacuuming away temp-table
> > rows is easier/cheaper than today.
> 
> We should also be thinking about how the opposite idea of "global"
> temp tables

(Obviously I should catch up on the rest of the thread when postponing
a message for an hour or two. Sorry for the duplicated idea here...)

Christoph



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Robert Haas
Дата:
Why are you sending this off-list?  Please let's keep the discussion
on the mailing list.  I suggest resending this there.

On Mon, Aug 15, 2016 at 5:01 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
>> > >>> I think the whole idea of a fast temporary table is that there
>> > >>> are no catalog entries.  If there are no catalog entries, then
>> > >>> dependencies are not visible.  If there ARE catalog entries, to
>> > >>> what do they refer? Without a pg_class entry for the table,
>> > >>> there's no table OID upon which to depend.
>> >
>> > >> TBH, I think that the chances of such a design getting committed
>> > >> are not distinguishable from zero.  Tables have to have OIDs;
>> > >> there is just too much code that assumes that.  And I seriously
>> > >> doubt that it will work (for any large value of "work") without
>> > >> catalog entries.
>> >
>> > > That seems a bit too defeatist.
>> >
>> > Huh?  I didn't say we shouldn't work on the problem --- I just
>> > think that this particular approach isn't good.  Which you seemed
>> > to agree with.
>>
>> I took your statement to mean that they need a pg_class entry - even
>> if there were a partial solution to the pg_depend problem allowing to
>> avoid pg_attribute entries, tha't still not really be a solution. If
>> that's not what you mean, sorry - and nice that we agree ;)
>>
>>
>
> Just to keep things sane I would like to remind that in this concrete
> patch there _are_ catalog entries:
>
> ```
> [...]
> This file contents imlementation of special type of temporary tables ---
> fast temporary tables (FTT). From user perspective they work exactly as
> regular temporary tables. However there are no records about FTTs in
> pg_catalog. These records are stored in backend's memory instead and
> mixed with regular records during scans of catalog tables. We refer to
> corresponding tuples of catalog tables as "in-memory" or "virtual"
> tuples and to all these tuples together --- as "in-memory" or "virtual"
> catalog.
> [...]
> ```
>
> As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
> Also I mentioned that in this case even \d and \d+ would not work.
>
> I personally find this discussion very confusing. Maybe we should
> concentrate on a concrete patch instead of some abstract ideas and
> topics that are still open.
>
> For instance it surprises me that apparently there is no one who
> objects "lets make all temporary tables fast temporary tables" idea.
> Since in this case code would use more memory for keeping a virtual
> catalog wouldn't it be considered a major change of behavior that could
> break someones production environment?
>
> --
> Best regards,
> Aleksander Alekseev



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Aleksander Alekseev
Дата:
> Why are you sending this off-list?  Please let's keep the discussion
> on the mailing list.  I suggest resending this there.

Sorry for that. I accidentally removed pgsql-hackers@ from CC list or
maybe my email client somehow did it for me. Short after that I realized
my mistake and sent a copy to the mailing list. 

-- 
Best regards,
Aleksander Alekseev



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Robert Haas
Дата:
On Mon, Aug 15, 2016 at 5:12 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> Just to keep things sane I would like to remind that in this concrete
> patch there _are_ catalog entries:
>
> ```
> [...]
> This file contents imlementation of special type of temporary tables ---
> fast temporary tables (FTT). From user perspective they work exactly as
> regular temporary tables. However there are no records about FTTs in
> pg_catalog. These records are stored in backend's memory instead and
> mixed with regular records during scans of catalog tables. We refer to
> corresponding tuples of catalog tables as "in-memory" or "virtual"
> tuples and to all these tuples together --- as "in-memory" or "virtual"
> catalog.
> [...]
> ```

That doesn't really solve the problem, because OTHER backends won't be
able to see them.  So, if I create a fast temporary table in one
session that depends on a permanent object, some other session can
drop the permanent object.  If there were REAL catalog entries, that
wouldn't work, because the other session would see the dependency.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Jim Nasby
Дата:
On 8/16/16 11:59 AM, Robert Haas wrote:
...
> That doesn't really solve the problem, because OTHER backends won't be
> able to see them.  So, if I create a fast temporary table in one
> session that depends on a permanent object, some other session can
> drop the permanent object.  If there were REAL catalog entries, that
> wouldn't work, because the other session would see the dependency.

Some discussion about TEMP functions is happening on -general right now, 
and there's other things where temp objects are good to have, so it'd be 
nice to have a more generic fix for this stuff. Is the idea of 
"partitioning" the catalogs to store temp objects separate from 
permanent fatally flawed?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Aleksander Alekseev
Дата:
> That doesn't really solve the problem, because OTHER backends won't be
> able to see them.  So, if I create a fast temporary table in one
> session that depends on a permanent object, some other session can
> drop the permanent object.  If there were REAL catalog entries, that
> wouldn't work, because the other session would see the dependency.
> 

This is a good point. However current implementation doesn't allow to
do that. There is a related bug though, a minor one.

In session 1:

```
CREATE TABLE cities2 (name text, population float, altitude int);
CREATE FAST TEMPORARY TABLE capitals2 (state char(2)) INHERITS (cities2);
```

In session 2:

```
DROP TABLE cities2;

ERROR:  cache lookup failed for relation 16401
```

Instead of "cache lookup failed" probably a better error message
should be displayed. Something like "cannot drop table cities2 because
other objects depend on it". I will send a corrected patch shortly.

Everything else seems to work as expected.

If you discover any other bugs please let me know!

-- 
Best regards,
Aleksander Alekseev



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:
On 08/17/2016 11:50 AM, Aleksander Alekseev wrote:
>> That doesn't really solve the problem, because OTHER backends won't be
>> able to see them.  So, if I create a fast temporary table in one
>> session that depends on a permanent object, some other session can
>> drop the permanent object.  If there were REAL catalog entries, that
>> wouldn't work, because the other session would see the dependency.
>>
>
> This is a good point. However current implementation doesn't allow to
> do that.

IMHO without handling that, the design is effectively broken and has 
very little change (or rather none at all) to get committed.

I think one way to fix that would be to store the virtual tuples in 
shared memory (instead of process memory). That will certainly require 
locking and synchronization, but well - it needs to be shared.
> There is a related bug though, a minor one.
>
> In session 1:
>
> ```
> CREATE TABLE cities2 (name text, population float, altitude int);
> CREATE FAST TEMPORARY TABLE capitals2 (state char(2)) INHERITS (cities2);
> ```
>
> In session 2:
>
> ```
> DROP TABLE cities2;
>
> ERROR:  cache lookup failed for relation 16401
> ```
>
> Instead of "cache lookup failed" probably a better error message
> should be displayed. Something like "cannot drop table cities2
> because other objects depend on it". I will send a corrected patch
> shortly.
>
> Everything else seems to work as expected.
>
> If you discover any other bugs please let me know!
>

While a better error message would be nice, this is curing the symptoms 
and not the cause. I think a proper design needs to prevent the DROP by 
using dependencies.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:
On 08/15/2016 12:23 PM, Pavel Stehule wrote:
>
>
> 2016-08-15 12:18 GMT+02:00 Aleksander Alekseev
> <a.alekseev@postgrespro.ru <mailto:a.alekseev@postgrespro.ru>>:
>
>     > The global temporary tables has persistent rows in the catalogue. The
>     > mapping to files can be marked as special and real mapping should be
>     > only in memory.
>     >
>     > So the changes in catalogue related to global temporary tables are
>     > pretty less frequently.
>
>     I'm afraid I still don't get it. Let say I have an application that
>     does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
>     time. Naturally there is not only one temp table per session. Could you
>     explain please in more detail how exactly do these persistent rows help?
>
>
> when you use global temporary tables, then you create it only once -
> like usual tables.
>
> you don't drop these tables.
>

I share the view that this is a better/simpler solution to the problem. 
It will still require virtual (in-memory) tuples for pg_statistic 
records, but everything else works pretty much as for regular tables. In 
particular there are no problems with dependencies.

The obvious disadvantage is that it requires changes to applications.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Pavel Stehule
Дата:


2016-08-18 16:33 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
On 08/15/2016 12:23 PM, Pavel Stehule wrote:


2016-08-15 12:18 GMT+02:00 Aleksander Alekseev
<a.alekseev@postgrespro.ru <mailto:a.alekseev@postgrespro.ru>>:

    > The global temporary tables has persistent rows in the catalogue. The
    > mapping to files can be marked as special and real mapping should be
    > only in memory.
    >
    > So the changes in catalogue related to global temporary tables are
    > pretty less frequently.

    I'm afraid I still don't get it. Let say I have an application that
    does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
    time. Naturally there is not only one temp table per session. Could you
    explain please in more detail how exactly do these persistent rows help?


when you use global temporary tables, then you create it only once -
like usual tables.

you don't drop these tables.


I share the view that this is a better/simpler solution to the problem. It will still require virtual (in-memory) tuples for pg_statistic records, but everything else works pretty much as for regular tables. In particular there are no problems with dependencies.

The obvious disadvantage is that it requires changes to applications.

sure - as plpgsql developer I can say, the global temp tables are much more friendly - so rewriting in application is enjoy work.

Regards

Pavel
 


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Robert Haas
Дата:
On Tue, Aug 16, 2016 at 8:03 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 8/16/16 11:59 AM, Robert Haas wrote:
> ...
>>
>> That doesn't really solve the problem, because OTHER backends won't be
>> able to see them.  So, if I create a fast temporary table in one
>> session that depends on a permanent object, some other session can
>> drop the permanent object.  If there were REAL catalog entries, that
>> wouldn't work, because the other session would see the dependency.
>
> Some discussion about TEMP functions is happening on -general right now, and
> there's other things where temp objects are good to have, so it'd be nice to
> have a more generic fix for this stuff. Is the idea of "partitioning" the
> catalogs to store temp objects separate from permanent fatally flawed?

I wouldn't say it's fatally flawed.  But you might need a
world-renowned team of physicians working round the clock for days in
a class 1 trauma center to save it.  If you imagine that you have a
permanent pg_class which holds permanent tables and a temporary
pg_class per-backend which stores temporary tables, then you very
quickly end up with the same deadly flaw as in Aleksander's design:
other backends cannot see all of the dependency entries and can drop
things that they shouldn't be permitted to drop.  However, you could
have a permanent pg_class which holds the records for permanent tables
and an *unlogged* table, say pg_class_unlogged, which holds records
for temporary tables.  Now everybody can see everybody else's data,
yet we don't have to create permanent catalog entries.  So we are not
dead.  All of the temporary catalog tables vanish on a crash, too, and
in a very clean way, which is great.

However:

1. The number of tables for which we would need to add a duplicate,
unlogged table is formidable.  You need pg_attribute, pg_attrdef,
pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
And the backend changes needed so that we used the unlogged copy for
temp tables and the permanent copy for regular tables is probably
really large.

2. You can't write to unlogged tables on standby servers, so this
doesn't help solve the problem of wanting to use temporary tables on
standbys.

3. While it makes creating temporary tables a lighter-weight
operation, because you no longer need to write WAL for the catalog
entries, there's probably still substantially more overhead than just
stuffing them in backend-local RAM.  So the performance benefits are
probably fairly modest.

Overall I feel like the development effort that it would take to make
this work would almost certainly be better-expended elsewhere.  But of
course I'm not in charge of how people who work for other companies
spend their time...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Alvaro Herrera
Дата:
Robert Haas wrote:

> However:
> 
> 1. The number of tables for which we would need to add a duplicate,
> unlogged table is formidable.  You need pg_attribute, pg_attrdef,
> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
> And the backend changes needed so that we used the unlogged copy for
> temp tables and the permanent copy for regular tables is probably
> really large.

Check.  This is the most serious issue, IMV.

> 2. You can't write to unlogged tables on standby servers, so this
> doesn't help solve the problem of wanting to use temporary tables on
> standbys.

Check.  We could think about relaxing this restriction, which would
enable the feature to satisfy that use case.  (I think the main
complication there is the init fork of btrees on those catalogs; other
relations could just be truncated to empty on restart.)

> 3. While it makes creating temporary tables a lighter-weight
> operation, because you no longer need to write WAL for the catalog
> entries, there's probably still substantially more overhead than just
> stuffing them in backend-local RAM.  So the performance benefits are
> probably fairly modest.

You also save catalog bloat ... These benefits may not be tremendous,
but I think they may be good enough for many users.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Andres Freund
Дата:
On 2016-08-22 17:50:11 -0300, Alvaro Herrera wrote:
> > 2. You can't write to unlogged tables on standby servers, so this
> > doesn't help solve the problem of wanting to use temporary tables on
> > standbys.
> 
> Check.  We could think about relaxing this restriction, which would
> enable the feature to satisfy that use case.  (I think the main
> complication there is the init fork of btrees on those catalogs; other
> relations could just be truncated to empty on restart.)

Isn't the main complication that visibility currently requires xids to
be assigned?



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Aleksander Alekseev
Дата:
Thank you everyone for great comments!

> have a permanent pg_class which holds the records for permanent tables
> and an *unlogged* table, say pg_class_unlogged, which holds records
> for temporary tables.  Now everybody can see everybody else's data,
> yet we don't have to create permanent catalog entries.  So we are not
> dead.  All of the temporary catalog tables vanish on a crash, too, and
> in a very clean way, which is great.
>
> [...]
>
> Overall I feel like the development effort that it would take to make
> this work would almost certainly be better-expended elsewhere.

Agree. This is an interesting idea but considering named drawbacks,
especially:

> 2. You can't write to unlogged tables on standby servers, so this
> doesn't help solve the problem of wanting to use temporary tables on
> standbys.

... I don't think it's worth an effort.

>> when you use global temporary tables, then you create it only once -
>> like usual tables.
>>
>> you don't drop these tables.
>
> I share the view that this is a better/simpler solution to the problem.
> It will still require virtual (in-memory) tuples for pg_statistic
> records, but everything else works pretty much as for regular tables. In
> particular there are no problems with dependencies.
>
> The obvious disadvantage is that it requires changes to applications.

Frankly I have much more faith in Tom's idea of using virtual part of the
catalog for all temporary tables, i.e turning all temporary tables into
"fast" temporary tables. Instead of introducing a new type of temporary tables
that solve catalog bloating problem and forcing users to rewrite applications
why not just not to create a problem in a first place?

> I think one way to fix that would be to store the virtual tuples in
> shared memory (instead of process memory). That will certainly require
> locking and synchronization, but well - it needs to be shared.

I believe currently this is the most promising course of action. In first
implementation we could just place all virtual part of the catalog in a shared
memory and protect it with a single lock. If it will work as expected the next
step would be elimination of bottlenecks --- using multiple locks, moving part
of a virtual catalog to local backend's memory, etc.

As always, please don't hesitate to share any thoughts on this topic!

-- 
Best regards,
Aleksander Alekseev



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:

On 08/22/2016 10:32 PM, Robert Haas wrote:
>
> ...
>
> 1. The number of tables for which we would need to add a duplicate,
> unlogged table is formidable.  You need pg_attribute, pg_attrdef,
> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
> And the backend changes needed so that we used the unlogged copy for
> temp tables and the permanent copy for regular tables is probably
> really large.
>
> 2. You can't write to unlogged tables on standby servers, so this
> doesn't help solve the problem of wanting to use temporary tables on
> standbys.
>
> 3. While it makes creating temporary tables a lighter-weight
> operation, because you no longer need to write WAL for the catalog
> entries, there's probably still substantially more overhead than just
> stuffing them in backend-local RAM.  So the performance benefits are
> probably fairly modest.
>
> Overall I feel like the development effort that it would take to make
> this work would almost certainly be better-expended elsewhere.  But of
> course I'm not in charge of how people who work for other companies
> spend their time...
>

Could someone please explain how the unlogged tables are supposed to fix 
the catalog bloat problem, as stated in the initial patch submission? 
We'd still need to insert/delete the catalog rows when creating/dropping 
the temporary tables, causing the bloat. Or is there something I'm missing?


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Claudio Freire
Дата:
On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 08/22/2016 10:32 PM, Robert Haas wrote:
>>
>>
>> ...
>>
>> 1. The number of tables for which we would need to add a duplicate,
>> unlogged table is formidable.  You need pg_attribute, pg_attrdef,
>> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
>> And the backend changes needed so that we used the unlogged copy for
>> temp tables and the permanent copy for regular tables is probably
>> really large.
>>
>> 2. You can't write to unlogged tables on standby servers, so this
>> doesn't help solve the problem of wanting to use temporary tables on
>> standbys.
>>
>> 3. While it makes creating temporary tables a lighter-weight
>> operation, because you no longer need to write WAL for the catalog
>> entries, there's probably still substantially more overhead than just
>> stuffing them in backend-local RAM.  So the performance benefits are
>> probably fairly modest.
>>
>> Overall I feel like the development effort that it would take to make
>> this work would almost certainly be better-expended elsewhere.  But of
>> course I'm not in charge of how people who work for other companies
>> spend their time...
>>
>
> Could someone please explain how the unlogged tables are supposed to fix the
> catalog bloat problem, as stated in the initial patch submission? We'd still
> need to insert/delete the catalog rows when creating/dropping the temporary
> tables, causing the bloat. Or is there something I'm missing?

Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Perhaps reserving a worker or N to run only on catalog schemas?

That'd be far simpler.



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Andres Freund
Дата:
On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:
> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> > Could someone please explain how the unlogged tables are supposed to fix the
> > catalog bloat problem, as stated in the initial patch submission? We'd still
> > need to insert/delete the catalog rows when creating/dropping the temporary
> > tables, causing the bloat. Or is there something I'm missing?

Beats me.


> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Not really in my experience, at least not without more drastic vacuum
changes. The issue is that if you have a single "long running"
transaction (in some workloads that can even just be a 3 min taking
query/xact), nothing will be cleaned up during that time. If you have a
few hundred temp tables created per sec, you'll be in trouble even
then. Not to speak of the case where you have queries taking hours (say
a backup).

Andres



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:

On 08/24/2016 12:18 AM, Claudio Freire wrote:
> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> On 08/22/2016 10:32 PM, Robert Haas wrote:
>>>
>>>
>>> ...
>>>
>>> 1. The number of tables for which we would need to add a duplicate,
>>> unlogged table is formidable.  You need pg_attribute, pg_attrdef,
>>> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
>>> And the backend changes needed so that we used the unlogged copy for
>>> temp tables and the permanent copy for regular tables is probably
>>> really large.
>>>
>>> 2. You can't write to unlogged tables on standby servers, so this
>>> doesn't help solve the problem of wanting to use temporary tables on
>>> standbys.
>>>
>>> 3. While it makes creating temporary tables a lighter-weight
>>> operation, because you no longer need to write WAL for the catalog
>>> entries, there's probably still substantially more overhead than just
>>> stuffing them in backend-local RAM.  So the performance benefits are
>>> probably fairly modest.
>>>
>>> Overall I feel like the development effort that it would take to make
>>> this work would almost certainly be better-expended elsewhere.  But of
>>> course I'm not in charge of how people who work for other companies
>>> spend their time...
>>>
>>
>> Could someone please explain how the unlogged tables are supposed to fix the
>> catalog bloat problem, as stated in the initial patch submission? We'd still
>> need to insert/delete the catalog rows when creating/dropping the temporary
>> tables, causing the bloat. Or is there something I'm missing?
>
> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
>
> Perhaps reserving a worker or N to run only on catalog schemas?
>
> That'd be far simpler.

Maybe, although IIRC the issues with catalog bloat were due to a 
combination of long queries and many temporary tables being 
created/dropped. In that case simply ramping up autovacuum (or even 
having a dedicated workers for catalogs) would not realy help due to the 
xmin horizon being blocked by the long-running queries.

Maybe it's entirely crazy idea due to the wine I drank at the dinner, 
but couldn't we vacuum the temporary table records differently? For 
example, couldn't we just consider them removable as soon as the backend 
that owns them disappears?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Claudio Freire
Дата:
On Tue, Aug 23, 2016 at 7:20 PM, Andres Freund <andres@anarazel.de> wrote:
>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
>
> Not really in my experience, at least not without more drastic vacuum
> changes. The issue is that if you have a single "long running"
> transaction (in some workloads that can even just be a 3 min taking
> query/xact), nothing will be cleaned up during that time. If you have a
> few hundred temp tables created per sec, you'll be in trouble even
> then. Not to speak of the case where you have queries taking hours (say
> a backup).

Well, my experience isn't as extreme as that (just a few dozen temp
tables per minute), but when I see bloat in catalog tables it's
because all autovacuum workers are stuck vacuuming huge tables for
huge periods of time (hours or days).

So that's certainly another bloat case to consider.



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Andres Freund
Дата:
On 2016-08-23 19:33:33 -0300, Claudio Freire wrote:
> On Tue, Aug 23, 2016 at 7:20 PM, Andres Freund <andres@anarazel.de> wrote:
> >> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
> >
> > Not really in my experience, at least not without more drastic vacuum
> > changes. The issue is that if you have a single "long running"
> > transaction (in some workloads that can even just be a 3 min taking
> > query/xact), nothing will be cleaned up during that time. If you have a
> > few hundred temp tables created per sec, you'll be in trouble even
> > then. Not to speak of the case where you have queries taking hours (say
> > a backup).
> 
> Well, my experience isn't as extreme as that (just a few dozen temp
> tables per minute), but when I see bloat in catalog tables it's
> because all autovacuum workers are stuck vacuuming huge tables for
> huge periods of time (hours or days).

Well, that's because our defaults are batshit stupid.



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Claudio Freire
Дата:
On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>>> Could someone please explain how the unlogged tables are supposed to fix
>>> the
>>> catalog bloat problem, as stated in the initial patch submission? We'd
>>> still
>>> need to insert/delete the catalog rows when creating/dropping the
>>> temporary
>>> tables, causing the bloat. Or is there something I'm missing?
>>
>>
>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
>>
>> Perhaps reserving a worker or N to run only on catalog schemas?
>>
>> That'd be far simpler.
>
>
> Maybe, although IIRC the issues with catalog bloat were due to a combination
> of long queries and many temporary tables being created/dropped. In that
> case simply ramping up autovacuum (or even having a dedicated workers for
> catalogs) would not realy help due to the xmin horizon being blocked by the
> long-running queries.
>
> Maybe it's entirely crazy idea due to the wine I drank at the dinner, but
> couldn't we vacuum the temporary table records differently? For example,
> couldn't we just consider them removable as soon as the backend that owns
> them disappears?

Or perhaps go all the way and generalize that to rows that never
become visible outside their parent transaction.

As in, delete of rows created by the deleting transaction could clean
up, carefully to avoid voiding indexes and all that, but more
aggressively than regular deletes.



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Greg Stark
Дата:
On Tue, Aug 23, 2016 at 4:15 PM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> Frankly I have much more faith in Tom's idea of using virtual part of the
> catalog for all temporary tables, i.e turning all temporary tables into
> "fast" temporary tables. Instead of introducing a new type of temporary tables
> that solve catalog bloating problem and forcing users to rewrite applications
> why not just not to create a problem in a first place?

Would applications really need to be rewritten? Are they really
constructing temporary tables where the definition of the table is
dynamic, not just the content? I think application authors would be
pretty happy to not need to keep recreating the same tables over and
over again and dealing with DDL in their run-time code. It's not
really rewriting an application to just remove that DDL and move it to
the one-time database schema creation.

I think it's clear we got the idea of temporary tables wrong when we
implemented them and the SQL standard is more useful. It's not just
some implementation artifact that it's possible to implement them in
an efficient way. It's a fundamental design change and experience
shows that separating DDL and making it static while the DML is
dynamic is just a better design.


-- 
greg



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:

On 08/24/2016 12:38 AM, Claudio Freire wrote:
> On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>>> Could someone please explain how the unlogged tables are supposed to fix
>>>> the
>>>> catalog bloat problem, as stated in the initial patch submission? We'd
>>>> still
>>>> need to insert/delete the catalog rows when creating/dropping the
>>>> temporary
>>>> tables, causing the bloat. Or is there something I'm missing?
>>>
>>>
>>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
>>>
>>> Perhaps reserving a worker or N to run only on catalog schemas?
>>>
>>> That'd be far simpler.
>>
>>
>> Maybe, although IIRC the issues with catalog bloat were due to a combination
>> of long queries and many temporary tables being created/dropped. In that
>> case simply ramping up autovacuum (or even having a dedicated workers for
>> catalogs) would not realy help due to the xmin horizon being blocked by the
>> long-running queries.
>>
>> Maybe it's entirely crazy idea due to the wine I drank at the dinner, but
>> couldn't we vacuum the temporary table records differently? For example,
>> couldn't we just consider them removable as soon as the backend that owns
>> them disappears?
>
> Or perhaps go all the way and generalize that to rows that never
> become visible outside their parent transaction.
>
> As in, delete of rows created by the deleting transaction could clean
> up, carefully to avoid voiding indexes and all that, but more
> aggressively than regular deletes.
>

Maybe, but I wouldn't be surprised if such generalization would be an 
order of magnitude more complicated - and even the vacuuming changes I 
mentioned are undoubtedly a fair amount of work.

Sadly, I don't see how this might fix the other issues mentioned in this 
thread (e.g. impossibility to create temp tables on standbys),


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Claudio Freire
Дата:
On Tue, Aug 23, 2016 at 8:50 PM, Greg Stark <stark@mit.edu> wrote:
> On Tue, Aug 23, 2016 at 4:15 PM, Aleksander Alekseev
> <a.alekseev@postgrespro.ru> wrote:
>> Frankly I have much more faith in Tom's idea of using virtual part of the
>> catalog for all temporary tables, i.e turning all temporary tables into
>> "fast" temporary tables. Instead of introducing a new type of temporary tables
>> that solve catalog bloating problem and forcing users to rewrite applications
>> why not just not to create a problem in a first place?
>
> Would applications really need to be rewritten? Are they really
> constructing temporary tables where the definition of the table is
> dynamic, not just the content?

Mine is. But it wouldn't be a big deal to adapt.



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Claudio Freire
Дата:
On Tue, Aug 23, 2016 at 9:12 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 08/24/2016 12:38 AM, Claudio Freire wrote:
>>
>> On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>>>>
>>>>> Could someone please explain how the unlogged tables are supposed to
>>>>> fix
>>>>> the
>>>>> catalog bloat problem, as stated in the initial patch submission? We'd
>>>>> still
>>>>> need to insert/delete the catalog rows when creating/dropping the
>>>>> temporary
>>>>> tables, causing the bloat. Or is there something I'm missing?
>>>>
>>>>
>>>>
>>>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
>>>>
>>>> Perhaps reserving a worker or N to run only on catalog schemas?
>>>>
>>>> That'd be far simpler.
>>>
>>>
>>>
>>> Maybe, although IIRC the issues with catalog bloat were due to a
>>> combination
>>> of long queries and many temporary tables being created/dropped. In that
>>> case simply ramping up autovacuum (or even having a dedicated workers for
>>> catalogs) would not realy help due to the xmin horizon being blocked by
>>> the
>>> long-running queries.
>>>
>>> Maybe it's entirely crazy idea due to the wine I drank at the dinner, but
>>> couldn't we vacuum the temporary table records differently? For example,
>>> couldn't we just consider them removable as soon as the backend that owns
>>> them disappears?
>>
>>
>> Or perhaps go all the way and generalize that to rows that never
>> become visible outside their parent transaction.
>>
>> As in, delete of rows created by the deleting transaction could clean
>> up, carefully to avoid voiding indexes and all that, but more
>> aggressively than regular deletes.
>>
>
> Maybe, but I wouldn't be surprised if such generalization would be an order
> of magnitude more complicated - and even the vacuuming changes I mentioned
> are undoubtedly a fair amount of work.

After looking at it from a birdseye view, I agree it's conceptually
complex (reading HeapTupleSatisfiesSelf already makes one dizzy).

But other than that, the implementation seems rather simple. It seems
to me, if one figures out that it is safe to do so (a-priori, xmin not
committed, xmax is current transaction), it would simply be a matter
of chasing the HOT chain root, setting all LP except the first to
LP_UNUSED and the first one to LP_DEAD.

Of course I may be missing a ton of stuff.

> Sadly, I don't see how this might fix the other issues mentioned in this
> thread (e.g. impossibility to create temp tables on standbys),

No it doesn't :(



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Alvaro Herrera
Дата:
Claudio Freire wrote:

> After looking at it from a birdseye view, I agree it's conceptually
> complex (reading HeapTupleSatisfiesSelf already makes one dizzy).
> 
> But other than that, the implementation seems rather simple. It seems
> to me, if one figures out that it is safe to do so (a-priori, xmin not
> committed, xmax is current transaction), it would simply be a matter
> of chasing the HOT chain root, setting all LP except the first to
> LP_UNUSED and the first one to LP_DEAD.
> 
> Of course I may be missing a ton of stuff.

What you seem to be missing is that rows corresponding to temp tables
are not "visible to its own transaction only".  The rows are valid
after the transaction is gone; what makes the tables temporary is the
fact that they are in a temporary schema.  And what makes them invisible
to one backend is the fact that they are in the temporary schema for
another backend.  Not that they are uncommitted.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Claudio Freire
Дата:
On Wed, Aug 24, 2016 at 2:04 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Claudio Freire wrote:
>
>> After looking at it from a birdseye view, I agree it's conceptually
>> complex (reading HeapTupleSatisfiesSelf already makes one dizzy).
>>
>> But other than that, the implementation seems rather simple. It seems
>> to me, if one figures out that it is safe to do so (a-priori, xmin not
>> committed, xmax is current transaction), it would simply be a matter
>> of chasing the HOT chain root, setting all LP except the first to
>> LP_UNUSED and the first one to LP_DEAD.
>>
>> Of course I may be missing a ton of stuff.
>
> What you seem to be missing is that rows corresponding to temp tables
> are not "visible to its own transaction only".  The rows are valid
> after the transaction is gone; what makes the tables temporary is the
> fact that they are in a temporary schema.  And what makes them invisible
> to one backend is the fact that they are in the temporary schema for
> another backend.  Not that they are uncommitted.

Yeah, I was thinking of "on commit drop" behavior, but granted there's
all the others.



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Robert Haas
Дата:
On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Could someone please explain how the unlogged tables are supposed to fix the
> catalog bloat problem, as stated in the initial patch submission? We'd still
> need to insert/delete the catalog rows when creating/dropping the temporary
> tables, causing the bloat. Or is there something I'm missing?

No, not really.  Jim just asked if the idea of partitioning the
columns was completely dead in the water, and I said, no, you could
theoretically salvage it.  Whether that does you much good is another
question.

IMV, the point here is that you MUST have globally visible dependency
entries for this to work sanely.  If they're not in a catalog, they
have to be someplace else, and backend-private memory isn't good
enough, because that's not globally visible.  Until we've got a
strategy for that problem, this whole effort is going nowhere - even
though in other respects it may be a terrific idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:

On 08/24/2016 12:20 AM, Andres Freund wrote:
> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:
>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> Could someone please explain how the unlogged tables are supposed to fix the
>>> catalog bloat problem, as stated in the initial patch submission? We'd still
>>> need to insert/delete the catalog rows when creating/dropping the temporary
>>> tables, causing the bloat. Or is there something I'm missing?
>
> Beats me.
>

Are you puzzled just like me, or are you puzzled why I'm puzzled?



-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Andres Freund
Дата:

On August 24, 2016 9:32:48 AM PDT, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>
>On 08/24/2016 12:20 AM, Andres Freund wrote:
>> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:
>>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
>>> <tomas.vondra@2ndquadrant.com> wrote:
>>>> Could someone please explain how the unlogged tables are supposed
>to fix the
>>>> catalog bloat problem, as stated in the initial patch submission?
>We'd still
>>>> need to insert/delete the catalog rows when creating/dropping the
>temporary
>>>> tables, causing the bloat. Or is there something I'm missing?
>>
>> Beats me.
>>
>
>Are you puzzled just like me, or are you puzzled why I'm puzzled?

Like you. I don't think this addresses the problem to a significant enough degree to care.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Corey Huinker
Дата:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Aug 24, 2016 at 12:39 PM, Andres Freund <span
dir="ltr"><<ahref="mailto:andres@anarazel.de" target="_blank">andres@anarazel.de</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class=""><br/><br /> On August 24, 2016 9:32:48 AM PDT, Tomas Vondra <<a
href="mailto:tomas.vondra@2ndquadrant.com">tomas.vondra@2ndquadrant.com</a>>wrote:<br /> ><br /> ><br />
>On08/24/2016 12:20 AM, Andres Freund wrote:<br /> >> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:<br
/>>>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra<br /> >>> <<a
href="mailto:tomas.vondra@2ndquadrant.com">tomas.vondra@2ndquadrant.com</a>>wrote:<br /> >>>> Could
someoneplease explain how the unlogged tables are supposed<br /> >to fix the<br /> >>>> catalog bloat
problem,as stated in the initial patch submission?<br /> >We'd still<br /> >>>> need to insert/delete
thecatalog rows when creating/dropping the<br /> >temporary<br /> >>>> tables, causing the bloat. Or is
theresomething I'm missing?<br /> >><br /> >> Beats me.<br /> >><br /> ><br /> >Are you puzzled
justlike me, or are you puzzled why I'm puzzled?<br /><br /></span>Like you. I don't think this addresses the problem
toa significant enough degree to care.<br /><span class="HOEnZb"><font color="#888888"><br /> Andres<br /> --<br />
Sentfrom my Android device with K-9 Mail. Please excuse my brevity.<br /></font></span><div class="HOEnZb"><div
class="h5"><br/><br /> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" rel="noreferrer"
target="_blank">http://www.postgresql.org/<wbr/>mailpref/pgsql-hackers</a><br /></div></div></blockquote></div><br
/></div><divclass="gmail_extra">Ok, here's a wild idea, and it probably depends on having native partitioning
implemented.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Propagate relpersistence, or a boolean
flagon (relpersistence = 't') from pg_class into the child pg_attribute records.</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">Partition the tables pg_class and pg_attribute first by relpersistence, and then by
oid.<br/><br />The partitions holding data on persistent objects would basically stay as-is, but the partition wouldn't
havemuch activity and no temp-table churn.<br /><br />The temporary ones, however, would fall into essentially a
rotatingset of partitions. Pick enough partitions such that the active transactions only cover some of the partitions.
Therest can be safely truncated by vacuum.<br /><br />It would mitigate the bloat, existing dictionary queries would
stillwork, but the additional lookup cost might not be worth it.</div></div> 

Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Vik Fearing
Дата:
On 08/24/2016 06:16 PM, Robert Haas wrote:
> On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> Could someone please explain how the unlogged tables are supposed to fix the
>> catalog bloat problem, as stated in the initial patch submission? We'd still
>> need to insert/delete the catalog rows when creating/dropping the temporary
>> tables, causing the bloat. Or is there something I'm missing?
> 
> No, not really.  Jim just asked if the idea of partitioning the
> columns was completely dead in the water, and I said, no, you could
> theoretically salvage it.  Whether that does you much good is another
> question.
> 
> IMV, the point here is that you MUST have globally visible dependency
> entries for this to work sanely.  If they're not in a catalog, they
> have to be someplace else, and backend-private memory isn't good
> enough, because that's not globally visible.  Until we've got a
> strategy for that problem, this whole effort is going nowhere - even
> though in other respects it may be a terrific idea.

Why not just have a regular-looking table, with a "global temporary"
relpersistence (I don't care which letter it gets) and when a backend
tries to access it, it uses its own private relfilenode instead of
whatever is in pg_class, creating one if necessary.  That way the
structure of the table is fixed, with all the dependencies and whatnot,
but the content is private to each backend.  What's wrong with this idea?
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:

On 08/31/2016 09:20 PM, Vik Fearing wrote:
> On 08/24/2016 06:16 PM, Robert Haas wrote:
>> On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> Could someone please explain how the unlogged tables are supposed to fix the
>>> catalog bloat problem, as stated in the initial patch submission? We'd still
>>> need to insert/delete the catalog rows when creating/dropping the temporary
>>> tables, causing the bloat. Or is there something I'm missing?
>>
>> No, not really.  Jim just asked if the idea of partitioning the
>> columns was completely dead in the water, and I said, no, you could
>> theoretically salvage it.  Whether that does you much good is another
>> question.
>>
>> IMV, the point here is that you MUST have globally visible dependency
>> entries for this to work sanely.  If they're not in a catalog, they
>> have to be someplace else, and backend-private memory isn't good
>> enough, because that's not globally visible.  Until we've got a
>> strategy for that problem, this whole effort is going nowhere - even
>> though in other respects it may be a terrific idea.
> 
> Why not just have a regular-looking table, with a "global temporary"
> relpersistence (I don't care which letter it gets) and when a backend
> tries to access it, it uses its own private relfilenode instead of
> whatever is in pg_class, creating one if necessary.  That way the
> structure of the table is fixed, with all the dependencies and whatnot,
> but the content is private to each backend.  What's wrong with this idea?
> 

It's an improvement (and it's pretty much exactly what I proposed
upthread). But it does not solve the problems with pg_statistic for
example (each backend needs it's own statistics. So we'd either bloat
the pg_statistic (if we manage to solve the problem that the table has
the same oid in all backends), or we would need in-memory tuples (just
like discussed in the thread so far).



-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Andres Freund
Дата:
On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote:
> It's an improvement (and it's pretty much exactly what I proposed
> upthread). But it does not solve the problems with pg_statistic for
> example (each backend needs it's own statistics. So we'd either bloat
> the pg_statistic (if we manage to solve the problem that the table has
> the same oid in all backends), or we would need in-memory tuples (just
> like discussed in the thread so far).

Creating a session private version of pg_statistic would be fairly
simple.



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Tomas Vondra
Дата:

On 08/31/2016 11:43 PM, Andres Freund wrote:
> On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote:
>> It's an improvement (and it's pretty much exactly what I proposed
>> upthread). But it does not solve the problems with pg_statistic for
>> example (each backend needs it's own statistics. So we'd either bloat
>> the pg_statistic (if we manage to solve the problem that the table has
>> the same oid in all backends), or we would need in-memory tuples (just
>> like discussed in the thread so far).
> 
> Creating a session private version of pg_statistic would be fairly
> simple.

Sure. I'm just saying it's not as simple as overriding relpath.

ISTM we only need the pg_statistics (as other catalogs are connected to
the pg_class entry), which does not have the dependency issues. Or do we
need other catalogs?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Andres Freund
Дата:

On August 31, 2016 3:00:15 PM PDT, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>
>On 08/31/2016 11:43 PM, Andres Freund wrote:
>> On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote:
>>> It's an improvement (and it's pretty much exactly what I proposed
>>> upthread). But it does not solve the problems with pg_statistic for
>>> example (each backend needs it's own statistics. So we'd either
>bloat
>>> the pg_statistic (if we manage to solve the problem that the table
>has
>>> the same oid in all backends), or we would need in-memory tuples
>(just
>>> like discussed in the thread so far).
>> 
>> Creating a session private version of pg_statistic would be fairly
>> simple.
>
>Sure. I'm just saying it's not as simple as overriding relpath.
>
>ISTM we only need the pg_statistics (as other catalogs are connected to
>the pg_class entry), which does not have the dependency issues. Or do
>we
>need other catalogs?

In my experience pg attribute is usually the worst affected. Many tech takes won't even have stays entries...

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

От
Corey Huinker
Дата:
On Wed, Aug 31, 2016 at 6:07 PM, Andres Freund <andres@anarazel.de> wrote:

In my experience pg attribute is usually the worst affected. Many tech takes won't even have stays entries...


Mine too. One database currently has a 400GB pg_attribute table, because we chew through temp tables like popcorn.