Обсуждение: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)

Поиск
Список
Период
Сортировка

Temp table handling after anti-wraparound shutdown (Was: BUG #15840)

От
Andres Freund
Дата:
Hi,

(Moving a part of this discussion to hackers)

In #15840 Thierry had the situation that autovacuum apparently could not
keep up, and he ended up with a wraparound situation. Following the
hints and shutting down the cluster and vacuuming the relevant DB in
single user mode did not resolve the issue however. That's because there
was a session with temp tables:

On 2019-06-07 16:40:27 -0500, Thierry Husson wrote:
>   oid   |                 oid                  | relkind | relfrozenxid |
> age
> --------+--------------------------------------+---------+--------------+------------
>  460564 | pg_temp_3.cur_semt700_progsync_4996  | r       |        36464 |
> 2146483652
>  460764 | pg_temp_8.cur_semt700_progsync_5568  | r       |     19836544 |
> 2126683572
>  460718 | pg_temp_4.cur_semt700_progsync_5564  | r       |     19836544 |
> 2126683572
>  460721 | pg_temp_5.cur_semt700_progsync_5565  | r       |     19836544 |
> 2126683572
>  461068 | pg_temp_22.cur_semt700_progsync_5581 | r       |     19836544 |
> 2126683572
> 
> These are temporary tables to manage concurrency & server load. It seems the
> sudden disconnection due to wraparound protection didn't get them removed. I
> removed them manually under single mode and there is no more warning now,
> vacuum command included. Your command is very interesting to know.

And our logic for dropping temp tables only kicks in autovacuum, but not
in a database manual VACUUM.

Which means that currently the advice we give, namely to shut down and
vacuum the database in singleuser mode plainly doesn't work. Without any
warnings hinting in the right direction.

Do we need to move the orphan temp cleanup code into database vacuums or
such?

Greetings,

Andres Freund



Re: Temp table handling after anti-wraparound shutdown (Was: BUG#15840)

От
Michael Paquier
Дата:
On Fri, Jun 07, 2019 at 03:58:43PM -0700, Andres Freund wrote:
> Do we need to move the orphan temp cleanup code into database vacuums or
> such?

When entering into the vacuum() code path for an autovacuum, only one
relation at a time is processed, and we have prior that extra
processing related to toast relations when selecting the relations to
work on, or potentially delete orphaned temp tables.  For a manual
vacuum, we finish by deciding which relation to process in
get_all_vacuum_rels(), so the localized processing is a bit different
than what's done in do_autovacuum() when scanning pg_class for
relations.

Technically, I think that it would work to give up on the gathering of
the orphaned OIDs in a gathering and let them be gathered in the list
of items to vacuum, and then put the deletion logic down to
vacuum_rel().  However, there is a take: for autovacuum we gather the
orphaned entries and the other relations to process, then drop all the
orphaned OIDs, and finally vacuum/analyze the entries collected.  So
if you put the deletion logic down into vacuum_rel() then we won't be
able to drop orphaned tables before working on a database, which would
be bad if we know about an orphaned set, but autovacuum works for a
long time on other legit entries first.
--
Michael

Вложения

Re: Temp table handling after anti-wraparound shutdown (Was: BUG#15840)

От
Andres Freund
Дата:
Hi,

On 2019-06-08 08:59:37 +0900, Michael Paquier wrote:
> On Fri, Jun 07, 2019 at 03:58:43PM -0700, Andres Freund wrote:
> > Do we need to move the orphan temp cleanup code into database vacuums or
> > such?
> 
> When entering into the vacuum() code path for an autovacuum, only one
> relation at a time is processed, and we have prior that extra
> processing related to toast relations when selecting the relations to
> work on, or potentially delete orphaned temp tables.  For a manual
> vacuum, we finish by deciding which relation to process in
> get_all_vacuum_rels(), so the localized processing is a bit different
> than what's done in do_autovacuum() when scanning pg_class for
> relations.

Yea, I know.  I didn't mean that we should only handle orphan cleanup
only within database wide vacuums, just *also* there. ISTM that'd mean
that at least some of the code ought to be in vacuum.c, and then also
called by autovacuum.c.


> Technically, I think that it would work to give up on the gathering of
> the orphaned OIDs in a gathering and let them be gathered in the list
> of items to vacuum, and then put the deletion logic down to
> vacuum_rel().

I don't think it makes much sense to go there. The API would probably
look pretty bad.

I was more thinking that we'd move the check for orphaned-ness into a
separate function (maybe IsOrphanedRelation()), and move the code to
drop orphan relations into a separate function (maybe
DropOrphanRelations()). That'd limit the amount of code duplication for
doing this both in autovacuum and all-database vacuums quite
considerably.

A more aggressive approach would be to teach vac_update_datfrozenxid()
to ignore orphaned temp tables - perhaps even by heap_inplace'ing an
orphaned table's relfrozenxid/relminmxid with InvalidTransactionId. We'd
not want to do that in do_autovacuum() - otherwise the schema won't get
cleaned up, but for database widevacuums that seems like it could be
good approach.



Random observation while re-reading this code: Having do_autovacuum()
and ExecVacuum() both go through vacuum() seems like it adds too much
complexity to be worth it. Like half of the file is only concerned with
checks related to that.

Greetings,

Andres Freund



Re: Temp table handling after anti-wraparound shutdown (Was: BUG#15840)

От
Michael Paquier
Дата:
On Fri, Jun 07, 2019 at 05:26:32PM -0700, Andres Freund wrote:
> I was more thinking that we'd move the check for orphaned-ness into a
> separate function (maybe IsOrphanedRelation()), and move the code to
> drop orphan relations into a separate function (maybe
> DropOrphanRelations()). That'd limit the amount of code duplication for
> doing this both in autovacuum and all-database vacuums quite
> considerably.

A separation makes sense.  At some point we should actually try to
separate vacuum and orphan relation cleanup, so separate functions
make sense.  The only reason why we are doing it with autovacuum is
that it is the only thing in-core spawning a worker connected to a
database which does a full scan of pg_class.
--
Michael

Вложения

Re: Temp table handling after anti-wraparound shutdown (Was: BUG#15840)

От
Thierry Husson
Дата:
I like the approach proposed by Andres: A more aggressive approach  
would be to teach vac_update_datfrozenxid() to ignore orphaned temp  
tables... In fact, I suppose all temporary tables and their content  
could be completly ignored by MVCC principles as they are not subject  
to concurrency being unmodifiable/unreadable by other connections.

That would solve a major problem I have because I automaticaly create  
an empty temporary table for each connection in each DB process to  
manage users' activities/system resources. Even when everything goes  
well, these tables take age as long as they exists, even if I  
explicitly vacuum them, frozen or not. So any connection kept open for  
a long time will finish by causing a anti-wraparound shutdown. For now  
the only solution I have is to kill my deamons connections every day.

I suppose this could be tested by a simple PSQL left open after a  
CREATE TEMP TABLE toto (a INT). Any vacuum can't reduce its age.

The separate situation, as noted by Michael, could be done at  
connection time, when PG gives a temporay schema to it. When it create  
a pg_temp_XXX schema, it could make sure it's completely empty and  
otherwise remove everything in it. I already had a DB corruption  
because system tables weren't in sync about these tables/schemas after  
a badly closed connection, so it was impossible to make a drop table  
on them. So it could be even safer to clear everything directly from  
system tables instead of calling drop table for each leftover temp  
table.

Thierry

Michael Paquier <michael@paquier.xyz> a écrit :

> On Fri, Jun 07, 2019 at 05:26:32PM -0700, Andres Freund wrote:
>> I was more thinking that we'd move the check for orphaned-ness into a
>> separate function (maybe IsOrphanedRelation()), and move the code to
>> drop orphan relations into a separate function (maybe
>> DropOrphanRelations()). That'd limit the amount of code duplication for
>> doing this both in autovacuum and all-database vacuums quite
>> considerably.
>
> A separation makes sense.  At some point we should actually try to
> separate vacuum and orphan relation cleanup, so separate functions
> make sense.  The only reason why we are doing it with autovacuum is
> that it is the only thing in-core spawning a worker connected to a
> database which does a full scan of pg_class.
> --
> Michael






Re: Temp table handling after anti-wraparound shutdown (Was: BUG#15840)

От
Andres Freund
Дата:
Hi,

(on postgres lists, please do not top-quote).

On 2019-06-08 04:06:39 -0500, Thierry Husson wrote:
> In fact, I suppose all temporary tables and their content could be
> completly ignored by MVCC principles as they are not subject to
> concurrency being unmodifiable/unreadable by other connections.

That'd cause corruption, because vacuum would then remove resources that
the temp table might rely on (commit log, multixacts, ...).


> The separate situation, as noted by Michael, could be done at connection
> time, when PG gives a temporay schema to it. When it create a pg_temp_XXX
> schema, it could make sure it's completely empty and otherwise remove
> everything in it.

That already happens, but unfortunately only too late. IIRC We only do
so once the first temp table in a session is created.


> I already had a DB corruption because system tables weren't in sync
> about these tables/schemas after a badly closed connection, so it was
> impossible to make a drop table on them. So it could be even safer to
> clear everything directly from system tables instead of calling drop
> table for each leftover temp table.

Hm, I'd like to know more about that corruption. Did you report it when
it occured?

Greetings,

Andres Freund



Re: Temp table handling after anti-wraparound shutdown (Was: BUG#15840)

От
Thierry Husson
Дата:
> Hm, I'd like to know more about that corruption. Did you report it when
> it occured?
>
> Greetings,
>
> Andres Freund

Thanks Andres for explanations, sorry for my previous mess. I didn't  
reported the corruption when it occured as it was my fault, not a PG  
bug, as the main cause was that I was using network drive, knowingly  
it's unreliable for DB but management didn't believe me.

I had these kind of errors:

pg_dump emet_istina -F c -n usr_...
pg_dump: schema with OID 308991 does not exist

\dt+ pg_temp*.*
ERROR:  catalog is missing 1 attribute(s) for relid 5733555

drop schema pg_temp_9;
ERROR:  cache lookup failed for relation 5733715

drop schema pg_temp_6;
ERROR:  cannot drop schema pg_temp_6 because other objects depend on it
DETAIL:  table pg_temp_6.cur_dde000_105577 depends on schema pg_temp_6
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

I had to manualy remove/edit records from pg_class, pg_type, pg_namespace,
pg_depend, pg_shdepend.

I finaly managed to make it works and could dump everything and  
rebuild the DB for more security. Server was down for 1 week, and that  
event gave me proven arguments to have local storage. That was with  
9.6 and I took the opportunity to upgrade to 10.3 at the same time.

Now it's more clear it's a PG9/10/12 problem (didn't tried 11)  with  
vacuum/autovacuum not changing xid on temp tables. So, as long a temp  
table exists, it take age and finish by causing a wraparound protection.

Thierry





Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)

От
Masahiko Sawada
Дата:
On Sat, Jun 8, 2019 at 9:26 AM Andres Freund <andres@anarazel.de> wrote:
>
>
> A more aggressive approach would be to teach vac_update_datfrozenxid()
> to ignore orphaned temp tables - perhaps even by heap_inplace'ing an
> orphaned table's relfrozenxid/relminmxid with InvalidTransactionId. We'd
> not want to do that in do_autovacuum() - otherwise the schema won't get
> cleaned up, but for database widevacuums that seems like it could be
> good approach.
>

FWIW I like this approach. We don't need to calculate new datfrozenxid
while including orphaned temp tables. It both improves behavior and
fixes this issue. Also with that approach we will not need stop
database cluster and do vacuuming in single user mode. The making the
vacuum command cleanup orphaned temp tables would be helpful in the
case where we reached to wraparound while having active temp tables,
it doesn't happen in normal use case though.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center