Generally, a subsystem can avoid needing a callback at subtransaction start (or transaction start) by detecting new levels of subtransactions at time of use.
Yes I agree with this argument.
A typical practice is to maintain a stack which has entries only for those transaction nesting levels where the functionality was used. The attached patch implements this method for async.c.
I have reviewed your patch, and it seems correctly implementing the
actions per subtransactions using stack. Atleast I could not find
any flaw with your implementation here.
I was a little surprised to find that it makes a pretty noticeable performance difference when starting and ending trivial subtransactions. I used this test case:
\timing do $$begin for i in 1 .. 10000000 loop begin null; exception when others then null; end; end loop; end;$$;
I ran your testcase and on my VM I get numbers like 3593.801 ms
without patch and 3593.801 with the patch, average of 5 runs each.