Обсуждение: Minor postmaster state machine bugs
In pursuit of the problem with standby servers sometimes not responding to fast shutdowns [1], I spent awhile staring at the postmaster's state-machine logic. I have not found a cause for that problem, but I have identified some other things that seem like bugs: 1. sigusr1_handler ignores PMSIGNAL_ADVANCE_STATE_MACHINE unless the current state is PM_WAIT_BACKUP or PM_WAIT_BACKENDS. This restriction seems useless and shortsighted: PostmasterStateMachine should behave sanely regardless of our state, and sigusr1_handler really has no business assuming anything about why a child is asking for a state machine reconsideration. But it's not just not future-proof, it's a live bug even for the one existing use-case, which is that a new walsender sends this signal after it's re-marked itself as being a walsender rather than a normal backend. Consider this sequence of events: * System is running as a hot standby and allowing cascaded replication. There are no live backends. * New replication connection request is received and forked off. (At this point the postmaster thinks this child is a normal session backend.) * SIGTERM (Smart Shutdown) is received. Postmaster will transition to PM_WAIT_READONLY. I don't think it would have autovac or bgworker or bgwriter or walwriter children, but if so, assume they all exit before the next step. Postmaster will continue to sleep, waiting for its one "normal" child backend to finish. * Replication connection request completes, so child re-marks itself as a walsender and sends PMSIGNAL_ADVANCE_STATE_MACHINE. * Postmaster ignores signal because it's in the "wrong" state, so it doesn't realize it now has no normal backend children. * Postmaster waits forever, or at least till DBA loses patience and sends a stronger signal. This scenario doesn't explain the buildfarm failures since those don't involve smart shutdowns (and I think they don't involve cascaded replication either). Still, it's clearly a bug, which I think we should fix by removing the pointless restriction on whether PostmasterStateMachine can be called. Also, I'm inclined to think that that should be the *last* step in sigusr1_handler, not randomly somewhere in the middle. As coded, it's basically assuming that no later action in sigusr1_handler could affect anything that PostmasterStateMachine cares about, which even if it's true today is another highly not-future-proof assumption. 2. MaybeStartWalReceiver will clear the WalReceiverRequested flag even if it fails to launch a child process for some reason. This is just dumb; it should leave the flag set so that we'll try again next time through the postmaster's idle loop. 3. PostmasterStateMachine's handling of PM_SHUTDOWN_2 is: if (pmState == PM_SHUTDOWN_2) { /* * PM_SHUTDOWN_2 state ends when there's no other children than * dead_end children left. There shouldn't be any regular backends * left by now anyway; what we're really waiting for is walsenders and * archiver. * * Walreceiver should normally be dead by now, but not when a fast * shutdown is performed during recovery. */ if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0 && WalReceiverPID == 0) { pmState = PM_WAIT_DEAD_END; } } The comment about walreceivers is confusing, and it's also wrong. Maybe it was valid when written, but today it's easy to trace the logic and see that we can only get to PM_SHUTDOWN_2 state from PM_SHUTDOWN state, and we can only get to PM_SHUTDOWN state when there is no live walreceiver (cf processing of PM_WAIT_BACKENDS state), and we won't attempt to launch a new walreceiver while in PM_SHUTDOWN or PM_SHUTDOWN_2 state, so it's impossible for there to be any walreceiver here. I think we should just remove that comment and the WalReceiverPID == 0 test. Comments? I think at least the first two points need to be back-patched. regards, tom lane [1] https://www.postgresql.org/message-id/20190416070119.GK2673@paquier.xyz
Hi,
I am working on an FDW where the database does not support any operator other than "=" in JOIN condition. Some queries are genrating the plan with JOIN having "<" operator. How and at what stage I can stop FDW to not make such a plan. Here is my sample query.
tpch=# select
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
customer,
orders,
lineitem
where
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-22'
and l_shipdate > date '1995-03-22'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue,
o_orderdate
LIMIT 10;
QUERY PLAN
...
Merge Cond: (orders.o_orderkey = lineitem.l_orderkey)
-> Foreign Scan (cost=1.00..-1.00 rows=1000 width=50)
Output: orders.o_orderdate, orders.o_shippriority, orders.o_orderkey
Relations: (customer) INNER JOIN (orders)
Remote SQL: SELECT r2.o_orderdate, r2.o_shippriority, r2.o_orderkey FROM db.customer r1 ALL INNER JOIN db.orders r2 ON (((r1.c_custkey = r2.o_custkey)) AND ((r2.o_orderdate < '1995-03-22')) AND ((r1.c_mktsegment = 'BUILDING'))) ORDER BY r2.o_orderkey, r2.o_orderdate, r2.o_shippriority
...
--
Ibrar Ahmed
Ibrar Ahmed <ibrar.ahmad@gmail.com> writes: > I am working on an FDW where the database does not support any operator > other than "=" in JOIN condition. Some queries are genrating the plan with > JOIN having "<" operator. How and at what stage I can stop FDW to not make > such a plan. Here is my sample query. What exactly do you think should happen instead? You can't just tell users not to ask such a query. (Well, you can try, but they'll probably go looking for a less broken FDW.) If what you really mean is you don't want to generate pushed-down foreign join paths containing non-equality conditions, the answer is to just not do that. That'd be the FDW's own fault, not that of the core planner, if it creates a path representing a join it can't actually implement. You'll end up running the join locally, which might not be great, but if you have no other alternative then that's what you gotta do. If what you mean is you don't know how to inspect the join quals to see if you can implement them, take a look at postgres_fdw to see how it handles the same issue. regards, tom lane
Ibrar Ahmed <ibrar.ahmad@gmail.com> writes:
> I am working on an FDW where the database does not support any operator
> other than "=" in JOIN condition. Some queries are genrating the plan with
> JOIN having "<" operator. How and at what stage I can stop FDW to not make
> such a plan. Here is my sample query.
What exactly do you think should happen instead? You can't just tell
users not to ask such a query. (Well, you can try, but they'll probably
go looking for a less broken FDW.)
If what you really mean is you don't want to generate pushed-down
foreign join paths containing non-equality conditions, the answer is
to just not do that. That'd be the FDW's own fault, not that of
the core planner, if it creates a path representing a join it
can't actually implement. You'll end up running the join locally,
which might not be great, but if you have no other alternative
then that's what you gotta do.
If what you mean is you don't know how to inspect the join quals
to see if you can implement them, take a look at postgres_fdw
to see how it handles the same issue.
regards, tom lane