The main problem I ran into was that the instrumentation nodes currently are
nested. That is, all the time for your children counts against you as well. Is that what we want for I/O costs?
As for me, I see nothing wrong with such costs model. I think it is good to know stuff like "the whole query took 3244 I/O" and "this join takes 34 I/O". So, every instrumentation node should not try to figure out its intrinsic I/O counters.
Another point is both "time" and "I/O" metrics should match. I do not see a reason to change current "actual time" behaviour.
If it is then I think it's fairly simple, have a global set of counters for various I/O events which are zeroed when the executor starts. Every time an instrumentation node starts it notes the starting point for all those counters, whenever it ends take the difference and add that to its personal counts.
I've tried to use "ReadBufferCount and friends" from storage\buffer\buf_init.c, however it is showing zeroes for some unknown yet reason. Hope, there is no fundamental problem behind.
One more problem with current counters in buf_init.c is ResetBufferUsage that zeroes those counters at random (e.g. if log_executor_stats or log_statement_stats is enabled). I am not sure if it could interfere with instrumentation.