Yeah, my concern is not whether the overhead will be zero; it's whether it will be small, yet allow large gains on other operations. Like, how much slower will it be to pull out a moderately complex 1MB JSON blob (not just a big string) out of a single-row, single-column table? If it's 5% slower, that's probably OK, since this is a reasonable approximation of a worst-case scenario. If it's 50% slower, that sounds painful. It would also be worth testing with a much smaller size, such as a 1K object with lots of internal structure. In both cases, all data cached in shared_buffers, etc.
Then on the flip side how do we do on val[37]["whatever"]? You'd like to hope that this will be significantly faster than the text encoding on both large and small objects. If it's not, there's probably not much point.
We're on the same page. I'm implementing the basic cases now and then will come up with some benchmarks.