Обсуждение: Broken stuff in new dtrace probes
I notice that we have in md.c TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode,relpath(reln->smgr_rnode, forknum), nbytes, BLCKSZ); TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode,relpath(reln->smgr_rnode, forknum), nbytes, BLCKSZ); relpath() returns a palloc'd string, which will not get freed, which means that any serious use of these probe points will shortly blow out the backend's memory. So far as I can see the path argument is redundant with the other probe arguments and we might as well just remove it --- objections? There's another problem in tuplesort.c: TRACE_POSTGRESQL_SORT_DONE(state->tapeset, (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) : (state->allowedMem- state->availMem + 1023) / 1024)); This is called after state->tapeset has been freed, which means that the LogicalTapeSetBlocks call will very likely give a wrong answer, especially in assert-enabled builds but maybe even in a regular one. Seems that we need to have been quality-checking the dtrace patches a bit more carefully. regards, tom lane
I wrote: > Seems that we need to have been quality-checking the dtrace patches > a bit more carefully. I went over all the existing dtrace probe calls and fixed what seemed clearly bogus, but there was one issue that seems like a bit of a judgement call. In ReadBuffer_common() we have isExtend = (blockNum == P_NEW); /* Substitute proper block number if caller asked for P_NEW */ if (isExtend) blockNum = smgrnblocks(smgr, forkNum); TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum, smgr->smgr_rnode.spcNode, smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf); if (isLocalBuf) ... etc etc ... What's bothering me about this is that in the isExtend case, the potentially significant cost of the smgrnblocks() call is not going to get charged as part of the buffer read time. An easy answer is to move the TRACE call in front of that, which would mean that in the isExtend case the trace would see blockNum == P_NEW rather than the actual block number. You could argue it either way as to whether that's good or bad, perhaps -- it would make it a tad harder to match up read_start and read_done calls, but it would also make it clearer which calls are for file extension and which are ordinary reads. Furthermore, an isExtend call doesn't actually do a read(), so lumping them together with regular reads doesn't seem like quite the right thing for performance measurement purposes anyway. Maybe we actually ought to have different probes for isExtend and regular cases. Comments? regards, tom lane
On Wed, Mar 11, 2009 at 11:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Furthermore, an isExtend call doesn't actually do a read(), so lumping > them together with regular reads doesn't seem like quite the right thing > for performance measurement purposes anyway. Maybe we actually ought to > have different probes for isExtend and regular cases. i like the idea of just have a separate pair of probes for table extension. I bet there are people who would actually like to see that alone sometimes too. I'm sure these probes will be refined over time as we get more experience analyzing with them. They don't have to be perfect right away... -- greg
Greg Stark <stark@enterprisedb.com> writes: > On Wed, Mar 11, 2009 at 11:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Furthermore, an isExtend call doesn't actually do a read(), so lumping >> them together with regular reads doesn't seem like quite the right thing >> for performance measurement purposes anyway. �Maybe we actually ought to >> have different probes for isExtend and regular cases. > i like the idea of just have a separate pair of probes for table > extension. I bet there are people who would actually like to see that > alone sometimes too. After further thought I concluded that the best solution for this is to add the isExtend flag to the buffer_read_start/read_done probe parameter lists. This allows the dtrace script writer to make the distinction if he chooses, without adding any extra overhead for normal non-traced operation. AFAICS using a separate probe type would add at least a couple of if-tests even with tracing turned off. regards, tom lane
Just returned from vacation ... Tom Lane wrote: > I notice that we have in md.c > > TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode,relpath(reln->smgr_rnode, forknum), nbytes, BLCKSZ); > > TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode,relpath(reln->smgr_rnode, forknum), nbytes, BLCKSZ); > > relpath() returns a palloc'd string, which will not get freed, which > means that any serious use of these probe points will shortly blow out > the backend's memory. So far as I can see the path argument is > redundant with the other probe arguments and we might as well just > remove it --- objections? > agreed; the other arguments can be used to get the same result. > There's another problem in tuplesort.c: > > TRACE_POSTGRESQL_SORT_DONE(state->tapeset, > (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) : > (state->allowedMem - state->availMem + 1023) / 1024)); > > This is called after state->tapeset has been freed, which means that the > LogicalTapeSetBlocks call will very likely give a wrong answer, > especially in assert-enabled builds but maybe even in a regular one. > > Good catch and thanks for reviewing the probes. -Robert
Tom Lane wrote: > Greg Stark <stark@enterprisedb.com> writes: > >> i like the idea of just have a separate pair of probes for table >> extension. I bet there are people who would actually like to see that >> alone sometimes too. >> > > After further thought I concluded that the best solution for this is to > add the isExtend flag to the buffer_read_start/read_done probe parameter > lists. This allows the dtrace script writer to make the distinction if > he chooses, without adding any extra overhead for normal non-traced > operation. AFAICS using a separate probe type would add at least a > couple of if-tests even with tracing turned off. > I like this solution. From my perspective, it's always better to give the script writer the flexibility to pick and choose the data s/she wants to see and at the same time avoid adding new probes unnecessarily. -Robert