Обсуждение: [HACKERS] pgstattuple documentation clarification
Recently a client was confused because there was a substantial difference between the reported table_len of a table and the sum of the corresponding tuple_len, dead_tuple_len and free_space. The docs are fairly silent on this point, and I agree that in the absence of explanation it is confusing, so I propose that we add a clarification note along the lines of: The table_len will always be greater than the sum of the tuple_len, dead_tuple_len and free_space. The difference isaccounted for by page overhead and space that is not free but cannot be attributed to any particular tuple. Or perhaps we should be more explicit and refer to the item pointers on the page. Thoughts? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Recently a client was confused because there was a substantial > difference between the reported table_len of a table and the sum of the > corresponding tuple_len, dead_tuple_len and free_space. The docs are > fairly silent on this point, and I agree that in the absence of > explanation it is confusing, so I propose that we add a clarification > note along the lines of: > The table_len will always be greater than the sum of the tuple_len, > dead_tuple_len and free_space. The difference is accounted for by > page overhead and space that is not free but cannot be attributed to > any particular tuple. > Or perhaps we should be more explicit and refer to the item pointers on > the page. I find "not free but cannot be attributed to any particular tuple" to be entirely useless weasel wording, not to mention wrong with respect to item pointers in particular. Perhaps we should start counting the item pointers in tuple_len. We'd still have to explain about page header overhead, but that would be a pretty small and fixed-size discrepancy. regards, tom lane
On 12/20/2016 10:01 AM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Recently a client was confused because there was a substantial >> difference between the reported table_len of a table and the sum of the >> corresponding tuple_len, dead_tuple_len and free_space. The docs are >> fairly silent on this point, and I agree that in the absence of >> explanation it is confusing, so I propose that we add a clarification >> note along the lines of: >> The table_len will always be greater than the sum of the tuple_len, >> dead_tuple_len and free_space. The difference is accounted for by >> page overhead and space that is not free but cannot be attributed to >> any particular tuple. >> Or perhaps we should be more explicit and refer to the item pointers on >> the page. > I find "not free but cannot be attributed to any particular tuple" > to be entirely useless weasel wording, not to mention wrong with > respect to item pointers in particular. Well, the reason I put it like that was that in my experimentation, after I vacuumed the table after a large delete the item pointer table didn't seem to shrink (at least according to the pgstattuple output), so we had a page with 0 dead tuples but some non-live line pointer space. If that's not what's happening then something is going on that I don't understand. (Wouldn't be a first.) > > Perhaps we should start counting the item pointers in tuple_len. > We'd still have to explain about page header overhead, but that > would be a pretty small and fixed-size discrepancy. > > Sure, sounds like a good idea. Meanwhile it would be nice to explain to people exactly what we currently have. If you have a good formulation I'm all ears. cheers andrew
On Tue, Dec 20, 2016 at 10:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Recently a client was confused because there was a substantial >> difference between the reported table_len of a table and the sum of the >> corresponding tuple_len, dead_tuple_len and free_space. The docs are >> fairly silent on this point, and I agree that in the absence of >> explanation it is confusing, so I propose that we add a clarification >> note along the lines of: > >> The table_len will always be greater than the sum of the tuple_len, >> dead_tuple_len and free_space. The difference is accounted for by >> page overhead and space that is not free but cannot be attributed to >> any particular tuple. > >> Or perhaps we should be more explicit and refer to the item pointers on >> the page. > > I find "not free but cannot be attributed to any particular tuple" > to be entirely useless weasel wording, not to mention wrong with > respect to item pointers in particular. > > Perhaps we should start counting the item pointers in tuple_len. > We'd still have to explain about page header overhead, but that > would be a pretty small and fixed-size discrepancy. It's pretty weird to count unused or dead line pointers as part of tuple_len, and it would screw things up for anybody trying to calculate the average width of their tuples, which is an entirely reasonable thing to want to do. I think if we're going to count item pointers as anything, it needs to be some new category -- either item pointers specifically, or an "other stuff" bucket. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/20/2016 11:41 PM, Robert Haas wrote: > On Tue, Dec 20, 2016 at 10:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Andrew Dunstan <andrew@dunslane.net> writes: >>> Recently a client was confused because there was a substantial >>> difference between the reported table_len of a table and the sum of the >>> corresponding tuple_len, dead_tuple_len and free_space. The docs are >>> fairly silent on this point, and I agree that in the absence of >>> explanation it is confusing, so I propose that we add a clarification >>> note along the lines of: >>> The table_len will always be greater than the sum of the tuple_len, >>> dead_tuple_len and free_space. The difference is accounted for by >>> page overhead and space that is not free but cannot be attributed to >>> any particular tuple. >>> Or perhaps we should be more explicit and refer to the item pointers on >>> the page. >> I find "not free but cannot be attributed to any particular tuple" >> to be entirely useless weasel wording, not to mention wrong with >> respect to item pointers in particular. >> >> Perhaps we should start counting the item pointers in tuple_len. >> We'd still have to explain about page header overhead, but that >> would be a pretty small and fixed-size discrepancy. > It's pretty weird to count unused or dead line pointers as part of > tuple_len, and it would screw things up for anybody trying to > calculate the average width of their tuples, which is an entirely > reasonable thing to want to do. I think if we're going to count item > pointers as anything, it needs to be some new category -- either item > pointers specifically, or an "other stuff" bucket. > Yes, I agree. In any case, before we change anything can we agree on a description of what we currently do? Here's a second attempt: The table_len will always be greater than the sum of the tuple_len, dead_tuple_len and free_space. The difference isaccounted for by fixed page overhead, the per-page table of pointers to tuples, and padding to ensure that tuples arecorrectly aligned. I don't think any of that is weaselish :-) cheers andrew
On 12/21/2016 09:04 AM, Andrew Dunstan wrote: > > > > > Yes, I agree. In any case, before we change anything can we agree on a > description of what we currently do? > > Here's a second attempt: > > The table_len will always be greater than the sum of the tuple_len, > dead_tuple_len and free_space. The difference is accounted for by > fixed page overhead, the per-page table of pointers to tuples, and > padding to ensure that tuples are correctly aligned. > In the absence of further comment I will proceed along these lines. cheers andrew