Обсуждение: Two aesthetic bugs in the 1-byte packed varlena code
Korry spotted two violations of the rule about accessing the first varlena field directly which I had missed. In both cases I believe the accesses are actually safe because they follow an int4 column and the data are either explicitly detoasted or passed to textout which can handle 1-byte headers fine. One instance is in flatfiles.c when it accesses the password field which is a text field. Here I don't see any reasonable way to avoid it and think we should just document that we're bending the rules. The other instance is in inv_api.c where it would be quite possible to use fastgetattr() instead. But the column is always at the same fixed offset and again it follows an int4 so it'll always be 4-byte aligned and work fine. So for performance reasons perhaps we should keep this as well? I've attached three patches. A patch which adds a comment for flatfiles.c. And two patches for inv_api.c one which just adds a comment and one which converts it to use fastgetattr(). Again, credit to Korry for spotting these. To do so he commented out all the varlena fields (excluding int2vector and oidvector) from the struct definitions and saw what broke. Why do we even have those fields in the structs if they're unsafe to use? Perhaps we should #if 0 out all the unsafe fields from all the struct definitions? That would avoid one of the most common new programmer bugs and also let us document the few exceptions and why they're allowed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Вложения
Gregory Stark <stark@enterprisedb.com> writes: > Why do we even have those fields in the structs if they're unsafe to use? 1. genbki.sh 2. As you note, they're not always unsafe to use. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> Why do we even have those fields in the structs if they're unsafe to use? > > 1. genbki.sh But genbki.sh wouldn't care if we #if 0 around the unsafe ones would it? > 2. As you note, they're not always unsafe to use. Well, it seems like it would be nice to mark which ones are and aren't unsafe rather than have them all be there waiting to trip people up. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Well, it seems like it would be nice to mark which ones are and aren't unsafe > rather than have them all be there waiting to trip people up. We already do mark 'em --- that's what all the VARIABLE LENGTH FIELDS START HERE comments are about. I'm not especially worried about mechanizing it since in practice code that tries to access those fields will crash pretty obviously and consistently, except for the first such field which is safe anyway. regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > The other instance is in inv_api.c where it would be quite possible to use > fastgetattr() instead. But the column is always at the same fixed offset and > again it follows an int4 so it'll always be 4-byte aligned and work fine. So > for performance reasons perhaps we should keep this as well? After looking closer, I think there are worse problems here: the code is still using VARSIZE/VARDATA etc, which it should not be because the field could easily be in 1-byte-header form. And it seems like we should be checking for NULL, too, because initdb only marks the first two fields as NOT NULL. The latter would qualify as a security hole except that you'd have to be superuser to put a record with a null data field into pg_largeobject. I concur that we probably want to avoid adding fastgetattr for performance reasons, seeing that this table's layout seems unlikely to change. But it seems like we might want to trouble with a null test. Thoughts? regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> The other instance is in inv_api.c where it would be quite possible to use >> fastgetattr() instead. But the column is always at the same fixed offset and >> again it follows an int4 so it'll always be 4-byte aligned and work fine. So >> for performance reasons perhaps we should keep this as well? > > After looking closer, I think there are worse problems here: the code is > still using VARSIZE/VARDATA etc, which it should not be because the > field could easily be in 1-byte-header form. Well that's ok because VARATT_IS_EXTENDED returns true for 1-byte forms so it'll detoast them first. We could avoid the detoasting but given that it's expecting the chunks to be compressed anyways the memcpys of the smallest chunks probably don't matter much either way. I'm assuming it's like toast in that only the last chunk will be smaller than LOBLKSIZE anyways, right? > I concur that we probably want to avoid adding fastgetattr for > performance reasons, seeing that this table's layout seems unlikely > to change. But it seems like we might want to trouble with a null > test. Thoughts? There should never even be a null bitmap right? Maybe we should just elog(ERROR) if we find HeapTupleHasNulls(tuple) to be true at all. Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> After looking closer, I think there are worse problems here: the code is >> still using VARSIZE/VARDATA etc, which it should not be because the >> field could easily be in 1-byte-header form. > Well that's ok because VARATT_IS_EXTENDED returns true for 1-byte forms so > it'll detoast them first. Ah, right. > We could avoid the detoasting but given that it's > expecting the chunks to be compressed anyways the memcpys of the smallest > chunks probably don't matter much either way. I'm assuming it's like toast in > that only the last chunk will be smaller than LOBLKSIZE anyways, right? Well, it's like toast except that there can be unwritten "holes" in a LO. Still, in normal cases you'd expect only the last partial page to be potentially short enough for 1-B format, and even then only about 1/16th of the time. OK, not worth changing then. > There should never even be a null bitmap right? Maybe we should just > elog(ERROR) if we find HeapTupleHasNulls(tuple) to be true at all. That sounds like a good and cheap test. Will make it so. regards, tom lane