Обсуждение: Fix incorrect comments in tuplesort.c
Hi,
The incorrect comment:
```
/*
* Initial size of memtuples array. We're trying to select this size so that
* array doesn't exceed ALLOCSET_SEPARATE_THRESHOLD and so that the overhead of
* allocation might possibly be lowered. However, we don't consider array sizes
* less than 1024.
*
*/
#define INITIAL_MEMTUPSIZE Max(1024, \
ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1)
```
The "doesn't exceed" are contrary to the macro. Maybe "slightly exceed"?
Attach a small patch. (also remove the empty line in this comment)
--
Regards,
ChangAo Chen
Вложения
> On Dec 6, 2025, at 22:56, cca5507 <cca5507@qq.com> wrote: > > Hi, > > The incorrect comment: > > ``` > /* > * Initial size of memtuples array. We're trying to select this size so that > * array doesn't exceed ALLOCSET_SEPARATE_THRESHOLD and so that the overhead of > * allocation might possibly be lowered. However, we don't consider array sizes > * less than 1024. > * > */ > #define INITIAL_MEMTUPSIZE Max(1024, \ > ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1) > ``` > > The "doesn't exceed" are contrary to the macro. Maybe "slightly exceed"? > > Attach a small patch. (also remove the empty line in this comment) > > -- > Regards, > ChangAo Chen > <v1-0001-Fix-incorrect-comments-in-tuplesort.c.patch> I guess a native English speaker may read the sentence in a different way than our non-English speakers. So, it’s betterto wait for an English-speaking committer to decide if this comment needs to be updated. But I agree from a non-English speaker’s view, this sentence is a little bit confusing, at least makes us to spend more timeto understand its real meaning. However, if we really want to update the comment, I just feel “slightly” is not be best choice here, because “slightly” justmeans “not too much”, it doesn’t precisely describe the real meaning of the macro. I would suggest a rephrasing like:"We choose the smallest number of SortTuple entries whose total size exceeds ALLOCSET_SEPARATE_THRESHOLD, and so that…" Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Hi Chao, Thank you for your reply. I feed the comment to Github Copilot and he says it's incorrect. The "slightly" is just what he suggests. Your suggestion also LGTM. -- Regards, ChangAo Chen
On Saturday, December 6, 2025, cca5507 <cca5507@qq.com> wrote:
Hi Chao,
Thank you for your reply.
I feed the comment to Github Copilot and he says it's incorrect. The "slightly" is just what he suggests.
Your suggestion also LGTM.
I don’t think just adding the word “slightly” is a good fix here. Moving the comment from line 695 here would be better as that explains and gives reference to why the formula is used, close to the formula itself. It feels like the identical comment at line 757 also is just a bad copy-paste outcome and should be removed and probably replaced with one describing the complicated conditional code it precedes.
David J.
Hi, I find that the initial size of memtuples array must be more than ALLOCSET_SEPARATE_THRESHOLD is not for lower overhead of allocation, but for a bug fixed in 8ea3e7a75c0d22c41c57f59c8b367059b97d0b66. Attach a new patch. -- Regards, ChangAo Chen
Вложения
On Sun, 7 Dec 2025 at 21:34, cca5507 <cca5507@qq.com> wrote: > I find that the initial size of memtuples array must be more than ALLOCSET_SEPARATE_THRESHOLD > is not for lower overhead of allocation, but for a bug fixed in 8ea3e7a75c0d22c41c57f59c8b367059b97d0b66. > > Attach a new patch. I find the current comment perfectly understandable and the text you're proposing to be incorrect. What you might be missing is that with aset.c, a palloc() request above ALLOCSET_SEPARATE_THRESHOLD is certain to require a malloc() and a dedicated block. Sizes under that may be able to use an existing AllocBlock. The comment is effectively explaining that we don't want to make the array big enough so that a malloc will always be required. You may need to read AllocSetContextCreateInternal() around where allocChunkLimit is being set to understand what this is all about. David
On Sun, Dec 7, 2025 at 3:09 PM David Rowley <dgrowleyml@gmail.com> wrote:
The comment is effectively
explaining that we don't want to make the array big enough so that a
malloc will always be required.
Doesn't what you are saying contradict both the formula (the +1 post-division) and the comments:
/*
* Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;
* see comments in grow_memtuples().
*/
state->memtupsize = INITIAL_MEMTUPSIZE;
state->memtuples = NULL;
* Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;
* see comments in grow_memtuples().
*/
state->memtupsize = INITIAL_MEMTUPSIZE;
state->memtuples = NULL;
and in grow_memtuples:
* That shouldn't happen because we chose the
* initial array size large enough to ensure that palloc will be treating
* both old and new arrays as separate chunks.
* initial array size large enough to ensure that palloc will be treating
* both old and new arrays as separate chunks.
?
David J.
On Mon, 8 Dec 2025 at 12:10, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Sun, Dec 7, 2025 at 3:09 PM David Rowley <dgrowleyml@gmail.com> wrote: >> >> The comment is effectively >> explaining that we don't want to make the array big enough so that a >> malloc will always be required. > > > Doesn't what you are saying contradict both the formula (the +1 post-division) and the comments: > > /* > * Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD; > * see comments in grow_memtuples(). > */ Looks like I put too much faith into what the comment was telling me. I now agree that the comment is describing the opposite of what the code is doing. The comment seems to have first appeared in the patch submitted in [1], and per the "Not sure myself, let's ask the other Alexander." in [2], indicates that Alexander Kuzmenkov reverse-engineered the comment from his incorrect understanding of the code. With that, I think the v2 patch is almost ok. One small quibble I have is: + * in grow_memtuples(). However, we don't consider array sizes + * less than 1024. Using "However" here indicates some exception to what's just been said, but there is no longer an exception. To write about what the 1024 is for, we might need to reverse engineer what that's for. I assume it's something like "Clamp at 1024 elements to avoid excessive reallocs of the array". Or perhaps that without the " of the array" part. David [1] https://www.postgresql.org/message-id/CAPpHfdtKHETXhf062CPvkjpG1wnjQ7rv4uLhZgYQ6VZjwqDYpg%40mail.gmail.com [2] https://www.postgresql.org/message-id/b45ff523-780b-502b-8d03-2763182aa3c6%40postgrespro.ru
> On Dec 8, 2025, at 08:28, David Rowley <dgrowleyml@gmail.com> wrote: > > + * in grow_memtuples(). However, we don't consider array sizes > + * less than 1024. > > Using "However" here indicates some exception to what's just been > said, but there is no longer an exception. To write about what the > 1024 is for, we might need to reverse engineer what that's for. I > assume it's something like "Clamp at 1024 elements to avoid excessive > reallocs of the array". Or perhaps that without the " of the array" > part. +1 for the “however” concern. Also, I found 8ea3e7a75c might explain why we want initial memtuples array size to exceed ALLOCSET_SEPARATE_THRESHOLD. That'sbecause we want to avoid switching between block chunk and separate chunk when growing. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Hi, > Using "However" here indicates some exception to what's just been > said, but there is no longer an exception. To write about what the > 1024 is for, we might need to reverse engineer what that's for. I > assume it's something like "Clamp at 1024 elements to avoid excessive > reallocs of the array". Or perhaps that without the " of the array" > part. Agreed. I prefer without the " of the array" part. -- Regards, ChangAo Chen
On Mon, 8 Dec 2025 at 15:08, cca5507 <cca5507@qq.com> wrote: > > Using "However" here indicates some exception to what's just been > > said, but there is no longer an exception. To write about what the > > 1024 is for, we might need to reverse engineer what that's for. I > > assume it's something like "Clamp at 1024 elements to avoid excessive > > reallocs of the array". Or perhaps that without the " of the array" > > part. > > Agreed. I prefer without the " of the array" part. OK. Done like that. Thanks. David