Обсуждение: nitpick about poor style in MergeAttributes
Hackers, I have been auditing the v12 source code for places which inappropriately ignore the return value of a function and have found another example which seems to me a fertile source of future bugs. In src/backend/nodes/list.c, list_delete_cell frees the list and returns NIL when you delete the last element of a list, placing a responsibility on any caller to check the return value. In tablecmds.c, MergeAttributes fails to do this. My inspection of the surrounding code leads me to suspect that logically the cell being deleted can never be the last cell, and hence the failure to check the return value does not manifest as a bug. But the surrounding code is rather large and convoluted, and I have little confidence that the code couldn't be changed such that the return value would be NIL, possibly leading to memory bugs. What to do about this is harder to say. In the following patch, I'm just doing what I think is standard for callers of list_delete_cell, and assigning the return value back to the list (similar to how a call to repalloc should do). But since there is an implicit assumption that the list is never emptied by this operation, perhaps checking against NIL and elog'ing makes more sense? diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 602a8dbd1c..96d6833274 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -2088,7 +2088,7 @@ MergeAttributes(List *schema, List *supers, char relpersistence, coldef->cooked_default = restdef->cooked_default; coldef->constraints = restdef->constraints; coldef->is_from_type = false; - list_delete_cell(schema, rest, prev); + schema = list_delete_cell(schema, rest, prev); } else ereport(ERROR,
On Wed, May 22, 2019 at 06:20:01PM -0700, Mark Dilger wrote: > What to do about this is harder to say. In the following > patch, I'm just doing what I think is standard for callers > of list_delete_cell, and assigning the return value back > to the list (similar to how a call to repalloc should do). > But since there is an implicit assumption that the list > is never emptied by this operation, perhaps checking > against NIL and elog'ing makes more sense? Yes, I agree that this is a bit fuzzy, and this code is new as of 705d433. As you say, I agree that making sure that the return value of list_delete_cell is not NIL is a sensible choice. I don't think that an elog() is in place here though as this does not rely directly on catalog contents, what about just an assertion? Here is an idea of message for the elog(ERROR) if we go that way: "no remaining columns after merging column \"%s\"". -- Michael
Вложения
On Wed, May 22, 2019 at 10:21 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, May 22, 2019 at 06:20:01PM -0700, Mark Dilger wrote: > > What to do about this is harder to say. In the following > > patch, I'm just doing what I think is standard for callers > > of list_delete_cell, and assigning the return value back > > to the list (similar to how a call to repalloc should do). > > But since there is an implicit assumption that the list > > is never emptied by this operation, perhaps checking > > against NIL and elog'ing makes more sense? > > Yes, I agree that this is a bit fuzzy, and this code is new as of > 705d433. As you say, I agree that making sure that the return value > of list_delete_cell is not NIL is a sensible choice. > > I don't think that an elog() is in place here though as this does not > rely directly on catalog contents, what about just an assertion? I think assigning the return value (as I did in my small patch) and then asserting that 'schema' is not NIL would be good. > Here is an idea of message for the elog(ERROR) if we go that way: > "no remaining columns after merging column \"%s\"". Perhaps. I like your idea of adding an assertion better. mark
Michael Paquier <michael@paquier.xyz> writes: > On Wed, May 22, 2019 at 06:20:01PM -0700, Mark Dilger wrote: >> What to do about this is harder to say. In the following >> patch, I'm just doing what I think is standard for callers >> of list_delete_cell, and assigning the return value back >> to the list (similar to how a call to repalloc should do). >> But since there is an implicit assumption that the list >> is never emptied by this operation, perhaps checking >> against NIL and elog'ing makes more sense? > Yes, I agree that this is a bit fuzzy, and this code is new as of > 705d433. As you say, I agree that making sure that the return value > of list_delete_cell is not NIL is a sensible choice. Are we sure that's not just a newly-introduced bug, ie it has not been tested in cases where the tlist could become empty? My first thought would be to assign the list pointer value back as per usual coding convention, not to double down on the assumption that this was well-considered code. regards, tom lane
On Thu, May 23, 2019 at 7:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Michael Paquier <michael@paquier.xyz> writes: > > On Wed, May 22, 2019 at 06:20:01PM -0700, Mark Dilger wrote: > >> What to do about this is harder to say. In the following > >> patch, I'm just doing what I think is standard for callers > >> of list_delete_cell, and assigning the return value back > >> to the list (similar to how a call to repalloc should do). > >> But since there is an implicit assumption that the list > >> is never emptied by this operation, perhaps checking > >> against NIL and elog'ing makes more sense? > > > Yes, I agree that this is a bit fuzzy, and this code is new as of > > 705d433. As you say, I agree that making sure that the return value > > of list_delete_cell is not NIL is a sensible choice. > > Are we sure that's not just a newly-introduced bug, ie it has not > been tested in cases where the tlist could become empty? My first > thought would be to assign the list pointer value back as per usual > coding convention, not to double down on the assumption that this > was well-considered code. I don't think that is disputed. I was debating between assigning it back and also asserting that it is not NIL vs. assigning it back and elog/ereporting if it is NIL. Of course, this is assuming the code was designed with the expectation that the list can never become empty. If you think it might become empty, and that the surrounding code can handle that sensibly, then perhaps we need neither the assertion nor the elog/ereport, though we still need the assignment. mark
On Thu, May 23, 2019 at 08:27:19AM -0700, Mark Dilger wrote: > On Thu, May 23, 2019 at 7:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Are we sure that's not just a newly-introduced bug, ie it has not >> been tested in cases where the tlist could become empty? My first >> thought would be to assign the list pointer value back as per usual >> coding convention, not to double down on the assumption that this >> was well-considered code. > > I don't think that is disputed. I was debating between assigning > it back and also asserting that it is not NIL vs. assigning it back > and elog/ereporting if it is NIL. Of course, this is assuming the > code was designed with the expectation that the list can never > become empty. If you think it might become empty, and that the > surrounding code can handle that sensibly, then perhaps we > need neither the assertion nor the elog/ereport, though we still > need the assignment. Looking closer, this code is not new as of v12. We have that since e7b3349 which has introduced CREATE TABLE OF. Anyway, I think that assigning the result of list_delete_cell and adding an assertion like in the attached are saner things to do. This code scans each entry in the list and removes columns with duplicate names, so we should never finish with an empty list as we will in the first case always merge down to at least one column. That's rather a nit, but I guess that this is better than the previous code which assumed that silently? -- Michael
Вложения
On Thu, May 23, 2019 at 5:24 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, May 23, 2019 at 08:27:19AM -0700, Mark Dilger wrote: > > On Thu, May 23, 2019 at 7:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Are we sure that's not just a newly-introduced bug, ie it has not > >> been tested in cases where the tlist could become empty? My first > >> thought would be to assign the list pointer value back as per usual > >> coding convention, not to double down on the assumption that this > >> was well-considered code. > > > > I don't think that is disputed. I was debating between assigning > > it back and also asserting that it is not NIL vs. assigning it back > > and elog/ereporting if it is NIL. Of course, this is assuming the > > code was designed with the expectation that the list can never > > become empty. If you think it might become empty, and that the > > surrounding code can handle that sensibly, then perhaps we > > need neither the assertion nor the elog/ereport, though we still > > need the assignment. > > Looking closer, this code is not new as of v12. We have that since > e7b3349 which has introduced CREATE TABLE OF. Anyway, I think that > assigning the result of list_delete_cell and adding an assertion like > in the attached are saner things to do. This code scans each entry in > the list and removes columns with duplicate names, so we should never > finish with an empty list as we will in the first case always merge > down to at least one column. That's rather a nit, but I guess that > this is better than the previous code which assumed that silently? I like it better because it makes static analysis of the code easier, and because if anybody ever changed list_delete_cell to return a different list object in more cases than just when the list is completely empty, this call site would be silently wrong. Thanks for the patch! mark
On 2019-May-23, Mark Dilger wrote: > On Thu, May 23, 2019 at 5:24 PM Michael Paquier <michael@paquier.xyz> wrote: > > Looking closer, this code is not new as of v12. We have that since > > e7b3349 which has introduced CREATE TABLE OF. Yeah, I was not quite understanding why it was being blamed on a commit that actually *removed* one other callsite that did the same thing. (I didn't actually realize at the time that this bug was there, mind.) > > Anyway, I think that assigning the result of list_delete_cell and > > adding an assertion like in the attached are saner things to do. Looks good to me. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 04, 2019 at 04:18:30PM -0400, Alvaro Herrera wrote: > Yeah, I was not quite understanding why it was being blamed on a commit > that actually *removed* one other callsite that did the same thing. (I > didn't actually realize at the time that this bug was there, mind.) I completely forgot about this thread as an effect of last week's activity. Committed now. Thanks for the input, Alvaro. -- Michael