Re: Reducing output size of nodeToString

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: Reducing output size of nodeToString
Дата
Msg-id fb1f1eae-1c1d-4ece-af41-f306436dcc0d@eisentraut.org
обсуждение исходный текст
Ответ на Re: Reducing output size of nodeToString  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы Re: Reducing output size of nodeToString  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-hackers
On 04.01.24 00:23, Matthias van de Meent wrote:
> Something like the attached? It splits out into the following
> 0001: basic 'omit default values'

  /* Write an integer field (anything written as ":fldname %d") */
-#define WRITE_INT_FIELD(fldname) \
+#define WRITE_INT_FIELD_DIRECT(fldname) \
         appendStringInfo(str, " :" CppAsString(fldname) " %d", 
node->fldname)
+#define WRITE_INT_FIELD_DEFAULT(fldname, default) \
+       ((node->fldname == default) ? (0) : WRITE_INT_FIELD_DIRECT(fldname))
+#define WRITE_INT_FIELD(fldname) \
+       WRITE_INT_FIELD_DEFAULT(fldname, 0)

Do we need the _DIRECT macros at all?  Could we not combine that into 
the _DEFAULT ones?

I think the way the conditional operator (?:) is written is not 
technically correct C, because one side has an integer result (0) and 
the other a void result (from appendStringInfo()).  Also, this could 
break accidentally even more if the result type of appendStringInfo() 
was changed for some reason.  I think it would be better to write this 
in a more straightforward way like

#define WRITE_INT_FIELD_DEFAULT(fldname, default) \
do { \
     if (node->fldname == default) \
         appendStringInfo(str, " :" CppAsString(fldname) " %d", 
node->fldname); \
while (0)

Relatedly, this

+/* a scaffold function to read an optionally-omitted field */
+#define READ_OPT_SCAFFOLD(fldname, read_field_code, default_value) \
+       if (pg_strtoken_next(":" CppAsString(fldname))) \
+       { \
+               read_field_code; \
+       } \
+       else \
+               local_node->fldname = default_value

would need to be written with a do { } while (0) wrapper around it.


> 0002: reset location and other querystring-related node fields for all
> catalogs of type pg_node_tree.

This goal makes sense, but I think it can be done in a better way.  If 
you look into the area of stringToNode(), stringToNodeWithLocations(), 
and stringToNodeInternal(), there already is support for selectively 
resetting or omitting location fields.  Notably, this works with the 
existing automated knowledge of where the location fields are and 
doesn't require a new hand-maintained table.  I think the way forward 
here would be to apply a similar toggle to nodeToString() (the reverse).


> 0003: add default marking on typmod fields.
> 0004 & 0006: various node fields marked with default() based on
> observed common or initial values of those fields

I think we could get about half the benefit here more automatically, by 
creating a separate type for typmods, like

typedef int32 TypMod;

and then having the node support automatically generate the 
serialization support with a -1 default.

(A similar thing could be applied to the location fields, which would 
allow us to get rid of the current hack of parsing out the name.)

Most of the other defaults I'm doubtful about.  First, we are colliding 
here between the goals of minimizing the storage size and making the 
debug output more readable.  If a Query dump would now omit the 
commandType field if it is CMD_SELECT, I think that would be widely 
confusing, and one would need to check the source code to identify the 
reason.  Also, what if we later decide to change a "default" for a 
field.  Then output between version would differ.  Of course, node 
output does change between versions in general, but these kinds of 
differences would be confusing.  Second, this relies on hand-maintained 
annotations that were created by you presumably through a combination of 
intuition and testing, based on what is in the template database.  Do we 
know whether this matches real-world queries created by users later? 
Also, my experience dealing with the node support over the last little 
while is that these manually maintained exceptions get ossified and 
outdated and create a maintenance headache for the future.


> 0005: truncate trailing 0s from outDatum

Does this significantly affect anything other than the "name" type? 
User views don't usually use the "name" type, so this would have limited 
impact outside of system views.


> 0007 (new): do run-length + gap coding for bitmapset and the various
> integer list types. This saves a surprising amount of bytes.

Can you show examples of this?  How would this affects the ability to 
manually interpret the output?




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Laurenz Albe
Дата:
Сообщение: Re: psql JSON output format
Следующее
От: vignesh C
Дата:
Сообщение: Re: Custom explain options