Обсуждение: splitting pg_resetwal output strings
Hello, For a long time I've been annoyed that certain tools such as (but not only) pg_resetwal have translatable strings full of whitespace that translators have to be careful about so that things look nice. For Spanish I have made the lines a bit wider so that everything fits, which means that every time someone adds a new line that's not yet translated, the output looks bad; and if I want to add one more space to align a new longer string, then I have to edit every single one of them. This is horrible. Here's a proposal. The idea is to have a separate file (entries.h right now but proposal for better names are welcome) which lists those strings, together with the printf specifiers needed to actually print them. This way, we can measure the length of each exactly as they translate before printing anything, and then line up everything to the same output length. One curious thing of note is that I had to add an internal_wcswidth() function, to avoid having to link libpq just to be able to do pg_wcswidth(). This is not complete. It modifies PrintControlValues(), which is easy because I just print each item in the entries.h file in the same order they appear there. But for PrintNewControlValues() I'll need to add a symbolic identifier to each string, that the code can use to scan the array and print just those elements. I'll do that if there are no objections to this idea here. Also, pg_controldata could probably something very similar or maybe the same entries.h file. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "If you want to have good ideas, you must have many ideas. Most of them will be wrong, and what you have to learn is which ones to throw away." (Linus Pauling)
Вложения
On 2026-Jan-31, Álvaro Herrera wrote: > This is not complete. It modifies PrintControlValues(), which is easy > because I just print each item in the entries.h file in the same order > they appear there. But for PrintNewControlValues() I'll need to add a > symbolic identifier to each string, that the code can use to scan the > array and print just those elements. I'll do that if there are no > objections to this idea here. It looks more or less like this. Patch 0001 is the same as before, and 0002 adds the symbolic names, which is used to create an enum and then to search for the correct lines to print in PrintNewControlValues. I decided to reuse simple_oid_list to store the integers values, which is kinda icky, so getting to something committable I think would have me add simple_int_list. Also, there's a few blocks that are duplicate cases of the same line measuring and printing logic; I suppose I should have a routine to simplify. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Most hackers will be perfectly comfortable conceptualizing users as entropy sources, so let's move on." (Nathaniel Smith) https://mail.gnu.org/archive/html/monotone-devel/2007-01/msg00080.html
Вложения
> On Jan 31, 2026, at 23:08, Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2026-Jan-31, Álvaro Herrera wrote:
>
>> This is not complete. It modifies PrintControlValues(), which is easy
>> because I just print each item in the entries.h file in the same order
>> they appear there. But for PrintNewControlValues() I'll need to add a
>> symbolic identifier to each string, that the code can use to scan the
>> array and print just those elements. I'll do that if there are no
>> objections to this idea here.
>
> It looks more or less like this. Patch 0001 is the same as before, and
> 0002 adds the symbolic names, which is used to create an enum and then
> to search for the correct lines to print in PrintNewControlValues.
>
> I decided to reuse simple_oid_list to store the integers values, which
> is kinda icky, so getting to something committable I think would have me
> add simple_int_list. Also, there's a few blocks that are duplicate
> cases of the same line measuring and printing logic; I suppose I should
> have a routine to simplify.
>
Hi Alvaro,
I don't think we necessarily need to add a simple_int_list. Since OIDs are not globally unique across all object types,
wecan reasonably treat ControlData items as "objects" and assign them OIDs within this context.
If we take that approach, the enum could be named ControlDataOid:
```
/* Define the string enums */
#define CONTROLDATA_LINE(symbol, description, fmt, ...) \
symbol,
enum {
#include "entries.h”
} ControlDataOid;
#undef CONTROLDATA_LINE
```
This makes reusing simple_oid_list feel appropriate. If you aren't fond of that idea, I’d suggest naming the enum
ControlDataSymbolrather than ControlDataStrings, as the former better reflects what the members actually are.
Regarding patch 0002:
```
+#define CONTROLDATA_LINE(symbol, description, fmt, ...) \
+ if (simple_oid_list_member(&toprint, symbol)) \
+ { \
+ thislen = internal_wcswidth(_(description), \
+ strlen(_(description)), \
+ encoding); \
+ if (thislen > maxlen) \
+ maxlen = thislen; \
}
+#include "entries.h"
+#undef CONTROLDATA_LINE
```
The code block inside the first if statement is missing an indentation level.
Otherwise, the patch looks good to me. I've tested it and confirmed that calculating the padding at runtime generates
better-lookingoutput.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Hello Evan,
On 2026-Feb-02, Chao Li wrote:
> I don't think we necessarily need to add a simple_int_list. Since OIDs
> are not globally unique across all object types, we can reasonably
> treat ControlData items as "objects" and assign them OIDs within this
> context.
>
> If we take that approach, the enum could be named ControlDataOid:
> ```
> /* Define the string enums */
> #define CONTROLDATA_LINE(symbol, description, fmt, ...) \
> symbol,
> enum {
> #include "entries.h”
> } ControlDataOid;
> #undef CONTROLDATA_LINE
> ```
>
> This makes reusing simple_oid_list feel appropriate.
True, but that's a bit cheating with the naming. Really, type Oid is
related to identifying database objects, that is, what is used in each
catalog row. Someday we may want to decide to enlarge that to 64 bits,
and if that happens and yet the C compiler continues to use 32-bit
integers to implement C-level enums, which is what this patch uses, what
then?
If I go with just simple_int_list, then that's always (by definition)
going to be a C integer, which is (presumably) always going to be what
the C compiler uses to implement a C enum.
(I think in reality we're unlikely to change Oid to be 64-bit, because
there's no reason to do so since it's quite unusual to have such very
many database objects. I think it's more likely that we'd change the
type that underlies TOAST pointers, which is where the real problem with
Oid resides. Even so, it seems ... dangerous? ... to me to do what
you propose.)
> If you aren't fond of that idea, I’d suggest naming the enum
> ControlDataSymbol rather than ControlDataStrings, as the former better
> reflects what the members actually are.
Yeah, this is probably a good idea anyway, given that these objects are
not strings anymore.
Thanks for the review!
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/