Обсуждение: splitting pg_resetwal output strings

Поиск
Список
Период
Сортировка

splitting pg_resetwal output strings

От
Álvaro Herrera
Дата:
Hello,

For a long time I've been annoyed that certain tools such as (but not
only) pg_resetwal have translatable strings full of whitespace that
translators have to be careful about so that things look nice.
For Spanish I have made the lines a bit wider so that everything fits,
which means that every time someone adds a new line that's not yet
translated, the output looks bad; and if I want to add one more space
to align a new longer string, then I have to edit every single one of
them.  This is horrible.

Here's a proposal.  The idea is to have a separate file (entries.h right
now but proposal for better names are welcome) which lists those
strings, together with the printf specifiers needed to actually print
them.  This way, we can measure the length of each exactly as they
translate before printing anything, and then line up everything to the
same output length.

One curious thing of note is that I had to add an internal_wcswidth()
function, to avoid having to link libpq just to be able to do
pg_wcswidth().

This is not complete.  It modifies PrintControlValues(), which is easy
because I just print each item in the entries.h file in the same order
they appear there.   But for PrintNewControlValues() I'll need to add a
symbolic identifier to each string, that the code can use to scan the
array and print just those elements.  I'll do that if there are no
objections to this idea here.  Also, pg_controldata could probably
something very similar or maybe the same entries.h file.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you want to have good ideas, you must have many ideas.  Most of them
will be wrong, and what you have to learn is which ones to throw away."
                                                         (Linus Pauling)

Вложения

Re: splitting pg_resetwal output strings

От
Álvaro Herrera
Дата:
On 2026-Jan-31, Álvaro Herrera wrote:

> This is not complete.  It modifies PrintControlValues(), which is easy
> because I just print each item in the entries.h file in the same order
> they appear there.   But for PrintNewControlValues() I'll need to add a
> symbolic identifier to each string, that the code can use to scan the
> array and print just those elements.  I'll do that if there are no
> objections to this idea here.

It looks more or less like this.  Patch 0001 is the same as before, and
0002 adds the symbolic names, which is used to create an enum and then
to search for the correct lines to print in PrintNewControlValues.

I decided to reuse simple_oid_list to store the integers values, which
is kinda icky, so getting to something committable I think would have me
add simple_int_list.  Also, there's a few blocks that are duplicate
cases of the same line measuring and printing logic; I suppose I should
have a routine to simplify.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
 sources, so let's move on."                               (Nathaniel Smith)
      https://mail.gnu.org/archive/html/monotone-devel/2007-01/msg00080.html

Вложения

Re: splitting pg_resetwal output strings

От
Chao Li
Дата:

> On Jan 31, 2026, at 23:08, Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2026-Jan-31, Álvaro Herrera wrote:
>
>> This is not complete.  It modifies PrintControlValues(), which is easy
>> because I just print each item in the entries.h file in the same order
>> they appear there.   But for PrintNewControlValues() I'll need to add a
>> symbolic identifier to each string, that the code can use to scan the
>> array and print just those elements.  I'll do that if there are no
>> objections to this idea here.
>
> It looks more or less like this.  Patch 0001 is the same as before, and
> 0002 adds the symbolic names, which is used to create an enum and then
> to search for the correct lines to print in PrintNewControlValues.
>
> I decided to reuse simple_oid_list to store the integers values, which
> is kinda icky, so getting to something committable I think would have me
> add simple_int_list.  Also, there's a few blocks that are duplicate
> cases of the same line measuring and printing logic; I suppose I should
> have a routine to simplify.
>

Hi Alvaro,

I don't think we necessarily need to add a simple_int_list. Since OIDs are not globally unique across all object types,
wecan reasonably treat ControlData items as "objects" and assign them OIDs within this context. 

If we take that approach, the enum could be named ControlDataOid:
```
/* Define the string enums */
#define CONTROLDATA_LINE(symbol, description, fmt, ...) \
symbol,
enum {
#include "entries.h”
} ControlDataOid;
#undef CONTROLDATA_LINE
```

This makes reusing simple_oid_list feel appropriate. If you aren't fond of that idea, I’d suggest naming the enum
ControlDataSymbolrather than ControlDataStrings, as the former better reflects what the members actually are. 

Regarding patch 0002:
```
+#define CONTROLDATA_LINE(symbol, description, fmt, ...)            \
+    if (simple_oid_list_member(&toprint, symbol))                \
+    {                                                            \
+    thislen = internal_wcswidth(_(description),                    \
+                                strlen(_(description)),            \
+                                encoding);                        \
+    if (thislen > maxlen)                                        \
+        maxlen = thislen;                                        \
     }
+#include "entries.h"
+#undef CONTROLDATA_LINE
```
The code block inside the first if statement is missing an indentation level.

Otherwise, the patch looks good to me. I've tested it and confirmed that calculating the padding at runtime generates
better-lookingoutput. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: splitting pg_resetwal output strings

От
Álvaro Herrera
Дата:
Hello Evan,

On 2026-Feb-02, Chao Li wrote:

> I don't think we necessarily need to add a simple_int_list. Since OIDs
> are not globally unique across all object types, we can reasonably
> treat ControlData items as "objects" and assign them OIDs within this
> context.
> 
> If we take that approach, the enum could be named ControlDataOid:
> ```
> /* Define the string enums */
> #define CONTROLDATA_LINE(symbol, description, fmt, ...) \
> symbol,
> enum {
> #include "entries.h”
> } ControlDataOid;
> #undef CONTROLDATA_LINE
> ```
> 
> This makes reusing simple_oid_list feel appropriate.

True, but that's a bit cheating with the naming.  Really, type Oid is
related to identifying database objects, that is, what is used in each
catalog row.  Someday we may want to decide to enlarge that to 64 bits,
and if that happens and yet the C compiler continues to use 32-bit
integers to implement C-level enums, which is what this patch uses, what
then?

If I go with just simple_int_list, then that's always (by definition)
going to be a C integer, which is (presumably) always going to be what
the C compiler uses to implement a C enum.

(I think in reality we're unlikely to change Oid to be 64-bit, because
there's no reason to do so since it's quite unusual to have such very
many database objects.  I think it's more likely that we'd change the
type that underlies TOAST pointers, which is where the real problem with
Oid resides.  Even so, it seems ...  dangerous? ... to me to do what
you propose.)

> If you aren't fond of that idea, I’d suggest naming the enum
> ControlDataSymbol rather than ControlDataStrings, as the former better
> reflects what the members actually are.

Yeah, this is probably a good idea anyway, given that these objects are
not strings anymore.

Thanks for the review!

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/