pg_trgm comparison bug on cross-architecture replication due to different char implementation

Поиск
Список
Период
Сортировка
От Guo, Adam
Тема pg_trgm comparison bug on cross-architecture replication due to different char implementation
Дата
Msg-id CB11ADBC-0C3F-4FE0-A678-666EE80CBB07@amazon.com
обсуждение исходный текст
Ответы Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers

Hi all,

 

I would like to report an issue with the pg_trgm extension on cross-architecture replication scenarios. When an x86_64 standby server is replicating from an aarch64 primary server or vice versa, the gist_trgm_ops opclass returns different results on the primary and standby. Masahiko previously reported a similar issue affecting the pg_bigm extension [1].

 

To reproduce, execute the following on the x86_64 primary server:

 

CREATE EXTENSION pg_trgm;

CREATE TABLE tbl (c text);

CREATE INDEX ON tbl USING gist (c gist_trgm_ops);

INSERT INTO tbl VALUES ('Bóbr');

 

On the x86_64 primary server:

 

postgres=> select * from tbl where c like '%Bób%';

  c

------

Bóbr

(1 row)

 

On the aarch64 replica server:

 

postgres=> select * from tbl where c like '%Bób%';

c

---

(0 rows)

 

The root cause looks the same as the pg_bigm issue that Masahiko reported. To compare trigrams, pg_trgm uses a numerical comparison of chars [2]. On x86_64 a char is signed by default, whereas on aarch64 it is unsigned by default. gist_trgm_ops expects the trigram list to be sorted, but due to the different signedness of chars, the sort order is broken when replicating the values across architectures.

 

The different sort behaviour can be demonstrated using show_trgm.

 

On the x86_64 primary server:

 

postgres=> SELECT show_trgm('Bóbr');

                show_trgm                

------------------------------------------

{0x89194c,"  b","br ",0x707c72,0x7f7849}

(1 row)

 

On the aarch64 replica server:

 

postgres=> SELECT show_trgm('Bóbr');

                show_trgm                

------------------------------------------

{"  b","br ",0x707c72,0x7f7849,0x89194c}

(1 row)

 

One simple solution for this specific case is to declare the char signedness in the CMPPCHAR macro.

 

--- a/contrib/pg_trgm/trgm.h

+++ b/contrib/pg_trgm/trgm.h

@@ -42,7 +42,7 @@

typedef char trgm[3];

 #define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) )

-#define CMPPCHAR(a,b,i)  CMPCHAR( *(((const char*)(a))+i), *(((const char*)(b))+i) )

+#define CMPPCHAR(a,b,i)  CMPCHAR( *(((unsigned char*)(a))+i), *(((unsigned char*)(b))+i) )

#define CMPTRGM(a,b) ( CMPPCHAR(a,b,0) ? CMPPCHAR(a,b,0) : ( CMPPCHAR(a,b,1) ? CMPPCHAR(a,b,1) : CMPPCHAR(a,b,2) ) )

 #define CPTRGM(a,b) do {                                                           \

 

Alternatively, Postgres can be compiled with -funsigned-char or -fsigned-char. I came across a code comment suggesting that this may not be a good idea in general [3].

 

Given that this has problem has come up before and seems likely to come up again, I'm curious what other broad solutions there might be to resolve it? Looking forward to any feedback, thanks!

 

Best,

 

Adam Guo

Amazon Web Services: https://aws.amazon.com

 

[1] https://osdn.net/projects/pgbigm/lists/archive/hackers/2024-February/000370.html

[2] https://github.com/postgres/postgres/blob/480bc6e3ed3a5719cdec076d4943b119890e8171/contrib/pg_trgm/trgm.h#L45

[3] https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/cash.c#L114-L123

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation