Обсуждение: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Dear All
I have a weird problem, I am trying to improve performance on this query :
SELECT text('foo@bar.com') from mail_vessel_addressbook where text('foo@bar.com') ~* address_regex limit 1;
The first system (linux) is a linux hosted in a cloud, kernel 3.16.0-4-amd64, 32GB mem, SSD, 4 x Intel(R) Xeon(R) CPU E7-4860 v2 @ 2.60GHz ,
The second (freebsd) system, used as test, is my local FreeBSD 13.1-RELEASE workstation, 32GB mem, ZFS/magnetic disks ,16 x AMD Ryzen 7 5800X 3800.16-MHz .
Overall my workstation is faster, but my issue is not plain speed. The problem is as follows :
FreeBSD
postgres@[local]/dynacom=# explain (analyze,buffers) SELECT text('foo@bar.com') from mail_vessel_addressbook where text('foo@bar.com') ~* address_regex limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..5.11 rows=1 width=32) (actual time=96.705..96.706 rows=1 loops=1)
Buffers: shared hit=71
-> Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook (cost=0.42..2912.06 rows=620 width=32) (actual time=96.704..96.705 rows=1 loops=1)
Filter: ('foo@bar.com'::text ~* address_regex)
Rows Removed by Filter: 14738
Heap Fetches: 0
Buffers: shared hit=71
Planning time: 0.082 ms
Execution time: 96.725 ms
(9 rows)
Time: 97.038 ms
postgres@[local]/dynacom=#
Linux
dynacom=# explain (analyze,buffers) SELECT text('foo@bar.com') from mail_vessel_addressbook where text('foo@bar.com') ~* address_regex limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..5.12 rows=1 width=32) (actual time=1768.725..1768.727 rows=1 loops=1)
Buffers: shared hit=530
-> Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook (cost=0.42..2913.04 rows=620 width=32) (actual time=1768.724..1768.725 rows=1 loops=1)
Filter: ('foo@bar.com'::text ~* address_regex)
Rows Removed by Filter: 97781
Heap Fetches: 0
Buffers: shared hit=530
Planning time: 1.269 ms
Execution time: 1768.998 ms
(9 rows)
The file in FreeBSD came by pg_dump from the linux system, I am puzzled why this huge difference in Buffers: shared hit. All table/index sizes are identical on both systems, I did vacuum full on the linux one, and also did vacuum freeze on both. I analyzed both, reindexed both (several times). Still the FreeBSD seems to access about 7 times less number of blocks from shared_buffers than linux : 71 vs 530 . There is no bloat , I tested with newly fresh table in both systems as well.
Thank you for any help.
Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com> writes: > *FreeBSD* > > -> Index Only Scan using mail_vessel_addressbook_address_regex_idx > on mail_vessel_addressbook (cost=0.42..2912.06 rows=620 width=32) > (actual time=96.704..96.705 rows=1 loops=1) > Filter: ('foo@bar.com'::text ~* address_regex) > Rows Removed by Filter: 14738 > Heap Fetches: 0 > Buffers: shared hit=71 > > *Linux* > > -> Index Only Scan using mail_vessel_addressbook_address_regex_idx > on mail_vessel_addressbook (cost=0.42..2913.04 rows=620 width=32) > (actual time=1768.724..1768.725 rows=1 loops=1) > Filter: ('foo@bar.com'::text ~* address_regex) > Rows Removed by Filter: 97781 > Heap Fetches: 0 > Buffers: shared hit=530 > The file in FreeBSD came by pg_dump from the linux system, I am puzzled > why this huge difference in Buffers: shared hit. The "rows removed" value is also quite a bit different, so it's not just a matter of buffer touches --- there's evidently some real difference in how much of the index is being scanned. I speculate that you are using different collations on the two systems, and FreeBSD's collation happens to place the first matching row earlier in the index. regards, tom lane
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com> writes:*FreeBSD* -> Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook (cost=0.42..2912.06 rows=620 width=32) (actual time=96.704..96.705 rows=1 loops=1) Filter: ('foo@bar.com'::text ~* address_regex) Rows Removed by Filter: 14738 Heap Fetches: 0 Buffers: shared hit=71 *Linux* -> Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook (cost=0.42..2913.04 rows=620 width=32) (actual time=1768.724..1768.725 rows=1 loops=1) Filter: ('foo@bar.com'::text ~* address_regex) Rows Removed by Filter: 97781 Heap Fetches: 0 Buffers: shared hit=530The file in FreeBSD came by pg_dump from the linux system, I am puzzled why this huge difference in Buffers: shared hit.The "rows removed" value is also quite a bit different, so it's not just a matter of buffer touches --- there's evidently some real difference in how much of the index is being scanned. I speculate that you are using different collations on the two systems, and FreeBSD's collation happens to place the first matching row earlier in the index.
Thank you, I see that both systems use en_US.UTF-8 as lc_collate and lc_ctype, and that in both systems :
dynacom=# \dOS+
List of collations
Schema | Name | Collate | Ctype | Provider | Description
------------+---------+---------+-------+----------+------------------------------
pg_catalog | C | C | C | libc | standard C collation
pg_catalog | POSIX | POSIX | POSIX | libc | standard POSIX collation
pg_catalog | default | | | default | database's default collation
(3 rows)
dynacom=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+-----------+-------------+-------------+------------------------
dynacom | postgres | SQL_ASCII | en_US.UTF-8 | en_US.UTF-8 |
the below seems ok
FreeBSD :
postgres@[local]/dynacom=# select * from (values ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
column1
---------
_
.
0
a
Z
(5 rows)
Linux:
dynacom=# select * from (values ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
column1
---------
_
.
0
a
Z
(5 rows)
dynacom=#
but :
Freebsd :
postgres@[local]/dynacom=# select distinct address_regex from mail_vessel_addressbook order by address_regex::text ASC limit 5;
address_regex
----------------------------------------------------------
_cmo.ship.inf@<hide>.<hid>
_EMD_REEFER@hide>.<hid>
_OfficeHayPoint@hide>.<hid>
_Sabtank_PCQ1_All_SSVSSouth_area@hide>.<hid>
_Sabtank_PCQ1_Lead_OperatorsSouth_area@hide>.<hid>
(5 rows)
While in Linux :
dynacom=# select distinct address_regex from mail_vessel_addressbook order by address_regex::text ASC limit 5;
address_regex
-----------------------------------
0033240902573@<hidden>.<hid>
0033442057364@<hidden>.<hid>
0072usl@<hidden>.<hid>
0081354426912@<hidden>.<hid>
00862163602861@<hidden>.<hid>
(5 rows)
somethings does not seem right.
regards, tom lane
-- Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt
Achilleas Mantzios <a.mantzios@cloud.gatewaynet.com> writes: > Thank you, I see that both systems use en_US.UTF-8 as lc_collate and > lc_ctype, Doesn't necessarily mean they interpret that the same way, though :-( > the below seems ok > FreeBSD : > postgres@[local]/dynacom=# select * from (values > ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text; > column1 > --------- > _ > . > 0 > a > Z > (5 rows) Sadly, this proves very little about Linux's behavior. glibc's idea of en_US involves some very complicated multi-pass sort rules. AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US as "same as C except case-insensitive", whereas I'm pretty sure that underscores and other punctuation are nearly ignored in glibc's interpretation; they'll only be taken into account if the alphanumeric parts of the strings sort equal. regards, tom lane
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Στις 15/9/23 22:42, ο/η Tom Lane έγραψε: > Achilleas Mantzios <a.mantzios@cloud.gatewaynet.com> writes: >> Thank you, I see that both systems use en_US.UTF-8 as lc_collate and >> lc_ctype, > Doesn't necessarily mean they interpret that the same way, though :-( > >> the below seems ok >> FreeBSD : >> postgres@[local]/dynacom=# select * from (values >> ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text; >> column1 >> --------- >> _ >> . >> 0 >> a >> Z >> (5 rows) > Sadly, this proves very little about Linux's behavior. glibc's idea > of en_US involves some very complicated multi-pass sort rules. > AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US > as "same as C except case-insensitive", whereas I'm pretty sure > that underscores and other punctuation are nearly ignored in > glibc's interpretation; they'll only be taken into account if the Thank you so much. Makes perfect sense. This begs the question asked also in the -sql list : how do I index on regex'es, or at least have a barely scalable solution? Here I try to match a given string against a stored regex, whereas in pg_trgm's case the user tries to match a stored text against a given regex. > alphanumeric parts of the strings sort equal. > > regards, tom lane -- Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt
On Sat, Sep 16, 2023 at 7:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sadly, this proves very little about Linux's behavior. glibc's idea > of en_US involves some very complicated multi-pass sort rules. > AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US > as "same as C except case-insensitive", whereas I'm pretty sure > that underscores and other punctuation are nearly ignored in > glibc's interpretation; they'll only be taken into account if the > alphanumeric parts of the strings sort equal. Achilleas didn't mention the glibc version, but based on the kernel vintage mentioned I guess that must be the "old" (pre 2.28) glibc sorting. In 2.28 they did a big sync-up with ISO 14651, while FreeBSD follows the UCA, a closely related standard[1]. I think newer Linux/glibc systems should agree with FreeBSD's libc in more cases (and also agree with ICU). [1] https://unicode.org/reports/tr10/#Synch_ISO14651
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Thank you Thomas , our linux's glibc is on version : 2.19-18+deb8u10, we need to upgrade on so many levels.On Sat, Sep 16, 2023 at 7:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:Sadly, this proves very little about Linux's behavior. glibc's idea of en_US involves some very complicated multi-pass sort rules. AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US as "same as C except case-insensitive", whereas I'm pretty sure that underscores and other punctuation are nearly ignored in glibc's interpretation; they'll only be taken into account if the alphanumeric parts of the strings sort equal.Achilleas didn't mention the glibc version, but based on the kernel vintage mentioned I guess that must be the "old" (pre 2.28) glibc sorting. In 2.28 they did a big sync-up with ISO 14651, while FreeBSD follows the UCA, a closely related standard[1]. I think newer Linux/glibc systems should agree with FreeBSD's libc in more cases (and also agree with ICU).
[1] https://unicode.org/reports/tr10/#Synch_ISO14651
-- Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt