Обсуждение: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Dear All
I have a weird problem, I am trying to improve performance on this query :
SELECT text('foo@bar.com') from mail_vessel_addressbook where text('foo@bar.com') ~* address_regex limit 1;
The first system (linux) is a linux hosted in a cloud, kernel 3.16.0-4-amd64, 32GB mem, SSD, 4 x Intel(R) Xeon(R) CPU E7-4860 v2 @ 2.60GHz ,
The second (freebsd) system, used as test, is my local FreeBSD 13.1-RELEASE workstation, 32GB mem, ZFS/magnetic disks ,16 x AMD Ryzen 7 5800X 3800.16-MHz .
Overall my workstation is faster, but my issue is not plain speed. The problem is as follows :
FreeBSD
postgres@[local]/dynacom=# explain (analyze,buffers) SELECT text('foo@bar.com') from mail_vessel_addressbook where text('foo@bar.com') ~* address_regex limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..5.11 rows=1 width=32) (actual time=96.705..96.706 rows=1 loops=1)
Buffers: shared hit=71
-> Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook (cost=0.42..2912.06 rows=620 width=32) (actual time=96.704..96.705 rows=1 loops=1)
Filter: ('foo@bar.com'::text ~* address_regex)
Rows Removed by Filter: 14738
Heap Fetches: 0
Buffers: shared hit=71
Planning time: 0.082 ms
Execution time: 96.725 ms
(9 rows)
Time: 97.038 ms
postgres@[local]/dynacom=#
Linux
dynacom=# explain (analyze,buffers) SELECT text('foo@bar.com') from mail_vessel_addressbook where text('foo@bar.com') ~* address_regex limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..5.12 rows=1 width=32) (actual time=1768.725..1768.727 rows=1 loops=1)
Buffers: shared hit=530
-> Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook (cost=0.42..2913.04 rows=620 width=32) (actual time=1768.724..1768.725 rows=1 loops=1)
Filter: ('foo@bar.com'::text ~* address_regex)
Rows Removed by Filter: 97781
Heap Fetches: 0
Buffers: shared hit=530
Planning time: 1.269 ms
Execution time: 1768.998 ms
(9 rows)
The file in FreeBSD came by pg_dump from the linux system, I am puzzled why this huge difference in Buffers: shared hit. All table/index sizes are identical on both systems, I did vacuum full on the linux one, and also did vacuum freeze on both. I analyzed both, reindexed both (several times). Still the FreeBSD seems to access about 7 times less number of blocks from shared_buffers than linux : 71 vs 530 . There is no bloat , I tested with newly fresh table in both systems as well.
Thank you for any help.
Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com> writes:
> *FreeBSD*
>
> -> Index Only Scan using mail_vessel_addressbook_address_regex_idx
> on mail_vessel_addressbook (cost=0.42..2912.06 rows=620 width=32)
> (actual time=96.704..96.705 rows=1 loops=1)
> Filter: ('foo@bar.com'::text ~* address_regex)
> Rows Removed by Filter: 14738
> Heap Fetches: 0
> Buffers: shared hit=71
>
> *Linux*
>
> -> Index Only Scan using mail_vessel_addressbook_address_regex_idx
> on mail_vessel_addressbook (cost=0.42..2913.04 rows=620 width=32)
> (actual time=1768.724..1768.725 rows=1 loops=1)
> Filter: ('foo@bar.com'::text ~* address_regex)
> Rows Removed by Filter: 97781
> Heap Fetches: 0
> Buffers: shared hit=530
> The file in FreeBSD came by pg_dump from the linux system, I am puzzled
> why this huge difference in Buffers: shared hit.
The "rows removed" value is also quite a bit different, so it's not
just a matter of buffer touches --- there's evidently some real difference
in how much of the index is being scanned. I speculate that you are
using different collations on the two systems, and FreeBSD's collation
happens to place the first matching row earlier in the index.
regards, tom lane
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com> writes:*FreeBSD* -> Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook (cost=0.42..2912.06 rows=620 width=32) (actual time=96.704..96.705 rows=1 loops=1) Filter: ('foo@bar.com'::text ~* address_regex) Rows Removed by Filter: 14738 Heap Fetches: 0 Buffers: shared hit=71 *Linux* -> Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook (cost=0.42..2913.04 rows=620 width=32) (actual time=1768.724..1768.725 rows=1 loops=1) Filter: ('foo@bar.com'::text ~* address_regex) Rows Removed by Filter: 97781 Heap Fetches: 0 Buffers: shared hit=530The file in FreeBSD came by pg_dump from the linux system, I am puzzled why this huge difference in Buffers: shared hit.The "rows removed" value is also quite a bit different, so it's not just a matter of buffer touches --- there's evidently some real difference in how much of the index is being scanned. I speculate that you are using different collations on the two systems, and FreeBSD's collation happens to place the first matching row earlier in the index.
Thank you, I see that both systems use en_US.UTF-8 as lc_collate and lc_ctype, and that in both systems :
dynacom=# \dOS+
List of collations
Schema | Name | Collate | Ctype | Provider | Description
------------+---------+---------+-------+----------+------------------------------
pg_catalog | C | C | C | libc | standard C collation
pg_catalog | POSIX | POSIX | POSIX | libc | standard POSIX collation
pg_catalog | default | | | default | database's default collation
(3 rows)
dynacom=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+-----------+-------------+-------------+------------------------
dynacom | postgres | SQL_ASCII | en_US.UTF-8 | en_US.UTF-8 |
the below seems ok
FreeBSD :
postgres@[local]/dynacom=# select * from (values ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
column1
---------
_
.
0
a
Z
(5 rows)
Linux:
dynacom=# select * from (values ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
column1
---------
_
.
0
a
Z
(5 rows)
dynacom=#
but :
Freebsd :
postgres@[local]/dynacom=# select distinct address_regex from mail_vessel_addressbook order by address_regex::text ASC limit 5;
address_regex
----------------------------------------------------------
_cmo.ship.inf@<hide>.<hid>
_EMD_REEFER@hide>.<hid>
_OfficeHayPoint@hide>.<hid>
_Sabtank_PCQ1_All_SSVSSouth_area@hide>.<hid>
_Sabtank_PCQ1_Lead_OperatorsSouth_area@hide>.<hid>
(5 rows)
While in Linux :
dynacom=# select distinct address_regex from mail_vessel_addressbook order by address_regex::text ASC limit 5;
address_regex
-----------------------------------
0033240902573@<hidden>.<hid>
0033442057364@<hidden>.<hid>
0072usl@<hidden>.<hid>
0081354426912@<hidden>.<hid>
00862163602861@<hidden>.<hid>
(5 rows)
somethings does not seem right.
regards, tom lane
-- Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt
Achilleas Mantzios <a.mantzios@cloud.gatewaynet.com> writes:
> Thank you, I see that both systems use en_US.UTF-8 as lc_collate and
> lc_ctype,
Doesn't necessarily mean they interpret that the same way, though :-(
> the below seems ok
> FreeBSD :
> postgres@[local]/dynacom=# select * from (values
> ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
> column1
> ---------
> _
> .
> 0
> a
> Z
> (5 rows)
Sadly, this proves very little about Linux's behavior. glibc's idea
of en_US involves some very complicated multi-pass sort rules.
AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
as "same as C except case-insensitive", whereas I'm pretty sure
that underscores and other punctuation are nearly ignored in
glibc's interpretation; they'll only be taken into account if the
alphanumeric parts of the strings sort equal.
regards, tom lane
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Στις 15/9/23 22:42, ο/η Tom Lane έγραψε:
> Achilleas Mantzios <a.mantzios@cloud.gatewaynet.com> writes:
>> Thank you, I see that both systems use en_US.UTF-8 as lc_collate and
>> lc_ctype,
> Doesn't necessarily mean they interpret that the same way, though :-(
>
>> the below seems ok
>> FreeBSD :
>> postgres@[local]/dynacom=# select * from (values
>> ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
>> column1
>> ---------
>> _
>> .
>> 0
>> a
>> Z
>> (5 rows)
> Sadly, this proves very little about Linux's behavior. glibc's idea
> of en_US involves some very complicated multi-pass sort rules.
> AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
> as "same as C except case-insensitive", whereas I'm pretty sure
> that underscores and other punctuation are nearly ignored in
> glibc's interpretation; they'll only be taken into account if the
Thank you so much. Makes perfect sense.
This begs the question asked also in the -sql list : how do I index on
regex'es, or at least have a barely scalable solution? Here I try to
match a given string against a stored regex, whereas in pg_trgm's case
the user tries to match a stored text against a given regex.
> alphanumeric parts of the strings sort equal.
>
> regards, tom lane
--
Achilleas Mantzios
IT DEV - HEAD
IT DEPT
Dynacom Tankers Mgmt
On Sat, Sep 16, 2023 at 7:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sadly, this proves very little about Linux's behavior. glibc's idea > of en_US involves some very complicated multi-pass sort rules. > AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US > as "same as C except case-insensitive", whereas I'm pretty sure > that underscores and other punctuation are nearly ignored in > glibc's interpretation; they'll only be taken into account if the > alphanumeric parts of the strings sort equal. Achilleas didn't mention the glibc version, but based on the kernel vintage mentioned I guess that must be the "old" (pre 2.28) glibc sorting. In 2.28 they did a big sync-up with ISO 14651, while FreeBSD follows the UCA, a closely related standard[1]. I think newer Linux/glibc systems should agree with FreeBSD's libc in more cases (and also agree with ICU). [1] https://unicode.org/reports/tr10/#Synch_ISO14651
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Thank you Thomas , our linux's glibc is on version : 2.19-18+deb8u10, we need to upgrade on so many levels.On Sat, Sep 16, 2023 at 7:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:Sadly, this proves very little about Linux's behavior. glibc's idea of en_US involves some very complicated multi-pass sort rules. AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US as "same as C except case-insensitive", whereas I'm pretty sure that underscores and other punctuation are nearly ignored in glibc's interpretation; they'll only be taken into account if the alphanumeric parts of the strings sort equal.Achilleas didn't mention the glibc version, but based on the kernel vintage mentioned I guess that must be the "old" (pre 2.28) glibc sorting. In 2.28 they did a big sync-up with ISO 14651, while FreeBSD follows the UCA, a closely related standard[1]. I think newer Linux/glibc systems should agree with FreeBSD's libc in more cases (and also agree with ICU).
[1] https://unicode.org/reports/tr10/#Synch_ISO14651
-- Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt