Обсуждение: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

Поиск
Список
Период
Сортировка

pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

От
Achilleas Mantzios - cloud
Дата:

Dear All

I have a weird problem, I am trying to improve performance on this query :

SELECT text('foo@bar.com') from mail_vessel_addressbook where text('foo@bar.com') ~* address_regex limit 1;

The first system (linux) is a linux hosted in a cloud, kernel 3.16.0-4-amd64, 32GB mem, SSD, 4 x Intel(R) Xeon(R) CPU E7-4860 v2 @ 2.60GHz

The second (freebsd) system, used as test, is my local FreeBSD 13.1-RELEASE workstation, 32GB mem, ZFS/magnetic disks ,16 x AMD Ryzen 7 5800X 3800.16-MHz .

Overall my workstation is faster, but my issue is not plain speed. The problem is as follows :

FreeBSD

postgres@[local]/dynacom=# explain (analyze,buffers) SELECT text('foo@bar.com') from mail_vessel_addressbook where text('foo@bar.com') ~* address_regex limit 1;
                                                                                     QUERY PLAN                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.42..5.11 rows=1 width=32) (actual time=96.705..96.706 rows=1 loops=1)
  Buffers: shared hit=71
  ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook  (cost=0.42..2912.06 rows=620 width=32) (actual time=96.704..96.705 rows=1 loops=1)
        Filter: ('foo@bar.com'::text ~* address_regex)
        Rows Removed by Filter: 14738
        Heap Fetches: 0
        Buffers: shared hit=71
Planning time: 0.082 ms
Execution time: 96.725 ms
(9 rows)

Time: 97.038 ms
postgres@[local]/dynacom=#

Linux

dynacom=# explain (analyze,buffers) SELECT text('foo@bar.com') from mail_vessel_addressbook where text('foo@bar.com') ~* address_regex limit 1;
                                                                                       QUERY PLAN                                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.42..5.12 rows=1 width=32) (actual time=1768.725..1768.727 rows=1 loops=1)
  Buffers: shared hit=530
  ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx on mail_vessel_addressbook  (cost=0.42..2913.04 rows=620 width=32) (actual time=1768.724..1768.725 rows=1 loops=1)
        Filter: ('foo@bar.com'::text ~* address_regex)
        Rows Removed by Filter: 97781
        Heap Fetches: 0
        Buffers: shared hit=530
Planning time: 1.269 ms
Execution time: 1768.998 ms
(9 rows)


The file in FreeBSD came by pg_dump from the linux system, I am puzzled why this huge difference in Buffers: shared hit. All table/index sizes are identical on both systems, I did vacuum full on the linux one, and also did vacuum freeze on both. I analyzed both, reindexed both (several times). Still the FreeBSD seems to access about 7 times less number of blocks from shared_buffers than linux : 71 vs 530 . There is no bloat , I tested with newly fresh table in both systems as well.

Thank you for any help.

Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

От
Tom Lane
Дата:
Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com> writes:
> *FreeBSD*
> 
>    ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
> on mail_vessel_addressbook  (cost=0.42..2912.06 rows=620 width=32) 
> (actual time=96.704..96.705 rows=1 loops=1)
>          Filter: ('foo@bar.com'::text ~* address_regex)
>          Rows Removed by Filter: 14738
>          Heap Fetches: 0
>          Buffers: shared hit=71
> 
> *Linux*
> 
>    ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
> on mail_vessel_addressbook  (cost=0.42..2913.04 rows=620 width=32) 
> (actual time=1768.724..1768.725 rows=1 loops=1)
>          Filter: ('foo@bar.com'::text ~* address_regex)
>          Rows Removed by Filter: 97781
>          Heap Fetches: 0
>          Buffers: shared hit=530

> The file in FreeBSD came by pg_dump from the linux system, I am puzzled 
> why this huge difference in Buffers: shared hit.

The "rows removed" value is also quite a bit different, so it's not
just a matter of buffer touches --- there's evidently some real difference
in how much of the index is being scanned.  I speculate that you are
using different collations on the two systems, and FreeBSD's collation
happens to place the first matching row earlier in the index.

            regards, tom lane



Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

От
Achilleas Mantzios
Дата:
Στις 15/9/23 18:23, ο/η Tom Lane έγραψε:
Achilleas Mantzios - cloud <a.mantzios@cloud.gatewaynet.com> writes:
*FreeBSD*
   ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
on mail_vessel_addressbook  (cost=0.42..2912.06 rows=620 width=32) 
(actual time=96.704..96.705 rows=1 loops=1)         Filter: ('foo@bar.com'::text ~* address_regex)         Rows Removed by Filter: 14738         Heap Fetches: 0         Buffers: shared hit=71

*Linux*
   ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
on mail_vessel_addressbook  (cost=0.42..2913.04 rows=620 width=32) 
(actual time=1768.724..1768.725 rows=1 loops=1)         Filter: ('foo@bar.com'::text ~* address_regex)         Rows Removed by Filter: 97781         Heap Fetches: 0         Buffers: shared hit=530
The file in FreeBSD came by pg_dump from the linux system, I am puzzled 
why this huge difference in Buffers: shared hit.
The "rows removed" value is also quite a bit different, so it's not
just a matter of buffer touches --- there's evidently some real difference
in how much of the index is being scanned.  I speculate that you are
using different collations on the two systems, and FreeBSD's collation
happens to place the first matching row earlier in the index.

Thank you, I see that both systems use en_US.UTF-8 as  lc_collate and lc_ctype, and that in both systems :

dynacom=# \dOS+
                               List of collations
  Schema   |  Name   | Collate | Ctype | Provider |         Description           
------------+---------+---------+-------+----------+------------------------------
pg_catalog | C       | C       | C     | libc     | standard C collation
pg_catalog | POSIX   | POSIX   | POSIX | libc     | standard POSIX collation
pg_catalog | default |         |       | default  | database's default collation
(3 rows)


dynacom=# \l
                                  List of databases
  Name    |  Owner   | Encoding  |   Collate   |    Ctype    |   Access privileges     
-----------+----------+-----------+-------------+-------------+------------------------
dynacom   | postgres | SQL_ASCII | en_US.UTF-8 | en_US.UTF-8 |

the below seems ok

FreeBSD :

postgres@[local]/dynacom=# select * from (values ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
column1  
---------
_
.
0
a
Z
(5 rows)

Linux:

dynacom=# select * from (values ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
column1  
---------
_
.
0
a
Z
(5 rows)

dynacom=#

but :

Freebsd :

postgres@[local]/dynacom=# select distinct address_regex from mail_vessel_addressbook order by address_regex::text ASC limit 5;
                     address_regex                        
----------------------------------------------------------
_cmo.ship.inf@<hide>.<hid>
_EMD_REEFER@
hide>.<hid>
_OfficeHayPoint@
hide>.<hid>
_Sabtank_PCQ1_All_SSVSSouth_area@
hide>.<hid>
_Sabtank_PCQ1_Lead_OperatorsSouth_area@
hide>.<hid>
(5 rows)

While in Linux :

dynacom=# select distinct address_regex from mail_vessel_addressbook order by address_regex::text ASC limit 5;  
          address_regex            
-----------------------------------
0033240902573@<hidden>.<hid>
0033442057364@
<hidden>.<hid>
0072usl@
<hidden>.<hid>

0081354426912@<hidden>.<hid>
00862163602861@
<hidden>.<hid>
(5 rows)

somethings does not seem right.

			regards, tom lane
-- 
Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt

Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

От
Tom Lane
Дата:
Achilleas Mantzios <a.mantzios@cloud.gatewaynet.com> writes:
> Thank you, I see that both systems use en_US.UTF-8 as lc_collate and 
> lc_ctype,

Doesn't necessarily mean they interpret that the same way, though :-(

> the below seems ok

> FreeBSD :

> postgres@[local]/dynacom=# select * from (values 
> ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
> column1
> ---------
> _
> .
> 0
> a
> Z
> (5 rows)

Sadly, this proves very little about Linux's behavior.  glibc's idea
of en_US involves some very complicated multi-pass sort rules.
AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
as "same as C except case-insensitive", whereas I'm pretty sure
that underscores and other punctuation are nearly ignored in
glibc's interpretation; they'll only be taken into account if the
alphanumeric parts of the strings sort equal.

            regards, tom lane



Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

От
Achilleas Mantzios
Дата:
Στις 15/9/23 22:42, ο/η Tom Lane έγραψε:
> Achilleas Mantzios <a.mantzios@cloud.gatewaynet.com> writes:
>> Thank you, I see that both systems use en_US.UTF-8 as lc_collate and
>> lc_ctype,
> Doesn't necessarily mean they interpret that the same way, though :-(
>
>> the below seems ok
>> FreeBSD :
>> postgres@[local]/dynacom=# select * from (values
>> ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
>> column1
>> ---------
>> _
>> .
>> 0
>> a
>> Z
>> (5 rows)
> Sadly, this proves very little about Linux's behavior.  glibc's idea
> of en_US involves some very complicated multi-pass sort rules.
> AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
> as "same as C except case-insensitive", whereas I'm pretty sure
> that underscores and other punctuation are nearly ignored in
> glibc's interpretation; they'll only be taken into account if the

Thank you so much. Makes perfect sense.

This begs the question asked also in the -sql list : how do I index on 
regex'es, or at least have a barely scalable solution? Here I try to 
match a given string against a stored regex, whereas in pg_trgm's case 
the user tries to match a stored text against a given regex.

> alphanumeric parts of the strings sort equal.
>
>             regards, tom lane

-- 
Achilleas Mantzios
  IT DEV - HEAD
  IT DEPT
  Dynacom Tankers Mgmt




Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

От
Thomas Munro
Дата:
On Sat, Sep 16, 2023 at 7:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sadly, this proves very little about Linux's behavior.  glibc's idea
> of en_US involves some very complicated multi-pass sort rules.
> AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
> as "same as C except case-insensitive", whereas I'm pretty sure
> that underscores and other punctuation are nearly ignored in
> glibc's interpretation; they'll only be taken into account if the
> alphanumeric parts of the strings sort equal.

Achilleas didn't mention the glibc version, but based on the kernel
vintage mentioned I guess that must be the "old" (pre 2.28) glibc
sorting.  In 2.28 they did a big sync-up with ISO 14651, while FreeBSD
follows the UCA, a closely related standard[1].  I think newer
Linux/glibc systems should agree with FreeBSD's libc in more cases
(and also agree with ICU).

[1] https://unicode.org/reports/tr10/#Synch_ISO14651



Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

От
Achilleas Mantzios
Дата:
Στις 16/9/23 02:08, ο/η Thomas Munro έγραψε:
On Sat, Sep 16, 2023 at 7:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sadly, this proves very little about Linux's behavior.  glibc's idea
of en_US involves some very complicated multi-pass sort rules.
AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
as "same as C except case-insensitive", whereas I'm pretty sure
that underscores and other punctuation are nearly ignored in
glibc's interpretation; they'll only be taken into account if the
alphanumeric parts of the strings sort equal.
Achilleas didn't mention the glibc version, but based on the kernel
vintage mentioned I guess that must be the "old" (pre 2.28) glibc
sorting.  In 2.28 they did a big sync-up with ISO 14651, while FreeBSD
follows the UCA, a closely related standard[1].  I think newer
Linux/glibc systems should agree with FreeBSD's libc in more cases
(and also agree with ICU).
Thank you Thomas , our linux's glibc is on version : 2.19-18+deb8u10, we need to upgrade on so many levels.

[1] https://unicode.org/reports/tr10/#Synch_ISO14651
-- 
Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt