Обсуждение: Potential "AIO / io workers" inter-worker locking issue in PG18?

Поиск

Список

Период

Сортировка

Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

05 октября, 11:55:01

Hi,

I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as 
Windows 10 Hyper-V virtual machine.

The machine is a dedicated refurbished HP Z840 local workstation with 
2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with 
the Ubuntu guest having 400 GB RAM available.

On this machine, which is dedicated to just one custom written 
geoprocessing workflow involving OpenStreetMap data, I have successfully 
processed up to global OpenStreetMap Facebook Daylight distribution 
data, with up to > 2.4 B record Polygon table for all Facebook Daylight 
buildings. So this has proven a very capable system.

However, after upgrading to PG18 and the switch to the "io_method = 
worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an 
issue where it appears there may be a major issue with io workers 
potentially getting into some sort of locking conflict, that takes hours 
to resolve.

The custom written geoprocessing workflow uses Python multi-threading 
based on the Python 'concurrent.futures' framework in combination with 
either pyodbc or psycopg2 as database connector to implement a powerful 
parallel processing solution to speed up some of the computationally 
intensive tasks (which use UPDATEs), which I generally use with up to 44 
threads to fully saturate the dual CPU 44 core system. The custom code 
creates a pool of jobs to process for the threads, with the code being 
designed to minimize inter-thread locking issues by taking into account 
PostgreSQL page locality (although the actual records to process are not 
assigned by pages but by unique IDs in the tables). Basically, the code 
is designed such that different threads never attempt to access the same 
database pages, as each thread gets it own unique pages assigned, thus 
avoiding inter-thread locking conflicts. This has worked really well in 
the past, with system usage maximized over all cores and significantly 
speeding up processing. Jobs are implemented as database VIEWs, that 
point to the records to process via the unique ID of each. These views 
must of course be read by each thread, which is probably where the PG18 
io workers kick-in.

This has worked really well in previous versions of PostgreSQL (tested 
up to PG17). However, in PG18, during the multi-threaded processing, I 
see some of my submitted jobs that in this case were run against a small 
OpenStreetMap Italy extract of Geofabrik, all of a sudden take > 1 hour 
to finish (up to 6 hours for this small extract), even though similar 
jobs from the same processing step, finish in less than 10 seconds (and 
the other jobs should as well). This seems to happen kind of "random". 
Many multi-threading tasks before and after the affected processing 
steps, do finish normally.

When this happens, I observe the following things:

- High processor activity, even though the jobs that should finish in 
seconds, take hours, all the while showing the high core usage.

- PgAdmin shows all sessions created by the Python threads as 'active', 
with *no* wait events attached.

- The pg_locks table does not show locking conflicts, all locks are 
granted. I did notice however, that the relation / table locks were not 
"fastpath" locks, but ordinary ones. All other locks taken, e.g. on 
indexes related to the same table, were fastpath. I don't know if this 
has any relevance though, as from what I read about the difference, this 
shouldn't cause such a big difference, not seconds to hours.

- Please note that the processing DOES eventually proceed, so it is not 
an infinite dead-lock or something where I need to kill my Python code. 
It just takes hours to resolve.

- Switching to "io_method = sync" seems to resolve this issue, and I do 
not observe some jobs of the same batch getting "stuck". This is the 
behavior I was used to seeing in <=PG17.

I am not to familiar with all the internals of PostgreSQL and the new 
AIO framework and its "io workers". However, it seems there may be some 
sort of locking issue between io workers that can occasionally happen in 
PG18 with "io_method = worker"? Is there anyone else observing similar 
issues in high multi-threaded processing worklflows?

Marco

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Markus KARG

Дата:

05 октября, 13:06:27

I am not a PostgreSQL contributor and have no clue what the actual 
technical details are in the new AIO code, but reading your report the 
following questions came to my mind:

* Does the failure also happen with io_mode=io_uring? If no, it is a 
proof that it is really bound to io_mode=worker, not to AIO in general.

* Does the failure also happen with io_mode=worker when your Python code 
uses only 22 cores, and PostgreSQL uses only 22 workers (so Python and 
PostgreSQL do not share CPU cores)? If no, it might indicate that the 
problem could be solved by increasing the execution policy in favor of 
PostgreSQL to give a hint to the scheduler that a CPU core should be 
given to PostgreSQL FIRST as Python most likely is waiting on it to 
continue, but PostgreSQL could not continue because the schedule gave 
all the cores to Python... (classical deadlock; eventually resolves once 
enough CPU cores are free to eventually finish the starving thread).

HTH

-Markus


Am 05.10.2025 um 10:55 schrieb Marco Boeringa:
> Hi,
>
> I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as 
> Windows 10 Hyper-V virtual machine.
>
> The machine is a dedicated refurbished HP Z840 local workstation with 
> 2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with 
> the Ubuntu guest having 400 GB RAM available.
>
> On this machine, which is dedicated to just one custom written 
> geoprocessing workflow involving OpenStreetMap data, I have 
> successfully processed up to global OpenStreetMap Facebook Daylight 
> distribution data, with up to > 2.4 B record Polygon table for all 
> Facebook Daylight buildings. So this has proven a very capable system.
>
> However, after upgrading to PG18 and the switch to the "io_method = 
> worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an 
> issue where it appears there may be a major issue with io workers 
> potentially getting into some sort of locking conflict, that takes 
> hours to resolve.
>
> The custom written geoprocessing workflow uses Python multi-threading 
> based on the Python 'concurrent.futures' framework in combination with 
> either pyodbc or psycopg2 as database connector to implement a 
> powerful parallel processing solution to speed up some of the 
> computationally intensive tasks (which use UPDATEs), which I generally 
> use with up to 44 threads to fully saturate the dual CPU 44 core 
> system. The custom code creates a pool of jobs to process for the 
> threads, with the code being designed to minimize inter-thread locking 
> issues by taking into account PostgreSQL page locality (although the 
> actual records to process are not assigned by pages but by unique IDs 
> in the tables). Basically, the code is designed such that different 
> threads never attempt to access the same database pages, as each 
> thread gets it own unique pages assigned, thus avoiding inter-thread 
> locking conflicts. This has worked really well in the past, with 
> system usage maximized over all cores and significantly speeding up 
> processing. Jobs are implemented as database VIEWs, that point to the 
> records to process via the unique ID of each. These views must of 
> course be read by each thread, which is probably where the PG18 io 
> workers kick-in.
>
> This has worked really well in previous versions of PostgreSQL (tested 
> up to PG17). However, in PG18, during the multi-threaded processing, I 
> see some of my submitted jobs that in this case were run against a 
> small OpenStreetMap Italy extract of Geofabrik, all of a sudden take > 
> 1 hour to finish (up to 6 hours for this small extract), even though 
> similar jobs from the same processing step, finish in less than 10 
> seconds (and the other jobs should as well). This seems to happen kind 
> of "random". Many multi-threading tasks before and after the affected 
> processing steps, do finish normally.
>
> When this happens, I observe the following things:
>
> - High processor activity, even though the jobs that should finish in 
> seconds, take hours, all the while showing the high core usage.
>
> - PgAdmin shows all sessions created by the Python threads as 
> 'active', with *no* wait events attached.
>
> - The pg_locks table does not show locking conflicts, all locks are 
> granted. I did notice however, that the relation / table locks were 
> not "fastpath" locks, but ordinary ones. All other locks taken, e.g. 
> on indexes related to the same table, were fastpath. I don't know if 
> this has any relevance though, as from what I read about the 
> difference, this shouldn't cause such a big difference, not seconds to 
> hours.
>
> - Please note that the processing DOES eventually proceed, so it is 
> not an infinite dead-lock or something where I need to kill my Python 
> code. It just takes hours to resolve.
>
> - Switching to "io_method = sync" seems to resolve this issue, and I 
> do not observe some jobs of the same batch getting "stuck". This is 
> the behavior I was used to seeing in <=PG17.
>
> I am not to familiar with all the internals of PostgreSQL and the new 
> AIO framework and its "io workers". However, it seems there may be 
> some sort of locking issue between io workers that can occasionally 
> happen in PG18 with "io_method = worker"? Is there anyone else 
> observing similar issues in high multi-threaded processing worklflows?
>
> Marco
>
>
>

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Thom Brown

Дата:

05 октября, 13:51:19

On Sun, 5 Oct 2025, 10:52 Marco Boeringa, <marco@boeringa.demon.nl> wrote:

Hi,

I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as
Windows 10 Hyper-V virtual machine.

The machine is a dedicated refurbished HP Z840 local workstation with
2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with
the Ubuntu guest having 400 GB RAM available.

On this machine, which is dedicated to just one custom written
geoprocessing workflow involving OpenStreetMap data, I have successfully
processed up to global OpenStreetMap Facebook Daylight distribution
data, with up to > 2.4 B record Polygon table for all Facebook Daylight
buildings. So this has proven a very capable system.

However, after upgrading to PG18 and the switch to the "io_method =
worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an
issue where it appears there may be a major issue with io workers
potentially getting into some sort of locking conflict, that takes hours
to resolve.

The custom written geoprocessing workflow uses Python multi-threading
based on the Python 'concurrent.futures' framework in combination with
either pyodbc or psycopg2 as database connector to implement a powerful
parallel processing solution to speed up some of the computationally
intensive tasks (which use UPDATEs), which I generally use with up to 44
threads to fully saturate the dual CPU 44 core system. The custom code
creates a pool of jobs to process for the threads, with the code being
designed to minimize inter-thread locking issues by taking into account
PostgreSQL page locality (although the actual records to process are not
assigned by pages but by unique IDs in the tables). Basically, the code
is designed such that different threads never attempt to access the same
database pages, as each thread gets it own unique pages assigned, thus
avoiding inter-thread locking conflicts. This has worked really well in
the past, with system usage maximized over all cores and significantly
speeding up processing. Jobs are implemented as database VIEWs, that
point to the records to process via the unique ID of each. These views
must of course be read by each thread, which is probably where the PG18
io workers kick-in.

This has worked really well in previous versions of PostgreSQL (tested
up to PG17). However, in PG18, during the multi-threaded processing, I
see some of my submitted jobs that in this case were run against a small
OpenStreetMap Italy extract of Geofabrik, all of a sudden take > 1 hour
to finish (up to 6 hours for this small extract), even though similar
jobs from the same processing step, finish in less than 10 seconds (and
the other jobs should as well). This seems to happen kind of "random".
Many multi-threading tasks before and after the affected processing
steps, do finish normally.

When this happens, I observe the following things:

- High processor activity, even though the jobs that should finish in
seconds, take hours, all the while showing the high core usage.

- PgAdmin shows all sessions created by the Python threads as 'active',
with *no* wait events attached.

- The pg_locks table does not show locking conflicts, all locks are
granted. I did notice however, that the relation / table locks were not
"fastpath" locks, but ordinary ones. All other locks taken, e.g. on
indexes related to the same table, were fastpath. I don't know if this
has any relevance though, as from what I read about the difference, this
shouldn't cause such a big difference, not seconds to hours.

- Please note that the processing DOES eventually proceed, so it is not
an infinite dead-lock or something where I need to kill my Python code.
It just takes hours to resolve.

- Switching to "io_method = sync" seems to resolve this issue, and I do
not observe some jobs of the same batch getting "stuck". This is the
behavior I was used to seeing in <=PG17.

I am not to familiar with all the internals of PostgreSQL and the new
AIO framework and its "io workers". However, it seems there may be some
sort of locking issue between io workers that can occasionally happen in
PG18 with "io_method = worker"? Is there anyone else observing similar
issues in high multi-threaded processing worklflows?

So, to confirm, you get the issue with as little as 3 io_workers?

Also, what is pg_aios telling you during this time?

Thom

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

05 октября, 22:57:48

Hi Thom,

I now also witnessed this issue with "io_method = sync", so it may not have relation with the number of workers set. I initially thought it did not occur with 'sync', as two runs successfully completed without delays, however, the last did show the issue. Unfortunately, this is a very complex multi-stage geoprocessing workflow, that cannot easily be cut down to a simple one SQL statement reproducible case. And for the specific Italy extract it takes about 7 hours to complete if the run is successful and without the delays observed, so each test run costs considerable time if I adjust anything.

There is also a PostGIS upgrade in the mix (3.5.2 to 3.6.0) that may or may not be involved, as that version of PostGIS is the minimum for PG18. I see a 3.6.1 is already planned and will need to re-test with that version once released. I definitely do use PostGIS functions at the stage the processing gets heavily delayed.

As to the question about the pg_aios view I wasn't aware off: it appears to be empty at that point, but I will need to confirm that observation, as with my last run, the moment I looked at the view, some of the very delayed multi-threaded jobs (> 6.5 hours instead of 10 seconds!) started slowly returning one by one, although some were still in wait / stuck for some time before all had returned, so the pg_aios view being empty probably is still representative of the stuck situation.

Also note that I also adjust the storage parameters of the tables involved to force a more aggressive vacuuming to avoid transaction ID wraparound (which shouldn't be an issue anyway with the small test Italy extract). This has all proven pretty reliable in the past and with previous PostgreSQL / PostGIS releases, up to the Facebook Daylight multi-billion record tables as noted in the previous post. There also is no PostgreSQL partitioning involved in any of this, these are ordinary tables.

Marco

Op 5-10-2025 om 12:51 schreef Thom Brown:

On Sun, 5 Oct 2025, 10:52 Marco Boeringa, <marco@boeringa.demon.nl> wrote:
Hi,

I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as
Windows 10 Hyper-V virtual machine.

The machine is a dedicated refurbished HP Z840 local workstation with
2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with
the Ubuntu guest having 400 GB RAM available.

On this machine, which is dedicated to just one custom written
geoprocessing workflow involving OpenStreetMap data, I have successfully
processed up to global OpenStreetMap Facebook Daylight distribution
data, with up to > 2.4 B record Polygon table for all Facebook Daylight
buildings. So this has proven a very capable system.

However, after upgrading to PG18 and the switch to the "io_method =
worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an
issue where it appears there may be a major issue with io workers
potentially getting into some sort of locking conflict, that takes hours
to resolve.

The custom written geoprocessing workflow uses Python multi-threading
based on the Python 'concurrent.futures' framework in combination with
either pyodbc or psycopg2 as database connector to implement a powerful
parallel processing solution to speed up some of the computationally
intensive tasks (which use UPDATEs), which I generally use with up to 44
threads to fully saturate the dual CPU 44 core system. The custom code
creates a pool of jobs to process for the threads, with the code being
designed to minimize inter-thread locking issues by taking into account
PostgreSQL page locality (although the actual records to process are not
assigned by pages but by unique IDs in the tables). Basically, the code
is designed such that different threads never attempt to access the same
database pages, as each thread gets it own unique pages assigned, thus
avoiding inter-thread locking conflicts. This has worked really well in
the past, with system usage maximized over all cores and significantly
speeding up processing. Jobs are implemented as database VIEWs, that
point to the records to process via the unique ID of each. These views
must of course be read by each thread, which is probably where the PG18
io workers kick-in.

This has worked really well in previous versions of PostgreSQL (tested
up to PG17). However, in PG18, during the multi-threaded processing, I
see some of my submitted jobs that in this case were run against a small
OpenStreetMap Italy extract of Geofabrik, all of a sudden take > 1 hour
to finish (up to 6 hours for this small extract), even though similar
jobs from the same processing step, finish in less than 10 seconds (and
the other jobs should as well). This seems to happen kind of "random".
Many multi-threading tasks before and after the affected processing
steps, do finish normally.

When this happens, I observe the following things:

- High processor activity, even though the jobs that should finish in
seconds, take hours, all the while showing the high core usage.

- PgAdmin shows all sessions created by the Python threads as 'active',
with *no* wait events attached.

- The pg_locks table does not show locking conflicts, all locks are
granted. I did notice however, that the relation / table locks were not
"fastpath" locks, but ordinary ones. All other locks taken, e.g. on
indexes related to the same table, were fastpath. I don't know if this
has any relevance though, as from what I read about the difference, this
shouldn't cause such a big difference, not seconds to hours.

- Please note that the processing DOES eventually proceed, so it is not
an infinite dead-lock or something where I need to kill my Python code.
It just takes hours to resolve.

- Switching to "io_method = sync" seems to resolve this issue, and I do
not observe some jobs of the same batch getting "stuck". This is the
behavior I was used to seeing in <=PG17.

I am not to familiar with all the internals of PostgreSQL and the new
AIO framework and its "io workers". However, it seems there may be some
sort of locking issue between io workers that can occasionally happen in
PG18 with "io_method = worker"? Is there anyone else observing similar
issues in high multi-threaded processing worklflows?

So, to confirm, you get the issue with as little as 3 io_workers?

Also, what is pg_aios telling you during this time?

Thom

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

05 октября, 23:22:32

Hi Thom,

I realized that my observation of the pg_aios view being empty was likely with the "io_method = sync" option set, which I guess doesn't use or fill the pg_aios view? Can you confirm the pg_aios view is unused with "io_method = sync", this aspect is not documented in the PostgreSQL help? Anyway, I will need to re-test with 'worker' set.

I do see Tomas Vondra mentioning that even the 'sync' option in PG18 still goes "through the AIO infrastructure", but what that exactly means, also in relation to the pg_aios view, IDK:

https://vondra.me/posts/tuning-aio-in-postgresql-18/

Marco

Op 5-10-2025 om 21:57 schreef Marco Boeringa:

Hi Thom,
I now also witnessed this issue with "io_method = sync", so it may not have relation with the number of workers set. I initially thought it did not occur with 'sync', as two runs successfully completed without delays, however, the last did show the issue. Unfortunately, this is a very complex multi-stage geoprocessing workflow, that cannot easily be cut down to a simple one SQL statement reproducible case. And for the specific Italy extract it takes about 7 hours to complete if the run is successful and without the delays observed, so each test run costs considerable time if I adjust anything.
There is also a PostGIS upgrade in the mix (3.5.2 to 3.6.0) that may or may not be involved, as that version of PostGIS is the minimum for PG18. I see a 3.6.1 is already planned and will need to re-test with that version once released. I definitely do use PostGIS functions at the stage the processing gets heavily delayed.
As to the question about the pg_aios view I wasn't aware off: it appears to be empty at that point, but I will need to confirm that observation, as with my last run, the moment I looked at the view, some of the very delayed multi-threaded jobs (> 6.5 hours instead of 10 seconds!) started slowly returning one by one, although some were still in wait / stuck for some time before all had returned, so the pg_aios view being empty probably is still representative of the stuck situation.
Also note that I also adjust the storage parameters of the tables involved to force a more aggressive vacuuming to avoid transaction ID wraparound (which shouldn't be an issue anyway with the small test Italy extract). This has all proven pretty reliable in the past and with previous PostgreSQL / PostGIS releases, up to the Facebook Daylight multi-billion record tables as noted in the previous post. There also is no PostgreSQL partitioning involved in any of this, these are ordinary tables.
Marco
Op 5-10-2025 om 12:51 schreef Thom Brown:
On Sun, 5 Oct 2025, 10:52 Marco Boeringa, <marco@boeringa.demon.nl> wrote:
Hi,

I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as
Windows 10 Hyper-V virtual machine.

The machine is a dedicated refurbished HP Z840 local workstation with
2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with
the Ubuntu guest having 400 GB RAM available.

On this machine, which is dedicated to just one custom written
geoprocessing workflow involving OpenStreetMap data, I have successfully
processed up to global OpenStreetMap Facebook Daylight distribution
data, with up to > 2.4 B record Polygon table for all Facebook Daylight
buildings. So this has proven a very capable system.

However, after upgrading to PG18 and the switch to the "io_method =
worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an
issue where it appears there may be a major issue with io workers
potentially getting into some sort of locking conflict, that takes hours
to resolve.

The custom written geoprocessing workflow uses Python multi-threading
based on the Python 'concurrent.futures' framework in combination with
either pyodbc or psycopg2 as database connector to implement a powerful
parallel processing solution to speed up some of the computationally
intensive tasks (which use UPDATEs), which I generally use with up to 44
threads to fully saturate the dual CPU 44 core system. The custom code
creates a pool of jobs to process for the threads, with the code being
designed to minimize inter-thread locking issues by taking into account
PostgreSQL page locality (although the actual records to process are not
assigned by pages but by unique IDs in the tables). Basically, the code
is designed such that different threads never attempt to access the same
database pages, as each thread gets it own unique pages assigned, thus
avoiding inter-thread locking conflicts. This has worked really well in
the past, with system usage maximized over all cores and significantly
speeding up processing. Jobs are implemented as database VIEWs, that
point to the records to process via the unique ID of each. These views
must of course be read by each thread, which is probably where the PG18
io workers kick-in.

This has worked really well in previous versions of PostgreSQL (tested
up to PG17). However, in PG18, during the multi-threaded processing, I
see some of my submitted jobs that in this case were run against a small
OpenStreetMap Italy extract of Geofabrik, all of a sudden take > 1 hour
to finish (up to 6 hours for this small extract), even though similar
jobs from the same processing step, finish in less than 10 seconds (and
the other jobs should as well). This seems to happen kind of "random".
Many multi-threading tasks before and after the affected processing
steps, do finish normally.

When this happens, I observe the following things:

- High processor activity, even though the jobs that should finish in
seconds, take hours, all the while showing the high core usage.

- PgAdmin shows all sessions created by the Python threads as 'active',
with *no* wait events attached.

- The pg_locks table does not show locking conflicts, all locks are
granted. I did notice however, that the relation / table locks were not
"fastpath" locks, but ordinary ones. All other locks taken, e.g. on
indexes related to the same table, were fastpath. I don't know if this
has any relevance though, as from what I read about the difference, this
shouldn't cause such a big difference, not seconds to hours.

- Please note that the processing DOES eventually proceed, so it is not
an infinite dead-lock or something where I need to kill my Python code.
It just takes hours to resolve.

- Switching to "io_method = sync" seems to resolve this issue, and I do
not observe some jobs of the same batch getting "stuck". This is the
behavior I was used to seeing in <=PG17.

I am not to familiar with all the internals of PostgreSQL and the new
AIO framework and its "io workers". However, it seems there may be some
sort of locking issue between io workers that can occasionally happen in
PG18 with "io_method = worker"? Is there anyone else observing similar
issues in high multi-threaded processing worklflows?

So, to confirm, you get the issue with as little as 3 io_workers?

Also, what is pg_aios telling you during this time?

Thom

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

06 октября, 12:55:54

Hi Thom,

As an extension to what I already wrote: as the processing gets stuck during UPDATEs, I realized the pg_aios view is likely not involved, as the current AIO implementation of PG18 only affects the read operations like sequential and bitmap heap scans.

So not seeing anything listed in the pg_aios view might be normal? That said, I have attempted to view and refresh the view during other stages of the processing, with pgAdmin showing apparently read operations, but still no records displayed in pg_aios. Maybe I am hitting the "refresh" button on the wrong time though...

But maybe the whole new AIO thing isn't involved in these, and it is another issue in PG18. Just to summarize my observations once again:

- Multi-threaded processing implemented in Python using pyodbc and concurrent.futures apparently getting stuck waiting for PostgreSQL to return. The processing step involved should return in ***less than 10 seconds*** for the small Italy extract, but can take >1h (up to >6) when it gets randomly stuck (some runs successful without delay, others not).

- pgAdmin showing all sessions associated with the threads as 'Active' with no wait events nor blocking PIDs during the whole time the processing appears stuck in PostgreSQL.

- No other sessions like VACUUM visible in pgAdmin during the time the processing appears stuck except the main 'postgres' user session.

- All locks as shown in pg_locks are granted, and most if not all are fastpath, with only AccessShareLock and RowExclusiveLock on the table and its indexes involved. A couple of ExclusiveLock on virtualxid and transactionid.

- 'Top' in Ubuntu showing multiple backend 'postgres' processes continuously at high core usage, one for each thread (each Python thread of course uses its own connection).

- pg_aios view empty, but the processing is UPDATEs, so probably no surprise.

- The processing *DOES* eventually continue after this particular anomaly, with no further consequences and expected results at the end of the total processing flow, so it is not a true dead-lock.

- I have noticed it gets stuck when processing OpenStreetMap scrub or grassland of the Italy extract of Geofabrik. However, as written above, some processing runs are fine on the same data, while others get stuck and delayed. The issue may or may not involve PostGIS though considering this and the fact that the processing step getting stuck involves PostGIS functions.

- In pgAdmin, the SQL statements as generated by my geoprocessing workflow and as being processed by PostgreSQL when the processing is stuck, look like this:

UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1 SET area_geo = t2.area_geo, perim_geo = t2.perim_geo, compact_geo = CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END, npoints_geo = t2.npoints_geo, comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END, convex_ratio_geo = CASE WHEN ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo / ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END FROM (SELECT objectid,ST_Area(way::geography,true) AS area_geo,ST_Perimeter(way::geography,true) AS perim_geo,ST_NPoints(way) AS npoints_geo FROM osm.landcover_scrubs_small_scale_2_ply) AS t2 WHERE (t2.objectid = t1.objectid) AND t1.objectid IN (SELECT t3.objectid FROM mini_test.osm.osm_tmp_28128_ch5 AS t3)

- All of this worked fine in PG <= 17.

Marco

So, to confirm, you get the issue with as little as 3 io_workers?

Also, what is pg_aios telling you during this time?

Thom

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

06 октября, 13:40:01

Hi Markus,

On my Ubuntu virtual machine, io_uring cannot be started. Setting 
"io_method = io_uring" and trying to restart the cluster, fails. It will 
not start, I have attempted this multiple times. Only 'sync' and 
'worker' allow restarting after modifying the PostgreSQL configuration 
file.

As I understood, the PostgreSQL binary needs to be compiled with the 
proper support, maybe my version on Ubuntu 24.04 that runs as a Windows 
Hyper-V virtual machine, doesn't have it. Although I did notice when 
installing PG18 from synaptic, that it installed an additional 
'liburing' package or something named like that if I remember well...

As to your question about Python and scheduling conflict: this is not 
the case. Python runs on the Windows host, not under Ubuntu inside the 
VM. I only have PostgreSQL installed on Ubuntu, as I use it with 
osm2pgsql there. I access the PostgreSQL instance via pyodbc or psycopg2 
on the Windows host, so it is like a remote database server, just 
running on local hardware.

Marco

 > I am not a PostgreSQL contributor and have no clue what the actual
 > technical details are in the new AIO code, but reading your report the
 > following questions came to my mind:

 >

 > * Does the failure also happen with io_mode=io_uring? If no, it is a
proof that it is really bound to io_mode=worker, not to AIO in general.

 > * Does the failure also happen with io_mode=worker when your Python code
 > uses only 22 cores, and PostgreSQL uses only 22 workers (so Python and
 > PostgreSQL do not share CPU cores)? If no, it might indicate that the
 > problem could be solved by increasing the execution policy in favor of
 > PostgreSQL to give a hint to the scheduler that a CPU core should be
 > given to PostgreSQL FIRST as Python most likely is waiting on it to
 > continue, but PostgreSQL could not continue because the schedule gave
 > all the cores to Python... (classical deadlock; eventually resolves once
 > enough CPU cores are free to eventually finish the starving thread).

 > HTH

 > -Markus

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

06 октября, 17:20:22

Hi,

On 2025-10-05 22:22:32 +0200, Marco Boeringa wrote:
> I realized that my observation of the pg_aios view being empty was likely
> with the "io_method = sync" option set, which I guess doesn't use or fill
> the pg_aios view? Can you confirm the pg_aios view is unused with "io_method
> = sync", this aspect is not documented in the PostgreSQL help? Anyway, I
> will need to re-test with 'worker' set.

pg_aios is populated even with io_method = sync, albeit with at most one entry
entry per backend.

If there were no entries in pg_aios at the time of your hang, it doesn't seem
likely - although not impossible - for AIO to be responsible.

Greetings,

Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

06 октября, 17:34:49

Hi,

On 2025-10-05 10:55:01 +0200, Marco Boeringa wrote:
> This has worked really well in previous versions of PostgreSQL (tested up to
> PG17). However, in PG18, during the multi-threaded processing, I see some of
> my submitted jobs that in this case were run against a small OpenStreetMap
> Italy extract of Geofabrik, all of a sudden take > 1 hour to finish (up to 6
> hours for this small extract), even though similar jobs from the same
> processing step, finish in less than 10 seconds (and the other jobs should
> as well). This seems to happen kind of "random". Many multi-threading tasks
> before and after the affected processing steps, do finish normally.
> 
> When this happens, I observe the following things:
> 
> - High processor activity, even though the jobs that should finish in
> seconds, take hours, all the while showing the high core usage.
> 
> - PgAdmin shows all sessions created by the Python threads as 'active', with
> *no* wait events attached.

I think we need CPU profiles of these tasks. If something is continually
taking a lot more CPU than expected, that seems like an issue worth
investigating.

Greetings,

Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

06 октября, 19:09:25

Hi Andres,

I should have phrased it better. The high processor and core activity is 
not unexpected. The code is designed to saturate the processor and 
maximize throughput by careful design of the Python threading stuff. It 
is just that all the jobs send to PostgreSQL via ODBC for the specific 
step in the processing and with the small Italy extract, should return 
in less than 10 seconds (which they do in those lucky runs I do not 
observe the issue), but some of the jobs for the specific step don't, 
e.g. 30 jobs return within 10 seconds, then the remaining 14 
unexpectedly get stuck for 2 hours before returning, all the while 
staying at the same high core usage they were initiated with.

So some of the PostgreSQL database sessions, as I already explained show 
up in pgAdmin as 'active' with no wait events or blocking pids, simply 
take an excessive amount of time, but will ultimately return.

The CPU time, as witnessed with 'top' in Ubuntu, is really spend in 
PostgreSQL and the database sessions, not Python, which is run in 
Windows, and doesn't show high CPU usage in the Windows Task Manager.

This doesn't always happen, it is kind of random. One run with the Italy 
data will be OK, the next not.

Marco

Op 6-10-2025 om 16:34 schreef Andres Freund:
> Hi,
>
> On 2025-10-05 10:55:01 +0200, Marco Boeringa wrote:
>> This has worked really well in previous versions of PostgreSQL (tested up to
>> PG17). However, in PG18, during the multi-threaded processing, I see some of
>> my submitted jobs that in this case were run against a small OpenStreetMap
>> Italy extract of Geofabrik, all of a sudden take > 1 hour to finish (up to 6
>> hours for this small extract), even though similar jobs from the same
>> processing step, finish in less than 10 seconds (and the other jobs should
>> as well). This seems to happen kind of "random". Many multi-threading tasks
>> before and after the affected processing steps, do finish normally.
>>
>> When this happens, I observe the following things:
>>
>> - High processor activity, even though the jobs that should finish in
>> seconds, take hours, all the while showing the high core usage.
>>
>> - PgAdmin shows all sessions created by the Python threads as 'active', with
>> *no* wait events attached.
> I think we need CPU profiles of these tasks. If something is continually
> taking a lot more CPU than expected, that seems like an issue worth
> investigating.
>
> Greetings,
>
> Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

06 октября, 19:13:29

Hi,

On 2025-10-06 18:09:25 +0200, Marco Boeringa wrote:
> I should have phrased it better. The high processor and core activity is not
> unexpected. The code is designed to saturate the processor and maximize
> throughput by careful design of the Python threading stuff. It is just that
> all the jobs send to PostgreSQL via ODBC for the specific step in the
> processing and with the small Italy extract, should return in less than 10
> seconds (which they do in those lucky runs I do not observe the issue), but
> some of the jobs for the specific step don't, e.g. 30 jobs return within 10
> seconds, then the remaining 14 unexpectedly get stuck for 2 hours before
> returning, all the while staying at the same high core usage they were
> initiated with.

> So some of the PostgreSQL database sessions, as I already explained show up
> in pgAdmin as 'active' with no wait events or blocking pids, simply take an
> excessive amount of time, but will ultimately return.

We need a profile of those processes while they use excessive amount of
time. If they don't have wait events they're using CPU time, and seeing a
profile of where all that time is spent might provide enough information where
to look in more detail.

Greetings,

Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

06 октября, 19:17:11

Hi Andras,

I am not really a Linux / Ubuntu expert. Can you give me a suggestion 
for how to create such a CPU profile for the specific PostgreSQL 
processes getting stuck?

Marco

Op 6-10-2025 om 18:13 schreef Andres Freund:
> Hi,
>
> On 2025-10-06 18:09:25 +0200, Marco Boeringa wrote:
>> I should have phrased it better. The high processor and core activity is not
>> unexpected. The code is designed to saturate the processor and maximize
>> throughput by careful design of the Python threading stuff. It is just that
>> all the jobs send to PostgreSQL via ODBC for the specific step in the
>> processing and with the small Italy extract, should return in less than 10
>> seconds (which they do in those lucky runs I do not observe the issue), but
>> some of the jobs for the specific step don't, e.g. 30 jobs return within 10
>> seconds, then the remaining 14 unexpectedly get stuck for 2 hours before
>> returning, all the while staying at the same high core usage they were
>> initiated with.
>> So some of the PostgreSQL database sessions, as I already explained show up
>> in pgAdmin as 'active' with no wait events or blocking pids, simply take an
>> excessive amount of time, but will ultimately return.
> We need a profile of those processes while they use excessive amount of
> time. If they don't have wait events they're using CPU time, and seeing a
> profile of where all that time is spent might provide enough information where
> to look in more detail.
>
> Greetings,
>
> Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

06 октября, 19:21:15

Hi,

On 2025-10-06 18:17:11 +0200, Marco Boeringa wrote:
> Hi Andras,
> 
> I am not really a Linux / Ubuntu expert. Can you give me a suggestion for
> how to create such a CPU profile for the specific PostgreSQL processes
> getting stuck?

https://wiki.postgresql.org/wiki/Profiling_with_perf is a good starting point.

Greetings,

Andres

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

06 октября, 20:01:37

Hi Andres,

Thanks for the suggestion, this seems a useful option.

However, when I attempt to run "perf top" in a Terminal window, I get 
the following warning:

WARNING: perf not found for kernel 6.14.0-1012

I also see a suggestion to install the Azure linux-tools. However, if I 
type 'linux-tools' as search keyword in Synaptic Package Manager, I see 
a whole bunch of 'linux-tools', e.g. azure/aws/gcp/gke, which also 
include kernel version build numbers (at least that is what I assume 
they are).

What version do you suggest I install for an ordinary locally running 
Ubuntu 24.04 VM?

And do these packages indeed add the perf command?

Marco

Op 6-10-2025 om 18:21 schreef Andres Freund:
> Hi,
>
> On 2025-10-06 18:17:11 +0200, Marco Boeringa wrote:
>> Hi Andras,
>>
>> I am not really a Linux / Ubuntu expert. Can you give me a suggestion for
>> how to create such a CPU profile for the specific PostgreSQL processes
>> getting stuck?
> https://wiki.postgresql.org/wiki/Profiling_with_perf is a good starting point.
>
> Greetings,
>
> Andres

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

06 октября, 20:09:11

Hi,

On 2025-10-06 19:01:37 +0200, Marco Boeringa wrote:
> Thanks for the suggestion, this seems a useful option.
> 
> However, when I attempt to run "perf top" in a Terminal window, I get the
> following warning:
> 
> WARNING: perf not found for kernel 6.14.0-1012
> 
> I also see a suggestion to install the Azure linux-tools. However, if I type
> 'linux-tools' as search keyword in Synaptic Package Manager, I see a whole
> bunch of 'linux-tools', e.g. azure/aws/gcp/gke, which also include kernel
> version build numbers (at least that is what I assume they are).
> 
> What version do you suggest I install for an ordinary locally running Ubuntu
> 24.04 VM?

There are meta-packages to install linux-tools for the right
version. E.g. linux-tools-virtual. Unfortunately ubuntu has multiple "kernel
variants" (like -virtual) that you still have to choose between.

You can figure out which base kernel you have with "dpkg -l|grep linux" or
such.

Greetings,

Andres

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

06 октября, 20:29:16

Hi Andres,

I now noticed that all the suggested 'linux-tools' packages that popped 
up in the warning message when I attempted to run "perf top" and 
referencing Azure, are already displayed as installed in Synaptic 
Package Manager. I guess it makes sense the packages and likely the 
kernel of my machine are for Azure, as it is a Windows Hyper-V virtual 
machine with the Microsoft provided Ubuntu install option.

However, if the packages are installed, why can't I run perf, or do I 
need a 'linux-tools' specific command for that instead of perf?

Marco

Op 6-10-2025 om 19:09 schreef Andres Freund:
> Hi,
>
> On 2025-10-06 19:01:37 +0200, Marco Boeringa wrote:
>> Thanks for the suggestion, this seems a useful option.
>>
>> However, when I attempt to run "perf top" in a Terminal window, I get the
>> following warning:
>>
>> WARNING: perf not found for kernel 6.14.0-1012
>>
>> I also see a suggestion to install the Azure linux-tools. However, if I type
>> 'linux-tools' as search keyword in Synaptic Package Manager, I see a whole
>> bunch of 'linux-tools', e.g. azure/aws/gcp/gke, which also include kernel
>> version build numbers (at least that is what I assume they are).
>>
>> What version do you suggest I install for an ordinary locally running Ubuntu
>> 24.04 VM?
> There are meta-packages to install linux-tools for the right
> version. E.g. linux-tools-virtual. Unfortunately ubuntu has multiple "kernel
> variants" (like -virtual) that you still have to choose between.
>
> You can figure out which base kernel you have with "dpkg -l|grep linux" or
> such.
>
> Greetings,
>
> Andres

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

06 октября, 23:41:31

Hi Andres,

I now found out that I do have a 'perf' living under one of 'usr' 
folders, but unfortunately, this is the 6.8 kernel version:

/usr/lib/linux-tools-6.8.0-85

None of the other suggested packages and their likely install folders 
seem to contain perf.

Since perf appears and rather understandably seems to need to exactly 
match the kernel version, I can't use this one, as my kernel was already 
upgraded to 6.14 by a more or less forced update in Software Updater. It 
is a pain the linux-tools-common package, that I suspect is the source 
of the 6.8 'perf' version and folder and tagged as that version in 
Synaptic, isn't being updated at the same time to allow you to run 
'perf' with the proper version.

I guess I will need to wait for an update of it.

Marco

Op 6-10-2025 om 19:09 schreef Andres Freund:
> Hi,
>
> On 2025-10-06 19:01:37 +0200, Marco Boeringa wrote:
>> Thanks for the suggestion, this seems a useful option.
>>
>> However, when I attempt to run "perf top" in a Terminal window, I get the
>> following warning:
>>
>> WARNING: perf not found for kernel 6.14.0-1012
>>
>> I also see a suggestion to install the Azure linux-tools. However, if I type
>> 'linux-tools' as search keyword in Synaptic Package Manager, I see a whole
>> bunch of 'linux-tools', e.g. azure/aws/gcp/gke, which also include kernel
>> version build numbers (at least that is what I assume they are).
>>
>> What version do you suggest I install for an ordinary locally running Ubuntu
>> 24.04 VM?
> There are meta-packages to install linux-tools for the right
> version. E.g. linux-tools-virtual. Unfortunately ubuntu has multiple "kernel
> variants" (like -virtual) that you still have to choose between.
>
> You can figure out which base kernel you have with "dpkg -l|grep linux" or
> such.
>
> Greetings,
>
> Andres

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

07 октября, 00:39:52

On 2025-10-06 22:41:31 +0200, Marco Boeringa wrote:
> Hi Andres,
> 
> I now found out that I do have a 'perf' living under one of 'usr' folders,
> but unfortunately, this is the 6.8 kernel version:
> 
> /usr/lib/linux-tools-6.8.0-85
> 
> None of the other suggested packages and their likely install folders seem
> to contain perf.
> 
> Since perf appears and rather understandably seems to need to exactly match
> the kernel version, I can't use this one, as my kernel was already upgraded
> to 6.14 by a more or less forced update in Software Updater.

I'm pretty sure that you can use any halfway-recent perf binary, they don't
actually need to match exactly. I don't know why ubuntu insists on a perfect
match. I regularly run completely different versions.

Greetings,

Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

07 октября, 10:15:51

It didn't work: as soon as I attempted to run perf, it emitted the 
warning message about the kernel version mismatch with suggestions of 
package to install.

However, I now realized after further digging, that Ubuntu usually has 
multiple kernel versions installed. I have now attempted to add the GRUB 
boot menu, which should allow me to boot with the older 6.8 version of 
the kernel (which was available during configuration of GRUB), and 
hopefully run perf with that version of the kernel.

Marco

Op 6-10-2025 om 23:39 schreef Andres Freund:
> On 2025-10-06 22:41:31 +0200, Marco Boeringa wrote:
>> Hi Andres,
>>
>> I now found out that I do have a 'perf' living under one of 'usr' folders,
>> but unfortunately, this is the 6.8 kernel version:
>>
>> /usr/lib/linux-tools-6.8.0-85
>>
>> None of the other suggested packages and their likely install folders seem
>> to contain perf.
>>
>> Since perf appears and rather understandably seems to need to exactly match
>> the kernel version, I can't use this one, as my kernel was already upgraded
>> to 6.14 by a more or less forced update in Software Updater.
> I'm pretty sure that you can use any halfway-recent perf binary, they don't
> actually need to match exactly. I don't know why ubuntu insists on a perfect
> match. I regularly run completely different versions.
>
> Greetings,
>
> Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

07 октября, 10:34:14

Hi Andres,

That worked, I successfully booted with kernel 6.8!

I can now run perf, but it emits a warning, see below. Do you have 
suggestions of how to set these perf 'paranoid' settings?

Marco

|Access to performance monitoring and observability operations is 
limited.  │
│Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to 
open    │
│access to performance monitoring and observability operations for 
processes│
|without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
│More information can be found at 'Perf events and tool security' 
document: │
│https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html     │
│perf_event_paranoid setting is 4:           │
│  -1: Allow use of (almost) all events by all users           │
|      Ignore mlock limit after perf_event_mlock_kb without 
CAP_IPC_LOCK    │
│>= 0: Disallow raw and ftrace function tracepoint access              │
│>= 1: Disallow CPU event access             │
│>= 2: Disallow kernel profiling             │
│To make the adjusted perf_event_paranoid setting permanent preserve it  
    │
│in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)

Op 7-10-2025 om 09:15 schreef Marco Boeringa:
> It didn't work: as soon as I attempted to run perf, it emitted the 
> warning message about the kernel version mismatch with suggestions of 
> package to install.
>
> However, I now realized after further digging, that Ubuntu usually has 
> multiple kernel versions installed. I have now attempted to add the 
> GRUB boot menu, which should allow me to boot with the older 6.8 
> version of the kernel (which was available during configuration of 
> GRUB), and hopefully run perf with that version of the kernel.
>
> Marco
>
> Op 6-10-2025 om 23:39 schreef Andres Freund:
>> On 2025-10-06 22:41:31 +0200, Marco Boeringa wrote:
>>> Hi Andres,
>>>
>>> I now found out that I do have a 'perf' living under one of 'usr' 
>>> folders,
>>> but unfortunately, this is the 6.8 kernel version:
>>>
>>> /usr/lib/linux-tools-6.8.0-85
>>>
>>> None of the other suggested packages and their likely install 
>>> folders seem
>>> to contain perf.
>>>
>>> Since perf appears and rather understandably seems to need to 
>>> exactly match
>>> the kernel version, I can't use this one, as my kernel was already 
>>> upgraded
>>> to 6.14 by a more or less forced update in Software Updater.
>> I'm pretty sure that you can use any halfway-recent perf binary, they 
>> don't
>> actually need to match exactly. I don't know why ubuntu insists on a 
>> perfect
>> match. I regularly run completely different versions.
>>
>> Greetings,
>>
>> Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

07 октября, 12:50:08

Hi Andres,

I realized I simply need to run perf with 'sudo' privileges, that seems 
to work. All examples I saw omitted it, so hence the confusion.

I will attempt to capture the stall with the Italy data once I finish 
another OpenStreetMap rendering test currently running.

Marco

Op 7-10-2025 om 09:34 schreef Marco Boeringa:
> Hi Andres,
>
> That worked, I successfully booted with kernel 6.8!
>
> I can now run perf, but it emits a warning, see below. Do you have 
> suggestions of how to set these perf 'paranoid' settings?
>
> Marco
>
> |Access to performance monitoring and observability operations is 
> limited.  │
> │Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to 
> open    │
> │access to performance monitoring and observability operations for 
> processes│
> |without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> │More information can be found at 'Perf events and tool security' 
> document: │
> │https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html 
>   │
> │perf_event_paranoid setting is 4:           │
> │  -1: Allow use of (almost) all events by all users           │
> |      Ignore mlock limit after perf_event_mlock_kb without 
> CAP_IPC_LOCK    │
> │>= 0: Disallow raw and ftrace function tracepoint access        │
> │>= 1: Disallow CPU event access             │
> │>= 2: Disallow kernel profiling             │
> │To make the adjusted perf_event_paranoid setting permanent preserve 
> it     │
> │in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
>
> Op 7-10-2025 om 09:15 schreef Marco Boeringa:
>> It didn't work: as soon as I attempted to run perf, it emitted the 
>> warning message about the kernel version mismatch with suggestions of 
>> package to install.
>>
>> However, I now realized after further digging, that Ubuntu usually 
>> has multiple kernel versions installed. I have now attempted to add 
>> the GRUB boot menu, which should allow me to boot with the older 6.8 
>> version of the kernel (which was available during configuration of 
>> GRUB), and hopefully run perf with that version of the kernel.
>>
>> Marco
>>
>> Op 6-10-2025 om 23:39 schreef Andres Freund:
>>> On 2025-10-06 22:41:31 +0200, Marco Boeringa wrote:
>>>> Hi Andres,
>>>>
>>>> I now found out that I do have a 'perf' living under one of 'usr' 
>>>> folders,
>>>> but unfortunately, this is the 6.8 kernel version:
>>>>
>>>> /usr/lib/linux-tools-6.8.0-85
>>>>
>>>> None of the other suggested packages and their likely install 
>>>> folders seem
>>>> to contain perf.
>>>>
>>>> Since perf appears and rather understandably seems to need to 
>>>> exactly match
>>>> the kernel version, I can't use this one, as my kernel was already 
>>>> upgraded
>>>> to 6.14 by a more or less forced update in Software Updater.
>>> I'm pretty sure that you can use any halfway-recent perf binary, 
>>> they don't
>>> actually need to match exactly. I don't know why ubuntu insists on a 
>>> perfect
>>> match. I regularly run completely different versions.
>>>
>>> Greetings,
>>>
>>> Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

07 октября, 22:34:00

Hi Andres,

I have now been able to successfully run 'perf' after a VM restart and 
booting with the linux 6.8 kernel.

I have copied in two results for 'perf top' and a recording of one of 
the about 10 stuck processes by supplying a specific pid of the postgres 
backend.

Let me know if this is of any use. Of course, the recordings themselves 
are much bigger, I just copied the top part as displayed by the 'perf' 
report option. If you need the actual underlying files, let me know.

Also, if you have specific suggestions for 'perf' command options you 
would like to see run, please post them in your response and I will try 
if I can run them.

Andres: I posted separately to you I had a suspicion a recent linux 
kernel update (6.8 --> 6.14) might be involved that coincided with the 
PG18/PostGIS 3.6.0 upgrade when I initially had multiple successful runs 
of my tool without the temporary stalling / delay in returning of 
postgres after booting to the older 6.8 kernel. This has now proven 
wrong: these results are of the 6.8 kernel runs, so the kernel is not 
involved, and I also see the issue with the older kernel.

*** sudo perf top: ***

Samples: 1M of event 'cpu-clock:ppp', 4000 Hz, Event count (approx.): 
346785156250 lost: 0/0 drop: 0/0
Overhead  Shared Object                          Symbol
   17,33%  postgres                               [.] 0x0000000000584216
   12,21%  postgres                               [.] 0x000000000058737a
   11,80%  [kernel]                               [k] pv_native_safe_halt
   10,24%  postgres                               [.] ReleaseAndReadBuffer
    4,92%  postgres                               [.] heap_hot_search_buffer
    3,63%  postgres                               [.] 
hash_search_with_hash_value
    2,36%  postgres                               [.] 0x00000000005841f3
    1,85%  postgres                               [.] 0x0000000000554edd
    1,61%  postgres                               [.] 0x0000000000554e1f
    1,40%  postgres                               [.] 0x00000000005567f9
    1,32%  postgres                               [.] StartReadBuffer
    1,24%  postgres                               [.] heap_page_prune_opt
    1,23%  postgres                               [.] LWLockAcquire
    1,14%  postgres                               [.] 
HeapTupleSatisfiesVisibility
    1,10%  postgres                               [.] 0x00000000003977dd
    0,92%  postgres                               [.] index_getnext_tid
    0,87%  postgres                               [.] index_fetch_heap
    0,83%  postgres                               [.] ExecScan
    0,82%  postgres                               [.] ReleaseBuffer
    0,81%  postgres                               [.] LWLockRelease
    0,71%  postgres                               [.] XidInMVCCSnapshot
    0,66%  postgres                               [.] _bt_checkkeys
    0,66%  postgres                               [.] _bt_next
    0,55%  libxorgxrdp.so                         [.] crc_process_data
    0,51%  postgres                               [.] btgettuple
    0,41%  postgres                               [.] 0x00000000005567c6
    0,40%  postgres                               [.] 0x00000000003975d0
    0,35%  postgres                               [.] 
ExecStoreBufferHeapTuple
    0,34%  postgres                               [.] hash_bytes
    0,33%  postgres                               [.] MemoryContextReset
    0,33%  postgres                               [.] 0x00000000003975d3
    0,32%  postgres                               [.] ExecEvalSysVar
    0,28%  postgres                               [.] 0x00000000003975e3
    0,27%  postgres                               [.] ResourceOwnerForget
    0,27%  postgres                               [.] GlobalVisTestFor
    0,26%  postgres                               [.] 0x00000000003975b6
    0,26%  postgres                               [.] HeapTupleIsSurelyDead
    0,23%  postgres                               [.] 0x00000000003975c3
    0,23%  postgres                               [.] 0x00000000001ff80a
    0,22%  postgres                               [.] PredicateLockTID
    0,21%  postgres                               [.] slot_getsomeattrs_int
    0,20%  postgres                               [.] 0x000000000058421d
    0,19%  postgres                               [.] ReadBufferExtended
    0,16%  libc.so.6                              [.] __memcmp_sse2
    0,15%  postgres                               [.] 0x0000000000382e9e

*** sudo perf -p <PID of one stuck postgres backend> -g -d 10 ***
*** sudo perf report -g ***

Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10018000000
   Children      Self  Command   Shared Object      Symbol
+  100,00%     0,00%  postgres  postgres           [.] _start
+  100,00%     0,00%  postgres  libc.so.6          [.] 
__libc_start_main@@GLIBC_2.34
+  100,00%     0,00%  postgres  libc.so.6          [.] 
__libc_start_call_main
+  100,00%     0,00%  postgres  postgres           [.] main
+  100,00%     0,00%  postgres  postgres           [.] PostmasterMain
+  100,00%     0,00%  postgres  postgres           [.] 0x00005f35711b913e
+  100,00%     0,00%  postgres  postgres           [.] 
postmaster_child_launch
+  100,00%     0,00%  postgres  postgres           [.] 0x00005f3570fb9dbf
+  100,00%     0,00%  postgres  postgres           [.] PostgresMain
+  100,00%     0,00%  postgres  postgres           [.] 0x00005f3570fb6cb9
+  100,00%     0,00%  postgres  postgres           [.] PortalRun
+  100,00%     0,00%  postgres  postgres           [.] 0x00005f3570fbeca3
+  100,00%     0,00%  postgres  postgres           [.] 0x00005f3570fbcf85
+  100,00%     0,00%  postgres  postgres           [.] standard_ExecutorRun
+  100,00%     0,00%  postgres  postgres           [.] 0x00005f3570de682b
+  100,00%     0,00%  postgres  postgres           [.] 0x00005f3570de966d
+   97,51%     0,00%  postgres  postgres           [.] 0x00005f3570de959d
+   97,17%     0,95%  postgres  postgres           [.] ExecScan
+   82,75%     0,00%  postgres  postgres           [.] 0x00005f3570dda135
+   82,35%     1,04%  postgres  postgres           [.] index_fetch_heap
+   28,80%     0,00%  postgres  postgres           [.] 0x00005f3570bf3f87
+   23,67%     0,00%  postgres  postgres           [.] 0x00005f3570bf3ee8
+   22,19%     0,00%  postgres  postgres           [.] 0x00005f3570fa1216
+   22,18%    22,18%  postgres  postgres           [.] 0x0000000000584216
+   17,06%     0,89%  postgres  postgres           [.] LWLockRelease
+   16,18%     0,00%  postgres  postgres           [.] 0x00005f3570bf3f4f
+   15,59%     0,00%  postgres  postgres           [.] 0x00005f3570fa437a
+   15,57%    15,57%  postgres  postgres           [.] 0x000000000058737a
+   14,90%     0,22%  postgres  postgres           [.] ReadBufferExtended
+   14,18%    13,05%  postgres  postgres           [.] ReleaseAndReadBuffer
+   14,00%     1,82%  postgres  postgres           [.] StartReadBuffer
+    6,59%     0,00%  postgres  postgres           [.] 0x00005f3570bf3f18
+    5,97%     3,78%  postgres  postgres           [.] 
heap_hot_search_buffer
+    5,23%     0,00%  postgres  postgres           [.] 0x00005f3570da0613
+    5,20%     0,00%  postgres  postgres           [.] 0x00005f3570dda121
+    5,19%     0,22%  postgres  postgres           [.] slot_getsomeattrs_int
+    5,18%     0,68%  postgres  postgres           [.] index_getnext_tid
+    4,52%     0,65%  postgres  postgres           [.] btgettuple
+    3,92%     0,99%  postgres  postgres           [.] _bt_next
+    3,28%     3,28%  postgres  postgres           [.] 
hash_search_with_hash_value
+    2,96%     0,38%  postgres  postgres           [.] 
ExecStoreBufferHeapTuple
+    2,93%     0,00%  postgres  postgres           [.] 0x00005f3570bf3fe0

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

07 октября, 22:49:14

Hi Andres,

Probably less useful, but just to further illustrate the issue, the 
results op 'top' which shows multiple stuck postgress processes already 
busy for hours and as written before all displaying as 'active' with no 
wait events in pgAdmin:

top - 21:42:44 up 12:20,  2 users,  load average: 16,14, 16,14, 16,09
Tasks: 664 total,  17 running, 647 sleeping,   0 stopped,   0 zombie
%Cpu(s): 37,0 us,  0,1 sy,  0,0 ni, 62,9 id,  0,0 wa,  0,0 hi, 0,0 si,  
0,0 st
MiB Mem : 393429,0 total, 297340,0 free,  53521,1 used,  91791,0 buff/cache
MiB Swap: 153600,0 total, 153600,0 free,      0,0 used. 339907,9 avail Mem

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM  TIME+ 
COMMAND
  118256 postgres  20   0  164,3g 994408 985600 R 100,0   0,2 309:57.22 
postgres
  118264 postgres  20   0  164,3g   1,0g 994,8m R 100,0   0,3 309:57.46 
postgres
  118367 postgres  20   0  164,3g 911516 907456 R 100,0   0,2 309:55.78 
postgres
  118368 postgres  20   0  164,3g 915388 911328 R 100,0   0,2 309:55.82 
postgres
  118369 postgres  20   0  164,3g 914684 910272 R 100,0   0,2 309:55.20 
postgres
  118372 postgres  20   0  164,3g 924188 916256 R 100,0   0,2 309:55.20 
postgres
  118254 postgres  20   0  164,3g   1,0g   1,0g R  99,9   0,3 309:52.71 
postgres
  118255 postgres  20   0  164,3g 993,6m 981,8m R  99,9   0,3 309:57.73 
postgres
  118259 postgres  20   0  164,3g 978,2m 988768 R  99,9   0,2 309:57.47 
postgres
  118261 postgres  20   0  164,3g   1,0g 987,2m R  99,9   0,3 309:57.52 
postgres
  118262 postgres  20   0  164,3g 998560 990176 R  99,9   0,2 309:57.04 
postgres
  118265 postgres  20   0  164,3g   1,0g   1,0g R  99,9   0,3 309:55.86 
postgres
  118365 postgres  20   0  164,3g 925244 917312 R  99,9   0,2 309:56.02 
postgres
  118366 postgres  20   0  164,3g 920668 912384 R  99,9   0,2 309:56.02 
postgres
  118370 postgres  20   0  164,3g 923132 915200 R  99,9   0,2 309:55.22 
postgres
  118373 postgres  20   0  164,3g 924188 916256 R  99,9   0,2 309:54.96 
postgres
    2546 ubuntu_+  20   0  560524 266288 106704 S  21,8   0,1  7:48.62 Xorg
    2954 ubuntu_+  20   0 9050316 455464 167812 S   7,3   0,1  4:33.90 
gnome-s+
    1855 xrdp      20   0  133248  16344   8800 S   1,3   0,0  0:37.48 xrdp
    4329 ubuntu_+  20   0  851956  56648  43292 S   0,7   0,0  0:25.66 
gnome-t+
  123595 ubuntu_+  20   0   14644   5632   3520 R   0,4   0,0  0:00.31 top
  123432 root      20   0       0      0      0 I   0,2   0,0  0:00.09 
kworker+
    1385 polkitd   20   0  384600   9744   7392 S   0,1   0,0  0:01.92 
polkitd
    1982 gdm       20   0 8805756 237412 128832 S   0,1   0,1  1:20.05 
gnome-s+
    3259 ubuntu_+  20   0  388852  11472   6688 S   0,1   0,0  0:08.03 
ibus-da+
    3368 ubuntu_+  20   0  347872  30504  17556 S   0,1   0,0  0:03.70 
ibus-ex+
    3638 ubuntu_+  20   0  236772   6688   6336 S   0,1   0,0  0:02.33 
ibus-en+
   30304 ubuntu_+  20   0 6490464 271864 129800 S   0,1   0,1  0:23.91 
nautilus
       1 root      20   0   23892  13024   8800 S   0,0   0,0  0:04.11 
systemd
       2 root      20   0       0      0      0 S   0,0   0,0  0:00.04 
kthreadd
       3 root      20   0       0      0      0 S   0,0   0,0  0:00.00 
pool_wo+
       4 root       0 -20       0      0      0 I   0,0   0,0  0:00.00 
kworker+
       5 root       0 -20       0      0      0 I   0,0   0,0  0:00.00 
kworker+
       6 root       0 -20       0      0      0 I   0,0   0,0  0:00.00 
kworker+
       7 root       0 -20       0      0      0 I   0,0   0,0  0:00.00 
kworker+
       9 root       0 -20       0      0      0 I   0,0   0,0  0:00.00 
kworker+
      10 root      20   0       0      0      0 I   0,0   0,0  0:05.18 
kworker+

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

07 октября, 22:56:34

Hi,

On 2025-10-07 21:34:00 +0200, Marco Boeringa wrote:
> I have now been able to successfully run 'perf' after a VM restart and
> booting with the linux 6.8 kernel.

Unfortunately the profile is lacking most symbols, so it's not particularly
informative :(. You probably need to install the debug symbols package for
postgres (and probably libc)...


> Andres: I posted separately to you I had a suspicion a recent linux kernel
> update (6.8 --> 6.14) might be involved that coincided with the PG18/PostGIS
> 3.6.0 upgrade when I initially had multiple successful runs of my tool
> without the temporary stalling / delay in returning of postgres after
> booting to the older 6.8 kernel. This has now proven wrong: these results
> are of the 6.8 kernel runs, so the kernel is not involved, and I also see
> the issue with the older kernel.

Good to know.

Greetings,

Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

07 октября, 23:04:03

Andres,

Can you be a bit more specific as to what debug symbols package I need 
to install? There is a whole list of them for PG18 when I look with 
Synaptic Package Manager...

Or do I simply need to install all PG18 listed debug symbols packages? 
It is a very long list though of packages, so I would prefer to limit it 
to the packages you think are really useful for your debug work.

And do you have any comments as the command run? If I install the debug 
symbols, are the specific commands OK, or do you want other options?

Marco

Op 7-10-2025 om 21:56 schreef Andres Freund:
> Hi,
>
> On 2025-10-07 21:34:00 +0200, Marco Boeringa wrote:
>> I have now been able to successfully run 'perf' after a VM restart and
>> booting with the linux 6.8 kernel.
> Unfortunately the profile is lacking most symbols, so it's not particularly
> informative :(. You probably need to install the debug symbols package for
> postgres (and probably libc)...
>
>
>> Andres: I posted separately to you I had a suspicion a recent linux kernel
>> update (6.8 --> 6.14) might be involved that coincided with the PG18/PostGIS
>> 3.6.0 upgrade when I initially had multiple successful runs of my tool
>> without the temporary stalling / delay in returning of postgres after
>> booting to the older 6.8 kernel. This has now proven wrong: these results
>> are of the 6.8 kernel runs, so the kernel is not involved, and I also see
>> the issue with the older kernel.
> Good to know.
>
> Greetings,
>
> Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

08 октября, 00:38:51

Hi Andres,

Searching a bit more, and reading the PostgreSQL wiki pages about 
debugging, I have now found the 'find-dbgsym-packages' command, part of 
the 'debian-goodies'

Installing and running this against a postgres process ID, returned the 
following debug symbol packages. Does this list seem about right for you?

Marco

lib32stdc++6-14-dbg

libkrb5-dbg

libstdc++6-14-dbg

libx32stdc++6-14-dbg

postgresql-18-dbgsym

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

08 октября, 01:13:34

Hi Andres,

This looks much better, doesn't it? I hope this helps. Let me know if 
you need anything else.

Marco

*** sudo perf top: ***

Samples: 2M of event 'cpu-clock:ppp', 4000 Hz, Event count (approx.): 
593240647834 lost: 0/0 drop: 0/0
Overhead  Shared Object                          Symbol
   24,66%  [kernel]                               [k] pv_native_safe_halt
    9,35%  postgres                               [.] LWLockAttemptLock
    7,69%  postgres                               [.] heap_hot_search_buffer
    6,68%  postgres                               [.] 
tts_heap_getsomeattrs.lto_priv.0
    4,93%  postgres                               [.] LWLockReleaseInternal
    4,26%  postgres                               [.] ExecInterpExpr
    4,12%  postgres                               [.] 
hash_search_with_hash_value
    3,42%  postgres                               [.] PinBuffer
    2,54%  postgres                               [.] 
heapam_index_fetch_tuple.lto_priv.0
    2,46%  postgres                               [.] ReleaseAndReadBuffer
    1,72%  postgres                               [.] LWLockAcquire
    1,72%  postgres                               [.] 
HeapTupleSatisfiesVisibility
    1,48%  postgres                               [.] index_fetch_heap
    1,44%  postgres                               [.] index_getnext_tid
    1,40%  postgres                               [.] heap_page_prune_opt
    1,28%  postgres                               [.] _bt_readpage
    1,28%  postgres                               [.] ExecScan
    1,23%  postgres                               [.] LWLockRelease
    1,10%  postgres                               [.] IndexNext
    1,10%  postgres                               [.] XidInMVCCSnapshot
    1,07%  postgres                               [.] _bt_checkkeys
    0,98%  postgres                               [.] _bt_next
    0,90%  postgres                               [.] 
UnpinBufferNoOwner.lto_priv.0
    0,82%  postgres                               [.] 
GetPrivateRefCountEntry.lto_priv.0
    0,81%  postgres                               [.] StartReadBuffer
    0,80%  postgres                               [.] btgettuple
    0,69%  postgres                               [.] 
_bt_check_compare.lto_priv.0
    0,67%  libxorgxrdp.so                         [.] crc_process_data
    0,56%  postgres                               [.] 
ExecStoreBufferHeapTuple
    0,55%  postgres                               [.] MemoryContextReset
    0,50%  postgres                               [.] ExecEvalSysVar
    0,50%  postgres                               [.] hash_bytes
    0,48%  postgres                               [.] 
tts_virtual_clear.lto_priv.0
    0,48%  postgres                               [.] ExecNestLoop
    0,43%  postgres                               [.] ResourceOwnerForget
    0,41%  postgres                               [.] GlobalVisTestFor
    0,37%  postgres                               [.] HeapTupleIsSurelyDead
    0,33%  postgres                               [.] slot_getsomeattrs_int
    0,33%  postgres                               [.] PredicateLockTID
    0,29%  postgres                               [.] ReadBufferExtended
    0,27%  postgres                               [.] _bt_saveitem
    0,23%  postgres                               [.] _bt_setuppostingitems
    0,22%  libc.so.6                              [.] __memcmp_sse2
    0,21%  postgres                               [.] ExecIndexScan
    0,20%  postgres                               [.] ReleaseBuffer
    0,19%  postgres                               [.] ResourceOwnerEnlarge

*** sudo perf -p <PID of one stuck postgres backend> -g -d 10 ***
*** sudo perf report -g ***

Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10008250000
   Children      Self  Command   Shared Object      Symbol
+  100,00%     0,00%  postgres  postgres           [.] _start
+  100,00%     0,00%  postgres  libc.so.6          [.] 
__libc_start_main@@GLIBC_2.34
+  100,00%     0,00%  postgres  libc.so.6          [.] 
__libc_start_call_main
+  100,00%     0,00%  postgres  postgres           [.] main
+  100,00%     0,00%  postgres  postgres           [.] PostmasterMain
+  100,00%     0,00%  postgres  postgres           [.] ServerLoop.isra.0
+  100,00%     0,00%  postgres  postgres           [.] 
postmaster_child_launch
+  100,00%     0,00%  postgres  postgres           [.] 0x00005f3570fb9dbf
+  100,00%     0,00%  postgres  postgres           [.] PostgresMain
+  100,00%     0,00%  postgres  postgres           [.] exec_simple_query
+  100,00%     0,63%  postgres  postgres           [.] ExecNestLoop
+  100,00%     0,00%  postgres  postgres           [.] PortalRun
+  100,00%     0,00%  postgres  postgres           [.] PortalRunMulti
+  100,00%     0,00%  postgres  postgres           [.] ProcessQuery
+  100,00%     0,00%  postgres  postgres           [.] standard_ExecutorRun
+  100,00%     0,00%  postgres  postgres           [.] ExecModifyTable
+   94,63%     1,47%  postgres  postgres           [.] ExecScan
+   78,76%     1,49%  postgres  postgres           [.] IndexNext
+   66,89%     1,96%  postgres  postgres           [.] index_fetch_heap
+   64,35%     3,61%  postgres  postgres           [.] 
heapam_index_fetch_tuple.lto_priv.0
+   21,92%     0,40%  postgres  postgres           [.] ReadBufferExtended
+   20,36%     1,19%  postgres  postgres           [.] StartReadBuffer
+   16,23%     5,79%  postgres  postgres           [.] ExecInterpExpr
+   15,00%    10,70%  postgres  postgres           [.] 
heap_hot_search_buffer
+   12,81%    12,81%  postgres  postgres           [.] LWLockAttemptLock
+   10,31%     2,36%  postgres  postgres           [.] index_getnext_tid
+    9,69%     0,48%  postgres  postgres           [.] slot_getsomeattrs_int
+    9,27%     9,27%  postgres  postgres           [.] 
tts_heap_getsomeattrs.lto_priv.0
+    8,45%     1,85%  postgres  postgres           [.] LWLockRelease
+    7,91%     1,08%  postgres  postgres           [.] btgettuple
+    6,89%     1,12%  postgres  postgres           [.] _bt_next
+    6,77%     6,77%  postgres  postgres           [.] LWLockReleaseInternal
+    5,89%     5,89%  postgres  postgres           [.] 
hash_search_with_hash_value
+    5,78%     0,10%  postgres  postgres           [.] _bt_readnextpage
+    4,48%     4,14%  postgres  postgres           [.] PinBuffer
+    4,41%     1,90%  postgres  postgres           [.] _bt_readpage
+    4,09%     3,00%  postgres  postgres           [.] ReleaseAndReadBuffer
+    2,99%     2,38%  postgres  postgres           [.] 
HeapTupleSatisfiesVisibility
+    2,58%     0,82%  postgres  postgres           [.]
ExecStoreBufferHeapTuple
+    2,43%     2,43%  postgres  postgres           [.] LWLockAcquire
+    2,28%     1,47%  postgres  postgres           [.] _bt_checkkeys
+    1,74%     1,74%  postgres  postgres           [.] heap_page_prune_opt
+    1,67%     1,10%  postgres  postgres           [.] 
UnpinBufferNoOwner.lto_priv.0
+    1,47%     1,47%  postgres  postgres           [.] XidInMVCCSnapshot
+    1,18%     1,18%  postgres  postgres           [.] 
GetPrivateRefCountEntry.lto_priv.0
+    1,09%     1,09%  postgres  postgres           [.] MemoryContextReset


Op 7-10-2025 om 23:38 schreef Marco Boeringa:
> Hi Andres,
>
> Searching a bit more, and reading the PostgreSQL wiki pages about 
> debugging, I have now found the 'find-dbgsym-packages' command, part 
> of the 'debian-goodies'
>
> Installing and running this against a postgres process ID, returned 
> the following debug symbol packages. Does this list seem about right 
> for you?
>
> Marco
>
> lib32stdc++6-14-dbg
>
> libkrb5-dbg
>
> libstdc++6-14-dbg
>
> libx32stdc++6-14-dbg
>
> postgresql-18-dbgsym
>

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

08 октября, 02:03:40

Hi,

On 2025-10-08 00:13:34 +0200, Marco Boeringa wrote:
> This looks much better, doesn't it?

It indeed does!

> I hope this helps. Let me know if you need anything else.

> *** sudo perf -p <PID of one stuck postgres backend> -g -d 10 ***
> *** sudo perf report -g ***

Could you show perf report --no-children? That would show us which individual
functions, rather than call-stacks, take the longest.


> Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10008250000
>   Children      Self  Command   Shared Object      Symbol
> +  100,00%     0,00%  postgres  postgres           [.] _start
> +  100,00%     0,00%  postgres  libc.so.6          [.]
> __libc_start_main@@GLIBC_2.34
> +  100,00%     0,00%  postgres  libc.so.6          [.]
> __libc_start_call_main
> +  100,00%     0,00%  postgres  postgres           [.] main
> +  100,00%     0,00%  postgres  postgres           [.] PostmasterMain
> +  100,00%     0,00%  postgres  postgres           [.] ServerLoop.isra.0
> +  100,00%     0,00%  postgres  postgres           [.]
> postmaster_child_launch
> +  100,00%     0,00%  postgres  postgres           [.] 0x00005f3570fb9dbf
> +  100,00%     0,00%  postgres  postgres           [.] PostgresMain
> +  100,00%     0,00%  postgres  postgres           [.] exec_simple_query
> +  100,00%     0,63%  postgres  postgres           [.] ExecNestLoop
> +  100,00%     0,00%  postgres  postgres           [.] PortalRun
> +  100,00%     0,00%  postgres  postgres           [.] PortalRunMulti
> +  100,00%     0,00%  postgres  postgres           [.] ProcessQuery
> +  100,00%     0,00%  postgres  postgres           [.] standard_ExecutorRun
> +  100,00%     0,00%  postgres  postgres           [.] ExecModifyTable
> +   94,63%     1,47%  postgres  postgres           [.] ExecScan
> +   78,76%     1,49%  postgres  postgres           [.] IndexNext
> +   66,89%     1,96%  postgres  postgres           [.] index_fetch_heap
> +   64,35%     3,61%  postgres  postgres           [.]
> heapam_index_fetch_tuple.lto_priv.0

So somehow >60% of the CPU time is spent fetching tuples corresponding to
index entries. That seems ... a lot.  Is it possible that you have a lot of
dead rows in the involved tables?

I don't immediately see how this could be related to AIO.

Greetings,

Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

08 октября, 02:17:06

Hi,

On 2025-10-07 19:03:40 -0400, Andres Freund wrote:
> On 2025-10-08 00:13:34 +0200, Marco Boeringa wrote:
> So somehow >60% of the CPU time is spent fetching tuples corresponding to
> index entries. That seems ... a lot.  Is it possible that you have a lot of
> dead rows in the involved tables?
> 
> I don't immediately see how this could be related to AIO.

Can you share the query and explain for it that was running in the "stuck"
backend?

Greetings,

Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

08 октября, 10:49:04

Hi Andres,

> Could you show perf report --no-children? That would show us which individual
functions, rather than call-stacks, take the longest.

See entirely below!

> I don't immediately see how this could be related to AIO.

Yes, you could be right this is not related to AIO at all, but another issue introduced in PG18. The only reason I
initiallythought of AIO was of course that it is one of the main new features of PG18, and I could imagine "workers"
gettinginto some sort of inter-worker locking issues, just like threads can.
 

For sure, I did not see this issue in <= PG17, so some change in PG18 is causing it. Additionally, there is a small
chanceit might be related to PostGIS, as that was upgraded as well (3.5.2 --> 3.6.0) during the PG upgrade, as PG18
requiresPostGIS 3.6.0 minimum. And the query does use PostGIS functions, but none that AFAIK rely on e.g. a spatial
index.Functions like ST_Area just process an individual geometry, not the spatial relationship between multiple
geometries.

As I wrote before, this is a multi-threaded Python application (actually developed in a GIS), that uses Python's
'concurrent.futures'threading framework to create jobs of records to process for each thread, significantly speeding up
theprocessing. The queries are in fact dynamically build by the code, and part of a much larger geoprocessing workflow,
soit is hard to run them separately and provide a query plan (although in this case I could by rewriting part of the
querybelow).
 

However, I want to stress that any query plan is unlikely to yield anything. In normal circumstances and in PG17 and
below,this code runs fine! And it is only 1 in maybe 4 runs in PG18 that goes berserk and makes a processing step that
shouldtakes < 10 seconds, all of a sudden take > 2 hours.
 

UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1 SET area_geo
       = t2.area_geo, perim_geo = t2.perim_geo, compact_geo = CASE WHEN
       t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) /
       (4 * pi())) ELSE 0 END, npoints_geo = t2.npoints_geo,
       comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN
       t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) /
       (4 * pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END,
       convex_ratio_geo = CASE WHEN
       ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN
       (t2.area_geo / ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1
       END FROM (SELECT objectid,ST_Area(way::geography,true) AS
       area_geo,ST_Perimeter(way::geography,true) AS
       perim_geo,ST_NPoints(way) AS npoints_geo FROM
       osm.landcover_scrubs_small_scale_2_ply)  AS t2 WHERE (t2.objectid
       = t1.objectid) AND t1.objectid IN (SELECT t3.objectid FROM
       mini_test.osm.osm_tmp_28128_ch5 AS t3)

> So somehow >60% of the CPU time is spent fetching tuples corresponding to
index entries. That seems ... a lot.  Is it possible that you have a lot of
dead rows in the involved tables?

Yes, that is perfectly possible. However, the particular table is only 
just over 100k records. It is true that my code is designed to process 
literally *every* record in a table. However, I specifically set 
adjusted table storage parameters with much more aggressive vacuum 
settings (essentially forcing always vacuum after something like 10k 
dead tuples irrespective of the size of the table). This has worked 
really well, and I have successfully UPDATEd all of Facebook Daylight 
size > 1B records tables with the same code, without ever running into 
this particular issue, nor transaction ID wraparound issues.

One particular thing to note as well is that, due to careful design of 
the jobs taking page locality into account, deliberately setting a low 
table fill factor, and plenty of RAM, quite a few but not all of the 
steps in the geoprocessing workflow, manage to run almost completely as 
PostgreSQL 'HOT' updates, so for all records in the table (even for very 
large ones).

*** sudo perf -p <PID of one stuck postgres backend> -g -d 10 ***
*** sudo perf report --no-children ***

Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10008250000
   Overhead  Command   Shared Object      Symbol
+   12,81%  postgres  postgres           [.] LWLockAttemptLock
+   10,70%  postgres  postgres           [.] heap_hot_search_buffer
+    9,27%  postgres  postgres           [.] 
tts_heap_getsomeattrs.lto_priv.0
+    6,77%  postgres  postgres           [.] LWLockReleaseInternal
+    5,89%  postgres  postgres           [.] hash_search_with_hash_value
+    5,79%  postgres  postgres           [.] ExecInterpExpr
+    4,14%  postgres  postgres           [.] PinBuffer
+    3,61%  postgres  postgres           [.] 
heapam_index_fetch_tuple.lto_priv.0
+    3,00%  postgres  postgres           [.] ReleaseAndReadBuffer
+    2,43%  postgres  postgres           [.] LWLockAcquire
+    2,38%  postgres  postgres           [.] HeapTupleSatisfiesVisibility
+    2,36%  postgres  postgres           [.] index_getnext_tid
+    1,96%  postgres  postgres           [.] index_fetch_heap
+    1,90%  postgres  postgres           [.] _bt_readpage
+    1,85%  postgres  postgres           [.] LWLockRelease
+    1,74%  postgres  postgres           [.] heap_page_prune_opt
+    1,49%  postgres  postgres           [.] IndexNext
+    1,47%  postgres  postgres           [.] _bt_checkkeys
+    1,47%  postgres  postgres           [.] ExecScan
+    1,47%  postgres  postgres           [.] XidInMVCCSnapshot
+    1,19%  postgres  postgres           [.] StartReadBuffer
+    1,18%  postgres  postgres           [.] 
GetPrivateRefCountEntry.lto_priv.0
+    1,12%  postgres  postgres           [.] _bt_next
+    1,10%  postgres  postgres           [.] UnpinBufferNoOwner.lto_priv.0
+    1,09%  postgres  postgres           [.] MemoryContextReset
+    1,08%  postgres  postgres           [.] btgettuple
+    0,91%  postgres  postgres           [.] _bt_check_compare.lto_priv.0
+    0,82%  postgres  postgres           [.] ExecEvalSysVar
+    0,82%  postgres  postgres           [.] ExecStoreBufferHeapTuple
+    0,71%  postgres  postgres           [.] hash_bytes
+    0,65%  postgres  postgres           [.] tts_virtual_clear.lto_priv.0
+    0,63%  postgres  postgres           [.] GlobalVisTestFor
+    0,63%  postgres  postgres           [.] ExecNestLoop
+    0,55%  postgres  postgres           [.] HeapTupleIsSurelyDead
+    0,52%  postgres  postgres           [.] ResourceOwnerForget
      0,48%  postgres  postgres           [.] slot_getsomeattrs_int
      0,42%  postgres  postgres           [.] PredicateLockTID
      0,40%  postgres  postgres           [.] ReadBufferExtended
      0,37%  postgres  postgres           [.] _bt_saveitem
      0,31%  postgres  postgres           [.] ExecIndexScan
      0,30%  postgres  libc.so.6          [.] __memcmp_sse2
      0,28%  postgres  postgres           [.] _bt_setuppostingitems
      0,26%  postgres  postgres           [.] ReleaseBuffer
      0,23%  postgres  postgres           [.] ResourceOwnerEnlarge
      0,23%  postgres  postgres           [.] 
HeapCheckForSerializableConflictOut
      0,22%  postgres  postgres           [.] IncrBufferRefCount
      0,18%  postgres  postgres           [.] pgstat_count_io_op

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

08 октября, 13:04:20

Could you show perf report --no-children? That would show us which individual
functions, rather than call-stacks, take the longest.

Andres, I now captured a few more 'perf' sessions.

Run 1 is the original run I already showed you. For comparison, runs 2-4 are from a different backend captured during a new geoprocessing run, but all refering to same PID. Run 5 is also from the same geoprocessing run, but another backend PID, so another python thread launched from my code.

Not much difference between all of these, especially the ordinary "perf report", the "perf report --no children" show a little more variation. "perf report --no children" runs are grouped together at the bottom of the email.

Despite the minor differences, maybe seeing different captures of perf is still of some use.

Marco

*** sudo perf -p <PID of one stuck postgres backend> -g -d 10 ***
*** sudo perf report -g ***

RUN 1:
Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10008250000
Children Self Command Shared Object Symbol
+ 100,00% 0,00% postgres postgres [.] _start
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_main@@GLIBC_2.34
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_call_main
+ 100,00% 0,00% postgres postgres [.] main
+ 100,00% 0,00% postgres postgres [.] PostmasterMain
+ 100,00% 0,00% postgres postgres [.] ServerLoop.isra.0
+ 100,00% 0,00% postgres postgres [.] postmaster_child_launch
+ 100,00% 0,00% postgres postgres [.] 0x00005f3570fb9dbf
+ 100,00% 0,00% postgres postgres [.] PostgresMain
+ 100,00% 0,00% postgres postgres [.] exec_simple_query
+ 100,00% 0,63% postgres postgres [.] ExecNestLoop
+ 100,00% 0,00% postgres postgres [.] PortalRun
+ 100,00% 0,00% postgres postgres [.] PortalRunMulti
+ 100,00% 0,00% postgres postgres [.] ProcessQuery
+ 100,00% 0,00% postgres postgres [.] standard_ExecutorRun
+ 100,00% 0,00% postgres postgres [.] ExecModifyTable
+ 94,63% 1,47% postgres postgres [.] ExecScan
+ 78,76% 1,49% postgres postgres [.] IndexNext
+ 66,89% 1,96% postgres postgres [.] index_fetch_heap
+ 64,35% 3,61% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 21,92% 0,40% postgres postgres [.] ReadBufferExtended
+ 20,36% 1,19% postgres postgres [.] StartReadBuffer
+ 16,23% 5,79% postgres postgres [.] ExecInterpExpr
+ 15,00% 10,70% postgres postgres [.] heap_hot_search_buffer
+ 12,81% 12,81% postgres postgres [.] LWLockAttemptLock
+ 10,31% 2,36% postgres postgres [.] index_getnext_tid
+ 9,69% 0,48% postgres postgres [.] slot_getsomeattrs_int
+ 9,27% 9,27% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 8,45% 1,85% postgres postgres [.] LWLockRelease
+ 7,91% 1,08% postgres postgres [.] btgettuple
+ 6,89% 1,12% postgres postgres [.] _bt_next
+ 6,77% 6,77% postgres postgres [.] LWLockReleaseInternal
+ 5,89% 5,89% postgres postgres [.] hash_search_with_hash_value
+ 5,78% 0,10% postgres postgres [.] _bt_readnextpage
+ 4,48% 4,14% postgres postgres [.] PinBuffer
+ 4,41% 1,90% postgres postgres [.] _bt_readpage
+ 4,09% 3,00% postgres postgres [.] ReleaseAndReadBuffer
+ 2,99% 2,38% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 2,58% 0,82% postgres postgres [.] ExecStoreBufferHeapTuple
+ 2,43% 2,43% postgres postgres [.] LWLockAcquire
+ 2,28% 1,47% postgres postgres [.] _bt_checkkeys
+ 1,74% 1,74% postgres postgres [.] heap_page_prune_opt
+ 1,67% 1,10% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 1,47% 1,47% postgres postgres [.] XidInMVCCSnapshot
+ 1,18% 1,18% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
+ 1,09% 1,09% postgres postgres [.] MemoryContextReset

RUN 2:
amples: 40K of event 'task-clock:ppp', Event count (approx.): 10020750000
Children Self Command Shared Object Symbol
+ 100,00% 0,00% postgres postgres [.] _start
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_main@@GLIBC_2.34
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_call_main
+ 100,00% 0,00% postgres postgres [.] main
+ 100,00% 0,00% postgres postgres [.] PostmasterMain
+ 100,00% 0,00% postgres postgres [.] ServerLoop.isra.0
+ 100,00% 0,00% postgres postgres [.] postmaster_child_launch
+ 100,00% 0,00% postgres postgres [.] 0x00005f3570fb9dbf
+ 100,00% 0,00% postgres postgres [.] PostgresMain
+ 100,00% 0,00% postgres postgres [.] exec_simple_query
+ 100,00% 0,34% postgres postgres [.] ExecNestLoop
+ 100,00% 0,00% postgres postgres [.] PortalRun
+ 100,00% 0,00% postgres postgres [.] PortalRunMulti
+ 100,00% 0,00% postgres postgres [.] ProcessQuery
+ 100,00% 0,00% postgres postgres [.] standard_ExecutorRun
+ 100,00% 0,00% postgres postgres [.] ExecModifyTable
+ 97,34% 1,13% postgres postgres [.] ExecScan
+ 88,55% 0,73% postgres postgres [.] IndexNext
+ 82,41% 1,05% postgres postgres [.] index_fetch_heap
+ 81,10% 1,78% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 25,70% 25,70% postgres postgres [.] LWLockAttemptLock
+ 16,67% 0,86% postgres postgres [.] LWLockRelease
+ 15,91% 15,91% postgres postgres [.] LWLockReleaseInternal
+ 15,27% 0,22% postgres postgres [.] ReadBufferExtended
+ 14,41% 1,73% postgres postgres [.] StartReadBuffer
+ 14,11% 13,01% postgres postgres [.] ReleaseAndReadBuffer
+ 8,60% 3,08% postgres postgres [.] ExecInterpExpr
+ 6,27% 3,90% postgres postgres [.] heap_hot_search_buffer
+ 5,36% 1,11% postgres postgres [.] index_getnext_tid
+ 5,20% 0,20% postgres postgres [.] slot_getsomeattrs_int
+ 5,05% 5,05% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 4,74% 4,59% postgres postgres [.] PinBuffer
+ 4,21% 0,54% postgres postgres [.] btgettuple
+ 3,75% 0,72% postgres postgres [.] _bt_next
+ 3,27% 3,27% postgres postgres [.] hash_search_with_hash_value
+ 3,04% 0,03% postgres postgres [.] _bt_readnextpage
+ 3,00% 0,44% postgres postgres [.] ExecStoreBufferHeapTuple
+ 2,43% 2,16% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 2,24% 1,01% postgres postgres [.] _bt_readpage
+ 1,62% 1,25% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 1,51% 1,51% postgres postgres [.] LWLockAcquire
+ 1,18% 1,18% postgres postgres [.] heap_page_prune_opt
+ 1,07% 0,72% postgres postgres [.] _bt_checkkeys
+ 0,85% 0,85% postgres postgres [.] XidInMVCCSnapshot
+ 0,82% 0,74% postgres postgres [.] ReleaseBuffer
+ 0,56% 0,56% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
0,41% 0,41% postgres postgres [.] _bt_check_compare.lto_priv.0

RUN 3:
Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10010750000
Children Self Command Shared Object Symbol
+ 100,00% 0,00% postgres postgres [.] _start
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_main@@GLIBC_2.34
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_call_main
+ 100,00% 0,00% postgres postgres [.] main
+ 100,00% 0,00% postgres postgres [.] PostmasterMain
+ 100,00% 0,00% postgres postgres [.] ServerLoop.isra.0
+ 100,00% 0,00% postgres postgres [.] postmaster_child_launch
+ 100,00% 0,00% postgres postgres [.] 0x00005f3570fb9dbf
+ 100,00% 0,00% postgres postgres [.] PostgresMain
+ 100,00% 0,00% postgres postgres [.] exec_simple_query
+ 100,00% 0,34% postgres postgres [.] ExecNestLoop
+ 100,00% 0,00% postgres postgres [.] PortalRun
+ 100,00% 0,00% postgres postgres [.] PortalRunMulti
+ 100,00% 0,00% postgres postgres [.] ProcessQuery
+ 100,00% 0,00% postgres postgres [.] standard_ExecutorRun
+ 100,00% 0,00% postgres postgres [.] ExecModifyTable
+ 97,43% 1,10% postgres postgres [.] ExecScan
+ 88,96% 0,79% postgres postgres [.] IndexNext
+ 82,68% 1,03% postgres postgres [.] index_fetch_heap
+ 81,42% 1,89% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 25,54% 25,54% postgres postgres [.] LWLockAttemptLock
+ 17,38% 1,01% postgres postgres [.] LWLockRelease
+ 16,44% 16,44% postgres postgres [.] LWLockReleaseInternal
+ 15,39% 0,25% postgres postgres [.] ReadBufferExtended
+ 14,49% 1,69% postgres postgres [.] StartReadBuffer
+ 13,77% 12,72% postgres postgres [.] ReleaseAndReadBuffer
+ 8,28% 2,95% postgres postgres [.] ExecInterpExpr
+ 6,05% 3,75% postgres postgres [.] heap_hot_search_buffer
+ 5,39% 1,09% postgres postgres [.] index_getnext_tid
+ 5,05% 0,21% postgres postgres [.] slot_getsomeattrs_int
+ 4,88% 4,88% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 4,87% 4,69% postgres postgres [.] PinBuffer
+ 4,29% 0,61% postgres postgres [.] btgettuple
+ 3,74% 0,74% postgres postgres [.] _bt_next
+ 3,47% 3,47% postgres postgres [.] hash_search_with_hash_value
+ 3,00% 0,05% postgres postgres [.] _bt_readnextpage
+ 2,99% 0,50% postgres postgres [.] ExecStoreBufferHeapTuple
+ 2,27% 2,01% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 2,19% 0,99% postgres postgres [.] _bt_readpage
+ 1,52% 1,14% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 1,50% 1,50% postgres postgres [.] LWLockAcquire
+ 1,12% 1,12% postgres postgres [.] heap_page_prune_opt
+ 1,06% 0,72% postgres postgres [.] _bt_checkkeys
+ 0,87% 0,87% postgres postgres [.] XidInMVCCSnapshot
+ 0,87% 0,76% postgres postgres [.] ReleaseBuffer
+ 0,59% 0,59% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
0,41% 0,41% postgres postgres [.] _bt_check_compare.lto_priv.0

RUN 4:
Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10010500000
Children Self Command Shared Object Symbol
+ 100,00% 0,00% postgres postgres [.] _start
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_main@@GLIBC_2.34
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_call_main
+ 100,00% 0,00% postgres postgres [.] main
+ 100,00% 0,00% postgres postgres [.] PostmasterMain
+ 100,00% 0,00% postgres postgres [.] ServerLoop.isra.0
+ 100,00% 0,00% postgres postgres [.] postmaster_child_launch
+ 100,00% 0,00% postgres postgres [.] 0x00005f3570fb9dbf
+ 100,00% 0,00% postgres postgres [.] PostgresMain
+ 100,00% 0,00% postgres postgres [.] exec_simple_query
+ 100,00% 0,32% postgres postgres [.] ExecNestLoop
+ 100,00% 0,00% postgres postgres [.] PortalRun
+ 100,00% 0,00% postgres postgres [.] PortalRunMulti
+ 100,00% 0,00% postgres postgres [.] ProcessQuery
+ 100,00% 0,00% postgres postgres [.] standard_ExecutorRun
+ 100,00% 0,00% postgres postgres [.] ExecModifyTable
+ 97,30% 1,16% postgres postgres [.] ExecScan
+ 88,81% 0,73% postgres postgres [.] IndexNext
+ 82,49% 0,99% postgres postgres [.] index_fetch_heap
+ 81,18% 2,01% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 25,65% 25,65% postgres postgres [.] LWLockAttemptLock
+ 16,65% 1,01% postgres postgres [.] LWLockRelease
+ 15,79% 0,26% postgres postgres [.] ReadBufferExtended
+ 15,76% 15,76% postgres postgres [.] LWLockReleaseInternal
+ 15,00% 1,65% postgres postgres [.] StartReadBuffer
+ 13,88% 12,75% postgres postgres [.] ReleaseAndReadBuffer
+ 8,41% 3,03% postgres postgres [.] ExecInterpExpr
+ 6,13% 3,82% postgres postgres [.] heap_hot_search_buffer
+ 5,46% 1,04% postgres postgres [.] index_getnext_tid
+ 5,02% 0,23% postgres postgres [.] slot_getsomeattrs_int
+ 4,94% 4,78% postgres postgres [.] PinBuffer
+ 4,83% 4,83% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 4,40% 0,58% postgres postgres [.] btgettuple
+ 3,87% 0,79% postgres postgres [.] _bt_next
+ 3,76% 3,76% postgres postgres [.] hash_search_with_hash_value
+ 3,09% 0,05% postgres postgres [.] _bt_readnextpage
+ 2,88% 0,46% postgres postgres [.] ExecStoreBufferHeapTuple
+ 2,44% 2,11% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 2,26% 1,01% postgres postgres [.] _bt_readpage
+ 1,53% 1,16% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 1,45% 1,45% postgres postgres [.] LWLockAcquire
+ 1,17% 1,17% postgres postgres [.] heap_page_prune_opt
+ 1,09% 0,72% postgres postgres [.] _bt_checkkeys
+ 0,82% 0,82% postgres postgres [.] XidInMVCCSnapshot
+ 0,80% 0,71% postgres postgres [.] ReleaseBuffer
+ 0,63% 0,63% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
0,44% 0,44% postgres postgres [.] _bt_check_compare.lto_priv.0

RUN 5:
Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10003250000
Children Self Command Shared Object Symbol
+ 100,00% 0,00% postgres postgres [.] _start
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_main@@GLIBC_2.34
+ 100,00% 0,00% postgres libc.so.6 [.] __libc_start_call_main
+ 100,00% 0,00% postgres postgres [.] main
+ 100,00% 0,00% postgres postgres [.] PostmasterMain
+ 100,00% 0,00% postgres postgres [.] ServerLoop.isra.0
+ 100,00% 0,00% postgres postgres [.] postmaster_child_launch
+ 100,00% 0,00% postgres postgres [.] 0x00005f3570fb9dbf
+ 100,00% 0,00% postgres postgres [.] PostgresMain
+ 100,00% 0,00% postgres postgres [.] exec_simple_query
+ 100,00% 0,00% postgres postgres [.] PortalRun
+ 100,00% 0,00% postgres postgres [.] PortalRunMulti
+ 100,00% 0,00% postgres postgres [.] ProcessQuery
+ 100,00% 0,00% postgres postgres [.] standard_ExecutorRun
+ 100,00% 0,00% postgres postgres [.] ExecModifyTable
+ 100,00% 0,35% postgres postgres [.] ExecNestLoop
+ 97,20% 0,90% postgres postgres [.] ExecScan
+ 89,12% 0,85% postgres postgres [.] IndexNext
+ 82,86% 1,00% postgres postgres [.] index_fetch_heap
+ 81,60% 1,92% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 26,49% 26,48% postgres postgres [.] LWLockAttemptLock
+ 16,53% 0,81% postgres postgres [.] LWLockRelease
+ 15,81% 15,81% postgres postgres [.] LWLockReleaseInternal
+ 15,57% 0,22% postgres postgres [.] ReadBufferExtended
+ 14,68% 1,88% postgres postgres [.] StartReadBuffer
+ 14,10% 12,88% postgres postgres [.] ReleaseAndReadBuffer
+ 8,27% 2,97% postgres postgres [.] ExecInterpExpr
+ 5,87% 3,67% postgres postgres [.] heap_hot_search_buffer
+ 5,38% 0,92% postgres postgres [.] index_getnext_tid
+ 5,01% 0,21% postgres postgres [.] slot_getsomeattrs_int
+ 4,85% 4,84% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 4,81% 4,64% postgres postgres [.] PinBuffer
+ 4,41% 0,53% postgres postgres [.] btgettuple
+ 3,96% 0,92% postgres postgres [.] _bt_next
+ 3,47% 3,47% postgres postgres [.] hash_search_with_hash_value
+ 3,03% 0,03% postgres postgres [.] _bt_readnextpage
+ 2,77% 0,40% postgres postgres [.] ExecStoreBufferHeapTuple
+ 2,41% 2,11% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 2,28% 0,97% postgres postgres [.] _bt_readpage
+ 1,46% 1,09% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 1,41% 1,41% postgres postgres [.] LWLockAcquire
+ 1,14% 0,71% postgres postgres [.] _bt_checkkeys
+ 1,12% 1,12% postgres postgres [.] heap_page_prune_opt
+ 0,83% 0,83% postgres postgres [.] XidInMVCCSnapshot
+ 0,81% 0,69% postgres postgres [.] ReleaseBuffer
+ 0,57% 0,57% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
0,48% 0,48% postgres postgres [.] _bt_check_compare.lto_priv.0

*** sudo perf -p <PID of one stuck postgres backend> -g -d 10 ***
*** sudo perf report --no-children ***

RUN 1 - NO CHILDREN:
Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10008250000
Overhead Command Shared Object Symbol
+ 12,81% postgres postgres [.] LWLockAttemptLock
+ 10,70% postgres postgres [.] heap_hot_search_buffer
+ 9,27% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 6,77% postgres postgres [.] LWLockReleaseInternal
+ 5,89% postgres postgres [.] hash_search_with_hash_value
+ 5,79% postgres postgres [.] ExecInterpExpr
+ 4,14% postgres postgres [.] PinBuffer
+ 3,61% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 3,00% postgres postgres [.] ReleaseAndReadBuffer
+ 2,43% postgres postgres [.] LWLockAcquire
+ 2,38% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 2,36% postgres postgres [.] index_getnext_tid
+ 1,96% postgres postgres [.] index_fetch_heap
+ 1,90% postgres postgres [.] _bt_readpage
+ 1,85% postgres postgres [.] LWLockRelease
+ 1,74% postgres postgres [.] heap_page_prune_opt
+ 1,49% postgres postgres [.] IndexNext
+ 1,47% postgres postgres [.] _bt_checkkeys
+ 1,47% postgres postgres [.] ExecScan
+ 1,47% postgres postgres [.] XidInMVCCSnapshot
+ 1,19% postgres postgres [.] StartReadBuffer
+ 1,18% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
+ 1,12% postgres postgres [.] _bt_next
+ 1,10% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 1,09% postgres postgres [.] MemoryContextReset
+ 1,08% postgres postgres [.] btgettuple
+ 0,91% postgres postgres [.] _bt_check_compare.lto_priv.0
+ 0,82% postgres postgres [.] ExecEvalSysVar
+ 0,82% postgres postgres [.] ExecStoreBufferHeapTuple
+ 0,71% postgres postgres [.] hash_bytes
+ 0,65% postgres postgres [.] tts_virtual_clear.lto_priv.0
+ 0,63% postgres postgres [.] GlobalVisTestFor
+ 0,63% postgres postgres [.] ExecNestLoop
+ 0,55% postgres postgres [.] HeapTupleIsSurelyDead
+ 0,52% postgres postgres [.] ResourceOwnerForget
0,48% postgres postgres [.] slot_getsomeattrs_int
0,42% postgres postgres [.] PredicateLockTID
0,40% postgres postgres [.] ReadBufferExtended
0,37% postgres postgres [.] _bt_saveitem
0,31% postgres postgres [.] ExecIndexScan
0,30% postgres libc.so.6 [.] __memcmp_sse2
0,28% postgres postgres [.] _bt_setuppostingitems
0,26% postgres postgres [.] ReleaseBuffer
0,23% postgres postgres [.] ResourceOwnerEnlarge
0,23% postgres postgres [.] HeapCheckForSerializableConflictOut
0,22% postgres postgres [.] IncrBufferRefCount
0,18% postgres postgres [.] pgstat_count_io_op

RUN 2 - NO CHILDREN:
Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10020750000
Overhead Command Shared Object Symbol
+ 25,70% postgres postgres [.] LWLockAttemptLock
+ 15,91% postgres postgres [.] LWLockReleaseInternal
+ 13,01% postgres postgres [.] ReleaseAndReadBuffer
+ 5,05% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 4,59% postgres postgres [.] PinBuffer
+ 3,90% postgres postgres [.] heap_hot_search_buffer
+ 3,27% postgres postgres [.] hash_search_with_hash_value
+ 3,08% postgres postgres [.] ExecInterpExpr
+ 2,16% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 1,78% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 1,73% postgres postgres [.] StartReadBuffer
+ 1,51% postgres postgres [.] LWLockAcquire
+ 1,25% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 1,18% postgres postgres [.] heap_page_prune_opt
+ 1,13% postgres postgres [.] ExecScan
+ 1,11% postgres postgres [.] index_getnext_tid
+ 1,05% postgres postgres [.] index_fetch_heap
+ 1,01% postgres postgres [.] _bt_readpage
+ 0,86% postgres postgres [.] LWLockRelease
+ 0,85% postgres postgres [.] XidInMVCCSnapshot
+ 0,74% postgres postgres [.] ReleaseBuffer
+ 0,73% postgres postgres [.] IndexNext
+ 0,72% postgres postgres [.] _bt_checkkeys
+ 0,72% postgres postgres [.] _bt_next
+ 0,56% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
+ 0,54% postgres postgres [.] btgettuple
0,44% postgres postgres [.] ExecStoreBufferHeapTuple
0,41% postgres postgres [.] _bt_check_compare.lto_priv.0
0,41% postgres postgres [.] hash_bytes
0,39% postgres postgres [.] MemoryContextReset
0,34% postgres postgres [.] ExecEvalSysVar
0,34% postgres postgres [.] ExecNestLoop
0,32% postgres postgres [.] tts_virtual_clear.lto_priv.0
0,32% postgres postgres [.] ResourceOwnerForget
0,30% postgres postgres [.] GlobalVisTestFor
0,30% postgres postgres [.] HeapTupleIsSurelyDead
0,26% postgres postgres [.] PredicateLockTID
0,22% postgres postgres [.] ReadBufferExtended
0,20% postgres postgres [.] _bt_setuppostingitems
0,20% postgres postgres [.] ExecIndexScan
0,20% postgres postgres [.] slot_getsomeattrs_int
0,17% postgres libc.so.6 [.] __memcmp_sse2
0,16% postgres postgres [.] _bt_saveitem
0,13% postgres postgres [.] ResourceOwnerEnlarge
0,12% postgres postgres [.] HeapCheckForSerializableConflictOut
0,10% postgres postgres [.] pgstat_count_io_op
0,08% postgres postgres [.] IncrBufferRefCount

RUN 3 - NO CHILDREN:
Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10010750000
Overhead Command Shared Object Symbol
+ 25,54% postgres postgres [.] LWLockAttemptLock
+ 16,44% postgres postgres [.] LWLockReleaseInternal
+ 12,72% postgres postgres [.] ReleaseAndReadBuffer
+ 4,88% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 4,69% postgres postgres [.] PinBuffer
+ 3,75% postgres postgres [.] heap_hot_search_buffer
+ 3,47% postgres postgres [.] hash_search_with_hash_value
+ 2,95% postgres postgres [.] ExecInterpExpr
+ 2,01% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 1,89% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 1,69% postgres postgres [.] StartReadBuffer
+ 1,50% postgres postgres [.] LWLockAcquire
+ 1,14% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 1,12% postgres postgres [.] heap_page_prune_opt
+ 1,10% postgres postgres [.] ExecScan
+ 1,09% postgres postgres [.] index_getnext_tid
+ 1,03% postgres postgres [.] index_fetch_heap
+ 1,01% postgres postgres [.] LWLockRelease
+ 0,99% postgres postgres [.] _bt_readpage
+ 0,87% postgres postgres [.] XidInMVCCSnapshot
+ 0,79% postgres postgres [.] IndexNext
+ 0,76% postgres postgres [.] ReleaseBuffer
+ 0,74% postgres postgres [.] _bt_next
+ 0,72% postgres postgres [.] _bt_checkkeys
+ 0,61% postgres postgres [.] btgettuple
+ 0,59% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
0,50% postgres postgres [.] ExecStoreBufferHeapTuple
0,41% postgres postgres [.] _bt_check_compare.lto_priv.0
0,41% postgres postgres [.] hash_bytes
0,38% postgres postgres [.] MemoryContextReset
0,34% postgres postgres [.] ExecEvalSysVar
0,34% postgres postgres [.] HeapTupleIsSurelyDead
0,34% postgres postgres [.] ExecNestLoop
0,32% postgres postgres [.] ResourceOwnerForget
0,31% postgres postgres [.] GlobalVisTestFor
0,31% postgres postgres [.] tts_virtual_clear.lto_priv.0
0,25% postgres postgres [.] ReadBufferExtended
0,22% postgres postgres [.] PredicateLockTID
0,21% postgres postgres [.] slot_getsomeattrs_int
0,17% postgres postgres [.] ExecIndexScan
0,17% postgres postgres [.] _bt_saveitem
0,16% postgres postgres [.] _bt_setuppostingitems
0,14% postgres postgres [.] HeapCheckForSerializableConflictOut
0,13% postgres libc.so.6 [.] __memcmp_sse2
0,12% postgres postgres [.] ResourceOwnerEnlarge
0,11% postgres postgres [.] IncrBufferRefCount
0,10% postgres postgres [.] pgstat_count_io_op

RUN 4 - NO CHILDREN:
Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10010500000
Overhead Command Shared Object Symbol
+ 25,65% postgres postgres [.] LWLockAttemptLock
+ 15,76% postgres postgres [.] LWLockReleaseInternal
+ 12,75% postgres postgres [.] ReleaseAndReadBuffer
+ 4,83% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 4,78% postgres postgres [.] PinBuffer
+ 3,82% postgres postgres [.] heap_hot_search_buffer
+ 3,76% postgres postgres [.] hash_search_with_hash_value
+ 3,03% postgres postgres [.] ExecInterpExpr
+ 2,11% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 2,01% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 1,65% postgres postgres [.] StartReadBuffer
+ 1,45% postgres postgres [.] LWLockAcquire
+ 1,17% postgres postgres [.] heap_page_prune_opt
+ 1,16% postgres postgres [.] ExecScan
+ 1,16% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 1,04% postgres postgres [.] index_getnext_tid
+ 1,01% postgres postgres [.] LWLockRelease
+ 1,01% postgres postgres [.] _bt_readpage
+ 0,99% postgres postgres [.] index_fetch_heap
+ 0,82% postgres postgres [.] XidInMVCCSnapshot
+ 0,79% postgres postgres [.] _bt_next
+ 0,73% postgres postgres [.] IndexNext
+ 0,72% postgres postgres [.] _bt_checkkeys
+ 0,71% postgres postgres [.] ReleaseBuffer
+ 0,63% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
+ 0,58% postgres postgres [.] btgettuple
0,46% postgres postgres [.] ExecStoreBufferHeapTuple
0,44% postgres postgres [.] _bt_check_compare.lto_priv.0
0,37% postgres postgres [.] ExecEvalSysVar
0,36% postgres postgres [.] MemoryContextReset
0,34% postgres postgres [.] tts_virtual_clear.lto_priv.0
0,32% postgres postgres [.] ExecNestLoop
0,31% postgres postgres [.] GlobalVisTestFor
0,30% postgres postgres [.] HeapTupleIsSurelyDead
0,30% postgres postgres [.] hash_bytes
0,27% postgres postgres [.] PredicateLockTID
0,26% postgres postgres [.] ResourceOwnerForget
0,26% postgres postgres [.] ReadBufferExtended
0,23% postgres postgres [.] slot_getsomeattrs_int
0,17% postgres postgres [.] _bt_setuppostingitems
0,17% postgres libc.so.6 [.] __memcmp_sse2
0,15% postgres postgres [.] ExecIndexScan
0,14% postgres postgres [.] _bt_saveitem
0,12% postgres postgres [.] int8eq
0,12% postgres postgres [.] ReservePrivateRefCountEntry.lto_priv.0
0,12% postgres postgres [.] ResourceOwnerEnlarge
0,11% postgres postgres [.] HeapCheckForSerializableConflictOut

RUN 5 - NO CHILDREN:
Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10003250000
Overhead Command Shared Object Symbol
+ 26,48% postgres postgres [.] LWLockAttemptLock
+ 15,81% postgres postgres [.] LWLockReleaseInternal
+ 12,88% postgres postgres [.] ReleaseAndReadBuffer
+ 4,84% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0
+ 4,64% postgres postgres [.] PinBuffer
+ 3,67% postgres postgres [.] heap_hot_search_buffer
+ 3,47% postgres postgres [.] hash_search_with_hash_value
+ 2,97% postgres postgres [.] ExecInterpExpr
+ 2,11% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0
+ 1,92% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0
+ 1,88% postgres postgres [.] StartReadBuffer
+ 1,41% postgres postgres [.] LWLockAcquire
+ 1,12% postgres postgres [.] heap_page_prune_opt
+ 1,09% postgres postgres [.] HeapTupleSatisfiesVisibility
+ 1,00% postgres postgres [.] index_fetch_heap
+ 0,97% postgres postgres [.] _bt_readpage
+ 0,92% postgres postgres [.] _bt_next
+ 0,92% postgres postgres [.] index_getnext_tid
+ 0,90% postgres postgres [.] ExecScan
+ 0,85% postgres postgres [.] IndexNext
+ 0,83% postgres postgres [.] XidInMVCCSnapshot
+ 0,81% postgres postgres [.] LWLockRelease
+ 0,71% postgres postgres [.] _bt_checkkeys
+ 0,69% postgres postgres [.] ReleaseBuffer
+ 0,57% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0
+ 0,53% postgres postgres [.] btgettuple
0,48% postgres postgres [.] _bt_check_compare.lto_priv.0
0,46% postgres postgres [.] MemoryContextReset
0,42% postgres postgres [.] hash_bytes
0,40% postgres postgres [.] ExecStoreBufferHeapTuple
0,36% postgres postgres [.] ResourceOwnerForget
0,35% postgres postgres [.] ExecNestLoop
0,33% postgres postgres [.] ExecEvalSysVar
0,32% postgres postgres [.] HeapTupleIsSurelyDead
0,27% postgres postgres [.] GlobalVisTestFor
0,26% postgres postgres [.] PredicateLockTID
0,22% postgres postgres [.] ReadBufferExtended
0,22% postgres postgres [.] tts_virtual_clear.lto_priv.0
0,21% postgres postgres [.] slot_getsomeattrs_int
0,19% postgres postgres [.] _bt_saveitem
0,19% postgres postgres [.] _bt_setuppostingitems
0,18% postgres postgres [.] ResourceOwnerEnlarge
0,17% postgres postgres [.] ExecIndexScan
0,14% postgres libc.so.6 [.] __memcmp_sse2
0,12% postgres postgres [.] pgstat_count_io_op
0,10% postgres postgres [.] int8eq
0,10% postgres postgres [.] HeapCheckForSerializableConflictOut

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

08 октября, 15:13:28

Hi Andres,

Looking at the "perf report --no-children" list, I noticed the high 
"LWLock" related CPU activity.

Although I have already mentioned part of it, here are a few more 
observations of pgAdmin and pg_locks during the stuck situation:

- pgAdmin shows 15 active session without wait events

- pg_locks shows 228 locks taken, of which 218 are "fastpath"

- Only 10 locks are not fastpath, and they refer to the "transactionid". 
I probably misunderstand this, but I would have expected to see 15 
transactionids, one for each database session? But maybe I have this 
wrong... There are no other transactionids. There *are* also 15 
"virtualxid" locks visible, all "fastpath". All transactionid and 
virtualxid locks are "ExclusiveLock" type. The rest of the 228 locks are 
of "AccessShareLock" and "RowExclusiveLock" type.

- My 'max_locks_per_transaction', as mentioned in this 
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c4d5cb71d) 
commit related to PG18 and fastpath changes, is set to the default of 64.

- If I look in the pg_stat_io view and regularly refresh it, I hardly 
see any changes in the table, except a few records (at least one related 
to autovacuum) now and then. This is also more or less confirmed by the 
disk RAIDs for tables and indexes, that show zero activity in Windows 
Task Manager (0 KB/s writes with 0 ms "average response time" (which 
doesn't happen in normal operation of the multi-threaded code where I do 
see almost continuous activity on the RAIDs).

- It is likely the entire table being processed is cached in RAM, as 
there is plenty of it, far larger than the small Italy extract table 
being processed.

Marco

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

08 октября, 16:04:03

Hi,

On 2025-10-08 09:49:04 +0200, Marco Boeringa wrote:
> Yes, you could be right this is not related to AIO at all, but another issue
> introduced in PG18.

Right.

> For sure, I did not see this issue in <= PG17, so some change in PG18 is
> causing it. Additionally, there is a small chance it might be related to
> PostGIS, as that was upgraded as well (3.5.2 --> 3.6.0) during the PG
> upgrade, as PG18 requires PostGIS 3.6.0 minimum. And the query does use
> PostGIS functions, but none that AFAIK rely on e.g. a spatial
> index. Functions like ST_Area just process an individual geometry, not the
> spatial relationship between multiple geometries.

Can you test this with postgis 3.6 on 17?


> As I wrote before, this is a multi-threaded Python application (actually
> developed in a GIS), that uses Python's 'concurrent.futures' threading
> framework to create jobs of records to process for each thread,
> significantly speeding up the processing. The queries are in fact
> dynamically build by the code, and part of a much larger geoprocessing
> workflow, so it is hard to run them separately and provide a query plan
> (although in this case I could by rewriting part of the query below).
> 
> However, I want to stress that any query plan is unlikely to yield
> anything. In normal circumstances and in PG17 and below, this code runs
> fine! And it is only 1 in maybe 4 runs in PG18 that goes berserk and makes a
> processing step that should takes < 10 seconds, all of a sudden take > 2
> hours.

The fact that it runs without a problem in 17 means it's actually rather
meaningful to look at the query plan. It could have changed. Separately, it
might help us to narrow down what changes to look at that could potentially
be causing problems.


> > So somehow >60% of the CPU time is spent fetching tuples corresponding to
> index entries. That seems ... a lot.  Is it possible that you have a lot of
> dead rows in the involved tables?
> 
> Yes, that is perfectly possible. However, the particular table is only just
> over 100k records.

> It is true that my code is designed to process literally *every* record in a
> table. However, I specifically set adjusted table storage parameters with
> much more aggressive vacuum settings (essentially forcing always vacuum
> after something like 10k dead tuples irrespective of the size of the
> table). This has worked really well, and I have successfully UPDATEd all of
> Facebook Daylight size > 1B records tables with the same code, without ever
> running into this particular issue, nor transaction ID wraparound issues.

Making vacuum more aggressive won't really help much if you have longrunning
queries/sessions, since vacuum can't clean up row versions that are still
visibile to some of the transactions.


> *** sudo perf -p <PID of one stuck postgres backend> -g -d 10 ***
> *** sudo perf report --no-children ***
> 
> Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10008250000
>   Overhead  Command   Shared Object      Symbol
> +   12,81%  postgres  postgres           [.] LWLockAttemptLock
> +   10,70%  postgres  postgres           [.] heap_hot_search_buffer
> +    9,27%  postgres  postgres           [.]
> tts_heap_getsomeattrs.lto_priv.0
> +    6,77%  postgres  postgres           [.] LWLockReleaseInternal
> +    5,89%  postgres  postgres           [.] hash_search_with_hash_value
> +    5,79%  postgres  postgres           [.] ExecInterpExpr
> +    4,14%  postgres  postgres           [.] PinBuffer

Could you "unfold" the callstacks for the top entries? And/or attach a
  perf report --no-children > somefile
(when redirecting to a file perf will include much more detail than when using
it interactively)

Greetings,

Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

08 октября, 16:07:45

Hi,

On 2025-10-08 14:13:28 +0200, Marco Boeringa wrote:
> Looking at the "perf report --no-children" list, I noticed the high "LWLock"
> related CPU activity.

Yes, that is remarkably high.  It likely indicates that there is significant
contention on some of the pages.


> Although I have already mentioned part of it, here are a few more
> observations of pgAdmin and pg_locks during the stuck situation:
> 
> - pgAdmin shows 15 active session without wait events
> 
> - pg_locks shows 228 locks taken, of which 218 are "fastpath"
> 
> - Only 10 locks are not fastpath, and they refer to the "transactionid". I
> probably misunderstand this, but I would have expected to see 15
> transactionids, one for each database session? But maybe I have this
> wrong... There are no other transactionids. There *are* also 15 "virtualxid"
> locks visible, all "fastpath". All transactionid and virtualxid locks are
> "ExclusiveLock" type. The rest of the 228 locks are of "AccessShareLock" and
> "RowExclusiveLock" type.

lwlocks are not visible in pg_locks, they are lower-level / more granular.



> - If I look in the pg_stat_io view and regularly refresh it, I hardly see
> any changes in the table, except a few records (at least one related to
> autovacuum) now and then. This is also more or less confirmed by the disk
> RAIDs for tables and indexes, that show zero activity in Windows Task
> Manager (0 KB/s writes with 0 ms "average response time" (which doesn't
> happen in normal operation of the multi-threaded code where I do see almost
> continuous activity on the RAIDs).

Note that pg_stat_io is only updated at the end of a transaction, so just
looking at it continuously while there are longrunning statements isn't
necessarily going to tell you that much.

Greetings,

Andres Freund

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

08 октября, 20:46:42

Hi Andres,

Thanks for all the help and suggestions so far!

 > Can you test this with postgis 3.6 on 17?

If I find time... but looking through the release notes of PostGIS 
3.6.0, I doubt the issue is in PostGIS, as there do not appear to be any 
changes to the specific PostGIS functions I am using. That doesn't 
exclude it though.

 > The fact that it runs without a problem in 17 means it's actually 
rather meaningful to look at the query plan. It could have changed. 
Separately, it might help us to narrow down what changes to look at that 
could potentially be causing problems.

I would fully understand if this was an "ordinary" issue case with a 
simple self-contained query and with things going wrong each time in the 
same way. However, as said, besides the major issue of running the query 
separately from my geoprocessing workflow which involves many more steps 
- which would mean that any test outside of it would *not* be very much 
representative of what is going on inside my tool and geoprocessing 
workflow - there is the fact that things going wrong is a random 
anomaly. I cannot stress this enough: about 3-4 in 5 runs are OK, then a 
random follow up run *with exactly the same input data* turns out bad 
with the stall. Even if there was an easy way to run the query, I think 
the chance is highly likely the postgres query planner would come up 
with a decent plan, as in normal circumstances, there is no issue.

 > Making vacuum more aggressive won't really help much if you have 
longrunning queries/sessions, since vacuum can't clean up row versions 
that are still visibile to some of the transactions.

My code batches the updates in sets of 2000 records at a time and then 
COMMITs, so the transactions themselves are limited in time and size, 
which should allow vacuum to do its job.

 > Could you "unfold" the callstacks for the top entries? And/or attach 
a perf report --no-children > somefile (when redirecting to a file perf 
will include much more detail than when using it interactively)

See below:

# Total Lost Samples: 0
#
# Samples: 40K of event 'task-clock:ppp'
# Event count (approx.): 10003250000
#
# Overhead  Command   Shared Object      Symbol
# ........  ........  ................. 
...........................................
#
     26.48%  postgres  postgres           [.] LWLockAttemptLock
             |
             ---LWLockAttemptLock
                |
                |--23.15%--heapam_index_fetch_tuple.lto_priv.0
                |          index_fetch_heap
                |          IndexNext
                |          ExecScan
                |          ExecNestLoop
                |          ExecNestLoop
                |          ExecModifyTable
                |          standard_ExecutorRun
                |          ProcessQuery
                |          PortalRunMulti
                |          PortalRun
                |          exec_simple_query
                |          PostgresMain
                |          0x5f3570fb9dbf
                |          postmaster_child_launch
                |          ServerLoop.isra.0
                |          PostmasterMain
                |          main
                |          __libc_start_call_main
                |          __libc_start_main@@GLIBC_2.34
                |          _start
                |
                 --3.30%--StartReadBuffer
                           ReadBufferExtended
                           |
  --3.25%--heapam_index_fetch_tuple.lto_priv.0
                                      index_fetch_heap
                                      IndexNext
                                      ExecScan
                                      ExecNestLoop
                                      ExecNestLoop
                                      ExecModifyTable
                                      standard_ExecutorRun
                                      ProcessQuery
                                      PortalRunMulti
                                      PortalRun
                                      exec_simple_query
                                      PostgresMain
                                      0x5f3570fb9dbf
                                      postmaster_child_launch
                                      ServerLoop.isra.0
                                      PostmasterMain
                                      main
                                      __libc_start_call_main
                                      __libc_start_main@@GLIBC_2.34
                                      _start

     15.81%  postgres  postgres           [.] LWLockReleaseInternal
             |
             ---LWLockReleaseInternal
                |
                 --15.71%--LWLockRelease
                           |
|--15.03%--heapam_index_fetch_tuple.lto_priv.0
                           |          index_fetch_heap
                           |          IndexNext
                           |          ExecScan
                           |          ExecNestLoop
                           |          ExecNestLoop
                           |          ExecModifyTable
                           |          standard_ExecutorRun
                           |          ProcessQuery
                           |          PortalRunMulti
                           |          PortalRun
                           |          exec_simple_query
                           |          PostgresMain
                           |          0x5f3570fb9dbf
                           |          postmaster_child_launch
                           |          ServerLoop.isra.0
                           |          PostmasterMain
                           |          main
                           |          __libc_start_call_main
                           |          __libc_start_main@@GLIBC_2.34
                           |          _start
                           |
                            --0.66%--StartReadBuffer
                                      ReadBufferExtended
                                      |
--0.66%--heapam_index_fetch_tuple.lto_priv.0
                                                 index_fetch_heap
                                                 IndexNext
                                                 ExecScan
                                                 ExecNestLoop
                                                 ExecNestLoop
                                                 ExecModifyTable
standard_ExecutorRun
                                                 ProcessQuery
                                                 PortalRunMulti
                                                 PortalRun
                                                 exec_simple_query
                                                 PostgresMain
                                                 0x5f3570fb9dbf
postmaster_child_launch
                                                 ServerLoop.isra.0
                                                 PostmasterMain
                                                 main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
                                                 _start

     12.88%  postgres  postgres           [.] ReleaseAndReadBuffer
             |
             ---ReleaseAndReadBuffer
                |
                 --12.78%--heapam_index_fetch_tuple.lto_priv.0
                           index_fetch_heap
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      4.84%  postgres  postgres           [.] 
tts_heap_getsomeattrs.lto_priv.0
             |
             ---tts_heap_getsomeattrs.lto_priv.0
                |
                 --4.80%--slot_getsomeattrs_int
                           ExecInterpExpr
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      4.64%  postgres  postgres           [.] PinBuffer
             |
             ---PinBuffer
                |
                 --4.64%--StartReadBuffer
                           ReadBufferExtended
                           |
  --4.57%--heapam_index_fetch_tuple.lto_priv.0
                                      index_fetch_heap
                                      IndexNext
                                      ExecScan
                                      ExecNestLoop
                                      ExecNestLoop
                                      ExecModifyTable
                                      standard_ExecutorRun
                                      ProcessQuery
                                      PortalRunMulti
                                      PortalRun
                                      exec_simple_query
                                      PostgresMain
                                      0x5f3570fb9dbf
                                      postmaster_child_launch
                                      ServerLoop.isra.0
                                      PostmasterMain
                                      main
                                      __libc_start_call_main
                                      __libc_start_main@@GLIBC_2.34
                                      _start

      3.67%  postgres  postgres           [.] heap_hot_search_buffer
             |
             ---heap_hot_search_buffer
                |
                 --3.61%--heapam_index_fetch_tuple.lto_priv.0
                           index_fetch_heap
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      3.47%  postgres  postgres           [.] hash_search_with_hash_value
             |
             ---hash_search_with_hash_value
                |
                 --3.45%--StartReadBuffer
                           ReadBufferExtended
                           |
  --3.35%--heapam_index_fetch_tuple.lto_priv.0
                                      index_fetch_heap
                                      IndexNext
                                      ExecScan
                                      ExecNestLoop
                                      |
                                       --3.34%--ExecNestLoop
                                                 ExecModifyTable
standard_ExecutorRun
                                                 ProcessQuery
                                                 PortalRunMulti
                                                 PortalRun
                                                 exec_simple_query
                                                 PostgresMain
                                                 0x5f3570fb9dbf
postmaster_child_launch
                                                 ServerLoop.isra.0
                                                 PostmasterMain
                                                 main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
                                                 _start

      2.97%  postgres  postgres           [.] ExecInterpExpr
             |
             ---ExecInterpExpr
                |
                |--1.64%--ExecNestLoop
                |          |
                |           --1.60%--ExecNestLoop
                |                     ExecModifyTable
                |                     standard_ExecutorRun
                |                     ProcessQuery
                |                     PortalRunMulti
                |                     PortalRun
                |                     exec_simple_query
                |                     PostgresMain
                |                     0x5f3570fb9dbf
                |                     postmaster_child_launch
                |                     ServerLoop.isra.0
                |                     PostmasterMain
                |                     main
                |                     __libc_start_call_main
                |                     __libc_start_main@@GLIBC_2.34
                |                     _start
                |
                 --1.33%--ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      2.11%  postgres  postgres           [.] UnpinBufferNoOwner.lto_priv.0
             |
             ---UnpinBufferNoOwner.lto_priv.0
                |
                |--1.24%--ExecStoreBufferHeapTuple
                |          heapam_index_fetch_tuple.lto_priv.0
                |          index_fetch_heap
                |          IndexNext
                |          ExecScan
                |          ExecNestLoop
                |          ExecNestLoop
                |          ExecModifyTable
                |          standard_ExecutorRun
                |          ProcessQuery
                |          PortalRunMulti
                |          PortalRun
                |          exec_simple_query
                |          PostgresMain
                |          0x5f3570fb9dbf
                |          postmaster_child_launch
                |          ServerLoop.isra.0
                |          PostmasterMain
                |          main
                |          __libc_start_call_main
                |          __libc_start_main@@GLIBC_2.34
                |          _start
                |
                 --0.81%--ReleaseAndReadBuffer
                           heapam_index_fetch_tuple.lto_priv.0
                           index_fetch_heap
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      1.92%  postgres  postgres           [.] 
heapam_index_fetch_tuple.lto_priv.0
             |
             ---heapam_index_fetch_tuple.lto_priv.0
                |
                 --1.67%--index_fetch_heap
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      1.88%  postgres  postgres           [.] StartReadBuffer
             |
             ---StartReadBuffer
                |
                 --1.86%--ReadBufferExtended
                           |
  --1.83%--heapam_index_fetch_tuple.lto_priv.0
                                      index_fetch_heap
                                      IndexNext
                                      ExecScan
                                      ExecNestLoop
                                      ExecNestLoop
                                      ExecModifyTable
                                      standard_ExecutorRun
                                      ProcessQuery
                                      PortalRunMulti
                                      PortalRun
                                      exec_simple_query
                                      PostgresMain
                                      0x5f3570fb9dbf
                                      postmaster_child_launch
                                      ServerLoop.isra.0
                                      PostmasterMain
                                      main
                                      __libc_start_call_main
                                      __libc_start_main@@GLIBC_2.34
                                      _start

      1.41%  postgres  postgres           [.] LWLockAcquire
             |
             ---LWLockAcquire
                |
                 --1.15%--heapam_index_fetch_tuple.lto_priv.0
                           index_fetch_heap
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      1.12%  postgres  postgres           [.] heap_page_prune_opt
             |
             ---heap_page_prune_opt
                |
                 --1.12%--heapam_index_fetch_tuple.lto_priv.0
                           index_fetch_heap
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      1.09%  postgres  postgres           [.] HeapTupleSatisfiesVisibility
             |
             ---HeapTupleSatisfiesVisibility
                |
                 --1.02%--heap_hot_search_buffer
                           heapam_index_fetch_tuple.lto_priv.0
                           index_fetch_heap
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      1.00%  postgres  postgres           [.] index_fetch_heap
             |
             ---index_fetch_heap
                |
                 --0.83%--IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      0.97%  postgres  postgres           [.] _bt_readpage
             |
             ---_bt_readpage
                |
                 --0.97%--_bt_readnextpage
                           _bt_next
                           btgettuple
                           index_getnext_tid
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      0.92%  postgres  postgres           [.] _bt_next
             |
             ---_bt_next
                |
                 --0.82%--btgettuple
                           index_getnext_tid
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      0.92%  postgres  postgres           [.] index_getnext_tid
             |
             ---index_getnext_tid
                |
                 --0.82%--IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      0.90%  postgres  postgres           [.] ExecScan
             |
             ---ExecScan
                ExecNestLoop
                |
                 --0.83%--ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      0.85%  postgres  postgres           [.] IndexNext
             |
             ---IndexNext
                |
                 --0.80%--ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      0.83%  postgres  postgres           [.] XidInMVCCSnapshot
             |
             ---XidInMVCCSnapshot

      0.81%  postgres  postgres           [.] LWLockRelease
             |
             ---LWLockRelease
                |
                 --0.54%--heapam_index_fetch_tuple.lto_priv.0
                           index_fetch_heap
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      0.71%  postgres  postgres           [.] _bt_checkkeys
             |
             ---_bt_checkkeys
                |
                 --0.65%--_bt_readpage
                           _bt_readnextpage
                           _bt_next
                           btgettuple
                           index_getnext_tid
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      0.69%  postgres  postgres           [.] ReleaseBuffer
             |
             ---ReleaseBuffer
                |
                 --0.68%--ExecStoreBufferHeapTuple
                           heapam_index_fetch_tuple.lto_priv.0
                           index_fetch_heap
                           IndexNext
                           ExecScan
                           ExecNestLoop
                           ExecNestLoop
                           ExecModifyTable
                           standard_ExecutorRun
                           ProcessQuery
                           PortalRunMulti
                           PortalRun
                           exec_simple_query
                           PostgresMain
                           0x5f3570fb9dbf
                           postmaster_child_launch
                           ServerLoop.isra.0
                           PostmasterMain
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start

      0.57%  postgres  postgres           [.] 
GetPrivateRefCountEntry.lto_priv.0
             |
             ---GetPrivateRefCountEntry.lto_priv.0

      0.53%  postgres  postgres           [.] btgettuple
             |
             ---btgettuple

      0.48%  postgres  postgres           [.] _bt_check_compare.lto_priv.0
      0.46%  postgres  postgres           [.] MemoryContextReset
      0.42%  postgres  postgres           [.] hash_bytes
      0.40%  postgres  postgres           [.] ExecStoreBufferHeapTuple
      0.36%  postgres  postgres           [.] ResourceOwnerForget
      0.35%  postgres  postgres           [.] ExecNestLoop
      0.33%  postgres  postgres           [.] ExecEvalSysVar
      0.32%  postgres  postgres           [.] HeapTupleIsSurelyDead
      0.27%  postgres  postgres           [.] GlobalVisTestFor
      0.26%  postgres  postgres           [.] PredicateLockTID
      0.22%  postgres  postgres           [.] ReadBufferExtended
      0.22%  postgres  postgres           [.] tts_virtual_clear.lto_priv.0
      0.21%  postgres  postgres           [.] slot_getsomeattrs_int
      0.19%  postgres  postgres           [.] _bt_saveitem
      0.19%  postgres  postgres           [.] _bt_setuppostingitems
      0.18%  postgres  postgres           [.] ResourceOwnerEnlarge
      0.17%  postgres  postgres           [.] ExecIndexScan
      0.14%  postgres  libc.so.6          [.] __memcmp_sse2
      0.12%  postgres  postgres           [.] pgstat_count_io_op
      0.10%  postgres  postgres           [.] int8eq
      0.10%  postgres  postgres           [.] 
HeapCheckForSerializableConflictOut
      0.10%  postgres  postgres           [.] 
ReservePrivateRefCountEntry.lto_priv.0
      0.09%  postgres  postgres           [.] IncrBufferRefCount
      0.06%  postgres  postgres           [.] _bt_checkpage
      0.03%  postgres  postgres           [.] BufferGetLSNAtomic
      0.03%  postgres  postgres           [.] tag_hash
      0.03%  postgres  postgres           [.] LockBufHdr
      0.03%  postgres  postgres           [.] _bt_readnextpage
      0.02%  postgres  postgres           [.] 
tts_buffer_heap_getsomeattrs.lto_priv.0
      0.02%  postgres  [kernel.kallsyms]  [k] handle_softirqs
      0.01%  postgres  postgres           [.] _bt_steppage
      0.01%  postgres  [kernel.kallsyms]  [k] task_mm_cid_work
      0.00%  postgres  [kernel.kallsyms]  [k] _raw_spin_unlock_irq
      0.00%  postgres  [kernel.kallsyms]  [k] irqentry_exit_to_user_mode
      0.00%  postgres  postgres           [.] 
TransactionIdIsCurrentTransactionId
      0.00%  postgres  postgres           [.] index_rescan
      0.00%  postgres  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
      0.00%  postgres  postgres           [.] AllocSetAlloc
      0.00%  postgres  postgres           [.] PredicateLockPage
      0.00%  postgres  postgres           [.] _bt_set_startikey

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

08 октября, 22:08:28

Hi,

On 2025-10-08 19:46:42 +0200, Marco Boeringa wrote:
> > The fact that it runs without a problem in 17 means it's actually rather
> meaningful to look at the query plan. It could have changed. Separately, it
> might help us to narrow down what changes to look at that could potentially
> be causing problems.
> 
> I would fully understand if this was an "ordinary" issue case with a simple
> self-contained query and with things going wrong each time in the same way.
> However, as said, besides the major issue of running the query separately
> from my geoprocessing workflow which involves many more steps - which would
> mean that any test outside of it would *not* be very much representative of
> what is going on inside my tool and geoprocessing workflow - there is the
> fact that things going wrong is a random anomaly. I cannot stress this
> enough: about 3-4 in 5 runs are OK, then a random follow up run *with
> exactly the same input data* turns out bad with the stall. Even if there was
> an easy way to run the query, I think the chance is highly likely the
> postgres query planner would come up with a decent plan, as in normal
> circumstances, there is no issue.

Even just knowing whether the "normal query plan" is the same one as we see in
profiles of "stuck" backends is valuable. Even if the query plan is perfectly
normal, it *still* is very important to know in which order the joins are
evaluated etc. But there also might be changes in the query plan between 17
and 18 that trigger the issue...

Without more details about what is expected to be run and what is actually
happening, it's just about impossible for us to debug this without a
reproducer that we can run and debug ourselves.


> > Making vacuum more aggressive won't really help much if you have
> longrunning queries/sessions, since vacuum can't clean up row versions that
> are still visibile to some of the transactions.
> 
> My code batches the updates in sets of 2000 records at a time and then
> COMMITs, so the transactions themselves are limited in time and size, which
> should allow vacuum to do its job.

Are the "stuck" backends stuck within one 2000 record batch, or are they
"just" slower processing each batch?


>     26.48%  postgres  postgres           [.] LWLockAttemptLock
>             |
>             ---LWLockAttemptLock
>                |
>                |--23.15%--heapam_index_fetch_tuple.lto_priv.0
>                |          index_fetch_heap
>                |          IndexNext
>                |          ExecScan
>                |          ExecNestLoop
>                |          ExecNestLoop
>                |          ExecModifyTable
>                |          standard_ExecutorRun
>                |          ProcessQuery

So the query plan we have is a nested loop between at least three tables
(there are two joins, c.f. the two ExecNestLoop calls), where there presumably
are a lot of row [versions] on the inner side of the innermost join.

In [1] you showed a query. Reformated that looks like this:

UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1
SET area_geo = t2.area_geo,
    perim_geo = t2.perim_geo,
    compact_geo = CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END,
    npoints_geo = t2.npoints_geo,
    comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) /
t2.area_geo)/ (4 * pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END,
 
    convex_ratio_geo = CASE WHEN ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo /
ST_Area(ST_ConvexHull(way)::geography,true))ELSE 1 END
 
FROM (
    SELECT
        objectid,
        ST_Area(way::geography,true) AS area_geo,
        ST_Perimeter(way::geography,true) AS perim_geo,
        ST_NPoints(way) AS npoints_geo
    FROM osm.landcover_scrubs_small_scale_2_ply) AS t2
WHERE (t2.objectid = t1.objectid)
   AND t1.objectid IN (SELECT t3.objectid FROM mini_test.osm.osm_tmp_28128_ch5 AS t3)


Which certainly fits with two nested loops, although I don't think I can infer
which order it the joins are in.


Is osm.landcover_scrubs_small_scale_2_ply.object_id unique?

Can there be multiple rows for one object_id in
mini_test.osm.osm_tmp_28128_ch5?

Are there indexes on mini_test.osm.osm_tmp_28128_ch5.unique_id and
osm.landcover_scrubs_small_scale_2_ply?

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/53b44572-0ceb-4149-b361-07da2da91032%40boeringa.demon.nl

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

08 октября, 23:09:04

Hi Andres, > Even just knowing whether the "normal query plan" is the same one as we see in > profiles of "stuck" backends is valuable. Even if the query plan is perfectly > normal, it *still* is very important to know in which order the joins are > evaluated etc. But there also might be changes in the query plan between 17 > and 18 that trigger the issue... > > Without more details about what is expected to be run and what is actually > happening, it's just about impossible for us to debug this without a > reproducer that we can run and debug ourselves. I now encountered the auto_explain option in the PostgreSQL help. May sound stupid, but I hadn't been aware of this option. This might help in getting an explain during the actual execution of my tool, if I understand the option properly. This would be far more valuable - as being the "real" thing - than some contrived reproduction case. I will need to investigate this a bit more: https://www.postgresql.org/docs/current/auto-explain.html >>> Making vacuum more aggressive won't really help much if you have >> longrunning queries/sessions, since vacuum can't clean up row versions that >> are still visibile to some of the transactions. >> >> My code batches the updates in sets of 2000 records at a time and then >> COMMITs, so the transactions themselves are limited in time and size, which >> should allow vacuum to do its job. > > Are the "stuck" backends stuck within one 2000 record batch, or are they > "just" slower processing each batch? I can't tell. But to explain: each thread has its own set of jobs assigned, and each job will be batched in sets of 2000 records until COMMIT. So if one job has 100k records to process, 50 commits should occur for that job by one Python thread. I take care to avoid to process records totally randomly, which could cause conflicts and locking issues between threads attempting to access the same locked database page, significantly slowing down the processing. Records are assigned by database page (and depending on some other parameters), which has worked really well so far. Note that this is just a simplified version of the different processing modes I developed for different challenges and geoprocessing steps. >> 26.48% postgres postgres [.] LWLockAttemptLock >> | >> ---LWLockAttemptLock >> | >> |--23.15%--heapam_index_fetch_tuple.lto_priv.0 >> | index_fetch_heap >> | IndexNext >> | ExecScan >> | ExecNestLoop >> | ExecNestLoop >> | ExecModifyTable >> | standard_ExecutorRun >> | ProcessQuery > > So the query plan we have is a nested loop between at least three tables > (there are two joins, c.f. the two ExecNestLoop calls), where there presumably > are a lot of row [versions] on the inner side of the innermost join. > > In [1] you showed a query. Reformated that looks like this: > > UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1 > SET area_geo = t2.area_geo, > perim_geo = t2.perim_geo, > compact_geo = CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END, > npoints_geo = t2.npoints_geo, > comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END, > convex_ratio_geo = CASE WHEN ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo / ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END > FROM ( > SELECT > objectid, > ST_Area(way::geography,true) AS area_geo, > ST_Perimeter(way::geography,true) AS perim_geo, > ST_NPoints(way) AS npoints_geo > FROM osm.landcover_scrubs_small_scale_2_ply) AS t2 > WHERE (t2.objectid = t1.objectid) > AND t1.objectid IN (SELECT t3.objectid FROM mini_test.osm.osm_tmp_28128_ch5 AS t3) > > > Which certainly fits with two nested loops, although I don't think I can infer > which order it the joins are in. > > > Is osm.landcover_scrubs_small_scale_2_ply.object_id unique? Yes. > Can there be multiple rows for one object_id in > mini_test.osm.osm_tmp_28128_ch5? No. This table contains the records to process, which are unique. It is the job.

It is a one-to-one join.

> Are there indexes on mini_test.osm.osm_tmp_28128_ch5.unique_id and > osm.landcover_scrubs_small_scale_2_ply? Yes, the unique ids / objectid fields are indexed to allow an efficient join.

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

08 октября, 23:22:48

I noticed the formatting of the last email was totally screwed when displayed on postgresql.org's mail archive making it hard to read there, so a re-post of the last email, hopefully it will be better this time.

Answers are intermingled with all the quotes, read carefully.

Op 8-10-2025 om 22:09 schreef Marco Boeringa:

Hi Andres, > Even just knowing whether the "normal query plan" is the same one as we see in > profiles of "stuck" backends is valuable. Even if the query plan is perfectly > normal, it *still* is very important to know in which order the joins are > evaluated etc. But there also might be changes in the query plan between 17 > and 18 that trigger the issue... > > Without more details about what is expected to be run and what is actually > happening, it's just about impossible for us to debug this without a > reproducer that we can run and debug ourselves. I now encountered the auto_explain option in the PostgreSQL help. May sound stupid, but I hadn't been aware of this option. This might help in getting an explain during the actual execution of my tool, if I understand the option properly. This would be far more valuable - as being the "real" thing - than some contrived reproduction case. I will need to investigate this a bit more: https://www.postgresql.org/docs/current/auto-explain.html >>> Making vacuum more aggressive won't really help much if you have >> longrunning queries/sessions, since vacuum can't clean up row versions that >> are still visibile to some of the transactions. >> >> My code batches the updates in sets of 2000 records at a time and then >> COMMITs, so the transactions themselves are limited in time and size, which >> should allow vacuum to do its job. > > Are the "stuck" backends stuck within one 2000 record batch, or are they > "just" slower processing each batch? I can't tell. But to explain: each thread has its own set of jobs assigned, and each job will be batched in sets of 2000 records until COMMIT. So if one job has 100k records to process, 50 commits should occur for that job by one Python thread. I take care to avoid to process records totally randomly, which could cause conflicts and locking issues between threads attempting to access the same locked database page, significantly slowing down the processing. Records are assigned by database page (and depending on some other parameters), which has worked really well so far. Note that this is just a simplified version of the different processing modes I developed for different challenges and geoprocessing steps. >> 26.48% postgres postgres [.] LWLockAttemptLock >> | >> ---LWLockAttemptLock >> | >> |--23.15%--heapam_index_fetch_tuple.lto_priv.0 >> | index_fetch_heap >> | IndexNext >> | ExecScan >> | ExecNestLoop >> | ExecNestLoop >> | ExecModifyTable >> | standard_ExecutorRun >> | ProcessQuery > > So the query plan we have is a nested loop between at least three tables > (there are two joins, c.f. the two ExecNestLoop calls), where there presumably > are a lot of row [versions] on the inner side of the innermost join. > > In [1] you showed a query. Reformated that looks like this: > > UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1 > SET area_geo = t2.area_geo, > perim_geo = t2.perim_geo, > compact_geo = CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END, > npoints_geo = t2.npoints_geo, > comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END, > convex_ratio_geo = CASE WHEN ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo / ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END > FROM ( > SELECT > objectid, > ST_Area(way::geography,true) AS area_geo, > ST_Perimeter(way::geography,true) AS perim_geo, > ST_NPoints(way) AS npoints_geo > FROM osm.landcover_scrubs_small_scale_2_ply) AS t2 > WHERE (t2.objectid = t1.objectid) > AND t1.objectid IN (SELECT t3.objectid FROM mini_test.osm.osm_tmp_28128_ch5 AS t3) > > > Which certainly fits with two nested loops, although I don't think I can infer > which order it the joins are in. > > > Is osm.landcover_scrubs_small_scale_2_ply.object_id unique? Yes. > Can there be multiple rows for one object_id in > mini_test.osm.osm_tmp_28128_ch5? No. This table contains the records to process, which are unique. It is the job.
It is a one-to-one join.
> Are there indexes on mini_test.osm.osm_tmp_28128_ch5.unique_id and > osm.landcover_scrubs_small_scale_2_ply? Yes, the unique ids / objectid fields are indexed to allow an efficient join.

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

09 октября, 10:40:03

Last attempt to get the mail out in a proper format. I hate it when software gets "smart" with formatting... ;-(

I added one minor detail to the last question of Andres at the entire bottom, see below.

Even just knowing whether the "normal query plan" is the same one as we see in
profiles of "stuck" backends is valuable. Even if the query plan is perfectly
normal, it *still* is very important to know in which order the joins are
evaluated etc. But there also might be changes in the query plan between 17
and 18 that trigger the issue...

Without more details about what is expected to be run and what is actually
happening, it's just about impossible for us to debug this without a
reproducer that we can run and debug ourselves.

I now encountered the auto_explain option in the PostgreSQL help. May sound stupid, but I hadn't been aware of this option. This might help in getting an explain during the actual execution of my tool, if I understand the option properly. This would be far more valuable - as being the "real" thing - than some contrived reproduction case. I will need to investigate this a bit more:

https://www.postgresql.org/docs/current/auto-explain.html

Making vacuum more aggressive won't really help much if you have

longrunning queries/sessions, since vacuum can't clean up row versions that
are still visibile to some of the transactions.

My code batches the updates in sets of 2000 records at a time and then
COMMITs, so the transactions themselves are limited in time and size, which
should allow vacuum to do its job.

Are the "stuck" backends stuck within one 2000 record batch, or are they
"just" slower processing each batch?

I can't tell. But to explain: each thread has its own set of jobs assigned, and each job will be batched in sets of 2000 records until COMMIT. So if one job has 100k records to process, 50 commits should occur for that job by one Python thread. I take care to avoid to process records totally randomly, which could cause conflicts and locking issues between threads attempting to access the same locked database page, significantly slowing down the processing. Records are assigned by database page (and depending on some other parameters), which has worked really well so far.

Note that this is just a simplified version of the different processing modes I developed for different challenges and geoprocessing steps.

    26.48%  postgres  postgres           [.] LWLockAttemptLock
            |
            ---LWLockAttemptLock
               |
               |--23.15%--heapam_index_fetch_tuple.lto_priv.0
               |          index_fetch_heap
               |          IndexNext
               |          ExecScan
               |          ExecNestLoop
               |          ExecNestLoop
               |          ExecModifyTable
               |          standard_ExecutorRun
               |          ProcessQuery

So the query plan we have is a nested loop between at least three tables
(there are two joins, c.f. the two ExecNestLoop calls), where there presumably
are a lot of row [versions] on the inner side of the innermost join.

In [1] you showed a query. Reformated that looks like this:

UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1
SET area_geo = t2.area_geo,    perim_geo = t2.perim_geo,    compact_geo = CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END,    npoints_geo = t2.npoints_geo,    comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END,    convex_ratio_geo = CASE WHEN ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo / ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END
FROM (    SELECT        objectid,        ST_Area(way::geography,true) AS area_geo,        ST_Perimeter(way::geography,true) AS perim_geo,        ST_NPoints(way) AS npoints_geo    FROM osm.landcover_scrubs_small_scale_2_ply) AS t2
WHERE (t2.objectid = t1.objectid)   AND t1.objectid IN (SELECT t3.objectid FROM mini_test.osm.osm_tmp_28128_ch5 AS t3)


Which certainly fits with two nested loops, although I don't think I can infer
which order it the joins are in.


Is osm.landcover_scrubs_small_scale_2_ply.object_id unique?

Yes.

Can there be multiple rows for one object_id in
mini_test.osm.osm_tmp_28128_ch5?

No. This table contains the records to process, which are unique. It is the job.

It is a one-to-one join.

Are there indexes on mini_test.osm.osm_tmp_28128_ch5.unique_id and
osm.landcover_scrubs_small_scale_2_ply?

Yes, the unique ids / objectid fields are indexed to allow an efficient join.

Actually, the "*_ch<number>" database object that represents the records to process for one job, references a database view. Each thread gets its own view. All views reference the same secondary table that has an index on the objectid.

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

10 октября, 13:03:02

Op 8-10-2025 om 21:08 schreef Andres Freund:
> Even just knowing whether the "normal query plan" is the same one as we see in
> profiles of "stuck" backends is valuable. Even if the query plan is perfectly
> normal, it *still* is very important to know in which order the joins are
> evaluated etc. But there also might be changes in the query plan between 17
> and 18 that trigger the issue...
>
> Without more details about what is expected to be run and what is actually
> happening, it's just about impossible for us to debug this without a
> reproducer that we can run and debug ourselves.
>
>
Hi Andres,

I have tried to get the auto_explain stuff to run, but that has not 
succeeded not yet.

However, I realized that due to the extreme long "stall", it should be 
possible to simply copy out the visible SQL statement from pgAdmin and 
run it in a separate window, as the required (temporary) tables and 
views to run the SQL statement would be there at that point all the 
while the processing appears stuck, and run Explain simply from pgAdmin.

This resulted in the plan I pasted below in JSON format.

Any insights you gain from this in combination with the other stuff I 
shared and the answers I gave to your last questions?

Marco


"[
   {
     ""Plan"": {
       ""Node Type"": ""ModifyTable"",
       ""Operation"": ""Update"",
       ""Parallel Aware"": false,
       ""Async Capable"": false,
       ""Relation Name"": ""landcover_grassy_small_scale_2_ply"",
       ""Schema"": ""osm"",
       ""Alias"": ""t1"",
       ""Startup Cost"": 1.14,
       ""Total Cost"": 9129.99,
       ""Plan Rows"": 0,
       ""Plan Width"": 0,
       ""Disabled"": false,
       ""Plans"": [
         {
           ""Node Type"": ""Nested Loop"",
           ""Parent Relationship"": ""Outer"",
           ""Parallel Aware"": false,
           ""Async Capable"": false,
           ""Join Type"": ""Inner"",
           ""Startup Cost"": 1.14,
           ""Total Cost"": 9129.99,
           ""Plan Rows"": 70,
           ""Plan Width"": 62,
           ""Disabled"": false,
           ""Output"": 
[""st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)"", 
""st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true)"", ""CASE WHEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) > 
'0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision 
END"", ""st_npoints(landcover_grassy_small_scale_2_ply.way)"", ""CASE 
WHEN (st_npoints(landcover_grassy_small_scale_2_ply.way) > 0) THEN (CASE 
WHEN (st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) 
 > '0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision END / 
(st_npoints(landcover_grassy_small_scale_2_ply.way))::double precision) 
ELSE '0'::double precision END"", ""CASE WHEN 
(st_area((st_convexhull(t1.way))::geography, true) > '0'::double 
precision) THEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) / 
st_area((st_convexhull(t1.way))::geography, true)) ELSE '1'::double 
precision END"", ""t1.ctid"", 
""landcover_grassy_small_scale_2_ply.ctid"", 
""landcover_grassy_small_scale_2_ply_pg.ctid""],
           ""Inner Unique"": true,
           ""Plans"": [
             {
               ""Node Type"": ""Nested Loop"",
               ""Parent Relationship"": ""Outer"",
               ""Parallel Aware"": false,
               ""Async Capable"": false,
               ""Join Type"": ""Inner"",
               ""Startup Cost"": 0.72,
               ""Total Cost"": 173.50,
               ""Plan Rows"": 70,
               ""Plan Width"": 828,
               ""Disabled"": false,
               ""Output"": [""t1.way"", ""t1.ctid"", ""t1.objectid"", 
""landcover_grassy_small_scale_2_ply_pg.ctid"", 
""landcover_grassy_small_scale_2_ply_pg.objectid""],
               ""Inner Unique"": true,
               ""Plans"": [
                 {
                   ""Node Type"": ""Index Scan"",
                   ""Parent Relationship"": ""Outer"",
                   ""Parallel Aware"": false,
                   ""Async Capable"": false,
                   ""Scan Direction"": ""Forward"",
                   ""Index Name"": ""idx_osm_35"",
                   ""Relation Name"": 
""landcover_grassy_small_scale_2_ply_pg"",
                   ""Schema"": ""osm"",
                   ""Alias"": ""landcover_grassy_small_scale_2_ply_pg"",
                   ""Startup Cost"": 0.29,
                   ""Total Cost"": 3.70,
                   ""Plan Rows"": 70,
                   ""Plan Width"": 14,
                   ""Disabled"": false,
                   ""Output"": 
[""landcover_grassy_small_scale_2_ply_pg.ctid"", 
""landcover_grassy_small_scale_2_ply_pg.objectid""],
                   ""Index Cond"": 
""((landcover_grassy_small_scale_2_ply_pg.page_number >= 28873) AND 
(landcover_grassy_small_scale_2_ply_pg.page_number < 29373))""
                 },
                 {
                   ""Node Type"": ""Index Scan"",
                   ""Parent Relationship"": ""Inner"",
                   ""Parallel Aware"": false,
                   ""Async Capable"": false,
                   ""Scan Direction"": ""Forward"",
                   ""Index Name"": 
""landcover_grassy_small_scale_2_ply_pkey"",
                   ""Relation Name"": 
""landcover_grassy_small_scale_2_ply"",
                   ""Schema"": ""osm"",
                   ""Alias"": ""t1"",
                   ""Startup Cost"": 0.42,
                   ""Total Cost"": 2.43,
                   ""Plan Rows"": 1,
                   ""Plan Width"": 814,
                   ""Disabled"": false,
                   ""Output"": [""t1.way"", ""t1.ctid"", ""t1.objectid""],
                   ""Index Cond"": ""(t1.objectid = 
landcover_grassy_small_scale_2_ply_pg.objectid)""
                 }
               ]
             },
             {
               ""Node Type"": ""Index Scan"",
               ""Parent Relationship"": ""Inner"",
               ""Parallel Aware"": false,
               ""Async Capable"": false,
               ""Scan Direction"": ""Forward"",
               ""Index Name"": ""landcover_grassy_small_scale_2_ply_pkey"",
               ""Relation Name"": ""landcover_grassy_small_scale_2_ply"",
               ""Schema"": ""osm"",
               ""Alias"": ""landcover_grassy_small_scale_2_ply"",
               ""Startup Cost"": 0.42,
               ""Total Cost"": 0.64,
               ""Plan Rows"": 1,
               ""Plan Width"": 814,
               ""Disabled"": false,
               ""Output"": [""landcover_grassy_small_scale_2_ply.way"", 
""landcover_grassy_small_scale_2_ply.ctid"", 
""landcover_grassy_small_scale_2_ply.objectid""],
               ""Index Cond"": 
""(landcover_grassy_small_scale_2_ply.objectid = t1.objectid)""
             }
           ]
         }
       ]
     }
   }
]"

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

11 октября, 00:26:38

Just found out about the other thread, which seems to describe an issue 
that may well be related to the problems I observe in my multi-threaded 
Python implementation using PostgreSQL 18:


https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BHh7u3YpPLKQS%2BB7KJu0Bw99EjhuPuvBRVccDfgod95g%40mail.gmail.com#969d704180758e8dbd124604206175e0

As I have zero experience compiling PostgreSQL from scratch and 
PostgreSQL development in general, I will unfortunately avoid going down 
the rabbit hole of such an attempt to be able to test the patch, but 
will eagerly await PG18.1 which hopefully will have it included and 
might fix the issues I observed.

Marco

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Andres Freund

Дата:

11 октября, 00:30:20

Hi,

On October 10, 2025 5:26:38 PM EDT, Marco Boeringa <marco@boeringa.demon.nl> wrote:
>Just found out about the other thread, which seems to describe an issue that may well be related to the problems I
observein my multi-threaded Python implementation using PostgreSQL 18: 
>

>https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BHh7u3YpPLKQS%2BB7KJu0Bw99EjhuPuvBRVccDfgod95g%40mail.gmail.com#969d704180758e8dbd124604206175e0
>
>As I have zero experience compiling PostgreSQL from scratch and PostgreSQL development in general, I will
unfortunatelyavoid going down the rabbit hole of such an attempt to be able to test the patch, but will eagerly await
PG18.1which hopefully will have it included and might fix the issues I observed. 

If that issue were causal you'd see something in pg_aios and the issue wouldn't be present with worker or sync io
methods.Nor would you see high cpu utilization. This is a different issue, likely not aio related. 

Greetings,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

11 октября, 00:36:37

Then as follow up on that: is there anything you still need from me, or 
is the information I gave so far including the last requested answers 
and data of perf, enough to give you at least a decent start to figure 
out where the issue is?

Op 10-10-2025 om 23:30 schreef Andres Freund:
> Hi,
>
> On October 10, 2025 5:26:38 PM EDT, Marco Boeringa <marco@boeringa.demon.nl> wrote:
>> Just found out about the other thread, which seems to describe an issue that may well be related to the problems I
observein my multi-threaded Python implementation using PostgreSQL 18:
 
>>
>>
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BHh7u3YpPLKQS%2BB7KJu0Bw99EjhuPuvBRVccDfgod95g%40mail.gmail.com#969d704180758e8dbd124604206175e0
>>
>> As I have zero experience compiling PostgreSQL from scratch and PostgreSQL development in general, I will
unfortunatelyavoid going down the rabbit hole of such an attempt to be able to test the patch, but will eagerly await
PG18.1which hopefully will have it included and might fix the issues I observed.
 
> If that issue were causal you'd see something in pg_aios and the issue wouldn't be present with worker or sync io
methods.Nor would you see high cpu utilization. This is a different issue, likely not aio related.
 
>
> Greetings,
>
> Andres

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

12 октября, 11:24:51

Hi Andres,

I have been doing a bit more investigation. As I explained before, the 
problematic multi-threaded geoprocessing step is not some stand-alone 
query that can be easily reduced to small easily portable reproducible 
case with attached data. In fact, this geoprocessing step is part of a 
large custom build Python geoprocessing workflow, with total code 
probably in the 25k code lines range.

However, based on the apparent poor query plan in PG18 / PostGIS 3.6.0, 
I now reviewed the exact code once more. I noticed that just before 
entering the multi-threaded code that emits the queries as seen below, I 
am actually adding the primary key field 'objectid' as "GENERATED BY 
DEFAULT AS IDENTITY" to the 'osm.landcover_scrubs_small_scale_2_ply' table.

Now I also noticed I did not run ANALYZE after that against the same 
table. I have now added this to the code. Although it is still 
preliminary, first tests seem to indicate that this resolves the issue, 
and prevents the stalls or better said apparent hugely inefficient query 
plan (remember: a < 10 sec process was turned into multi-hour). I still 
need to do more thorough testing to be sure though.

However, this raises a couple of question:

- While ANALYZE is of course hugely important for proper statistics and 
query planning, I have wondered if PostgreSQL shouldn't automatically 
have updated the statistics for the addition of the primary key with 
IDENTITY? It seems to me that based on the definition of the primary key 
column and IDENTITY and table size, the actual distribution of values is 
essentially already known even before any sampling of ANALYZE to update 
statistics?

- Am I right to assume that only the statistics on the objectid field 
play any role in this issue? As you can see, the WHERE clause does not 
involve any other fields than the two objectid fields of the main table 
and the chunk table specifying the job. All other values computed are 
just derived straight from the geometry column.

- Were there any hints in the all the other data I supplied as to where 
PG18's query planning without the updated statistics of the new ANALYZE 
step added, is going wrong? And why this was never an issue in <= PG17?

I also did some preliminary test with the old PG17.6 / PostgGIS 3.6.0 
cluster with the same Italy extract data. I still need to do more 
thorough testing, both with and without the extra ANALYZE step, to fully 
exclude that there isn't something related to the upgrade to PostGIS 
3.6.0, but first indications are as I already saw with the PG17.6 / 
PostgGIS 3.5.3, that there are no issue with <= PG17 / PostGIS 
combination as regards this apparent planner issue.

Marco

Op 8-10-2025 om 21:08 schreef Andres Freund:
> In [1] you showed a query. Reformated that looks like this:
>
> UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1
> SET area_geo = t2.area_geo,
>      perim_geo = t2.perim_geo,
>      compact_geo = CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END,
>      npoints_geo = t2.npoints_geo,
>      comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) /
t2.area_geo)/ (4 * pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END,
 
>      convex_ratio_geo = CASE WHEN ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo /
ST_Area(ST_ConvexHull(way)::geography,true))ELSE 1 END
 
> FROM (
>      SELECT
>          objectid,
>          ST_Area(way::geography,true) AS area_geo,
>          ST_Perimeter(way::geography,true) AS perim_geo,
>          ST_NPoints(way) AS npoints_geo
>      FROM osm.landcover_scrubs_small_scale_2_ply) AS t2
> WHERE (t2.objectid = t1.objectid)
>     AND t1.objectid IN (SELECT t3.objectid FROM mini_test.osm.osm_tmp_28128_ch5 AS t3)
>
>
> Which certainly fits with two nested loops, although I don't think I can infer
> which order it the joins are in.

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

19 октября, 23:37:08

Hi Andres,

I have now been able to capture an actual bad plan using PostgreSQL's 
'auto_explain' option, so this the bad plan that makes a process and job 
that should take just a few seconds, cost more than 7 hours to execute. 
And this is of course without updating the tables statistics with 
ANALYZE after adding the primary key objectid, as that solved the issue 
in PG18. It is now clear where PostgreSQL spends all of its time when 
the bad plan is generated, see the loop number of the second index scan. 
I am not sure why in this nested loop, two index scans on essentially 
the same key and table are executed. You can compare this bad plan with 
the one below it, that was generated with EXPLAIN in pgAdmin, not from 
actual auto_explain output, but shows a different query execution plan.

Note again:

- I did not see the bad plan in PG<=17, even without the now added 
ANALYZE step that solves the issue in PG18.

- The two tables involved ('landcover_grassy_small_scale_2_ply' and 
'landcover_grassy_small_scale_2_ply_pg') both have primary keys and 
unique objectid indexes.

- The join is one-to-one. 'landcover_grassy_small_scale_2_ply_pg' has 
unique values for objectids, so only record per corresponding record 
in 'landcover_grassy_small_scale_2_ply'

- The 'osm_tmp_28232_ch3' references a database view and represents the 
chunk / multi-threaded job that needs to be executed from my Python 
code, and references a selection of 
the 'landcover_grassy_small_scale_2_ply_pg' table's records, which is 
the filter in the last index scan in the bad plan below.

Marco

*** ACTUAL BAD PLAN AS CAPTURED BY auto_explain ***:

2025-10-17 23:32:17.375 CEST [242803] osm@mini_test LOG: duration: 
27133980.536 ms  plan:
     Query Text: UPDATE osm.landcover_grassy_small_scale_2_ply AS t1 SET 
area_geo = t2.area_geo, perim_geo = t2.perim_geo, compact_geo = CASE 
WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * 
pi())) ELSE 0 END, npoints_geo = t2.npoints_geo, comp_npoints_geo = CASE 
WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN 
((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END / 
t2.npoints_geo) ELSE 0 END, convex_ratio_geo = CASE WHEN 
ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo / 
ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END FROM (SELECT 
objectid,ST_Area(way::geography,true) AS 
area_geo,ST_Perimeter(way::geography,true) AS perim_geo,ST_NPoints(way) 
AS npoints_geo FROM osm.landcover_grassy_small_scale_2_ply)  AS t2 WHERE 
(t2.objectid = t1.objectid) AND t1.objectid IN (SELECT t3.objectid FROM 
mini_test.osm.osm_tmp_28232_ch3 AS t3)
     Update on osm.landcover_grassy_small_scale_2_ply t1 
(cost=1.17..134.56 rows=0 width=0) (actual 
time=27133980.532..27133980.534 rows=0.00 loops=1)
       Buffers: shared hit=7585627828
       ->  Nested Loop  (cost=1.17..134.56 rows=1 width=62) (actual 
time=225948.297..27133979.894 rows=8.00 loops=1)
             Output: 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true), 
st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, true), 
CASE WHEN (st_area((landcover_grassy_small_scale_2_ply.way)::geography, 
true) > '0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision END, 
st_npoints(landcover_grassy_small_scale_2_ply.way), CASE WHEN 
(st_npoints(landcover_grassy_small_scale_2_ply.way) > 0) THEN (CASE WHEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) > 
'0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision END / 
(st_npoints(landcover_grassy_small_scale_2_ply.way))::double precision) 
ELSE '0'::double precision END, CASE WHEN 
(st_area((st_convexhull(t1.way))::geography, true) > '0'::double 
precision) THEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) / 
st_area((st_convexhull(t1.way))::geography, true)) ELSE '1'::double 
precision END, t1.ctid, landcover_grassy_small_scale_2_ply.ctid, 
landcover_grassy_small_scale_2_ply_pg.ctid
             Inner Unique: true
             Join Filter: (landcover_grassy_small_scale_2_ply.objectid = 
landcover_grassy_small_scale_2_ply_pg.objectid)
             Buffers: shared hit=7585627705
             ->  Nested Loop  (cost=0.75..4.79 rows=1 width=620) (actual 
time=5.784..27131099.372 rows=222396.00 loops=1)
                   Output: t1.way, t1.ctid, t1.objectid, 
landcover_grassy_small_scale_2_ply.way, 
landcover_grassy_small_scale_2_ply.ctid, 
landcover_grassy_small_scale_2_ply.objectid
                   Inner Unique: true
                   Join Filter: (t1.objectid = 
landcover_grassy_small_scale_2_ply.objectid)
                   Rows Removed by Join Filter: 24729879210
                   Buffers: shared hit=7584737792
                   ->  Index Scan using 
landcover_grassy_small_scale_2_ply_pkey on 
osm.landcover_grassy_small_scale_2_ply t1  (cost=0.38..2.39 rows=1 
width=310) (actual time=5.176..462.613 rows=222396.00 loops=1)
                         Output: t1.way, t1.ctid, t1.objectid
                         Index Searches: 1
                         Buffers: shared hit=66411
                   ->  Index Scan using 
landcover_grassy_small_scale_2_ply_pkey on 
osm.landcover_grassy_small_scale_2_ply  (cost=0.38..2.39 rows=1 
width=310) (actual time=0.035..114.709 rows=111198.50 loops=222396)
                         Output: landcover_grassy_small_scale_2_ply.way, 
landcover_grassy_small_scale_2_ply.ctid, 
landcover_grassy_small_scale_2_ply.objectid
                         Index Searches: 222396
                         Buffers: shared hit=7584671381
             ->  Index Scan using idx_osm_41 on 
osm.landcover_grassy_small_scale_2_ply_pg  (cost=0.42..2.44 rows=1 
width=14) (actual time=0.009..0.009 rows=0.00 loops=222396)
                   Output: landcover_grassy_small_scale_2_ply_pg.ctid, 
landcover_grassy_small_scale_2_ply_pg.objectid
                   Index Cond: 
(landcover_grassy_small_scale_2_ply_pg.objectid = t1.objectid)
                   Filter: 
((landcover_grassy_small_scale_2_ply_pg.page_number >= 31276) AND 
(landcover_grassy_small_scale_2_ply_pg.page_number < 31766))
                   Rows Removed by Filter: 1
                   Index Searches: 222396
                   Buffers: shared hit=889584


*** LIKELY GOOD PLAN AS GENERATED BY CAPTURING THE SQL AND RUNNING 
'EXPLAIN' IN pgAdmin ***:

"[
   {
     ""Plan"": {
       ""Node Type"": ""ModifyTable"",
       ""Operation"": ""Update"",
       ""Parallel Aware"": false,
       ""Async Capable"": false,
       ""Relation Name"": ""landcover_grassy_small_scale_2_ply"",
       ""Schema"": ""osm"",
       ""Alias"": ""t1"",
       ""Startup Cost"": 1.14,
       ""Total Cost"": 9129.99,
       ""Plan Rows"": 0,
       ""Plan Width"": 0,
       ""Disabled"": false,
       ""Plans"": [
         {
           ""Node Type"": ""Nested Loop"",
           ""Parent Relationship"": ""Outer"",
           ""Parallel Aware"": false,
           ""Async Capable"": false,
           ""Join Type"": ""Inner"",
           ""Startup Cost"": 1.14,
           ""Total Cost"": 9129.99,
           ""Plan Rows"": 70,
           ""Plan Width"": 62,
           ""Disabled"": false,
           ""Output"": 
[""st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)"",
""st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true)"", ""CASE WHEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) > 
'0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision 
END"", ""st_npoints(landcover_grassy_small_scale_2_ply.way)"", ""CASE 
WHEN (st_npoints(landcover_grassy_small_scale_2_ply.way) > 0) THEN (CASE 
WHEN (st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) 
 > '0'::double precision) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
true), '2'::double precision) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) / 
'12.566370614359172'::double precision) ELSE '0'::double precision END / 
(st_npoints(landcover_grassy_small_scale_2_ply.way))::double precision) 
ELSE '0'::double precision END"", ""CASE WHEN 
(st_area((st_convexhull(t1.way))::geography, true) > '0'::double 
precision) THEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) / 
st_area((st_convexhull(t1.way))::geography, true)) ELSE '1'::double 
precision END"", ""t1.ctid"", 
""landcover_grassy_small_scale_2_ply.ctid"", 
""landcover_grassy_small_scale_2_ply_pg.ctid""],
           ""Inner Unique"": true,
           ""Plans"": [
             {
               ""Node Type"": ""Nested Loop"",
               ""Parent Relationship"": ""Outer"",
               ""Parallel Aware"": false,
               ""Async Capable"": false,
               ""Join Type"": ""Inner"",
               ""Startup Cost"": 0.72,
               ""Total Cost"": 173.50,
               ""Plan Rows"": 70,
               ""Plan Width"": 828,
               ""Disabled"": false,
               ""Output"": [""t1.way"", ""t1.ctid"", ""t1.objectid"", 
""landcover_grassy_small_scale_2_ply_pg.ctid"", 
""landcover_grassy_small_scale_2_ply_pg.objectid""],
               ""Inner Unique"": true,
               ""Plans"": [
                 {
                   ""Node Type"": ""Index Scan"",
                   ""Parent Relationship"": ""Outer"",
                   ""Parallel Aware"": false,
                   ""Async Capable"": false,
                   ""Scan Direction"": ""Forward"",
                   ""Index Name"": ""idx_osm_35"",
                   ""Relation Name"": 
""landcover_grassy_small_scale_2_ply_pg"",
                   ""Schema"": ""osm"",
                   ""Alias"": ""landcover_grassy_small_scale_2_ply_pg"",
                   ""Startup Cost"": 0.29,
                   ""Total Cost"": 3.70,
                   ""Plan Rows"": 70,
                   ""Plan Width"": 14,
                   ""Disabled"": false,
                   ""Output"": 
[""landcover_grassy_small_scale_2_ply_pg.ctid"", 
""landcover_grassy_small_scale_2_ply_pg.objectid""],
                   ""Index Cond"": 
""((landcover_grassy_small_scale_2_ply_pg.page_number >= 28873) AND 
(landcover_grassy_small_scale_2_ply_pg.page_number < 29373))""
                 },
                 {
                   ""Node Type"": ""Index Scan"",
                   ""Parent Relationship"": ""Inner"",
                   ""Parallel Aware"": false,
                   ""Async Capable"": false,
                   ""Scan Direction"": ""Forward"",
                   ""Index Name"": 
""landcover_grassy_small_scale_2_ply_pkey"",
                   ""Relation Name"": 
""landcover_grassy_small_scale_2_ply"",
                   ""Schema"": ""osm"",
                   ""Alias"": ""t1"",
                   ""Startup Cost"": 0.42,
                   ""Total Cost"": 2.43,
                   ""Plan Rows"": 1,
                   ""Plan Width"": 814,
                   ""Disabled"": false,
                   ""Output"": [""t1.way"", ""t1.ctid"", ""t1.objectid""],
                   ""Index Cond"": ""(t1.objectid = 
landcover_grassy_small_scale_2_ply_pg.objectid)""
                 }
               ]
             },
             {
               ""Node Type"": ""Index Scan"",
               ""Parent Relationship"": ""Inner"",
               ""Parallel Aware"": false,
               ""Async Capable"": false,
               ""Scan Direction"": ""Forward"",
               ""Index Name"": ""landcover_grassy_small_scale_2_ply_pkey"",
               ""Relation Name"": ""landcover_grassy_small_scale_2_ply"",
               ""Schema"": ""osm"",
               ""Alias"": ""landcover_grassy_small_scale_2_ply"",
               ""Startup Cost"": 0.42,
               ""Total Cost"": 0.64,
               ""Plan Rows"": 1,
               ""Plan Width"": 814,
               ""Disabled"": false,
               ""Output"": [""landcover_grassy_small_scale_2_ply.way"", 
""landcover_grassy_small_scale_2_ply.ctid"", 
""landcover_grassy_small_scale_2_ply.objectid""],
               ""Index Cond"": 
""(landcover_grassy_small_scale_2_ply.objectid = t1.objectid)""
             }
           ]
         }
       ]
     }
   }
]"

Op 12-10-2025 om 10:24 schreef Marco Boeringa:
> Hi Andres,
>
> I have been doing a bit more investigation. As I explained before, the 
> problematic multi-threaded geoprocessing step is not some stand-alone 
> query that can be easily reduced to small easily portable reproducible 
> case with attached data. In fact, this geoprocessing step is part of a 
> large custom build Python geoprocessing workflow, with total code 
> probably in the 25k code lines range.
>
> However, based on the apparent poor query plan in PG18 / PostGIS 
> 3.6.0, I now reviewed the exact code once more. I noticed that just 
> before entering the multi-threaded code that emits the queries as seen 
> below, I am actually adding the primary key field 'objectid' as 
> "GENERATED BY DEFAULT AS IDENTITY" to the 
> 'osm.landcover_scrubs_small_scale_2_ply' table.
>
> Now I also noticed I did not run ANALYZE after that against the same 
> table. I have now added this to the code. Although it is still 
> preliminary, first tests seem to indicate that this resolves the 
> issue, and prevents the stalls or better said apparent hugely 
> inefficient query plan (remember: a < 10 sec process was turned into 
> multi-hour). I still need to do more thorough testing to be sure though.
>
> However, this raises a couple of question:
>
> - While ANALYZE is of course hugely important for proper statistics 
> and query planning, I have wondered if PostgreSQL shouldn't 
> automatically have updated the statistics for the addition of the 
> primary key with IDENTITY? It seems to me that based on the definition 
> of the primary key column and IDENTITY and table size, the actual 
> distribution of values is essentially already known even before any 
> sampling of ANALYZE to update statistics?
>
> - Am I right to assume that only the statistics on the objectid field 
> play any role in this issue? As you can see, the WHERE clause does not 
> involve any other fields than the two objectid fields of the main 
> table and the chunk table specifying the job. All other values 
> computed are just derived straight from the geometry column.
>
> - Were there any hints in the all the other data I supplied as to 
> where PG18's query planning without the updated statistics of the new 
> ANALYZE step added, is going wrong? And why this was never an issue in 
> <= PG17?
>
> I also did some preliminary test with the old PG17.6 / PostgGIS 3.6.0 
> cluster with the same Italy extract data. I still need to do more 
> thorough testing, both with and without the extra ANALYZE step, to 
> fully exclude that there isn't something related to the upgrade to 
> PostGIS 3.6.0, but first indications are as I already saw with 
> the PG17.6 / PostgGIS 3.5.3, that there are no issue with <= PG17 / 
> PostGIS combination as regards this apparent planner issue.
>
> Marco
>
> Op 8-10-2025 om 21:08 schreef Andres Freund:
>> In [1] you showed a query. Reformated that looks like this:
>>
>> UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1
>> SET area_geo = t2.area_geo,
>>      perim_geo = t2.perim_geo,
>>      compact_geo = CASE WHEN t2.area_geo > 0 THEN 
>> ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END,
>>      npoints_geo = t2.npoints_geo,
>>      comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN 
>> t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * 
>> pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END,
>>      convex_ratio_geo = CASE WHEN 
>> ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo / 
>> ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END
>> FROM (
>>      SELECT
>>          objectid,
>>          ST_Area(way::geography,true) AS area_geo,
>>          ST_Perimeter(way::geography,true) AS perim_geo,
>>          ST_NPoints(way) AS npoints_geo
>>      FROM osm.landcover_scrubs_small_scale_2_ply) AS t2
>> WHERE (t2.objectid = t1.objectid)
>>     AND t1.objectid IN (SELECT t3.objectid FROM 
>> mini_test.osm.osm_tmp_28128_ch5 AS t3)
>>
>>
>> Which certainly fits with two nested loops, although I don't think I 
>> can infer
>> which order it the joins are in.

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

David Rowley

Дата:

20 октября, 04:33:35

On Mon, 20 Oct 2025 at 09:37, Marco Boeringa <marco@boeringa.demon.nl> wrote:
> I am not sure why in this nested loop, two index scans on essentially
> the same key and table are executed. You can compare this bad plan with

The query contains a self-join, so that's why you're seeing the index
scanned twice in the query plan. If that's not needed, then you should
remove it from the query. If objectid is unique for this table then I
don't see why you need to join the table again to access the very same
row that you're updating. Just put those function calls in the
UPDATE's SET clause.

(We do have self join elimination in v18, but I see that it's a bit
overly strict in what it removes around looking for duplicate
relations when one of them is the query's result relation. Likely that
can be made better so it still looks for duplicate relations including
the result relation, but just never considers removing that one, only
the other duplicate(s).)

> *** ACTUAL BAD PLAN AS CAPTURED BY auto_explain ***:

>                    ->  Index Scan using
> landcover_grassy_small_scale_2_ply_pkey on
> osm.landcover_grassy_small_scale_2_ply t1  (cost=0.38..2.39 rows=1
> width=310) (actual time=5.176..462.613 rows=222396.00 loops=1)
>                          Output: t1.way, t1.ctid, t1.objectid
>                          Index Searches: 1
>                          Buffers: shared hit=66411

This table must have been VACUUMed or ANALYZEd either when it was
empty or when it contained 1 row. There's no predicate here, so that
estimate, aside from clamping to 1, comes directly from
pg_class.reltuples. A new table or truncated table would never
estimate 1 row as the planner always plays it safe when there are no
statistics generated yet and assumes 10 pages worth of rows. I can't
think of any specific reason why v18 behaves differently from v17 on
this... Maybe you've gotten unlikely with an autovacuum timing thing
and it's running at a slightly different time than in v17, perhaps
because it completed the autovacuum of another table slightly quicker
than v17 did. v18 can perform asynchronous reads for vacuum, maybe
that could mean more vacuum_cost_page_hits and less
vacuum_cost_page_misses when calculating vacuum_cost_limit.

Or, perhaps you're doing something like performing a manual VACUUM
after the tables have had all of their rows deleted?

David

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

20 октября, 12:34:19

Hi David,

To be honest I am not a SQL wizard like some of you here on the list, 
but part of the reason I setup the query as it currently is, is that the 
PostGIS function calls like ST_Area and ST_Perimeter can be very 
expensive depending on the complexity and size of the geometry, and I 
thus want to avoid at all cost to have to unnecessarily recalculate them 
multiple times in the same query. Maybe I am misunderstanding how 
PostgreSQL processes such queries, but I need the values multiple times 
to calculate some other parameters. So unless PostgreSQL is smart enough 
to cache the result and not execute ST_Area multiple times if it is used 
multiple times in the same query, I thought it wise to separate out the 
calculation and use the SELECT's results as input for the calculation of 
the other parameters. Maybe that isn't actually needed, but I think I 
remember seeing performance gains from the current setup when I 
initially wrote it this way, but I am not entirely sure at this point in 
time, it is a while ago. I know far to little about the internals of 
PostgreSQL and its exact query processing to say anything really 
sensible about this.

A couple of things that still strike me:

- As I wrote in a previous post, just before entering the Python 
multi-threaded processing that generates the jobs, I am adding a primary 
key column with unique objectids (GENERATED BY DEFAULT AS IDENTITY), 
that is subsequently used in the join. I realize this actually an 
enhancement request, but I am wondering why, if PostgreSQL already needs 
to dig through the entire table to add a new column with unique objectid 
values, it doesn't automatically update the statistics of the newly 
added column and write out the proper number of records of the table, 
that must be known by the end of the addition of the column, to the 
'pg_class.reltuples' table you refer to? It seems to me it would save 
the need of the ANALYZE here, and at least ensure that there is some 
useful statistics available on the vital primary key column and about 
the relation size?

- As I wrote in a previous post, the multi-threaded Python code creates 
multiple jobs (dozens). What I am seeing is that only part of the jobs 
is failing, and with some runs, none. E.g. I can run my tool 3 times 
without issues, than the fourth run some of the jobs get the bad plan (I 
only see this behavior in PG18, I never saw it in <= PG17). In all of 
these cases, the input data is *exactly* the same. The planning behavior 
therefor appears non-deterministic. As I understood it, PostgreSQL may 
indeed have non-deterministic behavior if it switches to the genetic 
algorithm on complex queries with many joins, but I have the feeling 
that my query doesn't quite satisfy that level of complexity? Or am I 
wrong here, and do you consider it likely it went through the genetic 
algorithm? It would actually be desirable if EXPLAIN (especially 
'auto_explain') output always showed whether the genetic algorithm was 
activated, so one could judge if non-deterministic behavior of the 
planner is expected.

- Lastly, did you notice the likely "good" plan I posted below the "bad" 
one. I generated that one by simply copy the visible query to pgAdmin 
and hitting EXPLAIN, so it's not the real thing as from 'auto_explain', 
but it does show some marked differences between the plan. Do you have 
any comments to add as to the differences?

Marco

Op 20-10-2025 om 03:33 schreef David Rowley:
> On Mon, 20 Oct 2025 at 09:37, Marco Boeringa <marco@boeringa.demon.nl> wrote:
>> I am not sure why in this nested loop, two index scans on essentially
>> the same key and table are executed. You can compare this bad plan with
> The query contains a self-join, so that's why you're seeing the index
> scanned twice in the query plan. If that's not needed, then you should
> remove it from the query. If objectid is unique for this table then I
> don't see why you need to join the table again to access the very same
> row that you're updating. Just put those function calls in the
> UPDATE's SET clause.
>
> (We do have self join elimination in v18, but I see that it's a bit
> overly strict in what it removes around looking for duplicate
> relations when one of them is the query's result relation. Likely that
> can be made better so it still looks for duplicate relations including
> the result relation, but just never considers removing that one, only
> the other duplicate(s).)
>
>> *** ACTUAL BAD PLAN AS CAPTURED BY auto_explain ***:
>>                     ->  Index Scan using
>> landcover_grassy_small_scale_2_ply_pkey on
>> osm.landcover_grassy_small_scale_2_ply t1  (cost=0.38..2.39 rows=1
>> width=310) (actual time=5.176..462.613 rows=222396.00 loops=1)
>>                           Output: t1.way, t1.ctid, t1.objectid
>>                           Index Searches: 1
>>                           Buffers: shared hit=66411
> This table must have been VACUUMed or ANALYZEd either when it was
> empty or when it contained 1 row. There's no predicate here, so that
> estimate, aside from clamping to 1, comes directly from
> pg_class.reltuples. A new table or truncated table would never
> estimate 1 row as the planner always plays it safe when there are no
> statistics generated yet and assumes 10 pages worth of rows. I can't
> think of any specific reason why v18 behaves differently from v17 on
> this... Maybe you've gotten unlikely with an autovacuum timing thing
> and it's running at a slightly different time than in v17, perhaps
> because it completed the autovacuum of another table slightly quicker
> than v17 did. v18 can perform asynchronous reads for vacuum, maybe
> that could mean more vacuum_cost_page_hits and less
> vacuum_cost_page_misses when calculating vacuum_cost_limit.
>
> Or, perhaps you're doing something like performing a manual VACUUM
> after the tables have had all of their rows deleted?
>
> David

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

20 октября, 14:22:45

Hi David,

I wrote in my last post somewhere:

"In all of these cases, the input data is *exactly* the same."

I think I need to correct this. Although the input base table 
'landcover_grassy_small_scale_2_ply' is the same and the number of 
records it has as well for each run, the actual contents of the database 
views that represent the multi-threading jobs (in the example below 
'osm_tmp_28232_ch3'), and the number of records they refer to, *can* 
vary in a non-deterministic way, anywhere from 0 to about 5000 records. 
Most jobs have several thousands, close to the limit of 5000, but some 
can have considerably less. Actually, it appears that most of jobs 
getting the bad plan have the lower number of records (dozens to a few 
hundreds, instead of a few thousands), at least that is what I saw with 
the last run, but I would need to do further testing to confirm.

That said, they are implemented as non-materialized ordinary database 
views, so don't have their own statistics and such (the underlying table 
the views refer to is a secondary table called ' 
landcover_grassy_small_scale_2_ply_pg' derived 
from 'landcover_grassy_small_scale_2_ply' that was visible in the 
'auto_explain' output)? So this shouldn't actually be that relevant?

Marco

See query below for reference:

UPDATE
     osm.landcover_grassy_small_scale_2_ply AS t1 SET
     area_geo = t2.area_geo,
     perim_geo = t2.perim_geo,
     compact_geo = CASE
         WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo, 2) / 
t2.area_geo) / (4 * pi()))
         ELSE 0
     END,
     npoints_geo = t2.npoints_geo,
     comp_npoints_geo = CASE
         WHEN t2.npoints_geo > 0 THEN (CASE
             WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo, 2) / 
t2.area_geo) / (4 * pi()))
             ELSE 0
         END / t2.npoints_geo)
         ELSE 0
     END,
     convex_ratio_geo = CASE
         WHEN ST_Area(ST_ConvexHull(way)::geography, TRUE) > 0 THEN 
(t2.area_geo / ST_Area(ST_ConvexHull(way)::geography, TRUE))
         ELSE 1
     END
FROM
     (
     SELECT
         objectid,
         ST_Area(way::geography, TRUE) AS area_geo,
         ST_Perimeter(way::geography, TRUE) AS perim_geo,
         ST_NPoints(way) AS npoints_geo
     FROM
         osm.landcover_grassy_small_scale_2_ply) AS t2
WHERE
     (t2.objectid = t1.objectid)
     AND t1.objectid IN (
     SELECT
         t3.objectid
     FROM
         mini_test.osm.osm_tmp_28232_ch3 AS t3)

Op 20-10-2025 om 11:34 schreef Marco Boeringa:
> Hi David,
>
> To be honest I am not a SQL wizard like some of you here on the list, 
> but part of the reason I setup the query as it currently is, is that 
> the PostGIS function calls like ST_Area and ST_Perimeter can be very 
> expensive depending on the complexity and size of the geometry, and I 
> thus want to avoid at all cost to have to unnecessarily recalculate 
> them multiple times in the same query. Maybe I am misunderstanding how 
> PostgreSQL processes such queries, but I need the values multiple 
> times to calculate some other parameters. So unless PostgreSQL is 
> smart enough to cache the result and not execute ST_Area multiple 
> times if it is used multiple times in the same query, I thought it 
> wise to separate out the calculation and use the SELECT's results as 
> input for the calculation of the other parameters. Maybe that isn't 
> actually needed, but I think I remember seeing performance gains from 
> the current setup when I initially wrote it this way, but I am not 
> entirely sure at this point in time, it is a while ago. I know far to 
> little about the internals of PostgreSQL and its exact query 
> processing to say anything really sensible about this.
>
> A couple of things that still strike me:
>
> - As I wrote in a previous post, just before entering the Python 
> multi-threaded processing that generates the jobs, I am adding a 
> primary key column with unique objectids (GENERATED BY DEFAULT AS 
> IDENTITY), that is subsequently used in the join. I realize this 
> actually an enhancement request, but I am wondering why, if PostgreSQL 
> already needs to dig through the entire table to add a new column with 
> unique objectid values, it doesn't automatically update the statistics 
> of the newly added column and write out the proper number of records 
> of the table, that must be known by the end of the addition of the 
> column, to the 'pg_class.reltuples' table you refer to? It seems to me 
> it would save the need of the ANALYZE here, and at least ensure that 
> there is some useful statistics available on the vital primary key 
> column and about the relation size?
>
> - As I wrote in a previous post, the multi-threaded Python code 
> creates multiple jobs (dozens). What I am seeing is that only part of 
> the jobs is failing, and with some runs, none. E.g. I can run my tool 
> 3 times without issues, than the fourth run some of the jobs get the 
> bad plan (I only see this behavior in PG18, I never saw it in <= 
> PG17). In all of these cases, the input data is *exactly* the same. 
> The planning behavior therefor appears non-deterministic. As I 
> understood it, PostgreSQL may indeed have non-deterministic behavior 
> if it switches to the genetic algorithm on complex queries with many 
> joins, but I have the feeling that my query doesn't quite satisfy that 
> level of complexity? Or am I wrong here, and do you consider it likely 
> it went through the genetic algorithm? It would actually be desirable 
> if EXPLAIN (especially 'auto_explain') output always showed whether 
> the genetic algorithm was activated, so one could judge if 
> non-deterministic behavior of the planner is expected.
>
> - Lastly, did you notice the likely "good" plan I posted below the 
> "bad" one. I generated that one by simply copy the visible query to 
> pgAdmin and hitting EXPLAIN, so it's not the real thing as from 
> 'auto_explain', but it does show some marked differences between the 
> plan. Do you have any comments to add as to the differences?
>
> Marco
>
> Op 20-10-2025 om 03:33 schreef David Rowley:
>> On Mon, 20 Oct 2025 at 09:37, Marco Boeringa 
>> <marco@boeringa.demon.nl> wrote:
>>> I am not sure why in this nested loop, two index scans on essentially
>>> the same key and table are executed. You can compare this bad plan with
>> The query contains a self-join, so that's why you're seeing the index
>> scanned twice in the query plan. If that's not needed, then you should
>> remove it from the query. If objectid is unique for this table then I
>> don't see why you need to join the table again to access the very same
>> row that you're updating. Just put those function calls in the
>> UPDATE's SET clause.
>>
>> (We do have self join elimination in v18, but I see that it's a bit
>> overly strict in what it removes around looking for duplicate
>> relations when one of them is the query's result relation. Likely that
>> can be made better so it still looks for duplicate relations including
>> the result relation, but just never considers removing that one, only
>> the other duplicate(s).)
>>
>>> *** ACTUAL BAD PLAN AS CAPTURED BY auto_explain ***:
>>>                     ->  Index Scan using
>>> landcover_grassy_small_scale_2_ply_pkey on
>>> osm.landcover_grassy_small_scale_2_ply t1  (cost=0.38..2.39 rows=1
>>> width=310) (actual time=5.176..462.613 rows=222396.00 loops=1)
>>>                           Output: t1.way, t1.ctid, t1.objectid
>>>                           Index Searches: 1
>>>                           Buffers: shared hit=66411
>> This table must have been VACUUMed or ANALYZEd either when it was
>> empty or when it contained 1 row. There's no predicate here, so that
>> estimate, aside from clamping to 1, comes directly from
>> pg_class.reltuples. A new table or truncated table would never
>> estimate 1 row as the planner always plays it safe when there are no
>> statistics generated yet and assumes 10 pages worth of rows. I can't
>> think of any specific reason why v18 behaves differently from v17 on
>> this... Maybe you've gotten unlikely with an autovacuum timing thing
>> and it's running at a slightly different time than in v17, perhaps
>> because it completed the autovacuum of another table slightly quicker
>> than v17 did. v18 can perform asynchronous reads for vacuum, maybe
>> that could mean more vacuum_cost_page_hits and less
>> vacuum_cost_page_misses when calculating vacuum_cost_limit.
>>
>> Or, perhaps you're doing something like performing a manual VACUUM
>> after the tables have had all of their rows deleted?
>>
>> David

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

20 октября, 17:42:32

Op 20-10-2025 om 11:34 schreef Marco Boeringa:
> Hi David,
>
> To be honest I am not a SQL wizard like some of you here on the list, 
> but part of the reason I setup the query as it currently is, is that 
> the PostGIS function calls like ST_Area and ST_Perimeter can be very 
> expensive depending on the complexity and size of the geometry, and I 
> thus want to avoid at all cost to have to unnecessarily recalculate 
> them multiple times in the same query. Maybe I am misunderstanding how 
> PostgreSQL processes such queries, but I need the values multiple 
> times to calculate some other parameters. So unless PostgreSQL is 
> smart enough to cache the result and not execute ST_Area multiple 
> times if it is used multiple times in the same query, I thought it 
> wise to separate out the calculation and use the SELECT's results as 
> input for the calculation of the other parameters. Maybe that isn't 
> actually needed, but I think I remember seeing performance gains from 
> the current setup when I initially wrote it this way, but I am not 
> entirely sure at this point in time, it is a while ago. I know far to 
> little about the internals of PostgreSQL and its exact query 
> processing to say anything really sensible about this.
>
Hi David,

Looking through the 'auto_explain' output of the bad query plan, I 
noticed the below included clause as generated by the planner. In the 
context of what I actually wrote above about the desire to not run 
expensive function calls like ST_Area multiple times, do I understand it 
correctly from the 'auto_explain' output excerpt that PostgreSQL, by 
removing the self join, actually *does* run the ST_Area and ST_Perimeter 
multiple times? Is this how I need to interpret this part of the 
'auto_explain' output? If there is no caching of the function result, 
this could be expensive as well.

Marco


*** 'auto_explain' output excerpt ***:

st_area((landcover_grassy_small_scale_2_ply.way)::geography, TRUE),
st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, TRUE),
CASE
     WHEN (st_area((landcover_grassy_small_scale_2_ply.way)::geography, 
TRUE) > '0'::double PRECISION) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
TRUE), '2'::double PRECISION) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, TRUE)) / 
'12.566370614359172'::double PRECISION)
     ELSE '0'::double PRECISION
END,
st_npoints(landcover_grassy_small_scale_2_ply.way),
CASE
     WHEN (st_npoints(landcover_grassy_small_scale_2_ply.way) > 0) THEN 
(CASE
         WHEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, TRUE) > 
'0'::double PRECISION) THEN 
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, 
TRUE), '2'::double PRECISION) / 
st_area((landcover_grassy_small_scale_2_ply.way)::geography, TRUE)) / 
'12.566370614359172'::double PRECISION)
         ELSE '0'::double PRECISION
     END / (st_npoints(landcover_grassy_small_scale_2_ply.way))::double 
PRECISION)
     ELSE '0'::double PRECISION
END,
CASE
     WHEN (st_area((st_convexhull(t1.way))::geography, TRUE) > 
'0'::double PRECISION) THEN 
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, TRUE) / 
st_area((st_convexhull(t1.way))::geography, TRUE))
     ELSE '1'::double PRECISION
END,
t1.ctid,
landcover_grassy_small_scale_2_ply.ctid,
landcover_grassy_small_scale_2_ply_pg.ctid

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

David Rowley

Дата:

20 октября, 22:09:30

On Tue, 21 Oct 2025 at 03:42, Marco Boeringa <marco@boeringa.demon.nl> wrote:
> Looking through the 'auto_explain' output of the bad query plan, I
> noticed the below included clause as generated by the planner. In the
> context of what I actually wrote above about the desire to not run
> expensive function calls like ST_Area multiple times, do I understand it
> correctly from the 'auto_explain' output excerpt that PostgreSQL, by
> removing the self join, actually *does* run the ST_Area and ST_Perimeter
> multiple times? Is this how I need to interpret this part of the
> 'auto_explain' output? If there is no caching of the function result,
> this could be expensive as well.

So you basically have something like:

UPDATE t t1 SET col1 = t2.a1, col2 = t2.a2
FROM (SELECT unique_col, f1(col3) as a1, f2(col4) as a2 FROM t) AS t2
WHERE t1.unique_col = t2.unique_col
   AND <other filter clauses>

Assuming here that unique_col has a UNIQUE or PK constraint. The self
join basically amounts to wasted effort. There is no function result
caching anywhere. Looking at the EXPLAIN output, it seems those
functions are executed once per row that's output from the join and
just below the "Update" node and they're executed 8 times. That won't
change if you get rid of the self join.

David

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

20 октября, 23:02:35

Hi David,

Thanks for the explanation. Back when I developed this, I am pretty sure 
I tried to find out the answer to this, but was left somewhat confused 
as to the results of subqueries in relation to this aspect of avoiding 
doing unnecessary work related to costly functions.

E.g. this very old StackOverflow post of Erwin Brandstetter, who's name 
I have seen come up in that discussion forum a lot with what suggests 
based on his answers a pretty thorough knowledge of PostgreSQL and 
databases in general, at least suggested a subquery could work:

https://stackoverflow.com/questions/20718499/does-postgresql-cache-function-calls#comment31095072_20718499

And in relation to that post and thread, and the suggestion of WITH / 
CTE clause, would that be a suitable substitute and avoid the recalling 
of the functions? I assume with the MATERIALIZED option, it should, that 
is what the MATERIALIZED option is for, isn't it?

Marco

Op 20-10-2025 om 21:09 schreef David Rowley:
> On Tue, 21 Oct 2025 at 03:42, Marco Boeringa <marco@boeringa.demon.nl> wrote:
>> Looking through the 'auto_explain' output of the bad query plan, I
>> noticed the below included clause as generated by the planner. In the
>> context of what I actually wrote above about the desire to not run
>> expensive function calls like ST_Area multiple times, do I understand it
>> correctly from the 'auto_explain' output excerpt that PostgreSQL, by
>> removing the self join, actually *does* run the ST_Area and ST_Perimeter
>> multiple times? Is this how I need to interpret this part of the
>> 'auto_explain' output? If there is no caching of the function result,
>> this could be expensive as well.
> So you basically have something like:
>
> UPDATE t t1 SET col1 = t2.a1, col2 = t2.a2
> FROM (SELECT unique_col, f1(col3) as a1, f2(col4) as a2 FROM t) AS t2
> WHERE t1.unique_col = t2.unique_col
>     AND <other filter clauses>
>
> Assuming here that unique_col has a UNIQUE or PK constraint. The self
> join basically amounts to wasted effort. There is no function result
> caching anywhere. Looking at the EXPLAIN output, it seems those
> functions are executed once per row that's output from the join and
> just below the "Update" node and they're executed 8 times. That won't
> change if you get rid of the self join.
>
> David

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

David Rowley

Дата:

21 октября, 00:06:51

On Tue, 21 Oct 2025 at 09:02, Marco Boeringa <marco@boeringa.demon.nl> wrote:
> https://stackoverflow.com/questions/20718499/does-postgresql-cache-function-calls#comment31095072_20718499
>
> And in relation to that post and thread, and the suggestion of WITH /
> CTE clause, would that be a suitable substitute and avoid the recalling
> of the functions? I assume with the MATERIALIZED option, it should, that
> is what the MATERIALIZED option is for, isn't it?

That article states "this function is invoked multiple times with the
same parameter", so doesn't sound very applicable for your case since
your function parameter changes with every row.

I don't see how WITH MATERIALIZED could help you here as that's not a
parameterized cache. I suppose we could adjust the planner to consider
something similar to Memoize for caching results for expensive stable
functions. We'd have to put a lot of trust into n_distinct estimates
and the function(s) COST setting, however.

I suspect you're trying to optimise for something that's not an actual problem.

David

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

21 октября, 00:57:54

Hi David,

PostGIS functions can be very expensive, especially in the context of 
the fact that Polygon and Line geometries can vary vastly in size in 
terms of the number of vertices that constitute them, which has a 
profound impact on some PostGIS function calls, merely due to the 
enormous complexity of some shapes.

But of course you're right that any change will need some thorough 
testing before assuming it will actually benefit the queries.

Marco

Op 20-10-2025 om 23:06 schreef David Rowley:
> On Tue, 21 Oct 2025 at 09:02, Marco Boeringa <marco@boeringa.demon.nl> wrote:
>> https://stackoverflow.com/questions/20718499/does-postgresql-cache-function-calls#comment31095072_20718499
>>
>> And in relation to that post and thread, and the suggestion of WITH /
>> CTE clause, would that be a suitable substitute and avoid the recalling
>> of the functions? I assume with the MATERIALIZED option, it should, that
>> is what the MATERIALIZED option is for, isn't it?
> That article states "this function is invoked multiple times with the
> same parameter", so doesn't sound very applicable for your case since
> your function parameter changes with every row.
>
> I don't see how WITH MATERIALIZED could help you here as that's not a
> parameterized cache. I suppose we could adjust the planner to consider
> something similar to Memoize for caching results for expensive stable
> functions. We'd have to put a lot of trust into n_distinct estimates
> and the function(s) COST setting, however.
>
> I suspect you're trying to optimise for something that's not an actual problem.
>
> David

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

David Rowley

Дата:

21 октября, 01:16:13

On Tue, 21 Oct 2025 at 10:57, Marco Boeringa <marco@boeringa.demon.nl> wrote:
> PostGIS functions can be very expensive, especially in the context of
> the fact that Polygon and Line geometries can vary vastly in size in
> terms of the number of vertices that constitute them, which has a
> profound impact on some PostGIS function calls, merely due to the
> enormous complexity of some shapes.

How expensive the function call is is irrelevant as there simply is no
function result caching going on and there's nothing wired up in any
released version of PostgreSQL which gives you this with the query
you've written.

You still seem to be under the illusion that the self-join is giving
you some sort of caching. If you remain content in not trusting me on
that, by all means, create a plpgsql function with a RAISE NOTICE and
try it out for yourself.

> But of course you're right that any change will need some thorough
> testing before assuming it will actually benefit the queries.

I don't recall talking about testing... (It may help if you quote
things you're replying to. This conversation will be quite hard to
follow with your top post replies.)

This whole conversation has drifted well off what the original report
was about, so I think it's better if you need more help on this to use
pgsql-performance@lists.postgresql.org

David

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

Marco Boeringa

Дата:

21 октября, 01:46:40

Hi David,

I totally understood your remarks about no caching going on, and don't 
intend to keep the query as-is with self join if I decide to make and 
benchmark any changes. Any changes will certainly either involve a WITH 
/ CTE materialized, or first writing out some temporary table with the 
results of the expensive function calls to be able to re-use them from 
there.

Whether any of that will result in any measurable change and benefits, 
will require thorough testing.

Yes, you are right this has drifted. I don't need any more help and this 
issue can essentially be closed, I just wanted to report the fact that 
an application and workflow that has run for years without issues, was 
heavily affected by the (planner) changes for PG18. I have submitted all 
data I can on this, and the culprit is now clear, also thanks to 
Andres's useful (debugging) suggestions. It is also clear what the 
solution in my case is, the extra ANALYZE run I added to my code.

So, as said, I don't need more help, although answers to some of 
unanswered questions regarding specific observations I made (e.g. PG18 
versus PG17 and lower) would still be welcome, e.g. to repeat from 
previous posts.:

- As I wrote in a previous post, just before entering the Python 
multi-threaded processing that generates the jobs, I am adding a primary 
key column with unique objectids (GENERATED BY DEFAULT AS IDENTITY), 
that is subsequently used in the join. I realize this actually an 
enhancement request, but I am wondering why, if PostgreSQL already needs 
to dig through the entire table to add a new column with unique objectid 
values, it doesn't automatically update the statistics of the newly 
added column and write out the proper number of records of the table, 
that must be known by the end of the addition of the column, to the 
'pg_class.reltuples' table you refer to? It seems to me it would save 
the need of the ANALYZE here, and at least ensure that there is some 
useful statistics available on the vital primary key column and about 
the relation size?

- As I wrote in a previous post, the multi-threaded Python code creates 
multiple jobs (dozens). What I am seeing is that only part of the jobs 
is failing, and with some runs, none. E.g. I can run my tool 3 times 
without issues, than the fourth run some of the jobs get the bad plan (I 
only see this behavior in PG18, I never saw it in <= PG17). In all of 
these cases, the input data is *exactly* the same. The planning behavior 
therefor appears non-deterministic. As I understood it, PostgreSQL may 
indeed have non-deterministic behavior if it switches to the genetic 
algorithm on complex queries with many joins, but I have the feeling 
that my query doesn't quite satisfy that level of complexity? Or am I 
wrong here, and do you consider it likely it went through the genetic 
algorithm? It would actually be desirable if EXPLAIN (especially 
'auto_explain') output always showed whether the genetic algorithm was 
activated, so one could judge if non-deterministic behavior of the 
planner is expected.

But having no answers to these questions is not a major issue, it is 
just my curiosity.

Thanks for your answers and contributions so far,

Marco

Op 21-10-2025 om 00:16 schreef David Rowley:
> On Tue, 21 Oct 2025 at 10:57, Marco Boeringa <marco@boeringa.demon.nl> wrote:
>> PostGIS functions can be very expensive, especially in the context of
>> the fact that Polygon and Line geometries can vary vastly in size in
>> terms of the number of vertices that constitute them, which has a
>> profound impact on some PostGIS function calls, merely due to the
>> enormous complexity of some shapes.
> How expensive the function call is is irrelevant as there simply is no
> function result caching going on and there's nothing wired up in any
> released version of PostgreSQL which gives you this with the query
> you've written.
>
> You still seem to be under the illusion that the self-join is giving
> you some sort of caching. If you remain content in not trusting me on
> that, by all means, create a plpgsql function with a RAISE NOTICE and
> try it out for yourself.
>
>> But of course you're right that any change will need some thorough
>> testing before assuming it will actually benefit the queries.
> I don't recall talking about testing... (It may help if you quote
> things you're replying to. This conversation will be quite hard to
> follow with your top post replies.)
>
> This whole conversation has drifted well off what the original report
> was about, so I think it's better if you need more help on this to use
> pgsql-performance@lists.postgresql.org
>
> David

Re: Potential "AIO / io workers" inter-worker locking issue in PG18?

От

David Rowley

Дата:

21 октября, 03:44:52

On Tue, 21 Oct 2025 at 11:46, Marco Boeringa <marco@boeringa.demon.nl> wrote:
> I totally understood your remarks about no caching going on, and don't
> intend to keep the query as-is with self join if I decide to make and
> benchmark any changes. Any changes will certainly either involve a WITH
> / CTE materialized, or first writing out some temporary table with the
> results of the expensive function calls to be able to re-use them from
> there.

To get what you want, you'd need to do something like add OFFSET 0 to
the subquery. That would prevent the planner from pulling it up into
the main query. However, if you do that, it'll mean running those
PostGIS functions on every row in the
landcover_grassy_small_scale_2_ply table. You could get around that by
either moving or duplicating the "t1.objectid IN (SELECT t3.objectid
FROM mini_test.osm.osm_tmp_28232_ch3 AS t3)" and putting it in the
WHERE clause of the subquery.

David

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Potential "AIO / io workers" inter-worker locking issue in PG18?