Обсуждение: Potential "AIO / io workers" inter-worker locking issue in PG18?
Hi, I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as Windows 10 Hyper-V virtual machine. The machine is a dedicated refurbished HP Z840 local workstation with 2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with the Ubuntu guest having 400 GB RAM available. On this machine, which is dedicated to just one custom written geoprocessing workflow involving OpenStreetMap data, I have successfully processed up to global OpenStreetMap Facebook Daylight distribution data, with up to > 2.4 B record Polygon table for all Facebook Daylight buildings. So this has proven a very capable system. However, after upgrading to PG18 and the switch to the "io_method = worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an issue where it appears there may be a major issue with io workers potentially getting into some sort of locking conflict, that takes hours to resolve. The custom written geoprocessing workflow uses Python multi-threading based on the Python 'concurrent.futures' framework in combination with either pyodbc or psycopg2 as database connector to implement a powerful parallel processing solution to speed up some of the computationally intensive tasks (which use UPDATEs), which I generally use with up to 44 threads to fully saturate the dual CPU 44 core system. The custom code creates a pool of jobs to process for the threads, with the code being designed to minimize inter-thread locking issues by taking into account PostgreSQL page locality (although the actual records to process are not assigned by pages but by unique IDs in the tables). Basically, the code is designed such that different threads never attempt to access the same database pages, as each thread gets it own unique pages assigned, thus avoiding inter-thread locking conflicts. This has worked really well in the past, with system usage maximized over all cores and significantly speeding up processing. Jobs are implemented as database VIEWs, that point to the records to process via the unique ID of each. These views must of course be read by each thread, which is probably where the PG18 io workers kick-in. This has worked really well in previous versions of PostgreSQL (tested up to PG17). However, in PG18, during the multi-threaded processing, I see some of my submitted jobs that in this case were run against a small OpenStreetMap Italy extract of Geofabrik, all of a sudden take > 1 hour to finish (up to 6 hours for this small extract), even though similar jobs from the same processing step, finish in less than 10 seconds (and the other jobs should as well). This seems to happen kind of "random". Many multi-threading tasks before and after the affected processing steps, do finish normally. When this happens, I observe the following things: - High processor activity, even though the jobs that should finish in seconds, take hours, all the while showing the high core usage. - PgAdmin shows all sessions created by the Python threads as 'active', with *no* wait events attached. - The pg_locks table does not show locking conflicts, all locks are granted. I did notice however, that the relation / table locks were not "fastpath" locks, but ordinary ones. All other locks taken, e.g. on indexes related to the same table, were fastpath. I don't know if this has any relevance though, as from what I read about the difference, this shouldn't cause such a big difference, not seconds to hours. - Please note that the processing DOES eventually proceed, so it is not an infinite dead-lock or something where I need to kill my Python code. It just takes hours to resolve. - Switching to "io_method = sync" seems to resolve this issue, and I do not observe some jobs of the same batch getting "stuck". This is the behavior I was used to seeing in <=PG17. I am not to familiar with all the internals of PostgreSQL and the new AIO framework and its "io workers". However, it seems there may be some sort of locking issue between io workers that can occasionally happen in PG18 with "io_method = worker"? Is there anyone else observing similar issues in high multi-threaded processing worklflows? Marco
I am not a PostgreSQL contributor and have no clue what the actual technical details are in the new AIO code, but reading your report the following questions came to my mind: * Does the failure also happen with io_mode=io_uring? If no, it is a proof that it is really bound to io_mode=worker, not to AIO in general. * Does the failure also happen with io_mode=worker when your Python code uses only 22 cores, and PostgreSQL uses only 22 workers (so Python and PostgreSQL do not share CPU cores)? If no, it might indicate that the problem could be solved by increasing the execution policy in favor of PostgreSQL to give a hint to the scheduler that a CPU core should be given to PostgreSQL FIRST as Python most likely is waiting on it to continue, but PostgreSQL could not continue because the schedule gave all the cores to Python... (classical deadlock; eventually resolves once enough CPU cores are free to eventually finish the starving thread). HTH -Markus Am 05.10.2025 um 10:55 schrieb Marco Boeringa: > Hi, > > I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as > Windows 10 Hyper-V virtual machine. > > The machine is a dedicated refurbished HP Z840 local workstation with > 2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with > the Ubuntu guest having 400 GB RAM available. > > On this machine, which is dedicated to just one custom written > geoprocessing workflow involving OpenStreetMap data, I have > successfully processed up to global OpenStreetMap Facebook Daylight > distribution data, with up to > 2.4 B record Polygon table for all > Facebook Daylight buildings. So this has proven a very capable system. > > However, after upgrading to PG18 and the switch to the "io_method = > worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an > issue where it appears there may be a major issue with io workers > potentially getting into some sort of locking conflict, that takes > hours to resolve. > > The custom written geoprocessing workflow uses Python multi-threading > based on the Python 'concurrent.futures' framework in combination with > either pyodbc or psycopg2 as database connector to implement a > powerful parallel processing solution to speed up some of the > computationally intensive tasks (which use UPDATEs), which I generally > use with up to 44 threads to fully saturate the dual CPU 44 core > system. The custom code creates a pool of jobs to process for the > threads, with the code being designed to minimize inter-thread locking > issues by taking into account PostgreSQL page locality (although the > actual records to process are not assigned by pages but by unique IDs > in the tables). Basically, the code is designed such that different > threads never attempt to access the same database pages, as each > thread gets it own unique pages assigned, thus avoiding inter-thread > locking conflicts. This has worked really well in the past, with > system usage maximized over all cores and significantly speeding up > processing. Jobs are implemented as database VIEWs, that point to the > records to process via the unique ID of each. These views must of > course be read by each thread, which is probably where the PG18 io > workers kick-in. > > This has worked really well in previous versions of PostgreSQL (tested > up to PG17). However, in PG18, during the multi-threaded processing, I > see some of my submitted jobs that in this case were run against a > small OpenStreetMap Italy extract of Geofabrik, all of a sudden take > > 1 hour to finish (up to 6 hours for this small extract), even though > similar jobs from the same processing step, finish in less than 10 > seconds (and the other jobs should as well). This seems to happen kind > of "random". Many multi-threading tasks before and after the affected > processing steps, do finish normally. > > When this happens, I observe the following things: > > - High processor activity, even though the jobs that should finish in > seconds, take hours, all the while showing the high core usage. > > - PgAdmin shows all sessions created by the Python threads as > 'active', with *no* wait events attached. > > - The pg_locks table does not show locking conflicts, all locks are > granted. I did notice however, that the relation / table locks were > not "fastpath" locks, but ordinary ones. All other locks taken, e.g. > on indexes related to the same table, were fastpath. I don't know if > this has any relevance though, as from what I read about the > difference, this shouldn't cause such a big difference, not seconds to > hours. > > - Please note that the processing DOES eventually proceed, so it is > not an infinite dead-lock or something where I need to kill my Python > code. It just takes hours to resolve. > > - Switching to "io_method = sync" seems to resolve this issue, and I > do not observe some jobs of the same batch getting "stuck". This is > the behavior I was used to seeing in <=PG17. > > I am not to familiar with all the internals of PostgreSQL and the new > AIO framework and its "io workers". However, it seems there may be > some sort of locking issue between io workers that can occasionally > happen in PG18 with "io_method = worker"? Is there anyone else > observing similar issues in high multi-threaded processing worklflows? > > Marco > > >
Hi,
I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as
Windows 10 Hyper-V virtual machine.
The machine is a dedicated refurbished HP Z840 local workstation with
2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with
the Ubuntu guest having 400 GB RAM available.
On this machine, which is dedicated to just one custom written
geoprocessing workflow involving OpenStreetMap data, I have successfully
processed up to global OpenStreetMap Facebook Daylight distribution
data, with up to > 2.4 B record Polygon table for all Facebook Daylight
buildings. So this has proven a very capable system.
However, after upgrading to PG18 and the switch to the "io_method =
worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an
issue where it appears there may be a major issue with io workers
potentially getting into some sort of locking conflict, that takes hours
to resolve.
The custom written geoprocessing workflow uses Python multi-threading
based on the Python 'concurrent.futures' framework in combination with
either pyodbc or psycopg2 as database connector to implement a powerful
parallel processing solution to speed up some of the computationally
intensive tasks (which use UPDATEs), which I generally use with up to 44
threads to fully saturate the dual CPU 44 core system. The custom code
creates a pool of jobs to process for the threads, with the code being
designed to minimize inter-thread locking issues by taking into account
PostgreSQL page locality (although the actual records to process are not
assigned by pages but by unique IDs in the tables). Basically, the code
is designed such that different threads never attempt to access the same
database pages, as each thread gets it own unique pages assigned, thus
avoiding inter-thread locking conflicts. This has worked really well in
the past, with system usage maximized over all cores and significantly
speeding up processing. Jobs are implemented as database VIEWs, that
point to the records to process via the unique ID of each. These views
must of course be read by each thread, which is probably where the PG18
io workers kick-in.
This has worked really well in previous versions of PostgreSQL (tested
up to PG17). However, in PG18, during the multi-threaded processing, I
see some of my submitted jobs that in this case were run against a small
OpenStreetMap Italy extract of Geofabrik, all of a sudden take > 1 hour
to finish (up to 6 hours for this small extract), even though similar
jobs from the same processing step, finish in less than 10 seconds (and
the other jobs should as well). This seems to happen kind of "random".
Many multi-threading tasks before and after the affected processing
steps, do finish normally.
When this happens, I observe the following things:
- High processor activity, even though the jobs that should finish in
seconds, take hours, all the while showing the high core usage.
- PgAdmin shows all sessions created by the Python threads as 'active',
with *no* wait events attached.
- The pg_locks table does not show locking conflicts, all locks are
granted. I did notice however, that the relation / table locks were not
"fastpath" locks, but ordinary ones. All other locks taken, e.g. on
indexes related to the same table, were fastpath. I don't know if this
has any relevance though, as from what I read about the difference, this
shouldn't cause such a big difference, not seconds to hours.
- Please note that the processing DOES eventually proceed, so it is not
an infinite dead-lock or something where I need to kill my Python code.
It just takes hours to resolve.
- Switching to "io_method = sync" seems to resolve this issue, and I do
not observe some jobs of the same batch getting "stuck". This is the
behavior I was used to seeing in <=PG17.
I am not to familiar with all the internals of PostgreSQL and the new
AIO framework and its "io workers". However, it seems there may be some
sort of locking issue between io workers that can occasionally happen in
PG18 with "io_method = worker"? Is there anyone else observing similar
issues in high multi-threaded processing worklflows?
Hi Thom,
I now also witnessed this issue with "io_method = sync", so it may not have relation with the number of workers set. I initially thought it did not occur with 'sync', as two runs successfully completed without delays, however, the last did show the issue. Unfortunately, this is a very complex multi-stage geoprocessing workflow, that cannot easily be cut down to a simple one SQL statement reproducible case. And for the specific Italy extract it takes about 7 hours to complete if the run is successful and without the delays observed, so each test run costs considerable time if I adjust anything.
There is also a PostGIS upgrade in the mix (3.5.2 to 3.6.0) that may or may not be involved, as that version of PostGIS is the minimum for PG18. I see a 3.6.1 is already planned and will need to re-test with that version once released. I definitely do use PostGIS functions at the stage the processing gets heavily delayed.
As to the question about the pg_aios view I wasn't aware off: it appears to be empty at that point, but I will need to confirm that observation, as with my last run, the moment I looked at the view, some of the very delayed multi-threaded jobs (> 6.5 hours instead of 10 seconds!) started slowly returning one by one, although some were still in wait / stuck for some time before all had returned, so the pg_aios view being empty probably is still representative of the stuck situation.
Also note that I also adjust the storage parameters of the tables involved to force a more aggressive vacuuming to avoid transaction ID wraparound (which shouldn't be an issue anyway with the small test Italy extract). This has all proven pretty reliable in the past and with previous PostgreSQL / PostGIS releases, up to the Facebook Daylight multi-billion record tables as noted in the previous post. There also is no PostgreSQL partitioning involved in any of this, these are ordinary tables.
Marco
On Sun, 5 Oct 2025, 10:52 Marco Boeringa, <marco@boeringa.demon.nl> wrote:Hi,
I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as
Windows 10 Hyper-V virtual machine.
The machine is a dedicated refurbished HP Z840 local workstation with
2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with
the Ubuntu guest having 400 GB RAM available.
On this machine, which is dedicated to just one custom written
geoprocessing workflow involving OpenStreetMap data, I have successfully
processed up to global OpenStreetMap Facebook Daylight distribution
data, with up to > 2.4 B record Polygon table for all Facebook Daylight
buildings. So this has proven a very capable system.
However, after upgrading to PG18 and the switch to the "io_method =
worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an
issue where it appears there may be a major issue with io workers
potentially getting into some sort of locking conflict, that takes hours
to resolve.
The custom written geoprocessing workflow uses Python multi-threading
based on the Python 'concurrent.futures' framework in combination with
either pyodbc or psycopg2 as database connector to implement a powerful
parallel processing solution to speed up some of the computationally
intensive tasks (which use UPDATEs), which I generally use with up to 44
threads to fully saturate the dual CPU 44 core system. The custom code
creates a pool of jobs to process for the threads, with the code being
designed to minimize inter-thread locking issues by taking into account
PostgreSQL page locality (although the actual records to process are not
assigned by pages but by unique IDs in the tables). Basically, the code
is designed such that different threads never attempt to access the same
database pages, as each thread gets it own unique pages assigned, thus
avoiding inter-thread locking conflicts. This has worked really well in
the past, with system usage maximized over all cores and significantly
speeding up processing. Jobs are implemented as database VIEWs, that
point to the records to process via the unique ID of each. These views
must of course be read by each thread, which is probably where the PG18
io workers kick-in.
This has worked really well in previous versions of PostgreSQL (tested
up to PG17). However, in PG18, during the multi-threaded processing, I
see some of my submitted jobs that in this case were run against a small
OpenStreetMap Italy extract of Geofabrik, all of a sudden take > 1 hour
to finish (up to 6 hours for this small extract), even though similar
jobs from the same processing step, finish in less than 10 seconds (and
the other jobs should as well). This seems to happen kind of "random".
Many multi-threading tasks before and after the affected processing
steps, do finish normally.
When this happens, I observe the following things:
- High processor activity, even though the jobs that should finish in
seconds, take hours, all the while showing the high core usage.
- PgAdmin shows all sessions created by the Python threads as 'active',
with *no* wait events attached.
- The pg_locks table does not show locking conflicts, all locks are
granted. I did notice however, that the relation / table locks were not
"fastpath" locks, but ordinary ones. All other locks taken, e.g. on
indexes related to the same table, were fastpath. I don't know if this
has any relevance though, as from what I read about the difference, this
shouldn't cause such a big difference, not seconds to hours.
- Please note that the processing DOES eventually proceed, so it is not
an infinite dead-lock or something where I need to kill my Python code.
It just takes hours to resolve.
- Switching to "io_method = sync" seems to resolve this issue, and I do
not observe some jobs of the same batch getting "stuck". This is the
behavior I was used to seeing in <=PG17.
I am not to familiar with all the internals of PostgreSQL and the new
AIO framework and its "io workers". However, it seems there may be some
sort of locking issue between io workers that can occasionally happen in
PG18 with "io_method = worker"? Is there anyone else observing similar
issues in high multi-threaded processing worklflows?So, to confirm, you get the issue with as little as 3 io_workers?Also, what is pg_aios telling you during this time?Thom
Hi Thom,
I realized that my observation of the pg_aios view being empty was likely with the "io_method = sync" option set, which I guess doesn't use or fill the pg_aios view? Can you confirm the pg_aios view is unused with "io_method = sync", this aspect is not documented in the PostgreSQL help? Anyway, I will need to re-test with 'worker' set.
I do see Tomas Vondra mentioning that even the 'sync' option in PG18 still goes "through the AIO infrastructure", but what that exactly means, also in relation to the pg_aios view, IDK:
https://vondra.me/posts/tuning-aio-in-postgresql-18/
Marco
Hi Thom,
I now also witnessed this issue with "io_method = sync", so it may not have relation with the number of workers set. I initially thought it did not occur with 'sync', as two runs successfully completed without delays, however, the last did show the issue. Unfortunately, this is a very complex multi-stage geoprocessing workflow, that cannot easily be cut down to a simple one SQL statement reproducible case. And for the specific Italy extract it takes about 7 hours to complete if the run is successful and without the delays observed, so each test run costs considerable time if I adjust anything.
There is also a PostGIS upgrade in the mix (3.5.2 to 3.6.0) that may or may not be involved, as that version of PostGIS is the minimum for PG18. I see a 3.6.1 is already planned and will need to re-test with that version once released. I definitely do use PostGIS functions at the stage the processing gets heavily delayed.
As to the question about the pg_aios view I wasn't aware off: it appears to be empty at that point, but I will need to confirm that observation, as with my last run, the moment I looked at the view, some of the very delayed multi-threaded jobs (> 6.5 hours instead of 10 seconds!) started slowly returning one by one, although some were still in wait / stuck for some time before all had returned, so the pg_aios view being empty probably is still representative of the stuck situation.
Also note that I also adjust the storage parameters of the tables involved to force a more aggressive vacuuming to avoid transaction ID wraparound (which shouldn't be an issue anyway with the small test Italy extract). This has all proven pretty reliable in the past and with previous PostgreSQL / PostGIS releases, up to the Facebook Daylight multi-billion record tables as noted in the previous post. There also is no PostgreSQL partitioning involved in any of this, these are ordinary tables.
Marco
Op 5-10-2025 om 12:51 schreef Thom Brown:On Sun, 5 Oct 2025, 10:52 Marco Boeringa, <marco@boeringa.demon.nl> wrote:Hi,
I currently run PG18 + PostGIS 3.6.0 on an Ubuntu 24.04 VM guest as
Windows 10 Hyper-V virtual machine.
The machine is a dedicated refurbished HP Z840 local workstation with
2x22 cores (E5-2699 v4) with 512 GB RAM and a 10 TB NVMe raid-0, with
the Ubuntu guest having 400 GB RAM available.
On this machine, which is dedicated to just one custom written
geoprocessing workflow involving OpenStreetMap data, I have successfully
processed up to global OpenStreetMap Facebook Daylight distribution
data, with up to > 2.4 B record Polygon table for all Facebook Daylight
buildings. So this has proven a very capable system.
However, after upgrading to PG18 and the switch to the "io_method =
worker" setting (tested with 3, 5, 16 and 22 workers), I am seeing an
issue where it appears there may be a major issue with io workers
potentially getting into some sort of locking conflict, that takes hours
to resolve.
The custom written geoprocessing workflow uses Python multi-threading
based on the Python 'concurrent.futures' framework in combination with
either pyodbc or psycopg2 as database connector to implement a powerful
parallel processing solution to speed up some of the computationally
intensive tasks (which use UPDATEs), which I generally use with up to 44
threads to fully saturate the dual CPU 44 core system. The custom code
creates a pool of jobs to process for the threads, with the code being
designed to minimize inter-thread locking issues by taking into account
PostgreSQL page locality (although the actual records to process are not
assigned by pages but by unique IDs in the tables). Basically, the code
is designed such that different threads never attempt to access the same
database pages, as each thread gets it own unique pages assigned, thus
avoiding inter-thread locking conflicts. This has worked really well in
the past, with system usage maximized over all cores and significantly
speeding up processing. Jobs are implemented as database VIEWs, that
point to the records to process via the unique ID of each. These views
must of course be read by each thread, which is probably where the PG18
io workers kick-in.
This has worked really well in previous versions of PostgreSQL (tested
up to PG17). However, in PG18, during the multi-threaded processing, I
see some of my submitted jobs that in this case were run against a small
OpenStreetMap Italy extract of Geofabrik, all of a sudden take > 1 hour
to finish (up to 6 hours for this small extract), even though similar
jobs from the same processing step, finish in less than 10 seconds (and
the other jobs should as well). This seems to happen kind of "random".
Many multi-threading tasks before and after the affected processing
steps, do finish normally.
When this happens, I observe the following things:
- High processor activity, even though the jobs that should finish in
seconds, take hours, all the while showing the high core usage.
- PgAdmin shows all sessions created by the Python threads as 'active',
with *no* wait events attached.
- The pg_locks table does not show locking conflicts, all locks are
granted. I did notice however, that the relation / table locks were not
"fastpath" locks, but ordinary ones. All other locks taken, e.g. on
indexes related to the same table, were fastpath. I don't know if this
has any relevance though, as from what I read about the difference, this
shouldn't cause such a big difference, not seconds to hours.
- Please note that the processing DOES eventually proceed, so it is not
an infinite dead-lock or something where I need to kill my Python code.
It just takes hours to resolve.
- Switching to "io_method = sync" seems to resolve this issue, and I do
not observe some jobs of the same batch getting "stuck". This is the
behavior I was used to seeing in <=PG17.
I am not to familiar with all the internals of PostgreSQL and the new
AIO framework and its "io workers". However, it seems there may be some
sort of locking issue between io workers that can occasionally happen in
PG18 with "io_method = worker"? Is there anyone else observing similar
issues in high multi-threaded processing worklflows?So, to confirm, you get the issue with as little as 3 io_workers?Also, what is pg_aios telling you during this time?Thom
Hi Thom,
As an extension to what I already wrote: as the processing gets stuck during UPDATEs, I realized the pg_aios view is likely not involved, as the current AIO implementation of PG18 only affects the read operations like sequential and bitmap heap scans.
So not seeing anything listed in the pg_aios view might be normal? That said, I have attempted to view and refresh the view during other stages of the processing, with pgAdmin showing apparently read operations, but still no records displayed in pg_aios. Maybe I am hitting the "refresh" button on the wrong time though...
But maybe the whole new AIO thing isn't involved in these, and it is another issue in PG18. Just to summarize my observations once again:
- Multi-threaded processing implemented in Python using pyodbc and concurrent.futures apparently getting stuck waiting for PostgreSQL to return. The processing step involved should return in ***less than 10 seconds*** for the small Italy extract, but can take >1h (up to >6) when it gets randomly stuck (some runs successful without delay, others not).
- pgAdmin showing all sessions associated with the threads as 'Active' with no wait events nor blocking PIDs during the whole time the processing appears stuck in PostgreSQL.
- No other sessions like VACUUM visible in pgAdmin during the time the processing appears stuck except the main 'postgres' user session.
- All locks as shown in pg_locks are granted, and most if not all are fastpath, with only AccessShareLock and RowExclusiveLock on the table and its indexes involved. A couple of ExclusiveLock on virtualxid and transactionid.
- 'Top' in Ubuntu showing multiple backend 'postgres' processes continuously at high core usage, one for each thread (each Python thread of course uses its own connection).
- pg_aios view empty, but the processing is UPDATEs, so probably no surprise.
- The processing *DOES* eventually continue after this particular anomaly, with no further consequences and expected results at the end of the total processing flow, so it is not a true dead-lock.
- I have noticed it gets stuck when processing OpenStreetMap scrub or grassland of the Italy extract of Geofabrik. However, as written above, some processing runs are fine on the same data, while others get stuck and delayed. The issue may or may not involve PostGIS though considering this and the fact that the processing step getting stuck involves PostGIS functions.
- In pgAdmin, the SQL statements as generated by my geoprocessing workflow and as being processed by PostgreSQL when the processing is stuck, look like this:
UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1 SET area_geo = t2.area_geo, perim_geo = t2.perim_geo, compact_geo = CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END, npoints_geo = t2.npoints_geo, comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END, convex_ratio_geo = CASE WHEN ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo / ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END FROM (SELECT objectid,ST_Area(way::geography,true) AS area_geo,ST_Perimeter(way::geography,true) AS perim_geo,ST_NPoints(way) AS npoints_geo FROM osm.landcover_scrubs_small_scale_2_ply) AS t2 WHERE (t2.objectid = t1.objectid) AND t1.objectid IN (SELECT t3.objectid FROM mini_test.osm.osm_tmp_28128_ch5 AS t3)
- All of this worked fine in PG <= 17.
Marco
So, to confirm, you get the issue with as little as 3 io_workers?Also, what is pg_aios telling you during this time?Thom
Hi Markus, On my Ubuntu virtual machine, io_uring cannot be started. Setting "io_method = io_uring" and trying to restart the cluster, fails. It will not start, I have attempted this multiple times. Only 'sync' and 'worker' allow restarting after modifying the PostgreSQL configuration file. As I understood, the PostgreSQL binary needs to be compiled with the proper support, maybe my version on Ubuntu 24.04 that runs as a Windows Hyper-V virtual machine, doesn't have it. Although I did notice when installing PG18 from synaptic, that it installed an additional 'liburing' package or something named like that if I remember well... As to your question about Python and scheduling conflict: this is not the case. Python runs on the Windows host, not under Ubuntu inside the VM. I only have PostgreSQL installed on Ubuntu, as I use it with osm2pgsql there. I access the PostgreSQL instance via pyodbc or psycopg2 on the Windows host, so it is like a remote database server, just running on local hardware. Marco > I am not a PostgreSQL contributor and have no clue what the actual > technical details are in the new AIO code, but reading your report the > following questions came to my mind: > > * Does the failure also happen with io_mode=io_uring? If no, it is a proof that it is really bound to io_mode=worker, not to AIO in general. > * Does the failure also happen with io_mode=worker when your Python code > uses only 22 cores, and PostgreSQL uses only 22 workers (so Python and > PostgreSQL do not share CPU cores)? If no, it might indicate that the > problem could be solved by increasing the execution policy in favor of > PostgreSQL to give a hint to the scheduler that a CPU core should be > given to PostgreSQL FIRST as Python most likely is waiting on it to > continue, but PostgreSQL could not continue because the schedule gave > all the cores to Python... (classical deadlock; eventually resolves once > enough CPU cores are free to eventually finish the starving thread). > HTH > -Markus
Hi, On 2025-10-05 22:22:32 +0200, Marco Boeringa wrote: > I realized that my observation of the pg_aios view being empty was likely > with the "io_method = sync" option set, which I guess doesn't use or fill > the pg_aios view? Can you confirm the pg_aios view is unused with "io_method > = sync", this aspect is not documented in the PostgreSQL help? Anyway, I > will need to re-test with 'worker' set. pg_aios is populated even with io_method = sync, albeit with at most one entry entry per backend. If there were no entries in pg_aios at the time of your hang, it doesn't seem likely - although not impossible - for AIO to be responsible. Greetings, Andres Freund
Hi, On 2025-10-05 10:55:01 +0200, Marco Boeringa wrote: > This has worked really well in previous versions of PostgreSQL (tested up to > PG17). However, in PG18, during the multi-threaded processing, I see some of > my submitted jobs that in this case were run against a small OpenStreetMap > Italy extract of Geofabrik, all of a sudden take > 1 hour to finish (up to 6 > hours for this small extract), even though similar jobs from the same > processing step, finish in less than 10 seconds (and the other jobs should > as well). This seems to happen kind of "random". Many multi-threading tasks > before and after the affected processing steps, do finish normally. > > When this happens, I observe the following things: > > - High processor activity, even though the jobs that should finish in > seconds, take hours, all the while showing the high core usage. > > - PgAdmin shows all sessions created by the Python threads as 'active', with > *no* wait events attached. I think we need CPU profiles of these tasks. If something is continually taking a lot more CPU than expected, that seems like an issue worth investigating. Greetings, Andres Freund
Hi Andres, I should have phrased it better. The high processor and core activity is not unexpected. The code is designed to saturate the processor and maximize throughput by careful design of the Python threading stuff. It is just that all the jobs send to PostgreSQL via ODBC for the specific step in the processing and with the small Italy extract, should return in less than 10 seconds (which they do in those lucky runs I do not observe the issue), but some of the jobs for the specific step don't, e.g. 30 jobs return within 10 seconds, then the remaining 14 unexpectedly get stuck for 2 hours before returning, all the while staying at the same high core usage they were initiated with. So some of the PostgreSQL database sessions, as I already explained show up in pgAdmin as 'active' with no wait events or blocking pids, simply take an excessive amount of time, but will ultimately return. The CPU time, as witnessed with 'top' in Ubuntu, is really spend in PostgreSQL and the database sessions, not Python, which is run in Windows, and doesn't show high CPU usage in the Windows Task Manager. This doesn't always happen, it is kind of random. One run with the Italy data will be OK, the next not. Marco Op 6-10-2025 om 16:34 schreef Andres Freund: > Hi, > > On 2025-10-05 10:55:01 +0200, Marco Boeringa wrote: >> This has worked really well in previous versions of PostgreSQL (tested up to >> PG17). However, in PG18, during the multi-threaded processing, I see some of >> my submitted jobs that in this case were run against a small OpenStreetMap >> Italy extract of Geofabrik, all of a sudden take > 1 hour to finish (up to 6 >> hours for this small extract), even though similar jobs from the same >> processing step, finish in less than 10 seconds (and the other jobs should >> as well). This seems to happen kind of "random". Many multi-threading tasks >> before and after the affected processing steps, do finish normally. >> >> When this happens, I observe the following things: >> >> - High processor activity, even though the jobs that should finish in >> seconds, take hours, all the while showing the high core usage. >> >> - PgAdmin shows all sessions created by the Python threads as 'active', with >> *no* wait events attached. > I think we need CPU profiles of these tasks. If something is continually > taking a lot more CPU than expected, that seems like an issue worth > investigating. > > Greetings, > > Andres Freund
Hi, On 2025-10-06 18:09:25 +0200, Marco Boeringa wrote: > I should have phrased it better. The high processor and core activity is not > unexpected. The code is designed to saturate the processor and maximize > throughput by careful design of the Python threading stuff. It is just that > all the jobs send to PostgreSQL via ODBC for the specific step in the > processing and with the small Italy extract, should return in less than 10 > seconds (which they do in those lucky runs I do not observe the issue), but > some of the jobs for the specific step don't, e.g. 30 jobs return within 10 > seconds, then the remaining 14 unexpectedly get stuck for 2 hours before > returning, all the while staying at the same high core usage they were > initiated with. > So some of the PostgreSQL database sessions, as I already explained show up > in pgAdmin as 'active' with no wait events or blocking pids, simply take an > excessive amount of time, but will ultimately return. We need a profile of those processes while they use excessive amount of time. If they don't have wait events they're using CPU time, and seeing a profile of where all that time is spent might provide enough information where to look in more detail. Greetings, Andres Freund
Hi Andras, I am not really a Linux / Ubuntu expert. Can you give me a suggestion for how to create such a CPU profile for the specific PostgreSQL processes getting stuck? Marco Op 6-10-2025 om 18:13 schreef Andres Freund: > Hi, > > On 2025-10-06 18:09:25 +0200, Marco Boeringa wrote: >> I should have phrased it better. The high processor and core activity is not >> unexpected. The code is designed to saturate the processor and maximize >> throughput by careful design of the Python threading stuff. It is just that >> all the jobs send to PostgreSQL via ODBC for the specific step in the >> processing and with the small Italy extract, should return in less than 10 >> seconds (which they do in those lucky runs I do not observe the issue), but >> some of the jobs for the specific step don't, e.g. 30 jobs return within 10 >> seconds, then the remaining 14 unexpectedly get stuck for 2 hours before >> returning, all the while staying at the same high core usage they were >> initiated with. >> So some of the PostgreSQL database sessions, as I already explained show up >> in pgAdmin as 'active' with no wait events or blocking pids, simply take an >> excessive amount of time, but will ultimately return. > We need a profile of those processes while they use excessive amount of > time. If they don't have wait events they're using CPU time, and seeing a > profile of where all that time is spent might provide enough information where > to look in more detail. > > Greetings, > > Andres Freund
Hi, On 2025-10-06 18:17:11 +0200, Marco Boeringa wrote: > Hi Andras, > > I am not really a Linux / Ubuntu expert. Can you give me a suggestion for > how to create such a CPU profile for the specific PostgreSQL processes > getting stuck? https://wiki.postgresql.org/wiki/Profiling_with_perf is a good starting point. Greetings, Andres
Hi Andres, Thanks for the suggestion, this seems a useful option. However, when I attempt to run "perf top" in a Terminal window, I get the following warning: WARNING: perf not found for kernel 6.14.0-1012 I also see a suggestion to install the Azure linux-tools. However, if I type 'linux-tools' as search keyword in Synaptic Package Manager, I see a whole bunch of 'linux-tools', e.g. azure/aws/gcp/gke, which also include kernel version build numbers (at least that is what I assume they are). What version do you suggest I install for an ordinary locally running Ubuntu 24.04 VM? And do these packages indeed add the perf command? Marco Op 6-10-2025 om 18:21 schreef Andres Freund: > Hi, > > On 2025-10-06 18:17:11 +0200, Marco Boeringa wrote: >> Hi Andras, >> >> I am not really a Linux / Ubuntu expert. Can you give me a suggestion for >> how to create such a CPU profile for the specific PostgreSQL processes >> getting stuck? > https://wiki.postgresql.org/wiki/Profiling_with_perf is a good starting point. > > Greetings, > > Andres
Hi, On 2025-10-06 19:01:37 +0200, Marco Boeringa wrote: > Thanks for the suggestion, this seems a useful option. > > However, when I attempt to run "perf top" in a Terminal window, I get the > following warning: > > WARNING: perf not found for kernel 6.14.0-1012 > > I also see a suggestion to install the Azure linux-tools. However, if I type > 'linux-tools' as search keyword in Synaptic Package Manager, I see a whole > bunch of 'linux-tools', e.g. azure/aws/gcp/gke, which also include kernel > version build numbers (at least that is what I assume they are). > > What version do you suggest I install for an ordinary locally running Ubuntu > 24.04 VM? There are meta-packages to install linux-tools for the right version. E.g. linux-tools-virtual. Unfortunately ubuntu has multiple "kernel variants" (like -virtual) that you still have to choose between. You can figure out which base kernel you have with "dpkg -l|grep linux" or such. Greetings, Andres
Hi Andres, I now noticed that all the suggested 'linux-tools' packages that popped up in the warning message when I attempted to run "perf top" and referencing Azure, are already displayed as installed in Synaptic Package Manager. I guess it makes sense the packages and likely the kernel of my machine are for Azure, as it is a Windows Hyper-V virtual machine with the Microsoft provided Ubuntu install option. However, if the packages are installed, why can't I run perf, or do I need a 'linux-tools' specific command for that instead of perf? Marco Op 6-10-2025 om 19:09 schreef Andres Freund: > Hi, > > On 2025-10-06 19:01:37 +0200, Marco Boeringa wrote: >> Thanks for the suggestion, this seems a useful option. >> >> However, when I attempt to run "perf top" in a Terminal window, I get the >> following warning: >> >> WARNING: perf not found for kernel 6.14.0-1012 >> >> I also see a suggestion to install the Azure linux-tools. However, if I type >> 'linux-tools' as search keyword in Synaptic Package Manager, I see a whole >> bunch of 'linux-tools', e.g. azure/aws/gcp/gke, which also include kernel >> version build numbers (at least that is what I assume they are). >> >> What version do you suggest I install for an ordinary locally running Ubuntu >> 24.04 VM? > There are meta-packages to install linux-tools for the right > version. E.g. linux-tools-virtual. Unfortunately ubuntu has multiple "kernel > variants" (like -virtual) that you still have to choose between. > > You can figure out which base kernel you have with "dpkg -l|grep linux" or > such. > > Greetings, > > Andres
Hi Andres, I now found out that I do have a 'perf' living under one of 'usr' folders, but unfortunately, this is the 6.8 kernel version: /usr/lib/linux-tools-6.8.0-85 None of the other suggested packages and their likely install folders seem to contain perf. Since perf appears and rather understandably seems to need to exactly match the kernel version, I can't use this one, as my kernel was already upgraded to 6.14 by a more or less forced update in Software Updater. It is a pain the linux-tools-common package, that I suspect is the source of the 6.8 'perf' version and folder and tagged as that version in Synaptic, isn't being updated at the same time to allow you to run 'perf' with the proper version. I guess I will need to wait for an update of it. Marco Op 6-10-2025 om 19:09 schreef Andres Freund: > Hi, > > On 2025-10-06 19:01:37 +0200, Marco Boeringa wrote: >> Thanks for the suggestion, this seems a useful option. >> >> However, when I attempt to run "perf top" in a Terminal window, I get the >> following warning: >> >> WARNING: perf not found for kernel 6.14.0-1012 >> >> I also see a suggestion to install the Azure linux-tools. However, if I type >> 'linux-tools' as search keyword in Synaptic Package Manager, I see a whole >> bunch of 'linux-tools', e.g. azure/aws/gcp/gke, which also include kernel >> version build numbers (at least that is what I assume they are). >> >> What version do you suggest I install for an ordinary locally running Ubuntu >> 24.04 VM? > There are meta-packages to install linux-tools for the right > version. E.g. linux-tools-virtual. Unfortunately ubuntu has multiple "kernel > variants" (like -virtual) that you still have to choose between. > > You can figure out which base kernel you have with "dpkg -l|grep linux" or > such. > > Greetings, > > Andres
On 2025-10-06 22:41:31 +0200, Marco Boeringa wrote: > Hi Andres, > > I now found out that I do have a 'perf' living under one of 'usr' folders, > but unfortunately, this is the 6.8 kernel version: > > /usr/lib/linux-tools-6.8.0-85 > > None of the other suggested packages and their likely install folders seem > to contain perf. > > Since perf appears and rather understandably seems to need to exactly match > the kernel version, I can't use this one, as my kernel was already upgraded > to 6.14 by a more or less forced update in Software Updater. I'm pretty sure that you can use any halfway-recent perf binary, they don't actually need to match exactly. I don't know why ubuntu insists on a perfect match. I regularly run completely different versions. Greetings, Andres Freund
It didn't work: as soon as I attempted to run perf, it emitted the warning message about the kernel version mismatch with suggestions of package to install. However, I now realized after further digging, that Ubuntu usually has multiple kernel versions installed. I have now attempted to add the GRUB boot menu, which should allow me to boot with the older 6.8 version of the kernel (which was available during configuration of GRUB), and hopefully run perf with that version of the kernel. Marco Op 6-10-2025 om 23:39 schreef Andres Freund: > On 2025-10-06 22:41:31 +0200, Marco Boeringa wrote: >> Hi Andres, >> >> I now found out that I do have a 'perf' living under one of 'usr' folders, >> but unfortunately, this is the 6.8 kernel version: >> >> /usr/lib/linux-tools-6.8.0-85 >> >> None of the other suggested packages and their likely install folders seem >> to contain perf. >> >> Since perf appears and rather understandably seems to need to exactly match >> the kernel version, I can't use this one, as my kernel was already upgraded >> to 6.14 by a more or less forced update in Software Updater. > I'm pretty sure that you can use any halfway-recent perf binary, they don't > actually need to match exactly. I don't know why ubuntu insists on a perfect > match. I regularly run completely different versions. > > Greetings, > > Andres Freund
Hi Andres, That worked, I successfully booted with kernel 6.8! I can now run perf, but it emits a warning, see below. Do you have suggestions of how to set these perf 'paranoid' settings? Marco |Access to performance monitoring and observability operations is limited. │ │Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open │ │access to performance monitoring and observability operations for processes│ |without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability. │More information can be found at 'Perf events and tool security' document: │ │https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html │ │perf_event_paranoid setting is 4: │ │ -1: Allow use of (almost) all events by all users │ | Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK │ │>= 0: Disallow raw and ftrace function tracepoint access │ │>= 1: Disallow CPU event access │ │>= 2: Disallow kernel profiling │ │To make the adjusted perf_event_paranoid setting permanent preserve it │ │in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>) Op 7-10-2025 om 09:15 schreef Marco Boeringa: > It didn't work: as soon as I attempted to run perf, it emitted the > warning message about the kernel version mismatch with suggestions of > package to install. > > However, I now realized after further digging, that Ubuntu usually has > multiple kernel versions installed. I have now attempted to add the > GRUB boot menu, which should allow me to boot with the older 6.8 > version of the kernel (which was available during configuration of > GRUB), and hopefully run perf with that version of the kernel. > > Marco > > Op 6-10-2025 om 23:39 schreef Andres Freund: >> On 2025-10-06 22:41:31 +0200, Marco Boeringa wrote: >>> Hi Andres, >>> >>> I now found out that I do have a 'perf' living under one of 'usr' >>> folders, >>> but unfortunately, this is the 6.8 kernel version: >>> >>> /usr/lib/linux-tools-6.8.0-85 >>> >>> None of the other suggested packages and their likely install >>> folders seem >>> to contain perf. >>> >>> Since perf appears and rather understandably seems to need to >>> exactly match >>> the kernel version, I can't use this one, as my kernel was already >>> upgraded >>> to 6.14 by a more or less forced update in Software Updater. >> I'm pretty sure that you can use any halfway-recent perf binary, they >> don't >> actually need to match exactly. I don't know why ubuntu insists on a >> perfect >> match. I regularly run completely different versions. >> >> Greetings, >> >> Andres Freund
Hi Andres, I realized I simply need to run perf with 'sudo' privileges, that seems to work. All examples I saw omitted it, so hence the confusion. I will attempt to capture the stall with the Italy data once I finish another OpenStreetMap rendering test currently running. Marco Op 7-10-2025 om 09:34 schreef Marco Boeringa: > Hi Andres, > > That worked, I successfully booted with kernel 6.8! > > I can now run perf, but it emits a warning, see below. Do you have > suggestions of how to set these perf 'paranoid' settings? > > Marco > > |Access to performance monitoring and observability operations is > limited. │ > │Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to > open │ > │access to performance monitoring and observability operations for > processes│ > |without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability. > │More information can be found at 'Perf events and tool security' > document: │ > │https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html > │ > │perf_event_paranoid setting is 4: │ > │ -1: Allow use of (almost) all events by all users │ > | Ignore mlock limit after perf_event_mlock_kb without > CAP_IPC_LOCK │ > │>= 0: Disallow raw and ftrace function tracepoint access │ > │>= 1: Disallow CPU event access │ > │>= 2: Disallow kernel profiling │ > │To make the adjusted perf_event_paranoid setting permanent preserve > it │ > │in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>) > > Op 7-10-2025 om 09:15 schreef Marco Boeringa: >> It didn't work: as soon as I attempted to run perf, it emitted the >> warning message about the kernel version mismatch with suggestions of >> package to install. >> >> However, I now realized after further digging, that Ubuntu usually >> has multiple kernel versions installed. I have now attempted to add >> the GRUB boot menu, which should allow me to boot with the older 6.8 >> version of the kernel (which was available during configuration of >> GRUB), and hopefully run perf with that version of the kernel. >> >> Marco >> >> Op 6-10-2025 om 23:39 schreef Andres Freund: >>> On 2025-10-06 22:41:31 +0200, Marco Boeringa wrote: >>>> Hi Andres, >>>> >>>> I now found out that I do have a 'perf' living under one of 'usr' >>>> folders, >>>> but unfortunately, this is the 6.8 kernel version: >>>> >>>> /usr/lib/linux-tools-6.8.0-85 >>>> >>>> None of the other suggested packages and their likely install >>>> folders seem >>>> to contain perf. >>>> >>>> Since perf appears and rather understandably seems to need to >>>> exactly match >>>> the kernel version, I can't use this one, as my kernel was already >>>> upgraded >>>> to 6.14 by a more or less forced update in Software Updater. >>> I'm pretty sure that you can use any halfway-recent perf binary, >>> they don't >>> actually need to match exactly. I don't know why ubuntu insists on a >>> perfect >>> match. I regularly run completely different versions. >>> >>> Greetings, >>> >>> Andres Freund
Hi Andres, I have now been able to successfully run 'perf' after a VM restart and booting with the linux 6.8 kernel. I have copied in two results for 'perf top' and a recording of one of the about 10 stuck processes by supplying a specific pid of the postgres backend. Let me know if this is of any use. Of course, the recordings themselves are much bigger, I just copied the top part as displayed by the 'perf' report option. If you need the actual underlying files, let me know. Also, if you have specific suggestions for 'perf' command options you would like to see run, please post them in your response and I will try if I can run them. Andres: I posted separately to you I had a suspicion a recent linux kernel update (6.8 --> 6.14) might be involved that coincided with the PG18/PostGIS 3.6.0 upgrade when I initially had multiple successful runs of my tool without the temporary stalling / delay in returning of postgres after booting to the older 6.8 kernel. This has now proven wrong: these results are of the 6.8 kernel runs, so the kernel is not involved, and I also see the issue with the older kernel. *** sudo perf top: *** Samples: 1M of event 'cpu-clock:ppp', 4000 Hz, Event count (approx.): 346785156250 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 17,33% postgres [.] 0x0000000000584216 12,21% postgres [.] 0x000000000058737a 11,80% [kernel] [k] pv_native_safe_halt 10,24% postgres [.] ReleaseAndReadBuffer 4,92% postgres [.] heap_hot_search_buffer 3,63% postgres [.] hash_search_with_hash_value 2,36% postgres [.] 0x00000000005841f3 1,85% postgres [.] 0x0000000000554edd 1,61% postgres [.] 0x0000000000554e1f 1,40% postgres [.] 0x00000000005567f9 1,32% postgres [.] StartReadBuffer 1,24% postgres [.] heap_page_prune_opt 1,23% postgres [.] LWLockAcquire 1,14% postgres [.] HeapTupleSatisfiesVisibility 1,10% postgres [.] 0x00000000003977dd 0,92% postgres [.] index_getnext_tid 0,87% postgres [.] index_fetch_heap 0,83% postgres [.] ExecScan 0,82% postgres [.] ReleaseBuffer 0,81% postgres [.] LWLockRelease 0,71% postgres [.] XidInMVCCSnapshot 0,66% postgres [.] _bt_checkkeys 0,66% postgres [.] _bt_next 0,55% libxorgxrdp.so [.] crc_process_data 0,51% postgres [.] btgettuple 0,41% postgres [.] 0x00000000005567c6 0,40% postgres [.] 0x00000000003975d0 0,35% postgres [.] ExecStoreBufferHeapTuple 0,34% postgres [.] hash_bytes 0,33% postgres [.] MemoryContextReset 0,33% postgres [.] 0x00000000003975d3 0,32% postgres [.] ExecEvalSysVar 0,28% postgres [.] 0x00000000003975e3 0,27% postgres [.] ResourceOwnerForget 0,27% postgres [.] GlobalVisTestFor 0,26% postgres [.] 0x00000000003975b6 0,26% postgres [.] HeapTupleIsSurelyDead 0,23% postgres [.] 0x00000000003975c3 0,23% postgres [.] 0x00000000001ff80a 0,22% postgres [.] PredicateLockTID 0,21% postgres [.] slot_getsomeattrs_int 0,20% postgres [.] 0x000000000058421d 0,19% postgres [.] ReadBufferExtended 0,16% libc.so.6 [.] __memcmp_sse2 0,15% postgres [.] 0x0000000000382e9e *** sudo perf -p <PID of one stuck postgres backend> -g -d 10 *** *** sudo perf report -g *** Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10018000000 Children Self Command Shared Object Symbol + 100,00% 0,00% postgres postgres [.] _start + 100,00% 0,00% postgres libc.so.6 [.] __libc_start_main@@GLIBC_2.34 + 100,00% 0,00% postgres libc.so.6 [.] __libc_start_call_main + 100,00% 0,00% postgres postgres [.] main + 100,00% 0,00% postgres postgres [.] PostmasterMain + 100,00% 0,00% postgres postgres [.] 0x00005f35711b913e + 100,00% 0,00% postgres postgres [.] postmaster_child_launch + 100,00% 0,00% postgres postgres [.] 0x00005f3570fb9dbf + 100,00% 0,00% postgres postgres [.] PostgresMain + 100,00% 0,00% postgres postgres [.] 0x00005f3570fb6cb9 + 100,00% 0,00% postgres postgres [.] PortalRun + 100,00% 0,00% postgres postgres [.] 0x00005f3570fbeca3 + 100,00% 0,00% postgres postgres [.] 0x00005f3570fbcf85 + 100,00% 0,00% postgres postgres [.] standard_ExecutorRun + 100,00% 0,00% postgres postgres [.] 0x00005f3570de682b + 100,00% 0,00% postgres postgres [.] 0x00005f3570de966d + 97,51% 0,00% postgres postgres [.] 0x00005f3570de959d + 97,17% 0,95% postgres postgres [.] ExecScan + 82,75% 0,00% postgres postgres [.] 0x00005f3570dda135 + 82,35% 1,04% postgres postgres [.] index_fetch_heap + 28,80% 0,00% postgres postgres [.] 0x00005f3570bf3f87 + 23,67% 0,00% postgres postgres [.] 0x00005f3570bf3ee8 + 22,19% 0,00% postgres postgres [.] 0x00005f3570fa1216 + 22,18% 22,18% postgres postgres [.] 0x0000000000584216 + 17,06% 0,89% postgres postgres [.] LWLockRelease + 16,18% 0,00% postgres postgres [.] 0x00005f3570bf3f4f + 15,59% 0,00% postgres postgres [.] 0x00005f3570fa437a + 15,57% 15,57% postgres postgres [.] 0x000000000058737a + 14,90% 0,22% postgres postgres [.] ReadBufferExtended + 14,18% 13,05% postgres postgres [.] ReleaseAndReadBuffer + 14,00% 1,82% postgres postgres [.] StartReadBuffer + 6,59% 0,00% postgres postgres [.] 0x00005f3570bf3f18 + 5,97% 3,78% postgres postgres [.] heap_hot_search_buffer + 5,23% 0,00% postgres postgres [.] 0x00005f3570da0613 + 5,20% 0,00% postgres postgres [.] 0x00005f3570dda121 + 5,19% 0,22% postgres postgres [.] slot_getsomeattrs_int + 5,18% 0,68% postgres postgres [.] index_getnext_tid + 4,52% 0,65% postgres postgres [.] btgettuple + 3,92% 0,99% postgres postgres [.] _bt_next + 3,28% 3,28% postgres postgres [.] hash_search_with_hash_value + 2,96% 0,38% postgres postgres [.] ExecStoreBufferHeapTuple + 2,93% 0,00% postgres postgres [.] 0x00005f3570bf3fe0
Hi Andres, Probably less useful, but just to further illustrate the issue, the results op 'top' which shows multiple stuck postgress processes already busy for hours and as written before all displaying as 'active' with no wait events in pgAdmin: top - 21:42:44 up 12:20, 2 users, load average: 16,14, 16,14, 16,09 Tasks: 664 total, 17 running, 647 sleeping, 0 stopped, 0 zombie %Cpu(s): 37,0 us, 0,1 sy, 0,0 ni, 62,9 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 393429,0 total, 297340,0 free, 53521,1 used, 91791,0 buff/cache MiB Swap: 153600,0 total, 153600,0 free, 0,0 used. 339907,9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 118256 postgres 20 0 164,3g 994408 985600 R 100,0 0,2 309:57.22 postgres 118264 postgres 20 0 164,3g 1,0g 994,8m R 100,0 0,3 309:57.46 postgres 118367 postgres 20 0 164,3g 911516 907456 R 100,0 0,2 309:55.78 postgres 118368 postgres 20 0 164,3g 915388 911328 R 100,0 0,2 309:55.82 postgres 118369 postgres 20 0 164,3g 914684 910272 R 100,0 0,2 309:55.20 postgres 118372 postgres 20 0 164,3g 924188 916256 R 100,0 0,2 309:55.20 postgres 118254 postgres 20 0 164,3g 1,0g 1,0g R 99,9 0,3 309:52.71 postgres 118255 postgres 20 0 164,3g 993,6m 981,8m R 99,9 0,3 309:57.73 postgres 118259 postgres 20 0 164,3g 978,2m 988768 R 99,9 0,2 309:57.47 postgres 118261 postgres 20 0 164,3g 1,0g 987,2m R 99,9 0,3 309:57.52 postgres 118262 postgres 20 0 164,3g 998560 990176 R 99,9 0,2 309:57.04 postgres 118265 postgres 20 0 164,3g 1,0g 1,0g R 99,9 0,3 309:55.86 postgres 118365 postgres 20 0 164,3g 925244 917312 R 99,9 0,2 309:56.02 postgres 118366 postgres 20 0 164,3g 920668 912384 R 99,9 0,2 309:56.02 postgres 118370 postgres 20 0 164,3g 923132 915200 R 99,9 0,2 309:55.22 postgres 118373 postgres 20 0 164,3g 924188 916256 R 99,9 0,2 309:54.96 postgres 2546 ubuntu_+ 20 0 560524 266288 106704 S 21,8 0,1 7:48.62 Xorg 2954 ubuntu_+ 20 0 9050316 455464 167812 S 7,3 0,1 4:33.90 gnome-s+ 1855 xrdp 20 0 133248 16344 8800 S 1,3 0,0 0:37.48 xrdp 4329 ubuntu_+ 20 0 851956 56648 43292 S 0,7 0,0 0:25.66 gnome-t+ 123595 ubuntu_+ 20 0 14644 5632 3520 R 0,4 0,0 0:00.31 top 123432 root 20 0 0 0 0 I 0,2 0,0 0:00.09 kworker+ 1385 polkitd 20 0 384600 9744 7392 S 0,1 0,0 0:01.92 polkitd 1982 gdm 20 0 8805756 237412 128832 S 0,1 0,1 1:20.05 gnome-s+ 3259 ubuntu_+ 20 0 388852 11472 6688 S 0,1 0,0 0:08.03 ibus-da+ 3368 ubuntu_+ 20 0 347872 30504 17556 S 0,1 0,0 0:03.70 ibus-ex+ 3638 ubuntu_+ 20 0 236772 6688 6336 S 0,1 0,0 0:02.33 ibus-en+ 30304 ubuntu_+ 20 0 6490464 271864 129800 S 0,1 0,1 0:23.91 nautilus 1 root 20 0 23892 13024 8800 S 0,0 0,0 0:04.11 systemd 2 root 20 0 0 0 0 S 0,0 0,0 0:00.04 kthreadd 3 root 20 0 0 0 0 S 0,0 0,0 0:00.00 pool_wo+ 4 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 kworker+ 5 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 kworker+ 6 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 kworker+ 7 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 kworker+ 9 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 kworker+ 10 root 20 0 0 0 0 I 0,0 0,0 0:05.18 kworker+
Hi, On 2025-10-07 21:34:00 +0200, Marco Boeringa wrote: > I have now been able to successfully run 'perf' after a VM restart and > booting with the linux 6.8 kernel. Unfortunately the profile is lacking most symbols, so it's not particularly informative :(. You probably need to install the debug symbols package for postgres (and probably libc)... > Andres: I posted separately to you I had a suspicion a recent linux kernel > update (6.8 --> 6.14) might be involved that coincided with the PG18/PostGIS > 3.6.0 upgrade when I initially had multiple successful runs of my tool > without the temporary stalling / delay in returning of postgres after > booting to the older 6.8 kernel. This has now proven wrong: these results > are of the 6.8 kernel runs, so the kernel is not involved, and I also see > the issue with the older kernel. Good to know. Greetings, Andres Freund
Andres, Can you be a bit more specific as to what debug symbols package I need to install? There is a whole list of them for PG18 when I look with Synaptic Package Manager... Or do I simply need to install all PG18 listed debug symbols packages? It is a very long list though of packages, so I would prefer to limit it to the packages you think are really useful for your debug work. And do you have any comments as the command run? If I install the debug symbols, are the specific commands OK, or do you want other options? Marco Op 7-10-2025 om 21:56 schreef Andres Freund: > Hi, > > On 2025-10-07 21:34:00 +0200, Marco Boeringa wrote: >> I have now been able to successfully run 'perf' after a VM restart and >> booting with the linux 6.8 kernel. > Unfortunately the profile is lacking most symbols, so it's not particularly > informative :(. You probably need to install the debug symbols package for > postgres (and probably libc)... > > >> Andres: I posted separately to you I had a suspicion a recent linux kernel >> update (6.8 --> 6.14) might be involved that coincided with the PG18/PostGIS >> 3.6.0 upgrade when I initially had multiple successful runs of my tool >> without the temporary stalling / delay in returning of postgres after >> booting to the older 6.8 kernel. This has now proven wrong: these results >> are of the 6.8 kernel runs, so the kernel is not involved, and I also see >> the issue with the older kernel. > Good to know. > > Greetings, > > Andres Freund
Hi Andres, Searching a bit more, and reading the PostgreSQL wiki pages about debugging, I have now found the 'find-dbgsym-packages' command, part of the 'debian-goodies' Installing and running this against a postgres process ID, returned the following debug symbol packages. Does this list seem about right for you? Marco lib32stdc++6-14-dbg libkrb5-dbg libstdc++6-14-dbg libx32stdc++6-14-dbg postgresql-18-dbgsym
Hi Andres, This looks much better, doesn't it? I hope this helps. Let me know if you need anything else. Marco *** sudo perf top: *** Samples: 2M of event 'cpu-clock:ppp', 4000 Hz, Event count (approx.): 593240647834 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 24,66% [kernel] [k] pv_native_safe_halt 9,35% postgres [.] LWLockAttemptLock 7,69% postgres [.] heap_hot_search_buffer 6,68% postgres [.] tts_heap_getsomeattrs.lto_priv.0 4,93% postgres [.] LWLockReleaseInternal 4,26% postgres [.] ExecInterpExpr 4,12% postgres [.] hash_search_with_hash_value 3,42% postgres [.] PinBuffer 2,54% postgres [.] heapam_index_fetch_tuple.lto_priv.0 2,46% postgres [.] ReleaseAndReadBuffer 1,72% postgres [.] LWLockAcquire 1,72% postgres [.] HeapTupleSatisfiesVisibility 1,48% postgres [.] index_fetch_heap 1,44% postgres [.] index_getnext_tid 1,40% postgres [.] heap_page_prune_opt 1,28% postgres [.] _bt_readpage 1,28% postgres [.] ExecScan 1,23% postgres [.] LWLockRelease 1,10% postgres [.] IndexNext 1,10% postgres [.] XidInMVCCSnapshot 1,07% postgres [.] _bt_checkkeys 0,98% postgres [.] _bt_next 0,90% postgres [.] UnpinBufferNoOwner.lto_priv.0 0,82% postgres [.] GetPrivateRefCountEntry.lto_priv.0 0,81% postgres [.] StartReadBuffer 0,80% postgres [.] btgettuple 0,69% postgres [.] _bt_check_compare.lto_priv.0 0,67% libxorgxrdp.so [.] crc_process_data 0,56% postgres [.] ExecStoreBufferHeapTuple 0,55% postgres [.] MemoryContextReset 0,50% postgres [.] ExecEvalSysVar 0,50% postgres [.] hash_bytes 0,48% postgres [.] tts_virtual_clear.lto_priv.0 0,48% postgres [.] ExecNestLoop 0,43% postgres [.] ResourceOwnerForget 0,41% postgres [.] GlobalVisTestFor 0,37% postgres [.] HeapTupleIsSurelyDead 0,33% postgres [.] slot_getsomeattrs_int 0,33% postgres [.] PredicateLockTID 0,29% postgres [.] ReadBufferExtended 0,27% postgres [.] _bt_saveitem 0,23% postgres [.] _bt_setuppostingitems 0,22% libc.so.6 [.] __memcmp_sse2 0,21% postgres [.] ExecIndexScan 0,20% postgres [.] ReleaseBuffer 0,19% postgres [.] ResourceOwnerEnlarge *** sudo perf -p <PID of one stuck postgres backend> -g -d 10 *** *** sudo perf report -g *** Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10008250000 Children Self Command Shared Object Symbol + 100,00% 0,00% postgres postgres [.] _start + 100,00% 0,00% postgres libc.so.6 [.] __libc_start_main@@GLIBC_2.34 + 100,00% 0,00% postgres libc.so.6 [.] __libc_start_call_main + 100,00% 0,00% postgres postgres [.] main + 100,00% 0,00% postgres postgres [.] PostmasterMain + 100,00% 0,00% postgres postgres [.] ServerLoop.isra.0 + 100,00% 0,00% postgres postgres [.] postmaster_child_launch + 100,00% 0,00% postgres postgres [.] 0x00005f3570fb9dbf + 100,00% 0,00% postgres postgres [.] PostgresMain + 100,00% 0,00% postgres postgres [.] exec_simple_query + 100,00% 0,63% postgres postgres [.] ExecNestLoop + 100,00% 0,00% postgres postgres [.] PortalRun + 100,00% 0,00% postgres postgres [.] PortalRunMulti + 100,00% 0,00% postgres postgres [.] ProcessQuery + 100,00% 0,00% postgres postgres [.] standard_ExecutorRun + 100,00% 0,00% postgres postgres [.] ExecModifyTable + 94,63% 1,47% postgres postgres [.] ExecScan + 78,76% 1,49% postgres postgres [.] IndexNext + 66,89% 1,96% postgres postgres [.] index_fetch_heap + 64,35% 3,61% postgres postgres [.] heapam_index_fetch_tuple.lto_priv.0 + 21,92% 0,40% postgres postgres [.] ReadBufferExtended + 20,36% 1,19% postgres postgres [.] StartReadBuffer + 16,23% 5,79% postgres postgres [.] ExecInterpExpr + 15,00% 10,70% postgres postgres [.] heap_hot_search_buffer + 12,81% 12,81% postgres postgres [.] LWLockAttemptLock + 10,31% 2,36% postgres postgres [.] index_getnext_tid + 9,69% 0,48% postgres postgres [.] slot_getsomeattrs_int + 9,27% 9,27% postgres postgres [.] tts_heap_getsomeattrs.lto_priv.0 + 8,45% 1,85% postgres postgres [.] LWLockRelease + 7,91% 1,08% postgres postgres [.] btgettuple + 6,89% 1,12% postgres postgres [.] _bt_next + 6,77% 6,77% postgres postgres [.] LWLockReleaseInternal + 5,89% 5,89% postgres postgres [.] hash_search_with_hash_value + 5,78% 0,10% postgres postgres [.] _bt_readnextpage + 4,48% 4,14% postgres postgres [.] PinBuffer + 4,41% 1,90% postgres postgres [.] _bt_readpage + 4,09% 3,00% postgres postgres [.] ReleaseAndReadBuffer + 2,99% 2,38% postgres postgres [.] HeapTupleSatisfiesVisibility + 2,58% 0,82% postgres postgres [.] ExecStoreBufferHeapTuple + 2,43% 2,43% postgres postgres [.] LWLockAcquire + 2,28% 1,47% postgres postgres [.] _bt_checkkeys + 1,74% 1,74% postgres postgres [.] heap_page_prune_opt + 1,67% 1,10% postgres postgres [.] UnpinBufferNoOwner.lto_priv.0 + 1,47% 1,47% postgres postgres [.] XidInMVCCSnapshot + 1,18% 1,18% postgres postgres [.] GetPrivateRefCountEntry.lto_priv.0 + 1,09% 1,09% postgres postgres [.] MemoryContextReset Op 7-10-2025 om 23:38 schreef Marco Boeringa: > Hi Andres, > > Searching a bit more, and reading the PostgreSQL wiki pages about > debugging, I have now found the 'find-dbgsym-packages' command, part > of the 'debian-goodies' > > Installing and running this against a postgres process ID, returned > the following debug symbol packages. Does this list seem about right > for you? > > Marco > > lib32stdc++6-14-dbg > > libkrb5-dbg > > libstdc++6-14-dbg > > libx32stdc++6-14-dbg > > postgresql-18-dbgsym >
Hi, On 2025-10-08 00:13:34 +0200, Marco Boeringa wrote: > This looks much better, doesn't it? It indeed does! > I hope this helps. Let me know if you need anything else. > *** sudo perf -p <PID of one stuck postgres backend> -g -d 10 *** > *** sudo perf report -g *** Could you show perf report --no-children? That would show us which individual functions, rather than call-stacks, take the longest. > Samples: 40K of event 'task-clock:ppp', Event count (approx.): 10008250000 > Children Self Command Shared Object Symbol > + 100,00% 0,00% postgres postgres [.] _start > + 100,00% 0,00% postgres libc.so.6 [.] > __libc_start_main@@GLIBC_2.34 > + 100,00% 0,00% postgres libc.so.6 [.] > __libc_start_call_main > + 100,00% 0,00% postgres postgres [.] main > + 100,00% 0,00% postgres postgres [.] PostmasterMain > + 100,00% 0,00% postgres postgres [.] ServerLoop.isra.0 > + 100,00% 0,00% postgres postgres [.] > postmaster_child_launch > + 100,00% 0,00% postgres postgres [.] 0x00005f3570fb9dbf > + 100,00% 0,00% postgres postgres [.] PostgresMain > + 100,00% 0,00% postgres postgres [.] exec_simple_query > + 100,00% 0,63% postgres postgres [.] ExecNestLoop > + 100,00% 0,00% postgres postgres [.] PortalRun > + 100,00% 0,00% postgres postgres [.] PortalRunMulti > + 100,00% 0,00% postgres postgres [.] ProcessQuery > + 100,00% 0,00% postgres postgres [.] standard_ExecutorRun > + 100,00% 0,00% postgres postgres [.] ExecModifyTable > + 94,63% 1,47% postgres postgres [.] ExecScan > + 78,76% 1,49% postgres postgres [.] IndexNext > + 66,89% 1,96% postgres postgres [.] index_fetch_heap > + 64,35% 3,61% postgres postgres [.] > heapam_index_fetch_tuple.lto_priv.0 So somehow >60% of the CPU time is spent fetching tuples corresponding to index entries. That seems ... a lot. Is it possible that you have a lot of dead rows in the involved tables? I don't immediately see how this could be related to AIO. Greetings, Andres Freund
Hi, On 2025-10-07 19:03:40 -0400, Andres Freund wrote: > On 2025-10-08 00:13:34 +0200, Marco Boeringa wrote: > So somehow >60% of the CPU time is spent fetching tuples corresponding to > index entries. That seems ... a lot. Is it possible that you have a lot of > dead rows in the involved tables? > > I don't immediately see how this could be related to AIO. Can you share the query and explain for it that was running in the "stuck" backend? Greetings, Andres Freund