Fwd: 'CLUSTER' in one database prevents running it in two others on the same database cluster (PG15.2)

Поиск
Список
Период
Сортировка
От Marco Boeringa
Тема Fwd: 'CLUSTER' in one database prevents running it in two others on the same database cluster (PG15.2)
Дата
Msg-id 7c34313b-d285-7df9-0561-540fe8e1b2d0@boeringa.demon.nl
обсуждение исходный текст
Ответ на 'CLUSTER' in one database prevents running it in two others on the same database cluster (PG15.2)  (Marco Boeringa <marco@boeringa.demon.nl>)
Список pgsql-bugs

Well, I can now confirm that if the CLUSTER operation on the 'Planet' size database is finished, the CLUSTER on the secondary databases runs fine, no "missing chunk number 0" error.

I wonder if there is a relation with the currently discussed issue here, as I see in the PostgreSQL documentation, that CLUSTER also requires an ACCESS EXCLUSIVE lock, that is also referred in this thread?:

https://www.postgresql.org/message-id/flat/17812-206eaec1d133a94a%40postgresql.org

Is my issue possibly the "concrete use case", that David J. is referring to?:

"I agree this does seem like a poor risk/reward on the fixing side,
especially absent a concrete live use case problem. I am curious what led
to this discovery."

Marco


-------- Doorgestuurd bericht --------
Onderwerp: 'CLUSTER' in one database prevents running it in two others on the same database cluster (PG15.2)
Datum: Sun, 5 Mar 2023 22:38:40 +0100
Van: Marco Boeringa <marco@boeringa.demon.nl>
Aan: pgsql-bugs@lists.postgresql.org


Hi all,

I am running into a weird issue I haven't encountered before.

This is PostgreSQL 15.2 (Ubuntu 15.2-1.pgdg22.04+1) running on Ubuntu 22.04.2 LTS, with PostGIS (POSTGIS="3.3.2 4975da8" [EXTENSION] PGSQL="150" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.1" LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)").

Now the PostgreSQL database cluster contains multiple databases. In one of these, an OpenStreetMap 'Planet' size database, a CLUSTER operation is running against a very large table of buildings (> 400M records). This runs fine, and I can see the actual progress in pgAdmin by inspecting the 'pg_stat_progress_cluster' view. So far everything seems OK.

However, as part of two secondary Python scripts running that do all kinds of batch operations against spatial tables (generalization and such) in two other databases on the same database cluster, at the very end of the processing, these initialize 'CLUSTER' operations as well. Both of these now fail with a:

"missing chunk number 0 for toast value X in pg_toast_Y"

type error.

After these errors, I see the CLUSTER command listed as well in the PostgreSQL main log, but the ERROR line mentioned above is listed just above the actual CLUSTER statement in the log. No other anomaly / error is visible there. The CLUSTER command just fails though on the two secondary databases. The time displayed in the log is exactly the same for these two log lines one after another, also the number between the square brackets listed immediately after the "CET" of the time are the same for the same database. The pg_toast_Y table names mentioned in the errors differ by the way between the two databases.

If I inspect the table that is mentioned in the CLUSTER command in the log, the very first one that is going to be CLUSTERed in the secondary databases, the table can be displayed fine in DBeaver, even spatially. Checking the geometries with PostGIS's 'ST_IsValid' also shows them to be fine, no errors in the geometries.

Everything else seems OK, database cluster is still up&running, all tables in all databases can be viewed and accessed in DBeaver and pgAdmin.

In all other rare cases where I have seen the "missing chunk number 0 for toast value X in pg_toast_Y" type error, the generalization processing already failed somewhere half-way, never at the end when CLUSTER is initiated. The error therefore seems weird to occur at that point.

I have now been able to reproduce this twice, with two tiny database's filled with just Liechtenstein data, while the Planet database is still in the stage of CLUSTERing the huge building table.

Obviously, it will be interesting to see what happens if the CLUSTER operation on the Planet database is finished, but this will still take many hours, especially since other large tables need to be CLUSTERed in the same database, and I do not want to break off this process right now after already gotten this far.

I will report back once it is finished, but does anyone have a clue why this might happen?

The system has plenty of free space by the way (4x2TB NVMe), and the database runs on a professional workstation with ECC RAM, and used RAM also doesn't seem an issue.

Marco

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: BUG #17116: Assert failed in SerialSetActiveSerXmin() on commit of parallelized serializable transaction
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17823: Generated columns not always updated correctly