Обсуждение: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance

Поиск
Список
Период
Сортировка
Hi

We have a master code block which starts small, tiny operations that create a table and inserts data into that table in many threads.  

Nothing is done the master code, we follow an Orchestration pattern , where master just sends a message about what to do and that is done in other database connections not related connections used by master code.

In the master code I add sleep after the CRUD operations are done to make it easier to test. The test table will not change in the rest of this master code (in real life it happens more in the master code off course) .

Then we start testing VACUUM and very simple SQL testing in another window.

We can now show we have performance of "3343.794 ms" and not "0.123 ms", which is what we get when we are able to remove dead rows and run a new analyze.

The problem is that as long as the master code is active, we cannot remove alle dead rows and that what seems to be killing the performance.

With active I mean in hanging on pg_sleep and remember that this master has not created the test table or inserted any data in this test table it self.

Is the expected behavior ?

Is possible to around this problem in any way ?

In this note you find a detailed description and a simple standalone test script  https://gitlab.com/nibioopensource/resolve-overlap-and-gap/-/issues/67#note_1779300212

I have tested on "PostgreSQL 14.10 (Homebrew) on aarch64-apple-darwin23.0.0, compiled by Apple clang version 15.0.0 (clang-1500.0.40.1), 64-bit" and "PostgreSQL 12.6 (Ubuntu 12.6-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit"

Thanks .

Lars

On Mon, 2024-02-19 at 16:14 +0000, Lars Aksel Opsahl wrote:
> Then we start testing VACUUM and very simple SQL testing in another window.
>
> We can now show we have performance of "3343.794 ms" and not "0.123 ms", which
> is what we get when we are able to remove dead rows and run a new analyze.
>
> The problem is that as long as the master code is active, we cannot remove
> alle dead rows and that what seems to be killing the performance.
>
> With active I mean in hanging on pg_sleep and remember that this master has
> not created the test table or inserted any data in this test table it self.
>
> Is the expected behavior ?

It is not entirely clear what you are doing, but it seems like you are holding
a database transaction open, and yes, then it is expected behavior that
VACUUM cannot clean up dead rows in the table.

Make sure that your database transactions are short.
Don't use table or row locks to synchronize application threads.
What you could use to synchronize your application threads are advisory locks,
they are not tied to a database transaction.

Yours,
Laurenz Albe




From: Laurenz Albe <laurenz.albe@cybertec.at>
>
>It is not entirely clear what you are doing, but it seems like you are holding
>a database transaction open, and yes, then it is expected behavior that
>VACUUM cannot clean up dead rows in the table.
>
>Make sure that your database transactions are short.
>Don't use table or row locks to synchronize application threads.
>What you could use to synchronize your application threads are advisory locks,
>they are not tied to a database transaction.
>

Hi

The details are here at https://gitlab.com/nibioopensource/resolve-overlap-and-gap/-/issues/67#note_1779300212 and
here is also a ref. to this test script that shows problem https://gitlab.com/nibioopensource/resolve-overlap-and-gap/uploads/9a0988b50f05386ec9d91d6600bb03ec/test_issue_67.sql

I am not doing any locks I just do plain CRUD operations .

The key is that the master code is not creating any table or insert rows that is done by many short operations as you suggested.

But even if the master code is not doing any operations against the test table it's blocking removal of dead rows.

If this expected behavior, it's means that any long running transactions will block for removal of any dead rows for all visible tables in the database and that seems like problem or weakness of Postgresql.

While writing this I now was thinking maybe I can get around problem by not making the table not visible by the master code but that makes it very complicated for mee.     

Thanks.

Lars


From: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>
>From: Laurenz Albe <laurenz.albe@cybertec.at>
>>
>>It is not entirely clear what you are doing, but it seems like you are holding
>>a database transaction open, and yes, then it is expected behavior that
>>VACUUM cannot clean up dead rows in the table.
>>
>>Make sure that your database transactions are short.
>>Don't use table or row locks to synchronize application threads.
>>What you could use to synchronize your application threads are advisory locks,
>>they are not tied to a database transaction.
>>
>
>Hi
>
>The details are here at https://gitlab.com/nibioopensource/resolve-overlap-and-gap/-/issues/67#note_1779300212 and
>here is also a ref. to this test script that shows problem https://gitlab.com/nibioopensource/resolve-overlap-and-gap/uploads/9a0988b50f05386ec9d91d6600bb03ec/test_issue_67.sql
>
>I am not doing any locks I just do plain CRUD operations .
>
>The key is that the master code is not creating any table or insert rows that is done by many short operations as you suggested.
>
>But even if the master code is not doing any operations against the test table it's blocking removal of dead rows.
>
>If this expected behavior, it's means that any long running transactions will block for removal of any dead rows for all visible tables in the database and that seems like problem or weakness of Postgresql.
>
>While writing this I now was thinking maybe I can get around problem by not making the table not visible by the master code but that makes it very complicated for mee.    
>
>Thanks.
>
>Lars

Hi

I now tested running the master (Orchestration) code  as user joe.

In the master code I connect back as user lop and creates the test table test_null and inserts data in many tiny operations.

User joe who has the long running operation does not know anything about table test_null and does not have any grants to that table.

The table test_null is not granted to public either.    

The problem is the same, the long running transaction to joe will kill the performance on a table which user joe does not have any access to or know anything about .

If this is expected behavior it means that any user on the database that writes a long running sql that does not even insert any data can kill performance for any other user in the database.

So applications like QGIS who seems to keep open connections for a while can then also kill the performance for any other user in the data.

Having postgresql working like this also makes it very difficult to debug performance issues because a problem may just have been a side effect of a not related sql.

So I hope this is not the case and that I have done something wrong or that there are some parameters that can be adjusted on get around this problem.

Thanks

Lars

On Tue, 2024-02-20 at 05:46 +0000, Lars Aksel Opsahl wrote:
> If this is expected behavior it means that any user on the database that writes
> a long running sql that does not even insert any data can kill performance for
> any other user in the database.

Yes, that is the case.  A long running query will hold a snapshot, and no data
visible in that snapshot can be deleted.

That can cause bloat, which can impact performance.

> So applications like QGIS who seems to keep open connections for a while can
> then also kill the performance for any other user in the data.

No, that is not a problem.  Keeping *connections* open is a good thing. It is
keeping data modifying transactions, cursors or long-running queries open
that constitutes a problem.

Yours,
Laurenz Albe




From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Tuesday, February 20, 2024 8:29 AM
>Re: "not related" code blocks for removal of dead rows when using vacuum and this kills the performance
>Laurenz Albe <laurenz.albe@cybertec.at>
>​Lars Aksel Opsahl;​
>pgsql-performance@lists.postgresql.org​
>On Tue, 2024-02-20 at 05:46 +0000, Lars Aksel Opsahl wrote:
>> If this is expected behavior it means that any user on the database that writes
>> a long running sql that does not even insert any data can kill performance for
>> any other user in the database.
>
>Yes, that is the case.  A long running query will hold a snapshot, and no data
>visible in that snapshot can be deleted.
>
>That can cause bloat, which can impact performance.
>

Hi

Thanks for the chat, seems like I finally found solution that seems work for this test code.

Adding a commit's  like here /uploads/031b350bc1f65752b013ee4ae5ae64a3/test_issue_67_with_commit.sql to master code even if there are nothing to commit seems to solve problem and that makes sense based on what you say, because then the master code gets a new visible snapshot and then releases the old snapshot.

The reason why I like to use psql as the master/Orchestration code and not C/Python/Bash and so is to make more simple to use/code and test.

Lars