Обсуждение: Death postgres

Поиск
Список
Период
Сортировка

Death postgres

От
Marc Millas
Дата:
Hi,

postgres 14.2 on Linux redhat

temp_file_limit set around 210 GB.

a select request with 2 left join have crashed the server (oom killer) after the postgres disk occupation did grow from 15TB to 16 TB.

What are the cases where postgres may grow without caring about temp_file_limit ?

thanks,




Marc MILLAS
Senior Architect
+33607850334

Re: Death postgres

От
Adrian Klaver
Дата:
On 5/5/23 18:14, Marc Millas wrote:
> Hi,
> 
> postgres 14.2 on Linux redhat
> 
> temp_file_limit set around 210 GB.
> 
> a select request with 2 left join have crashed the server (oom killer) 
> after the postgres disk occupation did grow from 15TB to 16 TB.

The result of EXPLAIN <the query> would be helpful.

> 
> What are the cases where postgres may grow without caring about 
> temp_file_limit ?
> 
> thanks,
> 
> 
> 
> 
> Marc MILLAS
> Senior Architect
> +33607850334
> www.mokadb.com <http://www.mokadb.com>
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Death postgres

От
"Peter J. Holzer"
Дата:
On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom killer) after
> the postgres disk occupation did grow from 15TB to 16 TB.

temp_file_limit limits the space a process may use on disk while the OOM
killer gets activated when the system runs out of RAM. So these seem to
be unrelated.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Death postgres

От
Marc Millas
Дата:


Le sam. 6 mai 2023 à 09:46, Peter J. Holzer <hjp-pgsql@hjp.at> a écrit :
On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom killer) after
> the postgres disk occupation did grow from 15TB to 16 TB.

temp_file_limit limits the space a process may use on disk while the OOM
killer gets activated when the system runs out of RAM. So these seem to
be unrelated.

        hp
Its clear that oom killer is triggered by RAM and temp_file is a disk thing... 
But the sudden growth of disk space usage and RAM did happen exactly at the very same time, with only one user connected, and only one query running... 

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Re: Death postgres

От
Marc Millas
Дата:


Le sam. 6 mai 2023 à 06:18, Adrian Klaver <adrian.klaver@aklaver.com> a écrit :
On 5/5/23 18:14, Marc Millas wrote:
> Hi,
>
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom killer)
> after the postgres disk occupation did grow from 15TB to 16 TB.

The result of EXPLAIN <the query> would be helpful.
Sure! 
But. One of the table looks "inaccessible" since. 
Ie. Even explain select * from the_table didnt answer and must be killed by control c
>
> What are the cases where postgres may grow without caring about
> temp_file_limit ?
>
> thanks,
>
>
>
>
> Marc MILLAS
> Senior Architect
> +33607850334
> www.mokadb.com <http://www.mokadb.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Death postgres

От
Ron
Дата:
On 5/6/23 07:19, Marc Millas wrote:


Le sam. 6 mai 2023 à 09:46, Peter J. Holzer <hjp-pgsql@hjp.at> a écrit :
On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom killer) after
> the postgres disk occupation did grow from 15TB to 16 TB.

"15TB" and "16TB" are pretty low-resolution.  For example, 15.4TB rounds down to 15TB, while 15.6TB rounds up to 16TB, while they are in fact only 200GB apart.

Heck, even 15.4TB and 15.6TB are low-resolution.  temp_file_limit may actually be working.


temp_file_limit limits the space a process may use on disk while the OOM
killer gets activated when the system runs out of RAM. So these seem to
be unrelated.

        hp
Its clear that oom killer is triggered by RAM and temp_file is a disk thing... 
But the sudden growth of disk space usage and RAM did happen exactly at the very same time, with only one user connected, and only one query running...

If your question is about temp_file_limit, don't distract us with OOM issues.

--
Born in Arizona, moved to Babylonia.

Re: Death postgres

От
Marc Millas
Дата:


Le sam. 6 mai 2023 à 15:15, Ron <ronljohnsonjr@gmail.com> a écrit :
On 5/6/23 07:19, Marc Millas wrote:


Le sam. 6 mai 2023 à 09:46, Peter J. Holzer <hjp-pgsql@hjp.at> a écrit :
On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom killer) after
> the postgres disk occupation did grow from 15TB to 16 TB.

"15TB" and "16TB" are pretty low-resolution.  For example, 15.4TB rounds down to 15TB, while 15.6TB rounds up to 16TB, while they are in fact only 200GB apart.

Heck, even 15.4TB and 15.6TB are low-resolution.  temp_file_limit may actually be working.

It was... 15.2  and becomes 16.3...


temp_file_limit limits the space a process may use on disk while the OOM
killer gets activated when the system runs out of RAM. So these seem to
be unrelated.

        hp
Its clear that oom killer is triggered by RAM and temp_file is a disk thing... 
But the sudden growth of disk space usage and RAM did happen exactly at the very same time, with only one user connected, and only one query running...

If your question is about temp_file_limit, don't distract us with OOM issues.

--
Born in Arizona, moved to Babylonia.

Re: Death postgres

От
Marc Millas
Дата:


Le sam. 6 mai 2023 à 15:15, Ron <ronljohnsonjr@gmail.com> a écrit :
On 5/6/23 07:19, Marc Millas wrote:


Le sam. 6 mai 2023 à 09:46, Peter J. Holzer <hjp-pgsql@hjp.at> a écrit :
On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom killer) after
> the postgres disk occupation did grow from 15TB to 16 TB.

"15TB" and "16TB" are pretty low-resolution.  For example, 15.4TB rounds down to 15TB, while 15.6TB rounds up to 16TB, while they are in fact only 200GB apart.

Heck, even 15.4TB and 15.6TB are low-resolution.  temp_file_limit may actually be working.


temp_file_limit limits the space a process may use on disk while the OOM
killer gets activated when the system runs out of RAM. So these seem to
be unrelated.

        hp
Its clear that oom killer is triggered by RAM and temp_file is a disk thing... 
But the sudden growth of disk space usage and RAM did happen exactly at the very same time, with only one user connected, and only one query running...

If your question is about temp_file_limit, don't distract us with OOM issues.
My question is how postgres can use space without caring about temp_file_limit. The oom info is kind of hint about the context as, as said, one select did generate both things

--
Born in Arizona, moved to Babylonia.

Re: Death postgres

От
Ron
Дата:
On 5/6/23 08:52, Marc Millas wrote:

Le sam. 6 mai 2023 à 15:15, Ron <ronljohnsonjr@gmail.com> a écrit :

[snip]
If your question is about temp_file_limit, don't distract us with OOM issues.
My question is how postgres can use space without caring about temp_file_limit. The oom info is kind of hint about the context as, as said, one select did generate both things

It's a distraction to lead with "OOM killed my process".  Evidence of this fact is that all respondents have talked about is memory, not disk space.

--
Born in Arizona, moved to Babylonia.

Re: Death postgres

От
Thomas Guyot
Дата:
On 2023-05-05 21:14, Marc Millas wrote:
> Hi,
>
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom killer) 
> after the postgres disk occupation did grow from 15TB to 16 TB.
>
> What are the cases where postgres may grow without caring about 
> temp_file_limit ?
>
> thanks,
>

Some OSes like IIRC RHEL9 now default to tmpfs for /tmp - if your temp 
files are written in a tmpfs then it may may very well trigger the OOM 
because of the temp file used up all RAM.

Check the filesystem type of your temp file's location.

--
Thomas



Re: Death postgres

От
Adrian Klaver
Дата:
On 5/6/23 05:25, Marc Millas wrote:
> 
> 
> Le sam. 6 mai 2023 à 06:18, Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> a écrit :
> 
>     On 5/5/23 18:14, Marc Millas wrote:
>      > Hi,
>      >
>      > postgres 14.2 on Linux redhat
>      >
>      > temp_file_limit set around 210 GB.
>      >
>      > a select request with 2 left join have crashed the server (oom
>     killer)
>      > after the postgres disk occupation did grow from 15TB to 16 TB.
> 
>     The result of EXPLAIN <the query> would be helpful.
>     Sure!
> 
> But. One of the table looks "inaccessible" since.
> Ie. Even explain select * from the_table didnt answer and must be killed 
> by control c

When you restarted the server where there any warnings shown?

Using psql can you \d <the_table>?

Can you select from any other table in the database?





-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Death postgres

От
Marc Millas
Дата:


Le sam. 6 mai 2023 à 18:11, Adrian Klaver <adrian.klaver@aklaver.com> a écrit :
On 5/6/23 05:25, Marc Millas wrote:
>
>
> Le sam. 6 mai 2023 à 06:18, Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> a écrit :
>
>     On 5/5/23 18:14, Marc Millas wrote:
>      > Hi,
>      >
>      > postgres 14.2 on Linux redhat
>      >
>      > temp_file_limit set around 210 GB.
>      >
>      > a select request with 2 left join have crashed the server (oom
>     killer)
>      > after the postgres disk occupation did grow from 15TB to 16 TB.
>
>     The result of EXPLAIN <the query> would be helpful.
>     Sure!
>
> But. One of the table looks "inaccessible" since.
> Ie. Even explain select * from the_table didnt answer and must be killed
> by control c

When you restarted the server where there any warnings shown?
Sadly, I cannot. Will be done next tuesday. 

Using psql can you \d <the_table>?
Yes, and no pb to check pg_statistic,... 

Can you select from any other table in the database?
Yes





--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Death postgres

От
Adrian Klaver
Дата:
On 5/6/23 10:13, Marc Millas wrote:
> 
> 

>     When you restarted the server where there any warnings shown?
> 
> Sadly, I cannot. Will be done next tuesday.

Cannot do what:

1) Get to the log to see if there are warnings?

2) Restart the server?

Your original post said the server crashed.

If that was the case how can you do any of the below without restarting it?

> 
> 
>     Using psql can you \d <the_table>?
> 
> Yes, and no pb to check pg_statistic,...
> 
> 
>     Can you select from any other table in the database?
> 
> Yes
> 
> 
> 
> 
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Death postgres

От
Marc Millas
Дата:


Le sam. 6 mai 2023 à 21:46, Adrian Klaver <adrian.klaver@aklaver.com> a écrit :
On 5/6/23 10:13, Marc Millas wrote:
>
>

>     When you restarted the server where there any warnings shown?
>
> Sadly, I cannot. Will be done next tuesday.

Cannot do what:

1) Get to the log to see if there are warnings?

2) Restart the server?

Your original post said the server crashed.

If that was the case how can you do any of the below without restarting it?
Infrastructure is managed by another company. The db is within a container. 
So.. If I run a pg_ctl restart, the container... Vanished. 
So for each main thing, I must ask... By mail.. 
>
>
>     Using psql can you \d <the_table>?
>
> Yes, and no pb to check pg_statistic,...
>
>
>     Can you select from any other table in the database?
>
> Yes
>
>
>
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Death postgres

От
Adrian Klaver
Дата:
On 5/7/23 04:46, Marc Millas wrote:
> 
> 
> Le sam. 6 mai 2023 à 21:46, Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> a écrit :
> 
>     On 5/6/23 10:13, Marc Millas wrote:
>      >
>      >
> 
>      >     When you restarted the server where there any warnings shown?
>      >
>      > Sadly, I cannot. Will be done next tuesday.
> 
>     Cannot do what:
> 
>     1) Get to the log to see if there are warnings?
> 
>     2) Restart the server?
> 
>     Your original post said the server crashed.
> 
>     If that was the case how can you do any of the below without
>     restarting it?
>     Infrastructure is managed by another company. The db is within a
>     container.
> 
> So.. If I run a pg_ctl restart, the container... Vanished.
> So for each main thing, I must ask... By mail..

1) The question remains, if the container crashes when you run pg_ctl 
how can you do?:


     Using psql can you \d <the_table>?

Yes, and no pb to check pg_statistic,...


     Can you select from any other table in the database?

Yes


2) You need to provide a more detailed explanation of the setup that the 
Postgres server is running in.

a) The host OS and version.

b) The container specifications.

    1) What container software and version?

    2) 'Hardware' specifications.

c) Some indication of the size of the data sets you are querying. Also 
the query you used.

> 
>      >
>      >
>      >     Using psql can you \d <the_table>?
>      >
>      > Yes, and no pb to check pg_statistic,...
>      >
>      >
>      >     Can you select from any other table in the database?
>      >
>      > Yes
>      >
>      >
>      >
>      >
>      >
>      >
>      >     --
>      >     Adrian Klaver
>      > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>     <mailto:adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>
>      >
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Death postgres

От
Laurenz Albe
Дата:
On Sat, 2023-05-06 at 03:14 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom killer) after the postgres
> disk occupation did grow from 15TB to 16 TB.
>
> What are the cases where postgres may grow without caring about temp_file_limit ?

That's too little information for a decent answer.
One obvious answer is: if it is not writing temporary files.

Yours,
Laurenz Albe



Re: Death postgres

От
Marc Millas
Дата:
On Sun, May 7, 2023 at 8:42 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sat, 2023-05-06 at 03:14 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
>
> temp_file_limit set around 210 GB.
>
> a select request with 2 left join have crashed the server (oom killer) after the postgres
> disk occupation did grow from 15TB to 16 TB.
>
> What are the cases where postgres may grow without caring about temp_file_limit ?

That's too little information for a decent answer.
One obvious answer is: if it is not writing temporary files.

Yours,
Laurenz Albe

Logical ...

so here is a little more info:

db=# analyze myschema.table_a;
ANALYZE

db=# with ta as(select 'myschema' as s, 'table_a' as t), p as (select * from information_schema.columns, ta where table_schema=ta.s and table_name=ta.t),  tableid as(select relid from pg_stat_user_tables, ta where schemaname=ta.s and relname=ta.t) select staattnum, column_name, stanullfrac, stadistinct from tableid, pg_statistic join p on p.ordinal_position=staattnum where starelid=tableid.relid order by staattnum;
 staattnum | column_name  | stanullfrac | stadistinct
-----------+--------------+-------------+-------------
         1 | col_ne     |           0 |  -0.6100224
         2 | col_brg    |      0.0208 |           6
         3 | col_ano    |           0 |      447302
         4 | col_ine    |           0 |  -0.5341927
         5 | col_cha    |           0 |          11
(5 rows)

db=# select count(*) from myschema.table_a;
  count  
----------
 13080776
(1 row)

db=# select count(distinct col_ano) from myschema.table_a;
  count  
----------
 10149937
(1 row)


// stats looks somewhat absurd, as analyze guess 20 times less distinct values as a select distinct does on column col_ano...
db=# analyze myschema.table_b;
ANALYZE
db=# with ta as(select 'myschema' as s, 'table_b' as t), p as (select * from information_schema.columns, ta where table_schema=ta.s and table_name=ta.t),  tableid as(select relid from pg_stat_user_tables, ta where schemaname=ta.s and relname=ta.t) select staattnum, column_name, stanullfrac, stadistinct from tableid, pg_statistic join p on p.ordinal_position=staattnum where starelid=tableid.relid order by staattnum;
 staattnum |   column_name   | stanullfrac | stadistinct
-----------+-----------------+-------------+-------------
         1 | col_nerg      |           0 |      161828
         2 | col_ibi       |           0 |      362161
         3 | col_imi       |           0 |      381023
         4 | col_ipi       |           0 |      391915
         5 | col_ne        |           0 | -0.53864235
         6 | col_ano       |           0 |      482225
(6 rows)

db=# select count(*) from myschema.table_b;
  count  
----------
 14811485
(1 row)

db=# select count(distinct col_ano) from myschema.table_b;
  count  
----------
 10149937
(1 row)

//same remark


db=# explain  select distinct t1.col_ine,  case when t2.col_ibi is null then t3.col_ibi else t2.col_ibi end  from myschema.table_a t1 left join myschema.table_b t2 on t1.col_ano=t2.col_ano Left join myschema.table_b t3 on t1.NUM_ENQ=t3.NUM_ENQ;
                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97)
   ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=1021522829864 width=97)
         Workers Planned: 5
         ->  Sort  (cost=72377462162.94..72888223577.87 rows=204304565973 width=97)
               Sort Key: t1.col_ine, (CASE WHEN (t2.col_ibi IS NULL) THEN t3.col_ibi ELSE t2.col_ibi END)
               ->  Parallel Hash Left Join  (cost=604502.76..1276224253.51 rows=204304565973 width=97)
                     Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
                     ->  Parallel Hash Left Join  (cost=300803.38..582295.38 rows=4857277 width=52)
                           Hash Cond: ((t1.col_ne)::text = (t3.col_ne)::text)
                           ->  Parallel Seq Scan on table_a t1  (cost=0.00..184052.35 rows=2616335 width=53)
                           ->  Parallel Hash  (cost=243466.06..243466.06 rows=2965306 width=31)
                                 ->  Parallel Seq Scan on table_b t3  (cost=0.00..243466.06 rows=2965306 width=31)
                     ->  Parallel Hash  (cost=243466.06..243466.06 rows=2965306 width=34)
                           ->  Parallel Seq Scan on table_b t2  (cost=0.00..243466.06 rows=2965306 width=34)
 JIT:
   Functions: 19
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(17 rows)

//so.. the planner guess that those 2 join will generate 1000 billions rows...

//so, I try to change stats

db=# alter table myschema.table_a alter column col_ano set (n_distinct=-0.8);
ALTER TABLE
db=# analyze myschema.table_a;
ANALYZE
db=# with ta as(select 'myschema' as s, 'table_a' as t), p as (select * from information_schema.columns, ta where table_schema=ta.s and table_name=ta.t),  tableid as(select relid from pg_stat_user_tables, ta where schemaname=ta.s and relname=ta.t) select staattnum, column_name, stanullfrac, stadistinct from tableid, pg_statistic join p on p.ordinal_position=staattnum where starelid=tableid.relid order by staattnum;
 staattnum | column_name  | stanullfrac | stadistinct
-----------+--------------+-------------+-------------
         1 | col_ne     |           0 |  -0.6694066
         2 | col_brg    |      0.0224 |           6
         3 | col_ano    |           0 |        -0.8
         4 | col_ine    |           0 |   -0.542651
         5 | col_cha    |           0 |          12
(5 rows)

db=# alter table myschema.table_b alter column col_ano set (n_distinct=-0.8);
ALTER TABLE
db=# analyze myschema.table_b;
ANALYZE
db=# with ta as(select 'myschema' as s, 'table_b' as t), p as (select * from information_schema.columns, ta where table_schema=ta.s and table_name=ta.t),  tableid as(select relid from pg_stat_user_tables, ta where schemaname=ta.s and relname=ta.t) select staattnum, column_name, stanullfrac, stadistinct from tableid, pg_statistic join p on p.ordinal_position=staattnum where starelid=tableid.relid order by staattnum;
 staattnum |   column_name   | stanullfrac | stadistinct
-----------+-----------------+-------------+-------------
         1 | col_nerg    |           0 |      166151
         2 | col_ibi     |           0 |      374699
         3 | col_imi     |           0 |      394296
         4 | col_ipi     |           0 |      404515
         5 | col_ne      |           0 | -0.47193906
         6 | col_ano     |           0 |        -0.8
(6 rows)

//doing the explain of the joins again

db=# explain  select distinct t1.col_ine,  case when t2.col_ibi is null then t3.col_ibi else t2.col_ibi end  from myschema.table_a t1 left join myschema.table_b t2 on t1.col_ano=t2.col_ano Left join myschema.table_b t3 on t1.NUM_ENQ=t3.NUM_ENQ;
                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=62824833042.98..174732492390.53 rows=888686328950 width=97)
   ->  Gather Merge  (cost=62824833042.98..170289060745.78 rows=888686328950 width=97)
         Workers Planned: 5
         ->  Sort  (cost=62824832042.91..63269175207.38 rows=177737265790 width=97)
               Sort Key: t1.col_ine, (CASE WHEN (t2.col_ibi IS NULL) THEN t3.col_ibi ELSE t2.col_ibi END)
               ->  Parallel Hash Left Join  (cost=604274.78..1148041043.61 rows=177737265790 width=97)
                     Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
                     ->  Parallel Hash Left Join  (cost=300691.39..581280.67 rows=4424044 width=53)
                           Hash Cond: ((t1.col_ne)::text = (t3.col_ne)::text)
                           ->  Parallel Seq Scan on table_a t1  (cost=0.00..184049.73 rows=2616073 width=54)
                           ->  Parallel Hash  (cost=243427.84..243427.84 rows=2961484 width=31)
                                 ->  Parallel Seq Scan on table_b t3  (cost=0.00..243427.84 rows=2961484 width=31)
                     ->  Parallel Hash  (cost=243427.84..243427.84 rows=2961484 width=34)
                           ->  Parallel Seq Scan on table_b t2  (cost=0.00..243427.84 rows=2961484 width=34)
 JIT:
   Functions: 19
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(17 rows)

//almost equivalent.
//temp_file_limit is set to 210 GB.
// if I run the request, or explain analyze it (same thing) the disk occupation grows of 1.1 TB, and the RAM occupation grows until oom kills
//BTW, if I simplify the select to keep only the first join, the number of estimated rows is still ""huge"" and I didnt try to execute it.

thanks
 

Re: Death postgres

От
Laurenz Albe
Дата:
On Wed, 2023-05-10 at 16:35 +0200, Marc Millas wrote:
> > > postgres 14.2 on Linux redhat
> > >
> > > temp_file_limit set around 210 GB.
> > >
> > > a select request with 2 left join have crashed the server (oom killer) after the postgres
> > > disk occupation did grow from 15TB to 16 TB.
> > >
> > > What are the cases where postgres may grow without caring about temp_file_limit ?
> >
> > That's too little information for a decent answer.
> > One obvious answer is: if it is not writing temporary files.
>
> so here is a little more info:
>
>                                                         QUERY PLAN                                                  
      
>
---------------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97)
>    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=1021522829864 width=97)
>          Workers Planned: 5
>          ->  Sort  (cost=72377462162.94..72888223577.87 rows=204304565973 width=97)
>                Sort Key: t1.col_ine, (CASE WHEN (t2.col_ibi IS NULL) THEN t3.col_ibi ELSE t2.col_ibi END)
>                ->  Parallel Hash Left Join  (cost=604502.76..1276224253.51 rows=204304565973 width=97)
>                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
>                      ->  Parallel Hash Left Join  (cost=300803.38..582295.38 rows=4857277 width=52)
>                            Hash Cond: ((t1.col_ne)::text = (t3.col_ne)::text)
>                            ->  Parallel Seq Scan on table_a t1  (cost=0.00..184052.35 rows=2616335 width=53)
>                            ->  Parallel Hash  (cost=243466.06..243466.06 rows=2965306 width=31)
>                                  ->  Parallel Seq Scan on table_b t3  (cost=0.00..243466.06 rows=2965306 width=31)
>                      ->  Parallel Hash  (cost=243466.06..243466.06 rows=2965306 width=34)
>                            ->  Parallel Seq Scan on table_b t2  (cost=0.00..243466.06 rows=2965306 width=34)
>  JIT:
>    Functions: 19
>    Options: Inlining true, Optimization true, Expressions true, Deforming true
> (17 rows)

Perhaps parallel query drives you OOM.  Does the problem also happen if
"max_parallel_workers_per_gather" is set to 0?

Yours,
Laurenz Albe



Re: Death postgres

От
"Peter J. Holzer"
Дата:
On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
>  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97)
>    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=1021522829864 width=97)
...
>                ->  Parallel Hash Left Join  (cost=604502.76..1276224253.51 rows=204304565973 width=97)
>                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
...
>
> //so.. the planner guess that those 2 join will generate 1000 billions rows...

Are some of the col_ano values very frequent? If say the value 42 occurs
1 million times in both table_a and table_b, the join will create 1
trillion rows for that value alone. That doesn't explain the crash or the
disk usage, but it would explain the crazy cost (and would probably be a
hint that this query is unlikely to finish in any reasonable time).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Death postgres

От
Marc Millas
Дата:

On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
>  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97)
>    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=1021522829864 width=97)
...
>                ->  Parallel Hash Left Join  (cost=604502.76..1276224253.51 rows=204304565973 width=97)
>                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
...
>
> //so.. the planner guess that those 2 join will generate 1000 billions rows...

Are some of the col_ano values very frequent? If say the value 42 occurs
1 million times in both table_a and table_b, the join will create 1
trillion rows for that value alone. That doesn't explain the crash or the
disk usage, but it would explain the crazy cost (and would probably be a
hint that this query is unlikely to finish in any reasonable time).

        hp

good guess, even if a bit surprising: there is one (and only one) "value" which fit your supposition: NULL
750000 in each table which perfectly fit the planner rows estimate.
One question: what is postgres doing when it planned to hash 1000 billions rows ?
Did postgres create an appropriate ""space"" to handle those 1000 billions hash values ?
 thanks, 
MM
--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Re: Death postgres

От
"Peter J. Holzer"
Дата:
On 2023-05-10 22:52:47 +0200, Marc Millas wrote:
> On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
>     On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
>     >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=
>     97)
>     >    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=
>     1021522829864 width=97)
>     ...
>     >                ->  Parallel Hash Left Join  (cost=
>     604502.76..1276224253.51 rows=204304565973 width=97)
>     >                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
>     ...
>     >
>     > //so.. the planner guess that those 2 join will generate 1000 billions
>     rows...
>
>     Are some of the col_ano values very frequent? If say the value 42 occurs
>     1 million times in both table_a and table_b, the join will create 1
>     trillion rows for that value alone. That doesn't explain the crash or the
>     disk usage, but it would explain the crazy cost (and would probably be a
>     hint that this query is unlikely to finish in any reasonable time).
>
>
> good guess, even if a bit surprising: there is one (and only one) "value" which
> fit your supposition: NULL

But NULL doesn't equal NULL, so that would result in only one row in the
left join. So that's not it.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Death postgres

От
Marc Millas
Дата:




On Thu, May 11, 2023 at 1:56 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2023-05-10 22:52:47 +0200, Marc Millas wrote:
> On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
>     On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
>     >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=
>     97)
>     >    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=
>     1021522829864 width=97)
>     ...
>     >                ->  Parallel Hash Left Join  (cost=
>     604502.76..1276224253.51 rows=204304565973 width=97)
>     >                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
>     ...
>     >
>     > //so.. the planner guess that those 2 join will generate 1000 billions
>     rows...
>
>     Are some of the col_ano values very frequent? If say the value 42 occurs
>     1 million times in both table_a and table_b, the join will create 1
>     trillion rows for that value alone. That doesn't explain the crash or the
>     disk usage, but it would explain the crazy cost (and would probably be a
>     hint that this query is unlikely to finish in any reasonable time).
>
>
> good guess, even if a bit surprising: there is one (and only one) "value" which
> fit your supposition: NULL

But NULL doesn't equal NULL, so that would result in only one row in the
left join. So that's not it.

if so... how ??? 

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Re: Death postgres

От
Marc Millas
Дата:



On Thu, May 11, 2023 at 1:56 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2023-05-10 22:52:47 +0200, Marc Millas wrote:
> On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
>     On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
>     >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=
>     97)
>     >    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=
>     1021522829864 width=97)
>     ...
>     >                ->  Parallel Hash Left Join  (cost=
>     604502.76..1276224253.51 rows=204304565973 width=97)
>     >                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
>     ...
>     >
>     > //so.. the planner guess that those 2 join will generate 1000 billions
>     rows...
>
>     Are some of the col_ano values very frequent? If say the value 42 occurs
>     1 million times in both table_a and table_b, the join will create 1
>     trillion rows for that value alone. That doesn't explain the crash or the
>     disk usage, but it would explain the crazy cost (and would probably be a
>     hint that this query is unlikely to finish in any reasonable time).
>
>
> good guess, even if a bit surprising: there is one (and only one) "value" which
> fit your supposition: NULL

But NULL doesn't equal NULL, so that would result in only one row in the
left join. So that's not it.


so, apo...

the 750000 lines in each tables are not NULLs but '' empty varchar, which, obviously is not the same thing.
and which perfectly generates 500 billions lines for the left join.
So, no planner or statistics pbs. apologies for the time wasted.
Back to the initial pb:
if, with temp_file_limit positioned to 210 GB, I try to run the select * from table_a left join table_b on the col_a (which contains the 750000 '' on both tables)
then postgres do crash, killed by oom, after having taken 1.1 TB of additional disk space. 
the explain plan guess 512 planned partitions. (obviously, I cannot provide an explain analyze...)

to my understanding, before postgres 13, hash aggregate did eat RAM limitless in such circumstances.
but in 14.2 ??
(I know, 14.8 is up...)


        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"



Marc MILLAS
Senior Architect
+33607850334


 

Re: Death postgres

От
"Peter J. Holzer"
Дата:
On 2023-05-11 21:27:57 +0200, Marc Millas wrote:
> the 750000 lines in each tables are not NULLs but '' empty varchar, which,
> obviously is not the same thing.
> and which perfectly generates 500 billions lines for the left join.
> So, no planner or statistics pbs. apologies for the time wasted.

No problem. Glad to have solved that puzzle.

> Back to the initial pb: if, with temp_file_limit positioned to 210 GB,
> I try to run the select * from table_a left join table_b on the col_a
> (which contains the 750000 '' on both tables)
> then postgres do crash, killed by oom, after having taken 1.1 TB of additional
> disk space. 

My guess is that the amount of parallelism is the problem.

work_mem is a per-node limit. Even a single process can use a multiple of
work_mem if the query contains nested nodes (which almost every query
does, but most nodes don't need much memory). With 5 parallel workers,
the total consumption will be 5 times that. So to prevent the OOM
condition you would need to reduce work_mem or max_parallel_workers (at
least for this query).

The description temp_file_limit says "...the maximum amount of disk
space that *a process* can use...". So with 5 workers that's 210*5 =
1050 GB total. Again, you may want to reduce either temp_file_limit or
max_parallel_workers.

> to my understanding, before postgres 13, hash aggregate did eat RAM limitless
> in such circumstances.
> but in 14.2 ??
> (I know, 14.8 is up...)

Maybe the older version of postgres didn't use as many workers for that
query (or maybe not parallelize it at all)?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Death postgres

От
Marc Millas
Дата:



On Fri, May 12, 2023 at 8:31 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2023-05-11 21:27:57 +0200, Marc Millas wrote:
> the 750000 lines in each tables are not NULLs but '' empty varchar, which,
> obviously is not the same thing.
> and which perfectly generates 500 billions lines for the left join.
> So, no planner or statistics pbs. apologies for the time wasted.

No problem. Glad to have solved that puzzle.

> Back to the initial pb: if, with temp_file_limit positioned to 210 GB,
> I try to run the select * from table_a left join table_b on the col_a
> (which contains the 750000 '' on both tables)
> then postgres do crash, killed by oom, after having taken 1.1 TB of additional
> disk space. 

My guess is that the amount of parallelism is the problem.

work_mem is a per-node limit. Even a single process can use a multiple of
work_mem if the query contains nested nodes (which almost every query
does, but most nodes don't need much memory). With 5 parallel workers,
the total consumption will be 5 times that. So to prevent the OOM
condition you would need to reduce work_mem or max_parallel_workers (at
least for this query).

we have more than 100GB RAM and only 1 user, with one request running.
work_mem is set to 10MB.  for oom to kill due to work_mem it means that for one request with 2 left join, postgres needs more than 10.000 work_mem buffers.
to me, it seems difficult to believe. but that postgres may need that RAM space for hashing or whatever other similar purpose seems more probable.
no ? 

The description temp_file_limit says "...the maximum amount of disk
space that *a process* can use...". So with 5 workers that's 210*5 =
1050 GB total. Again, you may want to reduce either temp_file_limit or
max_parallel_workers.
Yes, but if so, we may have had a request canceled due to temp_file limit, which was not the case. 

> to my understanding, before postgres 13, hash aggregate did eat RAM limitless
> in such circumstances.
> but in 14.2 ??
> (I know, 14.8 is up...)

Maybe the older version of postgres didn't use as many workers for that
query (or maybe not parallelize it at all)?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Re: Death postgres

От
"Peter J. Holzer"
Дата:
On 2023-05-12 17:41:37 +0200, Marc Millas wrote:
> On Fri, May 12, 2023 at 8:31 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>     My guess is that the amount of parallelism is the problem.
>
>     work_mem is a per-node limit. Even a single process can use a multiple of
>     work_mem if the query contains nested nodes (which almost every query
>     does, but most nodes don't need much memory). With 5 parallel workers,
>     the total consumption will be 5 times that. So to prevent the OOM
>     condition you would need to reduce work_mem or max_parallel_workers (at
>     least for this query).
>
>
> we have more than 100GB RAM and only 1 user, with one request running.
> work_mem is set to 10MB.

I don't remember mentioning you that before, so I had to guess.

>   for oom to kill due to work_mem it means that for one
> request with 2 left join, postgres needs more than 10.000 work_mem buffers.
> to me, it seems difficult to believe.

Yes. work_mem is unlikely to be the culprit here. It must be something
else which is eating the RAM. But I think it's hard to reproduce that
with the information you've given us this far.

> but that postgres may need that RAM space for hashing or whatever
> other similar purpose seems more probable. no ? 

That should adhere to the work_mem limit, too.

I'm not a core developer (actually not a Postgres developer at all, just
a developer of other stuff which happens to use Postgres), but I
remember that there was an issue some years ago where a hash table could
grow much larger than estimated by the planner. That issue was fixed,
but maybe you've run into a similar case?


>     The description temp_file_limit says "...the maximum amount of disk
>     space that *a process* can use...". So with 5 workers that's 210*5 =
>     1050 GB total. Again, you may want to reduce either temp_file_limit or
>     max_parallel_workers.
>
> Yes, but if so, we may have had a request canceled due to temp_file limit,
> which was not the case. 

Well, you've said that the disk space grew by "1.1 TB". I understand
that to mean "something between 1.05 TB and 1.15 TB", not exactly
1100000000000 Bytes (or 1209462790554 Bytes or 1100213452800 Bytes,
depending on how you define "TB"). 1050 GB is within that range, so you
might just have missed that limit.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения