Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
| От | Joshua D. Drake |
|---|---|
| Тема | Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise |
| Дата | |
| Msg-id | 0ec4c002-a219-942d-a686-0804c826c3e0@commandprompt.com обсуждение исходный текст |
| Ответ на | Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise (Sokolov Yura <y.sokolov@postgrespro.ru>) |
| Ответы |
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
|
| Список | pgsql-hackers |
Hello,
I changed the test to run for 6 hours at a time regardless of number of
transactions. I also changed the du command to only look at the database
(previously wal logs were included). This is the clearest indication of
the problem I have been able to produce.
Again, this is with 128 clients and 500 warehouses. The first test is a
clean test, everything dropped, vacuumed etc... Each subsequent test is
just starting the test again to have breakpoints.
-------------------------------------+----------- autovacuum | on
autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold | 50 autovacuum_freeze_max_age |
200000000autovacuum_max_workers | 12 autovacuum_multixact_freeze_max_age | 400000000 autovacuum_naptime
| 10 autovacuum_vacuum_cost_delay | 0 autovacuum_vacuum_cost_limit | 5000
autovacuum_vacuum_scale_factor | 0.1 autovacuum_vacuum_threshold | 50 autovacuum_work_mem |
-1log_autovacuum_min_duration | -1 max_wal_size | 640 checkpoint_timeout
| 86400 checkpoint_completion_target | 0.5
Starting base metric
50G /srv/main/base
Test 1:
90G /srv/main/base
TPS: 838
Test 2:
121G /srv/main/base
TPS: 725
Test 3:
146G /srv/main/base
TPS: 642
Test 4:
171G /srv/main/base
TPS: 549
Test 5:
189G /srv/main/base
TPS: 489
Test 6:
208G /srv/main/base
TPS: 454
As you can see even with aggressive vacuuming, over a period of 36 hours
life gets increasingly miserable.
The largest table is:
postgres=# select
pg_size_pretty(pg_total_relation_size('bmsql_order_line')); pg_size_pretty
---------------- 148 GB
(1 row)
postgres=# \d bmsql_order_line Table "public.bmsql_order_line" Column | Type
|Modifiers
----------------+-----------------------------+----------- ol_w_id | integer | not null
ol_d_id | integer | not null ol_o_id | integer | not null
ol_number | integer | not null ol_i_id | integer | not null
ol_delivery_d | timestamp without time zone | ol_amount | numeric(6,2) | ol_supply_w_id | integer
| ol_quantity | integer | ol_dist_info | character(24) |
Indexes: "bmsql_order_line_pkey" PRIMARY KEY, btree (ol_w_id, ol_d_id,
ol_o_id, ol_number)
Foreign-key constraints: "ol_order_fkey" FOREIGN KEY (ol_w_id, ol_d_id, ol_o_id) REFERENCES
bmsql_oorder(o_w_id, o_d_id, o_id) "ol_stock_fkey" FOREIGN KEY (ol_supply_w_id, ol_i_id) REFERENCES
bmsql_stock(s_w_id, s_i_id)
With the PK being
postgres=# select pg_size_pretty(pg_relation_size('bmsql_order_line_pkey')); pg_size_pretty
---------------- 48 GB
(1 row)
I tried to see how much data we are dealing with here:
postgres=# select count(*) from bmsql_order_line; count
----------- 910324839
(1 row)
Time: 503965.767 ms
And just to show that we were pushing to get these numbers:
avg-cpu: %user %nice %system %iowait %steal %idle 2.38 0.00 2.20 1.98 0.00 93.44
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdb 2027.40 239.99 0.05 1199 0
sda 0.80 0.00 0.01 0 0
So we have 910M rows, and it took 8.39941667 minutes to count them at
240MB/s.
I know this is a lot of data and as I said previously, happy to let
anyone look at it. However, we clearly have something deeper to look into.
Thanks in advance,
JD
--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****
В списке pgsql-hackers по дате отправления: