Обсуждение: [ADMIN] Autovacuum after bulk data insert(millions!)

Поиск
Список
Период
Сортировка

[ADMIN] Autovacuum after bulk data insert(millions!)

От
Günce Kaya
Дата:
Hi all,

We have a database that is used for CRM. During day there are many process like bulk loading large amounts of data(not thousand like millions!). And all day we observe auto vacuum in server status. It takes long time. I'm curious what happens if we disable autovacuum for a database that process bulk insert? After insert data a table, what autovacuum will do for that table?

I've read some article like; "The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.
However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data." 

The reason why I'm asking, millions of data is inserting to CRM database all day. What is your advice for that scenario? Should I disable autovacuum during bulk process or disable autovacuum totally and autovacuum tables in a some period?

Regards,
Gunce Kaya

Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
Fabrízio de Royes Mello
Дата:


2017-10-17 4:59 GMT-02:00 Günce Kaya <guncekaya14@gmail.com>:
>
> Hi all,
>
> We have a database that is used for CRM. During day there are many process like bulk loading large amounts of data(not thousand like millions!). And all day we observe auto vacuum in server status. It takes long time. I'm curious what happens if we disable autovacuum for a database that process bulk insert? After insert data a table, what autovacuum will do for that table?
>
> I've read some article like; "The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.
> However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data."
>
> The reason why I'm asking, millions of data is inserting to CRM database all day. What is your advice for that scenario? Should I disable autovacuum during bulk process or disable autovacuum totally and autovacuum tables in a some period?
>

One option is:
1) Disable autovacuum in target tables
2) Run bulk load process
3) Run VACUUM ANALYZE manually
4) Enable autovacuum in target tables

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
Günce Kaya
Дата:
Hello Fabrízio,

Thank you for your advice but I'm not asking for only a specific table and not looking for temporary solution. Previous post may not clear to much and I'd explain again. 

So autovacuum process takes long time for that database. I don't know how can I estimate timing for a vacuum after insert millions of bulk data to a table. So there are many table inserting bulk data daily. This solution not for only a table. I'm looking a solution for a database. Data volume of this database is around 2GB so many transactions in a day. 

This is only suggestion and I'm really curious because can not estimate what do you think about that, what if disable autovacuum and vacuum all tables in every month?

Regards,
Gunce

On Tue, Oct 17, 2017 at 2:16 PM, Fabrízio de Royes Mello <fabrizio@timbira.com.br> wrote:


2017-10-17 4:59 GMT-02:00 Günce Kaya <guncekaya14@gmail.com>:
>
> Hi all,
>
> We have a database that is used for CRM. During day there are many process like bulk loading large amounts of data(not thousand like millions!). And all day we observe auto vacuum in server status. It takes long time. I'm curious what happens if we disable autovacuum for a database that process bulk insert? After insert data a table, what autovacuum will do for that table?
>
> I've read some article like; "The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.
> However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data."
>
> The reason why I'm asking, millions of data is inserting to CRM database all day. What is your advice for that scenario? Should I disable autovacuum during bulk process or disable autovacuum totally and autovacuum tables in a some period?
>

One option is:
1) Disable autovacuum in target tables
2) Run bulk load process
3) Run VACUUM ANALYZE manually
4) Enable autovacuum in target tables

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



--
Gunce Kaya

Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
Thomas Markus
Дата:
Hi,

Am 17.10.17 um 13:37 schrieb Günce Kaya:
Hello Fabrízio,

Thank you for your advice but I'm not asking for only a specific table and not looking for temporary solution. Previous post may not clear to much and I'd explain again. 

So autovacuum process takes long time for that database. I don't know how can I estimate timing for a vacuum after insert millions of bulk data to a table. So there are many table inserting bulk data daily. This solution not for only a table. I'm looking a solution for a database. Data volume of this database is around 2GB so many transactions in a day. 

This is only suggestion and I'm really curious because can not estimate what do you think about that, what if disable autovacuum and vacuum all tables in every month?

Regards,
Gunce
works same for a database, so it depends on your usecase

* disable autovacuum for instance
* run 'vacuumdb -a -z' manually/in cron

Regards
Thomas

Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
MichaelDBA
Дата:
Another thing to consider here is whether there are queries running against these tables shortly after the bulk inserts are done. Visibility checks can become expensive if these recent tuple inserts are not vacuumed right away.

Please give more context to the SQL workload for a normal day.  Are there multiple bulk inserts?  How often? Any significant window where they occur?  How many queries and how often related to these bulked up tables?

Also, you may end up choosing a strategy that involves making autovacuum toggle between being more or less aggressive based upon knowledge of the SQL workload.

Regards,
Michael Vitale

Tuesday, October 17, 2017 7:48 AM
Hi,

Am 17.10.17 um 13:37 schrieb Günce Kaya:
works same for a database, so it depends on your usecase

* disable autovacuum for instance
* run 'vacuumdb -a -z' manually/in cron

Regards
Thomas

Tuesday, October 17, 2017 7:37 AM
Hello Fabrízio,

Thank you for your advice but I'm not asking for only a specific table and not looking for temporary solution. Previous post may not clear to much and I'd explain again. 

So autovacuum process takes long time for that database. I don't know how can I estimate timing for a vacuum after insert millions of bulk data to a table. So there are many table inserting bulk data daily. This solution not for only a table. I'm looking a solution for a database. Data volume of this database is around 2GB so many transactions in a day. 

This is only suggestion and I'm really curious because can not estimate what do you think about that, what if disable autovacuum and vacuum all tables in every month?

Regards,
Gunce




--
Gunce Kaya

Tuesday, October 17, 2017 7:16 AM


2017-10-17 4:59 GMT-02:00 Günce Kaya <guncekaya14@gmail.com>:
>
> Hi all,
>
> We have a database that is used for CRM. During day there are many process like bulk loading large amounts of data(not thousand like millions!). And all day we observe auto vacuum in server status. It takes long time. I'm curious what happens if we disable autovacuum for a database that process bulk insert? After insert data a table, what autovacuum will do for that table?
>
> I've read some article like; "The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.
> However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data."
>
> The reason why I'm asking, millions of data is inserting to CRM database all day. What is your advice for that scenario? Should I disable autovacuum during bulk process or disable autovacuum totally and autovacuum tables in a some period?
>

One option is:
1) Disable autovacuum in target tables
2) Run bulk load process
3) Run VACUUM ANALYZE manually
4) Enable autovacuum in target tables

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Tuesday, October 17, 2017 2:59 AM
Hi all,

We have a database that is used for CRM. During day there are many process like bulk loading large amounts of data(not thousand like millions!). And all day we observe auto vacuum in server status. It takes long time. I'm curious what happens if we disable autovacuum for a database that process bulk insert? After insert data a table, what autovacuum will do for that table?

I've read some article like; "The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.
However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data." 

The reason why I'm asking, millions of data is inserting to CRM database all day. What is your advice for that scenario? Should I disable autovacuum during bulk process or disable autovacuum totally and autovacuum tables in a some period?

Regards,
Gunce Kaya


Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
Scott Marlowe
Дата:
On Tue, Oct 17, 2017 at 12:59 AM, Günce Kaya <guncekaya14@gmail.com> wrote:
> Hi all,
>
> We have a database that is used for CRM. During day there are many process
> like bulk loading large amounts of data(not thousand like millions!). And
> all day we observe auto vacuum in server status. It takes long time. I'm
> curious what happens if we disable autovacuum for a database that process
> bulk insert? After insert data a table, what autovacuum will do for that
> table?
>
> I've read some article like; "The autovacuum process takes care of several
> maintenance chores inside your database that you really need. Generally, if
> you think you need to turn regular vacuuming off because it's taking too
> much time or resources, that means you're doing it wrong. The answer to
> almost all vacuuming problems is to vacuum more often, not less, so that
> each individual vacuum operation has less to clean up.
> However, it's acceptable to disable autovacuum for short periods of time,
> for instance when bulk loading large amounts of data."
>
> The reason why I'm asking, millions of data is inserting to CRM database all
> day. What is your advice for that scenario? Should I disable autovacuum
> during bulk process or disable autovacuum totally and autovacuum tables in a
> some period?

I'm not sure the exact problem you're trying to solve here. Do you
want vacuum to finish more quickly? Is autovacuum impacting
performance of the cluster? Or is autovacuum kicking in partway
through the bulk load and causing that to be slow?

If you have a fast IO subsystem and you just want autovacuum to finish
more quickly, then look at making it more aggressive by lowering the
autovacuum_vacuum_cost_delay from the default 20ms to something lower
like 5 or 10ms.

If autovacuum is kicking in part way and affecting performance then
you can look at disabling autovacuum on those tables and then
vacuuming after the bulk upload. Set things like vacuum_cost_delay to
keep vacuum from taking up too much IO. Note that the default for
vacuum_cost_delay is 0, while the default for
autovacuum_vacuum_cost_delay is 20ms. These two values are worlds
apart in terms of impact on the IO of your db server.

Of the cuff I'd guess what you want is for autovacuum to keep up, and
it's taking too long. In that instance, just try lowering
autovacuum_vacuum_cost_delay from 20ms down to 10ms at first and check
both to see if autovacuum is flooding your IO and if autovacuum now
finishes "soon enough" for your needs.


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
Günce Kaya
Дата:
Hi Scott,

Thank you for your response. I sent this post because I want to make sure as to whether we need continuous autovacuum exactly for that database. I read your answer and consider on it well. I'm gonna keep your suggest "disabling autovacuum on those tables and then vacuuming after the bulk upload" on my mind this fastest way for specific tables.

The problem is not only about autovacuum is taking long time, I've read some documents about autovacuum, the point is that disable autovacuum and vacuuming monthly even after millions of bulk data load to a database is reasonable or not? I'm not insist on vacuuming monthly just I'm confusing about that.

Regards,
Gunce

On Tue, Oct 17, 2017 at 6:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Tue, Oct 17, 2017 at 12:59 AM, Günce Kaya <guncekaya14@gmail.com> wrote:
> Hi all,
>
> We have a database that is used for CRM. During day there are many process
> like bulk loading large amounts of data(not thousand like millions!). And
> all day we observe auto vacuum in server status. It takes long time. I'm
> curious what happens if we disable autovacuum for a database that process
> bulk insert? After insert data a table, what autovacuum will do for that
> table?
>
> I've read some article like; "The autovacuum process takes care of several
> maintenance chores inside your database that you really need. Generally, if
> you think you need to turn regular vacuuming off because it's taking too
> much time or resources, that means you're doing it wrong. The answer to
> almost all vacuuming problems is to vacuum more often, not less, so that
> each individual vacuum operation has less to clean up.
> However, it's acceptable to disable autovacuum for short periods of time,
> for instance when bulk loading large amounts of data."
>
> The reason why I'm asking, millions of data is inserting to CRM database all
> day. What is your advice for that scenario? Should I disable autovacuum
> during bulk process or disable autovacuum totally and autovacuum tables in a
> some period?

I'm not sure the exact problem you're trying to solve here. Do you
want vacuum to finish more quickly? Is autovacuum impacting
performance of the cluster? Or is autovacuum kicking in partway
through the bulk load and causing that to be slow?

If you have a fast IO subsystem and you just want autovacuum to finish
more quickly, then look at making it more aggressive by lowering the
autovacuum_vacuum_cost_delay from the default 20ms to something lower
like 5 or 10ms.

If autovacuum is kicking in part way and affecting performance then
you can look at disabling autovacuum on those tables and then
vacuuming after the bulk upload. Set things like vacuum_cost_delay to
keep vacuum from taking up too much IO. Note that the default for
vacuum_cost_delay is 0, while the default for
autovacuum_vacuum_cost_delay is 20ms. These two values are worlds
apart in terms of impact on the IO of your db server.

Of the cuff I'd guess what you want is for autovacuum to keep up, and
it's taking too long. In that instance, just try lowering
autovacuum_vacuum_cost_delay from 20ms down to 10ms at first and check
both to see if autovacuum is flooding your IO and if autovacuum now
finishes "soon enough" for your needs.



--
Gunce Kaya

Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
Igor Neyman
Дата:

 

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Günce Kaya
Sent: Tuesday, October 17, 2017 2:57 PM
To: Scott Marlowe <scott.marlowe@gmail.com>
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Autovacuum after bulk data insert(millions!)

 

Hi Scott,

 

Thank you for your response. I sent this post because I want to make sure as to whether we need continuous autovacuum exactly for that database. I read your answer and consider on it well. I'm gonna keep your suggest "disabling autovacuum on those tables and then vacuuming after the bulk upload" on my mind this fastest way for specific tables.

 

The problem is not only about autovacuum is taking long time, I've read some documents about autovacuum, the point is that disable autovacuum and vacuuming monthly even after millions of bulk data load to a database is reasonable or not? I'm not insist on vacuuming monthly just I'm confusing about that.

 

Regards,

Gunce

 

Gunce,

 

Autovacuum improved a lot in latest PG releases.  Don’t rely on what you read in oldpostings, try/test it yourself.

Disabling autovacuum is rather radical step.  You should do it only if you observe specific problems caused by autovacuum, and even in these cases more often than not tuning autovacuum parameters is a better than disabling it.

Switching from autovacuum to manual vacuuming may cause more problems than solve. You should have very strong reasons to do that.

 

So, first see if there is a problem with autovacuum, then try to tune it, and only if this doesn’t help, try manual vacuuming.

 

Regards,

Igor Neyman

Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
Günce Kaya
Дата:
Hi Michael,

Inserting bulk data is performed by authorised users and cron all day. There are some multiple bulk inserts and some of them has dblink(It increase IO during bulk insert) and there are around 30 tables feed from this bulk insert. Bulk insert is run only one time for each table in a day. Not insert millions of rows to all these tables but at least insert millions of data to around 10 table like this.

I also agree with you for your last paragraph, I just want to know people how experienced similar scenario or what do you think about that. 

Regards,
Gunce 

On Tue, Oct 17, 2017 at 3:24 PM, MichaelDBA <MichaelDBA@sqlexec.com> wrote:
Another thing to consider here is whether there are queries running against these tables shortly after the bulk inserts are done. Visibility checks can become expensive if these recent tuple inserts are not vacuumed right away.

Please give more context to the SQL workload for a normal day.  Are there multiple bulk inserts?  How often? Any significant window where they occur?  How many queries and how often related to these bulked up tables?

Also, you may end up choosing a strategy that involves making autovacuum toggle between being more or less aggressive based upon knowledge of the SQL workload.

Regards,
Michael Vitale

Tuesday, October 17, 2017 7:48 AM
Hi,

Am 17.10.17 um 13:37 schrieb Günce Kaya:
works same for a database, so it depends on your usecase

* disable autovacuum for instance
* run 'vacuumdb -a -z' manually/in cron

Regards
Thomas

Tuesday, October 17, 2017 7:37 AM
Hello Fabrízio,

Thank you for your advice but I'm not asking for only a specific table and not looking for temporary solution. Previous post may not clear to much and I'd explain again. 

So autovacuum process takes long time for that database. I don't know how can I estimate timing for a vacuum after insert millions of bulk data to a table. So there are many table inserting bulk data daily. This solution not for only a table. I'm looking a solution for a database. Data volume of this database is around 2GB so many transactions in a day. 

This is only suggestion and I'm really curious because can not estimate what do you think about that, what if disable autovacuum and vacuum all tables in every month?

Regards,
Gunce




--
Gunce Kaya

Tuesday, October 17, 2017 7:16 AM


2017-10-17 4:59 GMT-02:00 Günce Kaya <guncekaya14@gmail.com>:
>
> Hi all,
>
> We have a database that is used for CRM. During day there are many process like bulk loading large amounts of data(not thousand like millions!). And all day we observe auto vacuum in server status. It takes long time. I'm curious what happens if we disable autovacuum for a database that process bulk insert? After insert data a table, what autovacuum will do for that table?
>
> I've read some article like; "The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.
> However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data."
>
> The reason why I'm asking, millions of data is inserting to CRM database all day. What is your advice for that scenario? Should I disable autovacuum during bulk process or disable autovacuum totally and autovacuum tables in a some period?
>

One option is:
1) Disable autovacuum in target tables
2) Run bulk load process
3) Run VACUUM ANALYZE manually
4) Enable autovacuum in target tables

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Tuesday, October 17, 2017 2:59 AM
Hi all,

We have a database that is used for CRM. During day there are many process like bulk loading large amounts of data(not thousand like millions!). And all day we observe auto vacuum in server status. It takes long time. I'm curious what happens if we disable autovacuum for a database that process bulk insert? After insert data a table, what autovacuum will do for that table?

I've read some article like; "The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.
However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data." 

The reason why I'm asking, millions of data is inserting to CRM database all day. What is your advice for that scenario? Should I disable autovacuum during bulk process or disable autovacuum totally and autovacuum tables in a some period?

Regards,
Gunce Kaya





--
Gunce Kaya

Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
Günce Kaya
Дата:
Hi Igor,

My next step for this post is performing suggestions and testing. I also agree with you this is a radical decision but I'm happy for all your advice, at least I know how can proceed now. 

Thank you for all,

Regards,
Gunce

On Tue, Oct 17, 2017 at 10:21 PM, Igor Neyman <ineyman@perceptron.com> wrote:

 

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Günce Kaya
Sent: Tuesday, October 17, 2017 2:57 PM
To: Scott Marlowe <scott.marlowe@gmail.com>
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Autovacuum after bulk data insert(millions!)

 

Hi Scott,

 

Thank you for your response. I sent this post because I want to make sure as to whether we need continuous autovacuum exactly for that database. I read your answer and consider on it well. I'm gonna keep your suggest "disabling autovacuum on those tables and then vacuuming after the bulk upload" on my mind this fastest way for specific tables.

 

The problem is not only about autovacuum is taking long time, I've read some documents about autovacuum, the point is that disable autovacuum and vacuuming monthly even after millions of bulk data load to a database is reasonable or not? I'm not insist on vacuuming monthly just I'm confusing about that.

 

Regards,

Gunce

 

Gunce,

 

Autovacuum improved a lot in latest PG releases.  Don’t rely on what you read in oldpostings, try/test it yourself.

Disabling autovacuum is rather radical step.  You should do it only if you observe specific problems caused by autovacuum, and even in these cases more often than not tuning autovacuum parameters is a better than disabling it.

Switching from autovacuum to manual vacuuming may cause more problems than solve. You should have very strong reasons to do that.

 

So, first see if there is a problem with autovacuum, then try to tune it, and only if this doesn’t help, try manual vacuuming.

 

Regards,

Igor Neyman




--
Gunce Kaya

Re: [ADMIN] Autovacuum after bulk data insert(millions!)

От
Scott Marlowe
Дата:
On Tue, Oct 17, 2017 at 1:22 PM, Günce Kaya <guncekaya14@gmail.com> wrote:
> Hi Michael,
>
> Inserting bulk data is performed by authorised users and cron all day. There
> are some multiple bulk inserts and some of them has dblink(It increase IO
> during bulk insert) and there are around 30 tables feed from this bulk
> insert. Bulk insert is run only one time for each table in a day. Not insert
> millions of rows to all these tables but at least insert millions of data to
> around 10 table like this.
>
> I also agree with you for your last paragraph, I just want to know people
> how experienced similar scenario or what do you think about that.

Generally speaking making autovacuum more aggressive is the best
option if you're not sure. Waiting to vacuum can cause issues with
bloat getting out of hand, and once a table is bloated the only fix is
a blocking operation like vacuum full or cluster on index.

Autovacuum taking a while to run is fine as long as it's keeping up
with the deleted / dead tuples.  I would recommend installing the
check_postgres script from here: https://bucardo.org/check_postgres/
to keep track of your bloat. As long as it's staying fairly static and
not growing you're likely fine. If it's growing then either make
autovacuum more aggressive or run manual vacuums on the bloated
tables.

The real danger is hitting wrap around. PostgreSQL warns you and runs
autovacuums whether or not they're turned off when it detects that a
table is approaching this point. If you manage to hit wrap around on a
busy database, pgsql will stop accepting connections and force you to
run vacuum in single user mode, which is a pain as it takes down your
whole cluster while you're doing that.


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin