Обсуждение: Disabling/Enabling index before bulk loading
We recently did a test on COPY and found that on large tables (47 million rows , 20GB of raw data) the difference in COPY with 16 indexes and COPY without any index is 1:14. That is, COPY is 14 times slower when data is ingested with all indexes as opposed to COPY first without index and then create all index. I googled for earlier posting on this and it looks like this has been asked before too. This is what I am thinking to do: 1 - Extract index definition and save it as a SQL somewhere, either a file or a table. 2 - Drop all indexes. 3 - Ingest data via COPY 4 - Recreate all indexes saved in (1). Is there a generic sql or script or tool to accomplish (1). thanks
Ravi Krishna schrieb am 10.07.2018 um 16:08: > > We recently did a test on COPY and found that on large tables (47 million rows , 20GB of raw data) the > difference in COPY with 16 indexes and COPY without any index is 1:14. That is, COPY is 14 times slower > when data is ingested with all indexes as opposed to COPY first without index and then create all index. > > I googled for earlier posting on this and it looks like this has been asked before too. > > This is what I am thinking to do: > > 1 - Extract index definition and save it as a SQL somewhere, either a file or a table. > 2 - Drop all indexes. > 3 - Ingest data via COPY > 4 - Recreate all indexes saved in (1). > > Is there a generic sql or script or tool to accomplish (1). You can do this with a script like this: -- First store the definitions of all the indexes in a table: create table index_backup as select * from pg_indexes where schemaname = 'public' and tablename = 'the_table'; -- now drop all the indexes: do $$ declare l_rec record; begin for l_rec in select schemaname, indexname from index_backup loop execute format('drop index %I.%I', l_rec.schemaname, l_rec.indexname); end loop; end; $$ -- now insert the data ... -- and restore all indexes do $$ declare l_rec record; begin for l_rec in select indexdef from index_backup loop execute l_rec.indexdef; end loop; end; $$
On 07/10/2018 07:08 AM, Ravi Krishna wrote: > > We recently did a test on COPY and found that on large tables (47 million rows , 20GB of raw data) the > difference in COPY with 16 indexes and COPY without any index is 1:14. That is, COPY is 14 times slower > when data is ingested with all indexes as opposed to COPY first without index and then create all index. Did you include the time to CREATE INDEX after the COPY or is the 1:14 only for the COPY stage? > > I googled for earlier posting on this and it looks like this has been asked before too. > > This is what I am thinking to do: > > 1 - Extract index definition and save it as a SQL somewhere, either a file or a table. > 2 - Drop all indexes. > 3 - Ingest data via COPY > 4 - Recreate all indexes saved in (1). > > Is there a generic sql or script or tool to accomplish (1). > > thanks > -- Adrian Klaver adrian.klaver@aklaver.com
> > Did you include the time to CREATE INDEX after the COPY or is the 1:14 only for the COPY stage? Yes. Time taken to load 47 mil rows with all 16 indexes intact: 14+ hrs Time taken to load the same after dropping index and then loading and finally creating 16 indexes: 1 hr 40 min Frankly I am surprised by this staggering difference.
On Tue, Jul 10, 2018 at 1:13 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
>
> Did you include the time to CREATE INDEX after the COPY or is the 1:14 only for the COPY stage?
Yes.
Time taken to load 47 mil rows with all 16 indexes intact: 14+ hrs
Time taken to load the same after dropping index and then loading and finally creating 16 indexes: 1 hr 40 min
This does not work in RDS. In order to update system catalog tables (pg_index), one needs privileges which is
denied in RDS. In RDS terminology, the user must belong to role rdsadmin and that role is reserved only for AWS
internal users.
On Tue, Jul 10, 2018 at 12:26 PM Ravi Krishna <srkrishna@yahoo.com> wrote:
This does not work in RDS. In order to update system catalog tables (pg_index), one needs privileges which isdenied in RDS. In RDS terminology, the user must belong to role rdsadmin and that role is reserved only for AWSinternal users.
Right, but pl/pgsql works fine there, so you can backup&drop all indexes and then restore them using `do $$ ... $$ language plpgsql;`, as Thomas Kellerer already suggested.
One note here: there is some old confusion between concepts of unique indexes and unique constraints in Postgres (see https://www.postgresql.org/message-id/flat/CANNMO%2BKHkkDg-FBi0_78ADmfLiT9kODmz%2B8m6fR6f5kPL-n_ZQ%40mail.gmail.com#cfb3a9eaed8649d7d24ad7944ccb37cf), so there is a chance that after such backup/restore procedure you'll get logically correct but phisically different state.
Ravi Krishna <srkrishna@yahoo.com> writes: > We recently did a test on COPY and found that on large tables (47 million rows , 20GB of raw data) the > difference in COPY with 16 indexes and COPY without any index is 1:14. That is, COPY is 14 times slower > when data is ingested with all indexes as opposed to COPY first without index and then create all index. > This is very system dependent. On our system, when we tested a similar approach, we found that the time saved through dropping the indexes before copy was lost when rebuilding the indexes afterwards. In fact, it ended up being slightly slower. I suspect a lot depends on the number and types of indexes you have. Your example had a lot more indexes than ours. We were loading 22 batches with 2.5M records per batch. While copy was significantly faster than transaction based inserts (minutes vs hours), the differences between indexes and no indexes was measured in minutes. We only had 3 or 4 indexes. > I googled for earlier posting on this and it looks like this has been asked before too. > > This is what I am thinking to do: > > 1 - Extract index definition and save it as a SQL somewhere, either a file or a table. > 2 - Drop all indexes. > 3 - Ingest data via COPY > 4 - Recreate all indexes saved in (1). > > Is there a generic sql or script or tool to accomplish (1). > We are loading data via Javascript using pg and pg-copy-streams modules. It is pretty straight forward to drop the indexes and recreate them afterwards via sql, so we didn't look for a tool as such. As data is only inserted into this table and only by this process, we also turned off autovacuum for this table, performing vacuum and analyze manually after load. Tim -- Tim Cross
> We recently did a test on COPY and found that on large tables (47 million rows , 20GB of raw data) the
> difference in COPY with 16 indexes...
> difference in COPY with 16 indexes...
I am very suspicious of why you need 16 indexes. Are you sure all those indexes are actually being utilized?
Try executing the attached query, You may find find some are really not needed.
--
Melvin Davidson
Maj. Database & Exploration Specialist
Universe Exploration Command – UXC
Employment by invitation only!
Maj. Database & Exploration Specialist
Universe Exploration Command – UXC
Employment by invitation only!
Вложения
I am very suspicious of why you need 16 indexes. Are you sure all those indexes are actually being utilized?Try executing the attached query, You may find find some are really not needed.
this table has 200+ cols.
That said, it is possible that a few of them may be unnecessary. I will run your script and check that out.
thanks for the script.