Обсуждение: Is stats update during COPY IN really a good idea?
We have a TODO item* Update reltuples in COPY I was just about to go do this when I realized that it may not be such a hot idea after all. The problem is that updating pg_class.reltuples means that concurrent COPY operations will block each other, because they want to update the same row in pg_class. You can already see this happen in CREATE INDEX: create table foo(f1 int);begin;create index fooey on foo(f1); -- in another psql do create index fooey2 on foo(f1); -- second backend blocks until first xact is committed or rolled back. While this doesn't bother me for CREATE INDEX, it does bother me for COPY, since people often use COPY to avoid per-tuple INSERT overhead. It seems pretty likely that this will cause blocking problems for real applications. I think that may be a bigger problem than the benefit of not needing a VACUUM (or, now, ANALYZE) to get the stats updated. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > People are using COPY into the same table at the same time? > > Yes --- we had a message from someone who was doing that (and running > into unrelated performance issues) just last week. OK. > > My vote is to update pg_class. The VACUUM takes much more time than the > > update, and we are only updating the pg_class row, right? > > What? What does VACUUM have to do with this? You have to VACUUM to get pg_class updated after COPY, right? > The reason this is a significant issue is that the first COPY could be > inside a transaction, in which case the lock will persist until that > transaction commits, which could be awhile. Oh, I see. Can we disable the pg_class update if we are in a multi-statement transaction? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> We have a TODO item > * Update reltuples in COPY > > I was just about to go do this when I realized that it may not be such > a hot idea after all. The problem is that updating pg_class.reltuples > means that concurrent COPY operations will block each other, because > they want to update the same row in pg_class. You can already see this > happen in CREATE INDEX: People are using COPY into the same table at the same time? > While this doesn't bother me for CREATE INDEX, it does bother me for > COPY, since people often use COPY to avoid per-tuple INSERT overhead. > It seems pretty likely that this will cause blocking problems for real > applications. I think that may be a bigger problem than the benefit of > not needing a VACUUM (or, now, ANALYZE) to get the stats updated. Oh, well we can either decide to do it or remove the TODO item. Either way we win! My vote is to update pg_class. The VACUUM takes much more time than the update, and we are only updating the pg_class row, right? Can't we just start a new transaction and update the pg_class row, that way we don't have to open it for writing during the copy. FYI, I had a 100k deep directory that caused me problems this morning. Just catching up. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > People are using COPY into the same table at the same time? Yes --- we had a message from someone who was doing that (and running into unrelated performance issues) just last week. > My vote is to update pg_class. The VACUUM takes much more time than the > update, and we are only updating the pg_class row, right? What? What does VACUUM have to do with this? The reason this is a significant issue is that the first COPY could be inside a transaction, in which case the lock will persist until that transaction commits, which could be awhile. > Can't we just start a new transaction and update the pg_class row, > that way we don't have to open it for writing during the copy. No, we cannot; requiring COPY to happen outside a transaction block is not acceptable. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > My vote is to update pg_class. The VACUUM takes much more time than the > update, and we are only updating the pg_class row, right? >> >> What? What does VACUUM have to do with this? > You have to VACUUM to get pg_class updated after COPY, right? But doing this is only interesting if you need to update reltuples in order to get the planner to generate reasonable plans. In reality, if you've added enough data to cause the plans to shift, you probably ought to do an ANALYZE anyway to update pg_statistic. Given that ANALYZE is a lot cheaper than it used to be, I think that the notion of making COPY do this looks fairly obsolete anyhow. > Oh, I see. Can we disable the pg_class update if we are in a > multi-statement transaction? Ugh. Do you really want COPY's behavior to depend on context like that? If you did want context-dependent behavior, a saner approach would be to only try to update reltuples if the copy has more than, say, doubled the old value. This would be likely to happen in bulk load and unlikely to happen in concurrent-insertions-that-choose-to-use-COPY. But I'm not convinced we need it at all. regards, tom lane
> > You have to VACUUM to get pg_class updated after COPY, right? > > But doing this is only interesting if you need to update reltuples in > order to get the planner to generate reasonable plans. In reality, if > you've added enough data to cause the plans to shift, you probably ought > to do an ANALYZE anyway to update pg_statistic. Given that ANALYZE is a > lot cheaper than it used to be, I think that the notion of making COPY > do this looks fairly obsolete anyhow. Yes, but remember, we are trying to catch ignorant cases, not experienced people. > > Oh, I see. Can we disable the pg_class update if we are in a > > multi-statement transaction? > > Ugh. Do you really want COPY's behavior to depend on context like that? > > If you did want context-dependent behavior, a saner approach would be to > only try to update reltuples if the copy has more than, say, doubled the > old value. This would be likely to happen in bulk load and unlikely to > happen in concurrent-insertions-that-choose-to-use-COPY. But I'm not > convinced we need it at all. Maybe not. The COPY/pg_class hack is just to quiet people who have done COPY and forgotten VACUUM or ANALYZE. Maybe the user is only performing a few operations before deleting the table. Updating pg_class does help in that case. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026