Re: VACUUM and ANALYZE Follow-Up
От | Mark Dexter |
---|---|
Тема | Re: VACUUM and ANALYZE Follow-Up |
Дата | |
Msg-id | 5E8F9F5B63726C48836757FE673B584E0121599A@dcimail.dexterchaney.local обсуждение исходный текст |
Ответ на | VACUUM and ANALYZE Follow-Up ("Mark Dexter" <MDEXTER@dexterchaney.com>) |
Ответы |
Re: VACUUM and ANALYZE Follow-Up
(Alvaro Herrera <alvherre@dcc.uchile.cl>)
Re: VACUUM and ANALYZE Follow-Up (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>) |
Список | pgsql-general |
Hmm... it seems that we're maybe not understanding one another here. I'm going to try to be more clear. Below are the questions that were raised and my attemp to answer them clearly. 1. Why run VACUUM on an empty table? This is a good question, but perhaps there is a valid answer. Our production database contains about 1500 tables. It is MUCH more convenient to be able to VACCUM ANALYZE the entire database than to list specific tables. Furthermore, our application uses "work" tables that often will be empty (e.g., at night) but that, at times, will contain a large number of rows. The Postgres documentation says the following: "We recommend that active production databases be vacuumed frequently (at least nightly), in order to remove expired rows.". This is going to be difficult without messing up the performance. 2. Hang on, it's an empty table. Is it supposed to fabricate these statistics out of thin air? Any made up numbers will probably be worse than none at all. Well, that's why I suggested some type of command line option so the user could give it a number of rows to use for the analysis (e.g., ANALYZE MINIMUM 1000). Another point of interest: If I DROP and the CREATE the table, without doing ANALYZE, I get good performance when inserting rows. So whatever assumptions the database is making about a newly-created table appear to be different (and BETTER) than the assumptions made when doing ANALYZE on an empty table. It's not clear to me why this should be. In both cases, you don't really know anything about the table other than at this moment it has zero rows. Obviously, it would be better (at least in this instance) if running ANALYZE on an empty table had the same performance result as using CREATE to make a new (empty) table. Finally, my testing would seem to contradict that any made-up number will be better than none at all. In my testing (inserting 35,000 rows into an empty table), I could only measure two distinct outcomes -- one good and one bad (with a 15X performance difference). I got good performance with any of the folloiwng scenarios: CREATE TABLE, ANALYZE or VACUUM with more than 94 rows in the table. I got bad performance if I did ANALYZE or VACUUM with less than 94 rows in the table. I could not measure any difference between other numbers of rows (between 0 and 35,000). So I don't think in practice it is that sensitive, at least in the simple test case I was doing. Below are two additional questions I have. 3. Is there some benefit to having ANALYZE behave the way it now does on empty or nearly empty tables? Is there a large performance improvement for really small tables (e.g., under 100 rows or under 1000 rows)? Does anyone really care about performance for small tables? 4. Isn't ANALYZE on a totally empty table really a special case? The presumption should be that the table will not remain empty. To optimize the performance assuming that there will be zero (or close to zero) rows seems somewhat pointless. However, there are valid reasons why a table might be empty at the moment in time when the ANALYZE is run. (In our case, we use "work" tables that get cleared at the end of an application process.) And, as mentioned above, it is easier to VACUUM ANALYZE an entire database than it is to list tables individually. 5. Why does DROP / CREATE work better than TRUNCATE / VACUUM in terms of creating a fresh table in which to insert new rows? Is this desirable? In both cases, the optimizer doesn't really know anything about what to expect for the table. But CREATE provides a better starting point for inserts than does VACUUM or ANALYZE, at least in my testing. I am relatively new to Postgres, and I apologize if I'm repeating issues that have been raised before. However, it does seem to me to be an important issue. The easier it is to maintain a Postgres database, the more likely it is to be widely used. Thanks. Mark
В списке pgsql-general по дате отправления: