Re: Performance tuning
От | Richard Huxton |
---|---|
Тема | Re: Performance tuning |
Дата | |
Msg-id | 42355C40.1070207@archonet.com обсуждение исходный текст |
Ответ на | Performance tuning (Jacques Caron <jc@directinfos.com>) |
Список | pgsql-performance |
Jacques Caron wrote: > I'm preparing a set of servers which will eventually need to handle a > high volume of queries (both reads and writes, but most reads are very > simple index-based queries returning a limited set of rows, when not > just one), and I would like to optimize things as much as possible, so I > have a few questions on the exact way PostgreSQL's MVCC works, and how > transactions, updates and vacuuming interact. I hope someone will be > able to point me in the right direction (feel free to give pointers if I > missed the places where this is described). > > From what I understand (and testing confirms it), bundling many queries > in one single transaction is more efficient than having each query be a > separate transaction (like with autocommit on). However, I wonder about > the limits of this: > > - are there any drawbacks to grouping hundreds or thousands of queries > (inserts/updates) over several minutes in one single transaction? Other > than the fact that the inserts/updates will not be visible until > committed, of course. Essentially turning autocommit off, and doing a > commit once in a while. 1. If any locks are held then they will be held for much longer, causing other processes to block. 2. PG needs to be able to roll back the changes - thousands of simple inserts are fine, millions will probably not be. > - does this apply only to inserts/selects/updates or also for selects? > Another way to put this is: does a transaction with only one select > actually have much transaction-related work to do? Or, does a > transaction with only selects actually have any impact anywhere? Does it > really leave a trace anywhere? Again, I understand that selects grouped > in a transaction will not see updates done after the start of the > transaction (unless done by the same process). There are implications if a SELECT has side-effects (I can call a function in a select - that might do anything). > - if during a single transaction several UPDATEs affect the same row, > will MVCC generate as many row versions as there are updates (like would > be the case with autocommit) or will they be grouped into one single row > version? I believe there will be many versions. Certainly for 8.0 that must be the case to support savepoints within a transaction. > Another related issue is that many of the tables are indexed on a date > field, and one process does a lot of updates on "recent" rows (which > lead to many dead tuples), but after that "older" rows tend to remain > pretty much unchanged for quite a while. Other than splitting the tables > into "old" and "recent" tables, is there any way to make vacuum more > efficient? Scanning the whole table for dead tuples when only a small > portion of the table actually has any does not feel like being very > efficient in this situation. Not really. > Other issue: every five minutes or so, I see a noticeable performance > drop as PostgreSQL checkpoints. This is 7.4.3 with pretty lousy > hardware, I know 8.0 with decent hardware and separate disk(s) for > pg_xlog will definitely help, but I really wonder if there is any way to > reduce the amount of work that needs to be done at that point (I'm a > strong believer of fixing software before hardware). I have already > bumped checkpoint_segments to 8, but I'm not quite sure I understand how > this helps (or doesn't help) things. Logs show 3 to 6 "recycled > transaction log file" lines at that time, that seems quite a lot of work > for a load that's still pretty low. Does grouping of more queries in > transactions help with this? Are there other parameters that can affect > things, or is just a matter of how much inserts/updates/deletes are > done, and the amount of data that was changed? You might be better off reducing the number of checkpoint segments, and decreasing the timeout. There is a balance between doing a lot of work in one go, and the overhead of many smaller bursts of activity. > Last point: some of the servers have expandable data (and will be > replicated with slony-I) and will run with fsync off. I have read > conflicting statements as to what exactly this does: some sources > indicate that setting fsync off actually switches off WAL/checkpointing, > others that it just prevents the fsync (or equivalent) system calls. > Since I still see checkpointing in that case, I guess it's not exactly > the former, but I would love to understand more about it. Really, I > would love to be able to set some tables or databases to "go as fast as > you can and don't worry about transactions, MVCC or anything like that", > but I'm not sure that option exists... Setting fsync=false means the sync isn't done, so data might still be cached below PG's level. I'm not sure it's ever going to be possible to mark a table as "ignore transactions" - it would be a lot of work, and means you couldn't guarantee transactions that included that table in any way. -- Richard Huxton Archonet Ltd
В списке pgsql-performance по дате отправления: