Re: Massive performance issues
От | Matthew Sackman |
---|---|
Тема | Re: Massive performance issues |
Дата | |
Msg-id | 20050901211359.GD7131@pongo.lshift.net обсуждение исходный текст |
Ответ на | Re: Massive performance issues (Arjen van der Meijden <acmmailing@tweakers.net>) |
Ответы |
Re: Massive performance issues
Re: Massive performance issues |
Список | pgsql-performance |
On Thu, Sep 01, 2005 at 10:54:45PM +0200, Arjen van der Meijden wrote: > On 1-9-2005 19:42, Matthew Sackman wrote: > >Obviously, to me, this is a problem, I need these queries to be under a > >second to complete. Is this unreasonable? What can I do to make this "go > >faster"? I've considered normalising the table but I can't work out > >whether the slowness is in dereferencing the pointers from the index > >into the table or in scanning the index in the first place. And > >normalising the table is going to cause much pain when inserting values > >and I'm not entirely sure if I see why normalising it should cause a > >massive performance improvement. > > In this case, I think normalising will give a major decrease in on-disk > table-size of this large table and the indexes you have. If that's the > case, that in itself will speed-up all i/o-bound queries quite a bit. Well that's the thing - on the queries where it decides to use the index it only reads at around 3MB/s and the CPU is maxed out, whereas when it doesn't use the index, the disk is being read at 60MB/s. So when it decides to use an index, I don't seem to be IO bound at all. Or at least that's the way it seems to me. > locality_1, _2, city and county can probably be normalised away without > much problem, but going from varchar's to integers will probably safe > you quite a bit of (disk)space. Sure, that's what I've been considering today. > But since it won't change the selectivity of indexes, so you won't get > more index-scans instead of sequential scans, I suppose. > I think its not that hard to create a normalized set of tables from this > data-set (using insert into tablename select distinct ... from address > and such, insert into address_new (..., city) select ... (select cityid > from cities where city = address.city) from address) > So its at least relatively easy to figure out the performance > improvement from normalizing the dataset a bit. Yeah, the initial creation isn't too painful but when adding rows into the address table it gets more painful. However, as I've said elsewhere, the import isn't the critical path so I can cope with that pain, possibly coding around it in a stored proceedure and triggers as suggested. > If you want to improve your hardware, have a look at the Western Digital > Raptor-series SATA disks, they are fast scsi-like SATA drives. You may > also have a look at the amount of memory available, to allow caching > this (entire) table. Well I've got 1GB of RAM, but from analysis of its use, a fair amount isn't being used. About 50% is actually in use by applications and about half of the rest is cache and the rest isn't being used. Has this to do with the max_fsm_pages and max_fsm_relations settings? I've pretty much not touched the configuration and it's the standard Debian package. Matthew
В списке pgsql-performance по дате отправления: