Обсуждение: Some frustrations with admin tasks on PGSQL database
I'm loving the fact that while I am doing some one-time updates to the DB, users can still SELECT away to glory. This is a major boon in comparison to my experience with another major opensource database. However, I am a little frustrated by the amount of time PGSQL takes to complete tasks. Just to accommodate these tasks, my conf file has the following: autovacuum = off wal_buffers=64 checkpoint_segments=1000 checkpoint_timeout=900 fsync = off maintenance_work_mem = 128MB [PS: I will enable fsync after these operations, and decrease the checkpoint_segments.] I have dropped all indexes/indicises on my table, except for the primary key. Still, when I run the query: UPDATE mytable SET mycolumn = lower(mycolumn); This is, at the time of this writing, has taken well over 35 minutes! On a table of a "mere" 6 million rows (quoted from one discussion on this mailing list). I am on a 4GB RAM machine with two Intel Dual Core processors. Albeit this is not a dedicated db server, another comparable FOSS database never took these kinds of times to perform its operations. Suspecting that locking may be the cause of this, I read up on http://www.postgresql.org/docs/8.2/static/explicit-locking.html and found nothing specific that would help a person starting out on the DB to actually do meaningful explicit locking that the UPDATE command does not already do. I am now trying doing something like UPDATE mytable SET mycolumn = lower(mycolumn) WHERE id BETWEEN x AND y ; This is way too laborious and untenable because I want to put the fsync back on as soon as possible; this is a production database! What else can I do to make this go fast enough to be normal!? Penny for any thoughts and tips.
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote: > However, I am a little frustrated by the amount of time PGSQL takes to > complete tasks. Just to accommodate these tasks, my conf file has the > following: > > autovacuum = off > wal_buffers=64 > checkpoint_segments=1000 > checkpoint_timeout=900 > fsync = off > maintenance_work_mem = 128MB I note you did not give either the shared_buffers setting or the work_mem setting, both of which would have a major impact on your problem. > I have dropped all indexes/indicises on my table, except for the > primary key. Still, when I run the query: > > UPDATE mytable SET mycolumn = lower(mycolumn); > > This is, at the time of this writing, has taken well over 35 minutes! > On a table of a "mere" 6 million rows (quoted from one discussion on > this mailing list). How big is the actual table itself (in bytes). > Suspecting that locking may be the cause of this, I read up on > http://www.postgresql.org/docs/8.2/static/explicit-locking.html and > found nothing specific that would help a person starting out on the DB > to actually do meaningful explicit locking that the UPDATE command > does not already do. I doubt it's locking. > What else can I do to make this go fast enough to be normal!? Penny > for any thoughts and tips. Check your I/O throughput. You should be maxing out your disk... Although, with 6million rows, it might even fit in memory. Can you see (in ps) what it's actually doing? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
On 17/08/07, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote: > > However, I am a little frustrated by the amount of time PGSQL takes to > > complete tasks. Just to accommodate these tasks, my conf file has the > > following: > > > > autovacuum = off > > wal_buffers=64 > > checkpoint_segments=1000 > > checkpoint_timeout=900 > > fsync = off > > maintenance_work_mem = 128MB > > I note you did not give either the shared_buffers setting or the > work_mem setting, both of which would have a major impact on your > problem. Oh, sorry, here are the rest of them: max_connections = 250 shared_buffers = 60000 effective_cache_size = 60000 work_mem = 60000 max_fsm_relations = 1500 max_fsm_pages = 80000 sort_mem = 16348 temp_buffers = 4096 authentication_timeout = 10s ssl = off Are these too low? > > How big is the actual table itself (in bytes). > Where should I find this? I tried "/usr/lib/pgsql/data" and "/var/lib/pgsql/data" but there is no folder that seems to show the data files. TIA!
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote: > I have dropped all indexes/indicises on my table, except for the > primary key. Still, when I run the query: > UPDATE mytable SET mycolumn = lower(mycolumn); can you please check this: select count(*) from mytable; select count(*) from mytable where mycolumn ~ '[A-Z]'; and if the second is lower than first make the update: update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]'; of course if your data contain national characters you have to include them (upper case only) in this regexp. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
On Fri, Aug 17, 2007 at 09:50:42PM +0800, Phoenix Kiula wrote: > > How big is the actual table itself (in bytes). > Where should I find this? select pg_relation_size('mytable'); depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote: > On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote: > > I have dropped all indexes/indicises on my table, except for the > > primary key. Still, when I run the query: > > UPDATE mytable SET mycolumn = lower(mycolumn); > > can you please check this: > > select count(*) from mytable; > select count(*) from mytable where mycolumn ~ '[A-Z]'; > > and if the second is lower than first make the update: > update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]'; > > of course if your data contain national characters you have to include > them (upper case only) in this regexp. Wow, smartest advice of the day! Yes, a lot of our data in that column has dots and numbers (800,000 compared to 6 million), so I wanted to get only to the stuff that was pure alphabets, but just didn't think of how. [Slithers away to get dunce cap]. Thanks much!
On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote: > Wow, smartest advice of the day! Yes, a lot of our data in that column > has dots and numbers (800,000 compared to 6 million), so I wanted to > get only to the stuff that was pure alphabets, but just didn't think > of how. what i really meant was: check how many record will actually be modified - i.e. contain upper-case letters. the fact that some records contain only dots and digits is nice, but i suspect that you also have a lot of record with only lower-case letters in the field. or am i wrong? depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote: > On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote: > > Wow, smartest advice of the day! Yes, a lot of our data in that column > > has dots and numbers (800,000 compared to 6 million), so I wanted to > > get only to the stuff that was pure alphabets, but just didn't think > > of how. > > what i really meant was: check how many record will actually be modified > - i.e. contain upper-case letters. > > the fact that some records contain only dots and digits is nice, but i > suspect that you also have a lot of record with only lower-case letters > in the field. or am i wrong? > You guessed it right. The query makes perfect sense. Only update those records that have uppercase, and therefore require updating! Many thanks.