Обсуждение: Some frustrations with admin tasks on PGSQL database
I'm loving the fact that while I am doing some one-time updates to the
DB, users can still SELECT away to glory. This is a major boon in
comparison to my experience with another major opensource database.
However, I am a little frustrated by the amount of time PGSQL takes to
complete tasks. Just to accommodate these tasks, my conf file has the
following:
autovacuum = off
wal_buffers=64
checkpoint_segments=1000
checkpoint_timeout=900
fsync = off
maintenance_work_mem = 128MB
[PS: I will enable fsync after these operations, and decrease the
checkpoint_segments.]
I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:
UPDATE mytable SET mycolumn = lower(mycolumn);
This is, at the time of this writing, has taken well over 35 minutes!
On a table of a "mere" 6 million rows (quoted from one discussion on
this mailing list).
I am on a 4GB RAM machine with two Intel Dual Core processors. Albeit
this is not a dedicated db server, another comparable FOSS database
never took these kinds of times to perform its operations.
Suspecting that locking may be the cause of this, I read up on
http://www.postgresql.org/docs/8.2/static/explicit-locking.html and
found nothing specific that would help a person starting out on the DB
to actually do meaningful explicit locking that the UPDATE command
does not already do.
I am now trying doing something like
UPDATE mytable SET mycolumn = lower(mycolumn)
WHERE id BETWEEN x AND y ;
This is way too laborious and untenable because I want to put the
fsync back on as soon as possible; this is a production database!
What else can I do to make this go fast enough to be normal!? Penny
for any thoughts and tips.
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote: > However, I am a little frustrated by the amount of time PGSQL takes to > complete tasks. Just to accommodate these tasks, my conf file has the > following: > > autovacuum = off > wal_buffers=64 > checkpoint_segments=1000 > checkpoint_timeout=900 > fsync = off > maintenance_work_mem = 128MB I note you did not give either the shared_buffers setting or the work_mem setting, both of which would have a major impact on your problem. > I have dropped all indexes/indicises on my table, except for the > primary key. Still, when I run the query: > > UPDATE mytable SET mycolumn = lower(mycolumn); > > This is, at the time of this writing, has taken well over 35 minutes! > On a table of a "mere" 6 million rows (quoted from one discussion on > this mailing list). How big is the actual table itself (in bytes). > Suspecting that locking may be the cause of this, I read up on > http://www.postgresql.org/docs/8.2/static/explicit-locking.html and > found nothing specific that would help a person starting out on the DB > to actually do meaningful explicit locking that the UPDATE command > does not already do. I doubt it's locking. > What else can I do to make this go fast enough to be normal!? Penny > for any thoughts and tips. Check your I/O throughput. You should be maxing out your disk... Although, with 6million rows, it might even fit in memory. Can you see (in ps) what it's actually doing? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
On 17/08/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:
> > However, I am a little frustrated by the amount of time PGSQL takes to
> > complete tasks. Just to accommodate these tasks, my conf file has the
> > following:
> >
> > autovacuum = off
> > wal_buffers=64
> > checkpoint_segments=1000
> > checkpoint_timeout=900
> > fsync = off
> > maintenance_work_mem = 128MB
>
> I note you did not give either the shared_buffers setting or the
> work_mem setting, both of which would have a major impact on your
> problem.
Oh, sorry, here are the rest of them:
max_connections = 250
shared_buffers = 60000
effective_cache_size = 60000
work_mem = 60000
max_fsm_relations = 1500
max_fsm_pages = 80000
sort_mem = 16348
temp_buffers = 4096
authentication_timeout = 10s
ssl = off
Are these too low?
>
> How big is the actual table itself (in bytes).
>
Where should I find this?
I tried "/usr/lib/pgsql/data" and "/var/lib/pgsql/data" but there is
no folder that seems to show the data files.
TIA!
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote: > I have dropped all indexes/indicises on my table, except for the > primary key. Still, when I run the query: > UPDATE mytable SET mycolumn = lower(mycolumn); can you please check this: select count(*) from mytable; select count(*) from mytable where mycolumn ~ '[A-Z]'; and if the second is lower than first make the update: update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]'; of course if your data contain national characters you have to include them (upper case only) in this regexp. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
On Fri, Aug 17, 2007 at 09:50:42PM +0800, Phoenix Kiula wrote:
> > How big is the actual table itself (in bytes).
> Where should I find this?
select pg_relation_size('mytable');
depesz
--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA. here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)
On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote: > On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote: > > I have dropped all indexes/indicises on my table, except for the > > primary key. Still, when I run the query: > > UPDATE mytable SET mycolumn = lower(mycolumn); > > can you please check this: > > select count(*) from mytable; > select count(*) from mytable where mycolumn ~ '[A-Z]'; > > and if the second is lower than first make the update: > update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]'; > > of course if your data contain national characters you have to include > them (upper case only) in this regexp. Wow, smartest advice of the day! Yes, a lot of our data in that column has dots and numbers (800,000 compared to 6 million), so I wanted to get only to the stuff that was pure alphabets, but just didn't think of how. [Slithers away to get dunce cap]. Thanks much!
On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote: > Wow, smartest advice of the day! Yes, a lot of our data in that column > has dots and numbers (800,000 compared to 6 million), so I wanted to > get only to the stuff that was pure alphabets, but just didn't think > of how. what i really meant was: check how many record will actually be modified - i.e. contain upper-case letters. the fact that some records contain only dots and digits is nice, but i suspect that you also have a lot of record with only lower-case letters in the field. or am i wrong? depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote: > On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote: > > Wow, smartest advice of the day! Yes, a lot of our data in that column > > has dots and numbers (800,000 compared to 6 million), so I wanted to > > get only to the stuff that was pure alphabets, but just didn't think > > of how. > > what i really meant was: check how many record will actually be modified > - i.e. contain upper-case letters. > > the fact that some records contain only dots and digits is nice, but i > suspect that you also have a lot of record with only lower-case letters > in the field. or am i wrong? > You guessed it right. The query makes perfect sense. Only update those records that have uppercase, and therefore require updating! Many thanks.