Обсуждение: Some frustrations with admin tasks on PGSQL database

Поиск
Список
Период
Сортировка

Some frustrations with admin tasks on PGSQL database

От
"Phoenix Kiula"
Дата:
I'm loving the fact that while I am doing some one-time updates to the
DB, users can still SELECT away to glory. This is a major boon in
comparison to my experience with another major opensource database.

However, I am a little frustrated by the amount of time PGSQL takes to
complete tasks. Just to accommodate these tasks, my conf file has the
following:

    autovacuum = off
    wal_buffers=64
    checkpoint_segments=1000
    checkpoint_timeout=900
    fsync = off
    maintenance_work_mem = 128MB


[PS: I will enable fsync after these operations, and decrease the
checkpoint_segments.]

I have dropped all indexes/indicises on my table, except for the
primary key. Still, when I run the query:

    UPDATE mytable SET mycolumn = lower(mycolumn);

This is, at the time of this writing, has taken well over 35 minutes!
On a table of a "mere" 6 million rows (quoted from one discussion on
this mailing list).

I am on a 4GB RAM machine with two Intel Dual Core processors. Albeit
this is not a dedicated db server, another comparable FOSS database
never took these kinds of times to perform its operations.

Suspecting that locking may be the cause of this, I read up on
http://www.postgresql.org/docs/8.2/static/explicit-locking.html and
found nothing specific that would help a person starting out on the DB
to actually do meaningful explicit locking that the UPDATE command
does not already do.

I am now trying doing something like

    UPDATE mytable SET mycolumn = lower(mycolumn)
    WHERE id BETWEEN x AND y ;

This is way too laborious and untenable because I want to put the
fsync back on as soon as possible; this is a production database!

What else can I do to make this go fast enough to be normal!? Penny
for any thoughts and tips.

Re: Some frustrations with admin tasks on PGSQL database

От
Martijn van Oosterhout
Дата:
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:
> However, I am a little frustrated by the amount of time PGSQL takes to
> complete tasks. Just to accommodate these tasks, my conf file has the
> following:
>
>     autovacuum = off
>     wal_buffers=64
>     checkpoint_segments=1000
>     checkpoint_timeout=900
>     fsync = off
>     maintenance_work_mem = 128MB

I note you did not give either the shared_buffers setting or the
work_mem setting, both of which would have a major impact on your
problem.

> I have dropped all indexes/indicises on my table, except for the
> primary key. Still, when I run the query:
>
>     UPDATE mytable SET mycolumn = lower(mycolumn);
>
> This is, at the time of this writing, has taken well over 35 minutes!
> On a table of a "mere" 6 million rows (quoted from one discussion on
> this mailing list).

How big is the actual table itself (in bytes).

> Suspecting that locking may be the cause of this, I read up on
> http://www.postgresql.org/docs/8.2/static/explicit-locking.html and
> found nothing specific that would help a person starting out on the DB
> to actually do meaningful explicit locking that the UPDATE command
> does not already do.

I doubt it's locking.

> What else can I do to make this go fast enough to be normal!? Penny
> for any thoughts and tips.

Check your I/O throughput. You should be maxing out your disk...
Although, with 6million rows, it might even fit in memory. Can you see
(in ps) what it's actually doing?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: Some frustrations with admin tasks on PGSQL database

От
"Phoenix Kiula"
Дата:
On 17/08/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:
> > However, I am a little frustrated by the amount of time PGSQL takes to
> > complete tasks. Just to accommodate these tasks, my conf file has the
> > following:
> >
> >     autovacuum = off
> >     wal_buffers=64
> >     checkpoint_segments=1000
> >     checkpoint_timeout=900
> >     fsync = off
> >     maintenance_work_mem = 128MB
>
> I note you did not give either the shared_buffers setting or the
> work_mem setting, both of which would have a major impact on your
> problem.



Oh, sorry, here are the rest of them:

    max_connections = 250
    shared_buffers = 60000
    effective_cache_size = 60000
    work_mem = 60000
    max_fsm_relations = 1500
    max_fsm_pages = 80000
    sort_mem = 16348
    temp_buffers = 4096
    authentication_timeout = 10s
    ssl = off


Are these too low?



>
> How big is the actual table itself (in bytes).
>


Where should I find this?

I tried "/usr/lib/pgsql/data" and "/var/lib/pgsql/data" but there is
no folder that seems to show the data files.

TIA!

Re: Some frustrations with admin tasks on PGSQL database

От
hubert depesz lubaczewski
Дата:
On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:
> I have dropped all indexes/indicises on my table, except for the
> primary key. Still, when I run the query:
>     UPDATE mytable SET mycolumn = lower(mycolumn);

can you please check this:

select count(*) from mytable;
select count(*) from mytable where mycolumn ~ '[A-Z]';

and if the second is lower than first make the update:
update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]';

of course if your data contain national characters you have to include
them (upper case only) in this regexp.

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Some frustrations with admin tasks on PGSQL database

От
hubert depesz lubaczewski
Дата:
On Fri, Aug 17, 2007 at 09:50:42PM +0800, Phoenix Kiula wrote:
> > How big is the actual table itself (in bytes).
> Where should I find this?

select pg_relation_size('mytable');

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Some frustrations with admin tasks on PGSQL database

От
"Phoenix Kiula"
Дата:
On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote:
> On Fri, Aug 17, 2007 at 07:49:08PM +0800, Phoenix Kiula wrote:
> > I have dropped all indexes/indicises on my table, except for the
> > primary key. Still, when I run the query:
> >     UPDATE mytable SET mycolumn = lower(mycolumn);
>
> can you please check this:
>
> select count(*) from mytable;
> select count(*) from mytable where mycolumn ~ '[A-Z]';
>
> and if the second is lower than first make the update:
> update mytable set mycolumn = lower(mycolumn) where mycolumn ~ '[A-Z]';
>
> of course if your data contain national characters you have to include
> them (upper case only) in this regexp.


Wow, smartest advice of the day! Yes, a lot of our data in that column
has dots and numbers (800,000 compared to 6 million), so I wanted to
get only to the stuff that was pure alphabets, but just didn't think
of how.

[Slithers away to get dunce cap].

Thanks much!

Re: Some frustrations with admin tasks on PGSQL database

От
hubert depesz lubaczewski
Дата:
On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote:
> Wow, smartest advice of the day! Yes, a lot of our data in that column
> has dots and numbers (800,000 compared to 6 million), so I wanted to
> get only to the stuff that was pure alphabets, but just didn't think
> of how.

what i really meant was: check how many record will actually be modified
- i.e. contain upper-case letters.

the fact that some records contain only dots and digits is nice, but i
suspect that you also have a lot of record with only lower-case letters
in the field. or am i wrong?

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Some frustrations with admin tasks on PGSQL database

От
"Phoenix Kiula"
Дата:
On 17/08/07, hubert depesz lubaczewski <depesz@depesz.com> wrote:
> On Fri, Aug 17, 2007 at 10:22:55PM +0800, Phoenix Kiula wrote:
> > Wow, smartest advice of the day! Yes, a lot of our data in that column
> > has dots and numbers (800,000 compared to 6 million), so I wanted to
> > get only to the stuff that was pure alphabets, but just didn't think
> > of how.
>
> what i really meant was: check how many record will actually be modified
> - i.e. contain upper-case letters.
>
> the fact that some records contain only dots and digits is nice, but i
> suspect that you also have a lot of record with only lower-case letters
> in the field. or am i wrong?
>



You guessed it right. The query makes perfect sense. Only update those
records that have uppercase, and therefore require updating! Many
thanks.