Обсуждение: very, very slow performance

Поиск

Список

Период

Сортировка

very, very slow performance

От

"Tena Sakai"

Дата:

20 февраля 2009 г., 04:15:27

Hi Everybody,

I am running postgres v8.3.3 on redhat linux (del hardware)
with 4 cpu's. This machine is terribly bogged down and I
would like a bit of help as to what can be done.

For last maybe 18+/- hours, there are 24 queries happening.
What's odd is that 21 of them are identical queries. This
happens once in a while (maybe one per month, plus/minus)
because of the data I need to process. Basically, I fire
up an application for each data (there are 21 of them) and
the application gets data out of postgres and transforms
matrices and put them into the form the subsequent processes
can handle. I know it sounds dumb, but that's how it is for
a foreseeable future.

Here's the query (please read them in fixed-size font, if
you can):

select subjectid, genotype.markerid, a1.value as allele1,
        a2.value as allele2, genotype.dateCreated,
        genotype.dateReplaced, genotype.ignore,
        genotype.inconsistent
   from genotype, allele a1, allele a2
where
        allele1id = a1.alleleid
    and
        allele2id = a2.alleleid;

Genotype table mentioned above has about 600,000,000+ rows. As
I mentioned, there are 20 more of them running concurrently.
3 other jobs look like:

SELECT a.markerid,a.type,a.localname,b.ncbibuild,
        a.chromosome,a.geneticposition,b.physicalposition,
        a.strand,a.stdflanks,a.maxflanks,a.datecreated,
        a.datereplaced,a.sourcetablename,a.sourceid,
        b.dbsnprsid,a.ignore,a.gene
   FROM public.marker a, public.snpposition b
WHERE
        a.ignore= 'N'
    AND a.datecreated <= (timestamp'Wed Oct 29 09:35:54.266 2008')
    AND a.datereplaced > (timestamp'Wed Oct 29 09:35:54.266 2008')
    AND a.localname IN ('RS10757474','RS7859598','RS6148','RS9792663','RS1541125',
                         'RS10511446','RS10814410','RS12338622','RS875587',
                         'RS1590979', 'RS748786','RS958505','RS12352961',
                          and on and on and on...);

insert into summarystats
                    select 'Marker by Chromosomes', chromosome, sourcetablename,
                            count(*), null, to_timestamp('2009-02-18 20:29:40.125',
                            'yyyy-mm-dd hh:mi:ss.ms')
                      from marker
                     where ignore = 'N'
                       and datereplaced = '3000-01-01 12:00:00.000'
                       and exists (select 1
                                     from genotype
                                    where genotype.markerid = marker.markerid
                                      and genotype.ignore = 'N'
                                      and genotype.datereplaced = '3000-01-01 12:00:00.000')
                     group by chromosome, sourcetablename;

COPY public.genotype (genotypeid, subjectid, markerid,
                        allele1id, allele2id, datecreated, datereplaced,
                        ignore, inconsistent, sourcetablename, sourceid)
    TO stdout;

The one in the middle (insert job), which gets done 6 days
a week, takes anywhere between 30 minutes and 1.5 hour when
the load average is in the neighborhood of 3.

The total memory the machine has is 32 mb and nearly 100%
of it is consumed. Swap is twice as large as physical memory,
but very little is used. The load average of the machine when
I am in this fix is somewhere between 25 and 27.

Each postgres process consumes so little cpu time. The copy
takes maybe 4% (+/-); the rest of them use somewhere between
0.3% and 0.7%. As to memory, the copy takes 3% to 4% and the
rest takes something like 1.7%.

In terms of postgres configuration:
max_connections = 100
shared_buffers = 1024MB
temp_buffers = 128MB
max_fsm_pages = 153600
vacuum_cost_delay = 0
checkpoint_segments = 3
checkpoint_timeout = 5min
checkpoint_warning = 30s

I don't think I am doing anything wild... Am I?

Oh, one more thing, I said that there are 24 queries/jobs
happening, but there are a bunch of them that says <IDLE>
or <IDLE> in transaction --according to pg_stat_activity
view.

Can anybody suggest anything that I can do to gain speed?
Any help is much appreciated.

Regards,

Tena Sakai
tsakai@gallo.ucsf.edu

Re: very, very slow performance

От

Scott Marlowe

Дата:

20 февраля 2009 г., 04:48:57

On Thu, Feb 19, 2009 at 10:02 PM, Tena Sakai <tsakai@gallo.ucsf.edu> wrote:
> Hi Everybody,
>
> I am running postgres v8.3.3 on redhat linux (del hardware)
> with 4 cpu's.  This machine is terribly bogged down and I
> would like a bit of help as to what can be done.
>
> For last maybe 18+/- hours, there are 24 queries happening.
> What's odd is that 21 of them are identical queries.  This
> happens once in a while (maybe one per month, plus/minus)
> because of the data I need to process.  Basically, I fire
> up an application for each data (there are 21 of them) and
> the application gets data out of postgres and transforms
> matrices and put them into the form the subsequent processes
> can handle.  I know it sounds dumb, but that's how it is for
> a foreseeable future.
>
> Here's the query (please read them in fixed-size font, if
> you can):
>
>  select subjectid, genotype.markerid, a1.value as allele1,
>         a2.value as allele2, genotype.dateCreated,
>         genotype.dateReplaced, genotype.ignore,
>         genotype.inconsistent
>    from genotype, allele a1, allele a2
>   where
>         allele1id = a1.alleleid
>     and
>         allele2id = a2.alleleid;
>
> Genotype table mentioned above has about 600,000,000+ rows.  As
> I mentioned, there are 20 more of them running concurrently.
> 3 other jobs look like:

So, you're purposely creating a cross product of 600M rows? How big is
the allele table?  What does explain and (it'll take a while to get
it) explain analyze select ... have to say?

If you're doing the cross product on purpose then it's gonna chew up a
lot of memory.  I'd suggest raising work_mem to a gig or so and
running only as many of these queries at a time as the machine can
handle.  Running in them in parallel of 21 is gonna bog down and be
swapping / spilling to disk all over the place.

If you don't mean to have a cross product then add in the where clause
part / join on syntax to remove the cross product.
> The total memory the machine has is 32 mb and nearly 100%
> of it is consumed.  Swap is twice as large as physical memory,
> but very little is used.  The load average of the machine when
> I am in this fix is somewhere between 25 and 27.

I'll assume that's 32G, which is a pretty common size for db servers
nowadays.  How much is consumed means little, given the tendency of
the kernel to cache.  I'm assuming you're on linux / unix since you
didn't say otherwise, and big windows servers are a rarity right now
for pgsql.  It's good swap isn't used, shows there's no pressure on
the memory subsystem.  I assume you mean this is WHILE the queries are
running.

If you HAVE to run that many queries at once on a machine like this,
then you need a big honking RAID array, or a small one at least.
Generally you want as many mirror sets as you'll have parallel queries
running, preferably double with this kind of sequential scan heavy
load.  But you'll still be limited by memory when running these
queries at the same time.

> Each postgres process consumes so little cpu time.  The copy
> takes maybe 4% (+/-); the rest of them use somewhere between
> 0.3% and 0.7%.  As to memory, the copy takes 3% to 4% and the
> rest takes something like 1.7%.

What does vmstat 10 300 say while the queries are running?  We're
looking for high wait percentage.  If you've got that you're limited
by the speed of your drives.

> In terms of postgres configuration:
>   max_connections = 100
>   shared_buffers = 1024MB
>   temp_buffers = 128MB
>   max_fsm_pages = 153600
>   vacuum_cost_delay = 0
>   checkpoint_segments = 3
>   checkpoint_timeout = 5min
>   checkpoint_warning = 30s
>
> I don't think I am doing anything wild...  Am I?

Actually, you're doing something mild.  I'd test 2,4, 6, and 8 gig of
shared_buffers.  I'd increase work_mem at least for the user running
the biggest queries, those cross products up earlier in the post.
Also, since you seem to have some large updates, I'd increase the
checkpoint segments to something in the 20 to 100 range.

> Oh, one more thing, I said that there are 24 queries/jobs
> happening, but there are a bunch of them that says <IDLE>
> or <IDLE> in transaction --according to pg_stat_activity
> view.

Oh, then maybe you don't have that many.  idle connections are ok,
they use up little.  Unless you've got a hundred or so don't worry.
Idle in transaction, OTOH, is bad.  It basically holds a "lock" on
reusing old rows in the db and can cause bloat.  Generally it's an app
/ app translation error that needs fixing.  leaving a transaction open
for very long is a bad thing.

I think there were some planner fixes from 8.3.3 to 8.3.5 btw.  Those
might be important.  An update might solve your problems.

Re: very, very slow performance

От

Emanuel Calvo Franco

Дата:

20 февраля 2009 г., 12:02:37

2009/2/20 Tena Sakai <tsakai@gallo.ucsf.edu>:
> Hi Everybody,
>
> Here's the query (please read them in fixed-size font, if
> you can):
>
>  select subjectid, genotype.markerid, a1.value as allele1,
>         a2.value as allele2, genotype.dateCreated,
>         genotype.dateReplaced, genotype.ignore,
>         genotype.inconsistent
>    from genotype, allele a1, allele a2
>   where
>         allele1id = a1.alleleid
>     and
>         allele2id = a2.alleleid;
>
> Genotype table mentioned above has about 600,000,000+ rows.  As
> I mentioned, there are 20 more of them running concurrently.
> 3 other jobs look like:
>

Do you tried partitioned tables? diferent tablespaces? set the storage
external for
more important columns? what kind of indexes do you have?

>  SELECT a.markerid,a.type,a.localname,b.ncbibuild,
>         a.chromosome,a.geneticposition,b.physicalposition,
>         a.strand,a.stdflanks,a.maxflanks,a.datecreated,
>         a.datereplaced,a.sourcetablename,a.sourceid,
>         b.dbsnprsid,a.ignore,a.gene
>    FROM public.marker a, public.snpposition b
>   WHERE
>         a.ignore= 'N'
>     AND a.datecreated <= (timestamp'Wed Oct 29 09:35:54.266 2008')
>     AND a.datereplaced > (timestamp'Wed Oct 29 09:35:54.266 2008')
>     AND a.localname IN
> ('RS10757474','RS7859598','RS6148','RS9792663','RS1541125',
>                          'RS10511446','RS10814410','RS12338622','RS875587',
>                          'RS1590979', 'RS748786','RS958505','RS12352961',
>                           and on and on and on...);
>
>
>   insert into summarystats
>                     select 'Marker by Chromosomes', chromosome,
> sourcetablename,
>                             count(*), null, to_timestamp('2009-02-18
> 20:29:40.125',
>                             'yyyy-mm-dd hh:mi:ss.ms')
>                       from marker
>                      where ignore = 'N'
>                        and datereplaced = '3000-01-01 12:00:00.000'
>                        and exists (select 1
>                                      from genotype
>                                     where genotype.markerid =
> marker.markerid
>                                       and genotype.ignore = 'N'
>                                       and genotype.datereplaced =
> '3000-01-01 12:00:00.000')
>                      group by chromosome, sourcetablename;
>
>   COPY public.genotype (genotypeid, subjectid, markerid,
>                         allele1id, allele2id, datecreated, datereplaced,
>                         ignore, inconsistent, sourcetablename, sourceid)
>     TO stdout;
>

Like Scott saids, try to run explain analyze for those querys, then
post the results.


>
> Can anybody suggest anything that I can do to gain speed?
> Any help is much appreciated.
>

I recommend follow this thread:
http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php


> Regards,
>
> Tena Sakai
> tsakai@gallo.ucsf.edu
>
>
>



--
      Emanuel Calvo Franco
        Sumate al ARPUG !
      (www.postgres-arg.org -
         www.arpug.com.ar)
    ArPUG / AOSUG Member
   Postgresql Support & Admin

Re: very, very slow performance

От

"Tena Sakai"

Дата:

20 февраля 2009 г., 18:59:52

Hi Emanuel,

> Do you tried partitioned tables? diferent tablespaces?
> set the storage external for more important columns?
> what kind of indexes do you have?

I think the table design is ok.
I don't know what "set the storage external" means.
Maybe you can explain it to me in lay-person's terms?
I think indices are ok, but I welcome your comments.

canon=# \d genotype
                                              Table "public.genotype"
       Column      |            Type             |                           Modifiers
-----------------+-----------------------------+---------------------------------------------------------------
   genotypeid      | integer                     | not null default nextval('genotype_genotypeid_seq'::regclass)
   subjectid       | integer                     |
   markerid        | integer                     |
   allele1id       | integer                     |
   allele2id       | integer                     |
   datecreated     | timestamp without time zone | not null
   datereplaced    | timestamp without time zone | not null
   ignore          | character(1)                | not null default 'N'::bpchar
   inconsistent    | character(1)                | not null default 'N'::bpchar
   sourcetablename | character varying           | not null
   sourceid        | character varying           | not null
Indexes:
      "genotype_pkey" PRIMARY KEY, btree (genotypeid, datecreated)
      "genotype_genotypeid_idx" btree (genotypeid)
      "genotype_markerid_idx" btree (markerid)
      "genotype_source_idx" btree (sourceid, sourcetablename)
      "genotype_subjectid_idx" btree (subjectid)

canon=# \d allele
                                           Table "public.allele"
      Column    |            Type             |                         Modifiers
--------------+-----------------------------+-----------------------------------------------------------
   alleleid     | integer                     | not null default nextval('allele_alleleid_seq'::regclass)
   markerid     | integer                     |
value        | character varying           | not null
   datecreated | timestamp without time zone | not null
   datereplaced | timestamp without time zone | not null
Indexes:
      "allele_pkey" PRIMARY KEY, btree (alleleid, datecreated)
      "allele_markerid_idx" btree (markerid)

> try to run explain analyze for those querys, then
> post the results.

Yes, it is running for last 13+ hours and I have no
idea how much longer it might take.

> I recommend follow this thread:
> http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php

That's pretty interesting stuff. I need to experiment tweaking
parameters Joshua mentions. Alas, now is not a good time to do
so.

Regards,

Tena Sakai

-----Original Message-----
From: pgsql-admin-owner@postgresql.org on behalf of Emanuel Calvo Franco
Sent: Fri 2/20/2009 4:10 AM
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] very, very slow performance

2009/2/20 Tena Sakai <tsakai@gallo.ucsf.edu>:
> Hi Everybody,
>
> Here's the query (please read them in fixed-size font, if
> you can):
>
> select subjectid, genotype.markerid, a1.value as allele1,
>         a2.value as allele2, genotype.dateCreated,
>         genotype.dateReplaced, genotype.ignore,
>         genotype.inconsistent
>    from genotype, allele a1, allele a2
>   where
>         allele1id = a1.alleleid
>     and
>         allele2id = a2.alleleid;
>
> Genotype table mentioned above has about 600,000,000+ rows. As
> I mentioned, there are 20 more of them running concurrently.
> 3 other jobs look like:
>

Do you tried partitioned tables? diferent tablespaces? set the storage
external for
more important columns? what kind of indexes do you have?

> SELECT a.markerid,a.type,a.localname,b.ncbibuild,
>         a.chromosome,a.geneticposition,b.physicalposition,
>         a.strand,a.stdflanks,a.maxflanks,a.datecreated,
>         a.datereplaced,a.sourcetablename,a.sourceid,
>         b.dbsnprsid,a.ignore,a.gene
>    FROM public.marker a, public.snpposition b
>   WHERE
>         a.ignore= 'N'
>     AND a.datecreated <= (timestamp'Wed Oct 29 09:35:54.266 2008')
>     AND a.datereplaced > (timestamp'Wed Oct 29 09:35:54.266 2008')
>     AND a.localname IN
> ('RS10757474','RS7859598','RS6148','RS9792663','RS1541125',
>                          'RS10511446','RS10814410','RS12338622','RS875587',
>                          'RS1590979', 'RS748786','RS958505','RS12352961',
>                           and on and on and on...);
>
>
>   insert into summarystats
>                     select 'Marker by Chromosomes', chromosome,
> sourcetablename,
>                             count(*), null, to_timestamp('2009-02-18
> 20:29:40.125',
>                             'yyyy-mm-dd hh:mi:ss.ms')
>                       from marker
>                      where ignore = 'N'
>                        and datereplaced = '3000-01-01 12:00:00.000'
>                        and exists (select 1
>                                      from genotype
>                                     where genotype.markerid =
> marker.markerid
>                                       and genotype.ignore = 'N'
>                                       and genotype.datereplaced =
> '3000-01-01 12:00:00.000')
>                      group by chromosome, sourcetablename;
>
>   COPY public.genotype (genotypeid, subjectid, markerid,
>                         allele1id, allele2id, datecreated, datereplaced,
>                         ignore, inconsistent, sourcetablename, sourceid)
>     TO stdout;
>

Like Scott saids, try to run explain analyze for those querys, then
post the results.

>
> Can anybody suggest anything that I can do to gain speed?
> Any help is much appreciated.
>

I recommend follow this thread:
http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php

> Regards,
>
> Tena Sakai
> tsakai@gallo.ucsf.edu
>
>
>

--
      Emanuel Calvo Franco
        Sumate al ARPUG !
      (www.postgres-arg.org -
         www.arpug.com.ar)
    ArPUG / AOSUG Member
   Postgresql Support & Admin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Fwd: very, very slow performance

От

Emanuel Calvo Franco

Дата:

20 февраля 2009 г., 19:11:51

2009/2/20 Tena Sakai <tsakai@gallo.ucsf.edu>:
> Hi Emanuel,
>
>> Do you tried partitioned tables? diferent tablespaces?
>> set the storage external for more important columns?
>> what kind of indexes do you have?
>
> I think the table design is ok.

Sorry, copy to the list.

The table desing is ok. Partitioned tables is about cut
tables in terms of number of regs. It improve your engine
, specially if you are talking about that number of
regs.

> I don't know what "set the storage external" means.

I't means the colmun will storage separated, sometimes
improve performance.

> Maybe you can explain it to me in lay-person's terms?
> I think indices are ok, but I welcome your comments.
>

Indexes maybe ok. What I mean is th TYPE of that. Maybe
HASH or GIST could be better, depends on the column format.

>   canon=# \d genotype
>                                               Table "public.genotype"
>        Column      |            Type             |
> Modifiers
>
> -----------------+-----------------------------+---------------------------------------------------------------
>    genotypeid      | integer                     | not null default
> nextval('genotype_genotypeid_seq'::regclass)
>    subjectid       | integer                     |
>    markerid        | integer                     |
>    allele1id       | integer                     |
>    allele2id       | integer                     |
>    datecreated     | timestamp without time zone | not null
>    datereplaced    | timestamp without time zone | not null
>    ignore          | character(1)                | not null default
> 'N'::bpchar
>    inconsistent    | character(1)                | not null default
> 'N'::bpchar
>    sourcetablename | character varying           | not null
>    sourceid        | character varying           | not null
>   Indexes:
>       "genotype_pkey" PRIMARY KEY, btree (genotypeid, datecreated)
>       "genotype_genotypeid_idx" btree (genotypeid)
>       "genotype_markerid_idx" btree (markerid)
>       "genotype_source_idx" btree (sourceid, sourcetablename)
>       "genotype_subjectid_idx" btree (subjectid)
>
>   canon=# \d allele
>                                            Table "public.allele"
>       Column    |            Type             |
> Modifiers
>
> --------------+-----------------------------+-----------------------------------------------------------
>    alleleid     | integer                     | not null default
> nextval('allele_alleleid_seq'::regclass)
>    markerid     | integer                     |
>   value        | character varying           | not null
>    datecreated  | timestamp without time zone | not null
>    datereplaced | timestamp without time zone | not null
>   Indexes:
>       "allele_pkey" PRIMARY KEY, btree (alleleid, datecreated)
>       "allele_markerid_idx" btree (markerid)
>
>> try to run explain analyze for those querys, then
>> post the results.
>
> Yes, it is running for last 13+ hours and I have no
> idea how much longer it might take.
>

>> I recommend follow this thread:
>> http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php
>
> That's pretty interesting stuff.  I need to experiment tweaking
> parameters Joshua mentions.  Alas, now is not a good time to do
> so.
>

I'm glad if it's useful.

> Regards,
>
> Tena Sakai
>
> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org on behalf of Emanuel Calvo Franco
> Sent: Fri 2/20/2009 4:10 AM
> To: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] very, very slow performance
>
> 2009/2/20 Tena Sakai <tsakai@gallo.ucsf.edu>:
>> Hi Everybody,
>>
>> Here's the query (please read them in fixed-size font, if
>> you can):
>>
>>  select subjectid, genotype.markerid, a1.value as allele1,
>>         a2.value as allele2, genotype.dateCreated,
>>         genotype.dateReplaced, genotype.ignore,
>>         genotype.inconsistent
>>    from genotype, allele a1, allele a2
>>   where
>>         allele1id = a1.alleleid
>>     and
>>         allele2id = a2.alleleid;
>>
>> Genotype table mentioned above has about 600,000,000+ rows.  As
>> I mentioned, there are 20 more of them running concurrently.
>> 3 other jobs look like:
>>
>
> Do you tried partitioned tables? diferent tablespaces? set the storage
> external for
> more important columns? what kind of indexes do you have?
>
>>  SELECT a.markerid,a.type,a.localname,b.ncbibuild,
>>         a.chromosome,a.geneticposition,b.physicalposition,
>>         a.strand,a.stdflanks,a.maxflanks,a.datecreated,
>>         a.datereplaced,a.sourcetablename,a.sourceid,
>>         b.dbsnprsid,a.ignore,a.gene
>>    FROM public.marker a, public.snpposition b
>>   WHERE
>>         a.ignore= 'N'
>>     AND a.datecreated <= (timestamp'Wed Oct 29 09:35:54.266 2008')
>>     AND a.datereplaced > (timestamp'Wed Oct 29 09:35:54.266 2008')
>>     AND a.localname IN
>> ('RS10757474','RS7859598','RS6148','RS9792663','RS1541125',
>>
>> 'RS10511446','RS10814410','RS12338622','RS875587',
>>                          'RS1590979', 'RS748786','RS958505','RS12352961',
>>                           and on and on and on...);
>>
>>
>>   insert into summarystats
>>                     select 'Marker by Chromosomes', chromosome,
>> sourcetablename,
>>                             count(*), null, to_timestamp('2009-02-18
>> 20:29:40.125',
>>                             'yyyy-mm-dd hh:mi:ss.ms')
>>                       from marker
>>                      where ignore = 'N'
>>                        and datereplaced = '3000-01-01 12:00:00.000'
>>                        and exists (select 1
>>                                      from genotype
>>                                     where genotype.markerid =
>> marker.markerid
>>                                       and genotype.ignore = 'N'
>>                                       and genotype.datereplaced =
>> '3000-01-01 12:00:00.000')
>>                      group by chromosome, sourcetablename;
>>
>>   COPY public.genotype (genotypeid, subjectid, markerid,
>>                         allele1id, allele2id, datecreated, datereplaced,
>>                         ignore, inconsistent, sourcetablename, sourceid)
>>     TO stdout;
>>
>
> Like Scott saids, try to run explain analyze for those querys, then
> post the results.
>
>
>>
>> Can anybody suggest anything that I can do to gain speed?
>> Any help is much appreciated.
>>
>
> I recommend follow this thread:
> http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php
>
>
>> Regards,
>>
>> Tena Sakai
>> tsakai@gallo.ucsf.edu
>>
>>
>>
>
>
>
> --
>       Emanuel Calvo Franco
>         Sumate al ARPUG !
>       (www.postgres-arg.org -
>          www.arpug.com.ar)
>     ArPUG / AOSUG Member
>    Postgresql Support & Admin
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>
>



--
     Emanuel Calvo Franco
       Sumate al ARPUG !
     (www.postgres-arg.org -
        www.arpug.com.ar)
   ArPUG / AOSUG Member
  Postgresql Support & Admin



--
      Emanuel Calvo Franco
        Sumate al ARPUG !
      (www.postgres-arg.org -
         www.arpug.com.ar)
    ArPUG / AOSUG Member
   Postgresql Support & Admin

Re: very, very slow performance

От

"Tena Sakai"

Дата:

21 февраля 2009 г., 02:59:19

Hi Scott,

> What does explain and (it'll take a while to get
> it) explain analyze select ... have to say?

Here's what "explain select..." told me:

canon=# explain select subjectid, genotype.markerid, a1.value as allele1,
canon-#           a2.value as allele2, genotype.dateCreated,
canon-#           genotype.dateReplaced, genotype.ignore,
canon-#           genotype.inconsistent
canon-#      from genotype, allele a1, allele a2
canon-#     where
canon-#           allele1id = a1.alleleid
canon-#       and
canon-#           allele2id = a2.alleleid;
                                       QUERY PLAN
-------------------------------------------------------------------------------------
   Hash Join (cost=165264.65..55486119.31 rows=601095277 width=32)
     Hash Cond: (genotype.allele1id = a1.alleleid)
     -> Hash Join (cost=82632.33..34731274.54 rows=601095277 width=34)
           Hash Cond: (genotype.allele2id = a2.alleleid)
           -> Seq Scan on genotype (cost=0.00..13976429.77 rows=601095277 width=36)
           -> Hash (cost=42474.59..42474.59 rows=2447659 width=6)
                 -> Seq Scan on allele a2 (cost=0.00..42474.59 rows=2447659 width=6)
     -> Hash (cost=42474.59..42474.59 rows=2447659 width=6)
           -> Seq Scan on allele a1 (cost=0.00..42474.59 rows=2447659 width=6)
(9 rows)

and here's what "explain analyze select..." told me.

canon=#
canon=# explain analyze select subjectid, genotype.markerid, a1.value as allele1,
canon-#           a2.value as allele2, genotype.dateCreated,
canon-#           genotype.dateReplaced, genotype.ignore,
canon-#           genotype.inconsistent
canon-#      from genotype, allele a1, allele a2
canon-#     where
canon-#           allele1id = a1.alleleid
canon-#       and
canon-#           allele2id = a2.alleleid;
                                                                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
   Hash Join (cost=165264.65..55486119.31 rows=601095277 width=32) (actual time=198780.631..67980788.847 rows=601095175 loops=1)
     Hash Cond: (genotype.allele1id = a1.alleleid)
     -> Hash Join (cost=82632.33..34731274.54 rows=601095277 width=34) (actual time=107782.354..64802207.979 rows=601095175 loops=1)
           Hash Cond: (genotype.allele2id = a2.alleleid)
           -> Seq Scan on genotype (cost=0.00..13976429.77 rows=601095277 width=36) (actual time=0.059..7262971.427 rows=601095175 loops=1)
           -> Hash (cost=42474.59..42474.59 rows=2447659 width=6) (actual time=105348.936..105348.936 rows=2447675 loops=1)
                 -> Seq Scan on allele a2 (cost=0.00..42474.59 rows=2447659 width=6) (actual time=0.009..4009.140 rows=2447675 loops=1)
     -> Hash (cost=42474.59..42474.59 rows=2447659 width=6) (actual time=90998.199..90998.199 rows=2447675 loops=1)
           -> Seq Scan on allele a1 (cost=0.00..42474.59 rows=2447659 width=6) (actual time=0.013..3983.008 rows=2447675 loops=1)
   Total runtime: 68792886.556 ms
(10 rows)

canon=#

Can you please comment?

Many thanks.

Regards,

Tena Sakai
tsakai@gallo.ucsf.edu

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Thu 2/19/2009 9:48 PM
To: Tena Sakai
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] very, very slow performance

On Thu, Feb 19, 2009 at 10:02 PM, Tena Sakai <tsakai@gallo.ucsf.edu> wrote:
> Hi Everybody,
>
> I am running postgres v8.3.3 on redhat linux (del hardware)
> with 4 cpu's. This machine is terribly bogged down and I
> would like a bit of help as to what can be done.
>
> For last maybe 18+/- hours, there are 24 queries happening.
> What's odd is that 21 of them are identical queries. This
> happens once in a while (maybe one per month, plus/minus)
> because of the data I need to process. Basically, I fire
> up an application for each data (there are 21 of them) and
> the application gets data out of postgres and transforms
> matrices and put them into the form the subsequent processes
> can handle. I know it sounds dumb, but that's how it is for
> a foreseeable future.
>
> Here's the query (please read them in fixed-size font, if
> you can):
>
> select subjectid, genotype.markerid, a1.value as allele1,
>         a2.value as allele2, genotype.dateCreated,
>         genotype.dateReplaced, genotype.ignore,
>         genotype.inconsistent
>    from genotype, allele a1, allele a2
>   where
>         allele1id = a1.alleleid
>     and
>         allele2id = a2.alleleid;
>
> Genotype table mentioned above has about 600,000,000+ rows. As
> I mentioned, there are 20 more of them running concurrently.
> 3 other jobs look like:

So, you're purposely creating a cross product of 600M rows? How big is
the allele table? What does explain and (it'll take a while to get
it) explain analyze select ... have to say?

If you're doing the cross product on purpose then it's gonna chew up a
lot of memory. I'd suggest raising work_mem to a gig or so and
running only as many of these queries at a time as the machine can
handle. Running in them in parallel of 21 is gonna bog down and be
swapping / spilling to disk all over the place.

If you don't mean to have a cross product then add in the where clause
part / join on syntax to remove the cross product.
> The total memory the machine has is 32 mb and nearly 100%
> of it is consumed. Swap is twice as large as physical memory,
> but very little is used. The load average of the machine when
> I am in this fix is somewhere between 25 and 27.

I'll assume that's 32G, which is a pretty common size for db servers
nowadays. How much is consumed means little, given the tendency of
the kernel to cache. I'm assuming you're on linux / unix since you
didn't say otherwise, and big windows servers are a rarity right now
for pgsql. It's good swap isn't used, shows there's no pressure on
the memory subsystem. I assume you mean this is WHILE the queries are
running.

If you HAVE to run that many queries at once on a machine like this,
then you need a big honking RAID array, or a small one at least.
Generally you want as many mirror sets as you'll have parallel queries
running, preferably double with this kind of sequential scan heavy
load. But you'll still be limited by memory when running these
queries at the same time.

> Each postgres process consumes so little cpu time. The copy
> takes maybe 4% (+/-); the rest of them use somewhere between
> 0.3% and 0.7%. As to memory, the copy takes 3% to 4% and the
> rest takes something like 1.7%.

What does vmstat 10 300 say while the queries are running? We're
looking for high wait percentage. If you've got that you're limited
by the speed of your drives.

> In terms of postgres configuration:
>   max_connections = 100
>   shared_buffers = 1024MB
>   temp_buffers = 128MB
>   max_fsm_pages = 153600
>   vacuum_cost_delay = 0
>   checkpoint_segments = 3
>   checkpoint_timeout = 5min
>   checkpoint_warning = 30s
>
> I don't think I am doing anything wild... Am I?

Actually, you're doing something mild. I'd test 2,4, 6, and 8 gig of
shared_buffers. I'd increase work_mem at least for the user running
the biggest queries, those cross products up earlier in the post.
Also, since you seem to have some large updates, I'd increase the
checkpoint segments to something in the 20 to 100 range.

> Oh, one more thing, I said that there are 24 queries/jobs
> happening, but there are a bunch of them that says <IDLE>
> or <IDLE> in transaction --according to pg_stat_activity
> view.

Oh, then maybe you don't have that many. idle connections are ok,
they use up little. Unless you've got a hundred or so don't worry.
Idle in transaction, OTOH, is bad. It basically holds a "lock" on
reusing old rows in the db and can cause bloat. Generally it's an app
/ app translation error that needs fixing. leaving a transaction open
for very long is a bad thing.

I think there were some planner fixes from 8.3.3 to 8.3.5 btw. Those
might be important. An update might solve your problems.

Re: very, very slow performance

От

"Uwe C. Schroeder"

Дата:

21 февраля 2009 г., 05:12:42

On Friday 20 February 2009, Tena Sakai wrote:
> Hi Scott,
>
> > What does explain and (it'll take a while to get
> > it) explain analyze select ... have to say?
>
> ---------------------------------------------------------------------------
>---------- Hash Join  (cost=165264.65..55486119.31 rows=601095277 width=32)
> Hash Cond: (genotype.allele1id = a1.alleleid)
>      ->  Hash Join  (cost=82632.33..34731274.54 rows=601095277 width=34)
>            Hash Cond: (genotype.allele2id = a2.alleleid)
>            ->  Seq Scan on genotype  (cost=0.00..13976429.77 rows=601095277
> width=36) ->  Hash  (cost=42474.59..42474.59 rows=2447659 width=6) ->  Seq
> Scan on allele a2  (cost=0.00..42474.59 rows=2447659 width=6) ->  Hash
> (cost=42474.59..42474.59 rows=2447659 width=6)
>            ->  Seq Scan on allele a1  (cost=0.00..42474.59 rows=2447659
> width=6) (9 rows)

The above tells you that you don't have indices in place. Postgres chooses a
seq scan - which as the name implies scans all the rows in sequencial order.

I'd add an index on genotype.allele1id and genotype.allele2id
aka
create index gtallele1idx on genotype (allele1id);

create index gtallele2idx on genotype (allele2id);

and also on allele.alleleid:
create index alleleididx on allele (alleleid);

After a "analyze genotype" and "analyze allele" the query should perform much
better. The explain analyze should show you an index scan instead of the seq
scan after that.

HTH

Uwe

Re: very, very slow performance

От

Scott Marlowe

Дата:

21 февраля 2009 г., 07:58:03

On Fri, Feb 20, 2009 at 10:45 PM, Uwe C. Schroeder <uwe@oss4u.com> wrote:
>
> On Friday 20 February 2009, Tena Sakai wrote:
>> Hi Scott,
>>
>> > What does explain and (it'll take a while to get
>> > it) explain analyze select ... have to say?
>>
>> ---------------------------------------------------------------------------
>>---------- Hash Join  (cost=165264.65..55486119.31 rows=601095277 width=32)
>> Hash Cond: (genotype.allele1id = a1.alleleid)
>>      ->  Hash Join  (cost=82632.33..34731274.54 rows=601095277 width=34)
>>            Hash Cond: (genotype.allele2id = a2.alleleid)
>>            ->  Seq Scan on genotype  (cost=0.00..13976429.77 rows=601095277
>> width=36) ->  Hash  (cost=42474.59..42474.59 rows=2447659 width=6) ->  Seq
>> Scan on allele a2  (cost=0.00..42474.59 rows=2447659 width=6) ->  Hash
>> (cost=42474.59..42474.59 rows=2447659 width=6)
>>            ->  Seq Scan on allele a1  (cost=0.00..42474.59 rows=2447659
>> width=6) (9 rows)

I was wrong about this query, it is constrained by the where clause.
I much prefer join on syntax as it's more obvious what's joining to
what.  Pgsql is smart enough to reorder join clauses as long as it's
not contrained in by say, a left join, and even then there's some
wiggle room I think.  Anyway...

> The above tells you that you don't have indices in place. Postgres chooses a
> seq scan - which as the name implies scans all the rows in sequencial order.

Well, he's grabbing everything, so I'm betting an index won't buy you
anything unless everything fits in memory and you set random_page_cost
low enough and shared_buffers and effective_cache high enough, then an
index will lose.

However, if you always access the tables in a given order, you can
cluster tables and get really fast results.  I'd try clustering on an
index for each sub table, clustering on that, and adding order bys to
put the result sets into matching clustered index fields for each
joined table.

Re: very, very slow performance

От

"Tena Sakai"

Дата:

21 февраля 2009 г., 08:59:20

Hi Uwe,

I can certainly add indices and run analyze on each
table and look at the performance. I am going to
upgrade to 8.3.6 (I am at 8.3.3) and then add indices.
It might take a few days before I can update you with
outcome, but I will keep you posted.

Many thanks.

Regards,

Tena Sakai
tsakai@gallo.ucsf.edu

-----Original Message-----
From: Uwe C. Schroeder [mailto:uwe@oss4u.com]
Sent: Fri 2/20/2009 9:45 PM
To: pgsql-admin@postgresql.org
Cc: Tena Sakai; Scott Marlowe
Subject: Re: [ADMIN] very, very slow performance

On Friday 20 February 2009, Tena Sakai wrote:
> Hi Scott,
>
> > What does explain and (it'll take a while to get
> > it) explain analyze select ... have to say?
>
> ---------------------------------------------------------------------------
>---------- Hash Join (cost=165264.65..55486119.31 rows=601095277 width=32)
> Hash Cond: (genotype.allele1id = a1.alleleid)
>      -> Hash Join (cost=82632.33..34731274.54 rows=601095277 width=34)
>            Hash Cond: (genotype.allele2id = a2.alleleid)
>            -> Seq Scan on genotype (cost=0.00..13976429.77 rows=601095277
> width=36) -> Hash (cost=42474.59..42474.59 rows=2447659 width=6) -> Seq
> Scan on allele a2 (cost=0.00..42474.59 rows=2447659 width=6) -> Hash
> (cost=42474.59..42474.59 rows=2447659 width=6)
>            -> Seq Scan on allele a1 (cost=0.00..42474.59 rows=2447659
> width=6) (9 rows)

The above tells you that you don't have indices in place. Postgres chooses a
seq scan - which as the name implies scans all the rows in sequencial order.

I'd add an index on genotype.allele1id and genotype.allele2id
aka
create index gtallele1idx on genotype (allele1id);

create index gtallele2idx on genotype (allele2id);

and also on allele.alleleid:
create index alleleididx on allele (alleleid);

After a "analyze genotype" and "analyze allele" the query should perform much
better. The explain analyze should show you an index scan instead of the seq
scan after that.

HTH

Uwe

Re: very, very slow performance

От

"Tena Sakai"

Дата:

21 февраля 2009 г., 09:11:40

Hi Scott,

Thanks for clustering suggestion. I have never used
cluster command and I need to read up before I can
use it.

I have adjusted postgres parameters per your recommen-
dation. Work_mem is now 8GB, checkpoint_segments is
raised to 100. Shared_buffers is still at 1GB because
this is bound by SHMMAX of kernal, which I cannot alter
for maybe a few days. I am thinking of setting SHMMAX
to 8GB. I may or may not be able to move up to 8.3.6
over this weekend. If not, sometime during the week.

With all that, hopefully, I can achieve better perfor-
mance. More updates to come in a day or two.

Many thanks for your help.

Regards,

Tena Sakai
tsakai@gallo.ucsf.edu

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Sat 2/21/2009 12:57 AM
To: Uwe C. Schroeder
Cc: pgsql-admin@postgresql.org; Tena Sakai
Subject: Re: [ADMIN] very, very slow performance

On Fri, Feb 20, 2009 at 10:45 PM, Uwe C. Schroeder <uwe@oss4u.com> wrote:
>
> On Friday 20 February 2009, Tena Sakai wrote:
>> Hi Scott,
>>
>> > What does explain and (it'll take a while to get
>> > it) explain analyze select ... have to say?
>>
>> ---------------------------------------------------------------------------
>>---------- Hash Join (cost=165264.65..55486119.31 rows=601095277 width=32)
>> Hash Cond: (genotype.allele1id = a1.alleleid)
>>      -> Hash Join (cost=82632.33..34731274.54 rows=601095277 width=34)
>>            Hash Cond: (genotype.allele2id = a2.alleleid)
>>            -> Seq Scan on genotype (cost=0.00..13976429.77 rows=601095277
>> width=36) -> Hash (cost=42474.59..42474.59 rows=2447659 width=6) -> Seq
>> Scan on allele a2 (cost=0.00..42474.59 rows=2447659 width=6) -> Hash
>> (cost=42474.59..42474.59 rows=2447659 width=6)
>>            -> Seq Scan on allele a1 (cost=0.00..42474.59 rows=2447659
>> width=6) (9 rows)

I was wrong about this query, it is constrained by the where clause.
I much prefer join on syntax as it's more obvious what's joining to
what. Pgsql is smart enough to reorder join clauses as long as it's
not contrained in by say, a left join, and even then there's some
wiggle room I think. Anyway...

> The above tells you that you don't have indices in place. Postgres chooses a
> seq scan - which as the name implies scans all the rows in sequencial order.

Well, he's grabbing everything, so I'm betting an index won't buy you
anything unless everything fits in memory and you set random_page_cost
low enough and shared_buffers and effective_cache high enough, then an
index will lose.

However, if you always access the tables in a given order, you can
cluster tables and get really fast results. I'd try clustering on an
index for each sub table, clustering on that, and adding order bys to
put the result sets into matching clustered index fields for each
joined table.

Re: very, very slow performance

От

Jan-Peter Seifert

Дата:

21 февраля 2009 г., 12:45:42

Hello,

> I have adjusted postgres parameters per your recommen-
> dation.  Work_mem is now 8GB,

GB is just a typo I guess? Otherwise the value is insanely high. See:
http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html

Peter

Re: very, very slow performance

От

"Tena Sakai"

Дата:

21 февраля 2009 г., 17:45:52

Hi Peter,

> GB is just a typo I guess?
No, it isn't a typo. I meant it.

> Otherwise the value is insanely high.
I kinda agree, but I am in a process of finding an
equilibrium for my application. (I must admit I am
doing so in a bit of blind fashion, but that's kinda
where I am at.)

As I look at top utility's %MEM column as postgres
processes run under duress, it never goes beyond 1.9.
To me, that's like not using what is available. When
I see a larger number for %MEM, I will readjust as
necessary.

In a previous correspondence, Scott said I was mild
in terms of postgres parameter tuning. Maybe I can
get him to say I am doing something WILD?! ;)

Regards,

Tena Sakai
tsakai@gallo.ucsf.edu

-----Original Message-----
From: Jan-Peter Seifert [mailto:Jan-Peter.Seifert@gmx.de]
Sent: Sat 2/21/2009 5:45 AM
To: Tena Sakai; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] very, very slow performance

Hello,

> I have adjusted postgres parameters per your recommen-
> dation. Work_mem is now 8GB,

GB is just a typo I guess? Otherwise the value is insanely high. See:
http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html

Peter

Re: very, very slow performance

От

Scott Marlowe

Дата:

21 февраля 2009 г., 20:39:37

On Sat, Feb 21, 2009 at 3:11 AM, Tena Sakai <tsakai@gallo.ucsf.edu> wrote:
> Hi Scott,
>
> Thanks for clustering suggestion.  I have never used
> cluster command and I need to read up before I can
> use it.
>
> I have adjusted postgres parameters per your recommen-
> dation.  Work_mem is now 8GB, checkpoint_segments is
> raised to 100.  Shared_buffers is still at 1GB because
> this is bound by SHMMAX of kernal, which I cannot alter
> for maybe a few days.  I am thinking of setting SHMMAX
> to 8GB.

Yeah, I'd reverse those two and set work_mem to something in the 512M
range max.  keep in mind work_mem is per query / per sort.  4 queries
with3 sorts each = max allocation of work_mem*3*4...

 I may or may not be able to move up to 8.3.6

> over this weekend.  If not, sometime during the week.

definitely a good idea.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: very, very slow performance