Re: very, very slow performance

Поиск
Список
Период
Сортировка
От Tena Sakai
Тема Re: very, very slow performance
Дата
Msg-id FE44E0D7EAD2ED4BB2165071DB8E328C0378F774@egcrc-ex01.egcrc.org
обсуждение исходный текст
Ответ на very, very slow performance  ("Tena Sakai" <tsakai@gallo.ucsf.edu>)
Ответы Re: very, very slow performance
Список pgsql-admin

Hi Scott,

> What does explain and (it'll take a while to get
> it) explain analyze select ... have to say?

Here's what "explain select..." told me:

  canon=# explain select subjectid, genotype.markerid, a1.value as allele1,
  canon-#           a2.value as allele2, genotype.dateCreated,
  canon-#           genotype.dateReplaced, genotype.ignore,
  canon-#           genotype.inconsistent
  canon-#      from genotype, allele a1, allele a2
  canon-#     where
  canon-#           allele1id = a1.alleleid
  canon-#       and
  canon-#           allele2id = a2.alleleid;
                                       QUERY PLAN                                     
  -------------------------------------------------------------------------------------
   Hash Join  (cost=165264.65..55486119.31 rows=601095277 width=32)
     Hash Cond: (genotype.allele1id = a1.alleleid)
     ->  Hash Join  (cost=82632.33..34731274.54 rows=601095277 width=34)
           Hash Cond: (genotype.allele2id = a2.alleleid)
           ->  Seq Scan on genotype  (cost=0.00..13976429.77 rows=601095277 width=36)
           ->  Hash  (cost=42474.59..42474.59 rows=2447659 width=6)
                 ->  Seq Scan on allele a2  (cost=0.00..42474.59 rows=2447659 width=6)
     ->  Hash  (cost=42474.59..42474.59 rows=2447659 width=6)
           ->  Seq Scan on allele a1  (cost=0.00..42474.59 rows=2447659 width=6)
  (9 rows)
 

and here's what "explain analyze select..." told me.

  canon=#
  canon=# explain analyze select subjectid, genotype.markerid, a1.value as allele1,
  canon-#           a2.value as allele2, genotype.dateCreated,
  canon-#           genotype.dateReplaced, genotype.ignore,
  canon-#           genotype.inconsistent
  canon-#      from genotype, allele a1, allele a2
  canon-#     where
  canon-#           allele1id = a1.alleleid
  canon-#       and
  canon-#           allele2id = a2.alleleid;
                                                                   QUERY PLAN                                                                
  --------------------------------------------------------------------------------------------------------------------------------------------
   Hash Join  (cost=165264.65..55486119.31 rows=601095277 width=32) (actual time=198780.631..67980788.847 rows=601095175 loops=1)
     Hash Cond: (genotype.allele1id = a1.alleleid)
     ->  Hash Join  (cost=82632.33..34731274.54 rows=601095277 width=34) (actual time=107782.354..64802207.979 rows=601095175 loops=1)
           Hash Cond: (genotype.allele2id = a2.alleleid)
           ->  Seq Scan on genotype  (cost=0.00..13976429.77 rows=601095277 width=36) (actual time=0.059..7262971.427 rows=601095175 loops=1)
           ->  Hash  (cost=42474.59..42474.59 rows=2447659 width=6) (actual time=105348.936..105348.936 rows=2447675 loops=1)
                 ->  Seq Scan on allele a2  (cost=0.00..42474.59 rows=2447659 width=6) (actual time=0.009..4009.140 rows=2447675 loops=1)
     ->  Hash  (cost=42474.59..42474.59 rows=2447659 width=6) (actual time=90998.199..90998.199 rows=2447675 loops=1)
           ->  Seq Scan on allele a1  (cost=0.00..42474.59 rows=2447659 width=6) (actual time=0.013..3983.008 rows=2447675 loops=1)
   Total runtime: 68792886.556 ms
  (10 rows)
 
  canon=#

Can you please comment?

Many thanks.

Regards,

Tena Sakai
tsakai@gallo.ucsf.edu


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Thu 2/19/2009 9:48 PM
To: Tena Sakai
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] very, very slow performance

On Thu, Feb 19, 2009 at 10:02 PM, Tena Sakai <tsakai@gallo.ucsf.edu> wrote:
> Hi Everybody,
>
> I am running postgres v8.3.3 on redhat linux (del hardware)
> with 4 cpu's.  This machine is terribly bogged down and I
> would like a bit of help as to what can be done.
>
> For last maybe 18+/- hours, there are 24 queries happening.
> What's odd is that 21 of them are identical queries.  This
> happens once in a while (maybe one per month, plus/minus)
> because of the data I need to process.  Basically, I fire
> up an application for each data (there are 21 of them) and
> the application gets data out of postgres and transforms
> matrices and put them into the form the subsequent processes
> can handle.  I know it sounds dumb, but that's how it is for
> a foreseeable future.
>
> Here's the query (please read them in fixed-size font, if
> you can):
>
>  select subjectid, genotype.markerid, a1.value as allele1,
>         a2.value as allele2, genotype.dateCreated,
>         genotype.dateReplaced, genotype.ignore,
>         genotype.inconsistent
>    from genotype, allele a1, allele a2
>   where
>         allele1id = a1.alleleid
>     and
>         allele2id = a2.alleleid;
>
> Genotype table mentioned above has about 600,000,000+ rows.  As
> I mentioned, there are 20 more of them running concurrently.
> 3 other jobs look like:

So, you're purposely creating a cross product of 600M rows? How big is
the allele table?  What does explain and (it'll take a while to get
it) explain analyze select ... have to say?

If you're doing the cross product on purpose then it's gonna chew up a
lot of memory.  I'd suggest raising work_mem to a gig or so and
running only as many of these queries at a time as the machine can
handle.  Running in them in parallel of 21 is gonna bog down and be
swapping / spilling to disk all over the place.

If you don't mean to have a cross product then add in the where clause
part / join on syntax to remove the cross product.
> The total memory the machine has is 32 mb and nearly 100%
> of it is consumed.  Swap is twice as large as physical memory,
> but very little is used.  The load average of the machine when
> I am in this fix is somewhere between 25 and 27.

I'll assume that's 32G, which is a pretty common size for db servers
nowadays.  How much is consumed means little, given the tendency of
the kernel to cache.  I'm assuming you're on linux / unix since you
didn't say otherwise, and big windows servers are a rarity right now
for pgsql.  It's good swap isn't used, shows there's no pressure on
the memory subsystem.  I assume you mean this is WHILE the queries are
running.

If you HAVE to run that many queries at once on a machine like this,
then you need a big honking RAID array, or a small one at least.
Generally you want as many mirror sets as you'll have parallel queries
running, preferably double with this kind of sequential scan heavy
load.  But you'll still be limited by memory when running these
queries at the same time.

> Each postgres process consumes so little cpu time.  The copy
> takes maybe 4% (+/-); the rest of them use somewhere between
> 0.3% and 0.7%.  As to memory, the copy takes 3% to 4% and the
> rest takes something like 1.7%.

What does vmstat 10 300 say while the queries are running?  We're
looking for high wait percentage.  If you've got that you're limited
by the speed of your drives.

> In terms of postgres configuration:
>   max_connections = 100
>   shared_buffers = 1024MB
>   temp_buffers = 128MB
>   max_fsm_pages = 153600
>   vacuum_cost_delay = 0
>   checkpoint_segments = 3
>   checkpoint_timeout = 5min
>   checkpoint_warning = 30s
>
> I don't think I am doing anything wild...  Am I?

Actually, you're doing something mild.  I'd test 2,4, 6, and 8 gig of
shared_buffers.  I'd increase work_mem at least for the user running
the biggest queries, those cross products up earlier in the post.
Also, since you seem to have some large updates, I'd increase the
checkpoint segments to something in the 20 to 100 range.

> Oh, one more thing, I said that there are 24 queries/jobs
> happening, but there are a bunch of them that says <IDLE>
> or <IDLE> in transaction --according to pg_stat_activity
> view.

Oh, then maybe you don't have that many.  idle connections are ok,
they use up little.  Unless you've got a hundred or so don't worry.
Idle in transaction, OTOH, is bad.  It basically holds a "lock" on
reusing old rows in the db and can cause bloat.  Generally it's an app
/ app translation error that needs fixing.  leaving a transaction open
for very long is a bad thing.

I think there were some planner fixes from 8.3.3 to 8.3.5 btw.  Those
might be important.  An update might solve your problems.

В списке pgsql-admin по дате отправления:

Предыдущее
От: Tino Schwarze
Дата:
Сообщение: Tuning postgres for fast restore?
Следующее
От: "Uwe C. Schroeder"
Дата:
Сообщение: Re: very, very slow performance