Обсуждение: indexes in partitioned tables - again

От:
Samuel Gendler
Дата:

I know this problem crops up all the time and I have read what I could
find, but I'm still not finding an answer to my problem.  This is all
postgres 8.3. Yes, I've enabled constraint_exclusion.  Yes, there are
indexes on the partitions, not just on the parent.

I've got a table with 1 month partitions.  As it happens, I've only
got 2 partitions at the moment, one with 12 million rows and the other
with 5 million.  I only discovered all of the caveats surrounding
indexes and partitioned tables when I executed a very simple query,
saw that it took far too long to run, and started looking at what the
query planner did. In this case, I simply want the set of distinct
values for a particular column, across all partitions.  The set of
distinct values is very small (3) and there is an index on the column,
so I'd expect an index scan to return the 3 values almost
instantaneously.  I turns out that when I query the partitions
directly, the planner does an index scan.  When I query the parent
table, I get full table scans instead of merged output from n index
scans.  Even worse, instead of getting the distinct values from each
partition and merging those, it merges each partition in its entirety
and then sorts and uniques, which is pretty much the pathological
execution order.

 I'll give the queries, then the schema, then the various explain outputs.

(parent table) select distinct probe_type_num from
day_scale_radar_performance_fact; (30 seconds)
(partition) select distinct probe_type_num from
day_scale_radar_performace_fact_20100301_0000; (6 seconds)
(partition) select distinct probe_type_num from
day_scale_radar_performance_fact_20100401_0000; (1 second)

(manual union) select distinct probe_type_num from (select distinct
probe_type_num from day_scale_radar_performace_fact_20100301_0000
UNION select distinct probe_type_num from
day_scale_radar_performace_fact_20100401_0000) t2; (7 seconds)

In part, I'm surprised that the index scan takes as long as it does,
since I'd think an index would be able to return the set of keys
relatively quickly.  But that's a secondary issue.

Parent table:
cedexis_v2=# \d day_scale_radar_performance_fact;
       Table "perf_reporting.day_scale_radar_performance_fact"
           Column           |            Type             | Modifiers
----------------------------+-----------------------------+-----------
 count                      | bigint                      | not null
 total_ms                   | bigint                      | not null
 time                       | timestamp without time zone | not null
 market_num                 | integer                     | not null
 country_num                | integer                     | not null
 autosys_num                | integer                     | not null
 provider_owner_zone_id     | integer                     | not null
 provider_owner_customer_id | integer                     | not null
 provider_id                | integer                     | not null
 probe_type_num             | integer                     | not null
Indexes:
    "temp1_probe_type_num" btree (probe_type_num)


partition:
cedexis_v2=# \d day_scale_radar_performance_fact_20100301_0000;
Table "perf_reporting.day_scale_radar_performance_fact_20100301_0000"
           Column           |            Type             | Modifiers
----------------------------+-----------------------------+-----------
 count                      | bigint                      | not null
 total_ms                   | bigint                      | not null
 time                       | timestamp without time zone | not null
 market_num                 | integer                     | not null
 country_num                | integer                     | not null
 autosys_num                | integer                     | not null
 provider_owner_zone_id     | integer                     | not null
 provider_owner_customer_id | integer                     | not null
 provider_id                | integer                     | not null
 probe_type_num             | integer                     | not null
Indexes:
    "day_scale_radar_performance_fact_20100301_0000_asn" btree (autosys_num)
    "day_scale_radar_performance_fact_20100301_0000_cty" btree (country_num)
    "day_scale_radar_performance_fact_20100301_0000_mkt" btree (market_num)
    "day_scale_radar_performance_fact_20100301_0000_p" btree (provider_id)
    "day_scale_radar_performance_fact_20100301_0000_poc" btree
(provider_owner_customer_id)
    "day_scale_radar_performance_fact_20100301_0000_poz" btree
(provider_owner_zone_id)
    "day_scale_radar_performance_fact_20100301_0000_pt" btree (probe_type_num)
    "day_scale_radar_performance_fact_20100301_0000_time" btree ("time")
Check constraints:
    "day_scale_radar_performance_fact_20100301_0000_time_check" CHECK
("time" >= '2010-03-01 00:00:00'::timestamp without time zone AND
"time" < '2010-04-01 00:00:00'::timestamp without time zone)
Inherits: day_scale_radar_performance_fact

I also tried creating an index on the relevant column in the parent
table, but it had no effect, either way. You can see it in the table
description above

cedexis_v2=# explain select distinct probe_type_num from
day_scale_radar_performance_fact;

     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1864962.35..1926416.31 rows=200 width=4)
   ->  Sort  (cost=1864962.35..1895689.33 rows=12290793 width=4)
         Sort Key:
perf_reporting.day_scale_radar_performance_fact.probe_type_num
         ->  Result  (cost=0.00..249616.93 rows=12290793 width=4)
               ->  Append  (cost=0.00..249616.93 rows=12290793 width=4)
                     ->  Seq Scan on day_scale_radar_performance_fact
(cost=0.00..19.90 rows=990 width=4)
                     ->  Seq Scan on
day_scale_radar_performance_fact_20100401_0000
day_scale_radar_performance_fact  (cost=0.00..31388.01 rows=1545501
width=4)
                     ->  Seq Scan on
day_scale_radar_performance_fact_20100301_0000
day_scale_radar_performance_fact  (cost=0.00..218209.02 rows=10744302
width=4)


cedexis_v2=# explain select distinct probe_type_num from
day_scale_radar_performance_fact_20100301_0000;

         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..684328.92 rows=3 width=4)
   ->  Index Scan using
day_scale_radar_performance_fact_20100301_0000_pt on
day_scale_radar_performance_fact_20100301_0000  (cost=0.00..657468.16
rows=10744302 width=4)


And this is a lot closer to what I would hope the query planner would do:

cedexis_v2=# explain select distinct probe_type_num from (select
distinct probe_type_num from
day_scale_radar_performance_fact_20100401_0000 union
select distinct probe_type_num from
day_scale_radar_performance_fact_20100301_0000) t2;

                     QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=781113.73..781113.84 rows=6 width=4)
   ->  Unique  (cost=781113.73..781113.76 rows=6 width=4)
         ->  Sort  (cost=781113.73..781113.75 rows=6 width=4)
               Sort Key:
day_scale_radar_performance_fact_20100401_0000.probe_type_num
               ->  Append  (cost=0.00..781113.66 rows=6 width=4)
                     ->  Unique  (cost=0.00..96784.68 rows=3 width=4)
                           ->  Index Scan using
day_scale_radar_performance_fact_20100401_0000_pt on
day_scale_radar_performance_fact_20100401_0000  (cost=0.00..92920.93
rows=1545501 width=4)
                     ->  Unique  (cost=0.00..684328.92 rows=3 width=4)
                           ->  Index Scan using
day_scale_radar_performance_fact_20100301_0000_pt on
day_scale_radar_performance_fact_20100301_0000  (cost=0.00..657468.16
rows=10744302 width=4)

От:
Robert Haas
Дата:

On Tue, Apr 6, 2010 at 5:37 PM, Samuel Gendler
<> wrote:
> In part, I'm surprised that the index scan takes as long as it does,
> since I'd think an index would be able to return the set of keys
> relatively quickly.  But that's a secondary issue.

We don't actually have a facility built into the index-scan machinery
to scan for distinct keys.  It's doing a full scan of the index and
then unique-ifying the results afterward.  It produces the right
answers, but it's definitely not as fast as it could be.

The planner is not as smart about partitioned tables as it could be,
either.  A scan of the partitioned tables is implemented as an append
node with one member per partition; and the planner isn't very good at
pushing things down through append nodes.

...Robert

От:
Craig James
Дата:

Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts just
ashort time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike events.
Duringthese spikes, the system is completely unresponsive (you can't even login via ssh). 

I managed to capture one such event using top(1) with the "batch" option as a background process.  See output below -
itshows 19 active postgress processes, but I think it missed the bulk of the spike. 

For some reason, every postgres backend suddenly decides (is told?) to do something.  When this happens, the system
becomeunusable for anywhere from ten seconds to a minute or so, depending on how much web traffic stacks up behind this
event. We have two servers, one offline and one public, and they both do this, so it's not caused by actual web traffic
(andthe Apache logs don't show any HTTP activity correlated with the spikes). 

I thought based on other posts that this might be a background-writer problem, but it's not I/O, it's all CPU as far as
Ican tell. 

Any ideas where I can look to find what's triggering this?

8 CPUs, 8 GB memory
8-disk RAID10 (10k SATA)
Postgres 8.3.0
Fedora 8, kernel is 2.6.24.4-64.fc8
Diffs from original postgres.conf:

max_connections = 1000
shared_buffers = 2000MB
work_mem = 256MB
max_fsm_pages = 16000000
max_fsm_relations = 625000
synchronous_commit = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB
escape_string_warning = off

Thanks,
Craig


top - 11:24:59 up 81 days, 20:27,  4 users,  load average: 0.98, 0.83, 0.92
Tasks: 366 total,  20 running, 346 sleeping,   0 stopped,   0 zombie
Cpu(s): 30.6%us,  1.5%sy,  0.0%ni, 66.3%id,  1.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8194800k total,  8118688k used,    76112k free,       36k buffers
Swap:  2031608k total,   169348k used,  1862260k free,  7313232k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18972 postgres  20   0 2514m  11m 8752 R   11  0.1   0:00.35 postmaster
10618 postgres  20   0 2514m  12m 9456 R    9  0.2   0:00.54 postmaster
10636 postgres  20   0 2514m  11m 9192 R    9  0.1   0:00.45 postmaster
25903 postgres  20   0 2514m  11m 8784 R    9  0.1   0:00.21 postmaster
10626 postgres  20   0 2514m  11m 8716 R    6  0.1   0:00.45 postmaster
10645 postgres  20   0 2514m  12m 9352 R    6  0.2   0:00.42 postmaster
10647 postgres  20   0 2514m  11m 9172 R    6  0.1   0:00.51 postmaster
18502 postgres  20   0 2514m  11m 9016 R    6  0.1   0:00.23 postmaster
10641 postgres  20   0 2514m  12m 9296 R    5  0.2   0:00.36 postmaster
10051 postgres  20   0 2514m  13m  10m R    4  0.2   0:00.70 postmaster
10622 postgres  20   0 2514m  12m 9216 R    4  0.2   0:00.39 postmaster
10640 postgres  20   0 2514m  11m 8592 R    4  0.1   0:00.52 postmaster
18497 postgres  20   0 2514m  11m 8804 R    4  0.1   0:00.25 postmaster
18498 postgres  20   0 2514m  11m 8804 R    4  0.1   0:00.22 postmaster
10341 postgres  20   0 2514m  13m   9m R    2  0.2   0:00.57 postmaster
10619 postgres  20   0 2514m  12m 9336 R    1  0.2   0:00.38 postmaster
15687 postgres  20   0 2321m  35m  35m R    0  0.4   8:36.12 postmaster



От:
"Joshua D. Drake"
Дата:

On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote:
> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts
justa short time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
events. During these spikes, the system is completely unresponsive (you can't even login via ssh). 
>
> I managed to capture one such event using top(1) with the "batch" option as a background process.  See output below -
itshows 19 active postgress processes, but I think it missed the bulk of the spike. 

What does iostat 5 say during the jump?

Joshua D. Drake



--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
От:
Craig James
Дата:

On 4/7/10 2:40 PM, Joshua D. Drake wrote:
> On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote:
>> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts
justa short time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
events. During these spikes, the system is completely unresponsive (you can't even login via ssh). 
>>
>> I managed to capture one such event using top(1) with the "batch" option as a background process.  See output below
-it shows 19 active postgress processes, but I think it missed the bulk of the spike. 
>
> What does iostat 5 say during the jump?

It's very hard to say ... I'll have to start a background job to watch for a day or so.  While it's happening, you
can'tlogin, and any open windows become unresponsive.  I'll probably have to run it at high priority using nice(1) to
getany data at all during the event. 

Would vmstat be informative?

Thanks,
Craig

От:
Greg Smith
Дата:

Craig James wrote:
> I managed to capture one such event using top(1) with the "batch"
> option as a background process.  See output below

You should add "-c" to your batch top capture, then you'll be able to
see what the individual postmaster processes are actually doing when
things get stuck.

> max_connections = 1000
> shared_buffers = 2000MB
> work_mem = 256MB
> Mem:   8194800k total,  8118688k used,    76112k free,       36k buffers
> Swap:  2031608k total,   169348k used,  1862260k free,  7313232k cached

These settings appear way too high for a server with 8GB of RAM.  I'm
not sure if max_connections is too large, or if it's work_mem that's too
big, but one or both of them may need to be tuned way down from where
they are now to get your memory usage under control.  Your server might
running out of RAM during the periods where it becomes
unresponsive--that could be the system paging stuff out to swap, which
isn't necessarily a high user of I/O but it will block things.  Not
having any memory used for buffers is never a good sign.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
Tom Lane
Дата:

Craig James <> writes:
> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts
justa short time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
events. During these spikes, the system is completely unresponsive (you can't even login via ssh). 
> I managed to capture one such event using top(1) with the "batch" option as a background process.  See output below -
itshows 19 active postgress processes, but I think it missed the bulk of the spike. 

> Any ideas where I can look to find what's triggering this?

> Postgres 8.3.0
           ^^^^^

If it's really 8.3.0, try updating to 8.3.something-recent.  We've fixed
a whole lot of bugs since then.

I have a suspicion that this might be an sinval overrun scenario, in
which case you'd need to update to 8.4 to get a real fix.  But updating
in the 8.3 branch would be cheap and easy.

If it is sinval overrun, it would presumably be triggered by a whole lot
of catalog changes being made at approximately the same time.  Can you
correlate the spikes with anything like that?

            regards, tom lane

От:
"Joshua D. Drake"
Дата:

On Wed, 2010-04-07 at 14:45 -0700, Craig James wrote:
> On 4/7/10 2:40 PM, Joshua D. Drake wrote:
> > On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote:
> >> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts
justa short time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
events. During these spikes, the system is completely unresponsive (you can't even login via ssh). 
> >>
> >> I managed to capture one such event using top(1) with the "batch" option as a background process.  See output
below- it shows 19 active postgress processes, but I think it missed the bulk of the spike. 
> >
> > What does iostat 5 say during the jump?
>
> It's very hard to say ... I'll have to start a background job to watch for a day or so.  While it's happening, you
can'tlogin, and any open windows become unresponsive.  I'll probably have to run it at high priority using nice(1) to
getany data at all during the event. 

Do you have sar runing? Say a sar -A ?

>
> Would vmstat be informative?

Yes.

My guess is that it is not CPU, it is IO and your CPU usage is all WAIT
on IO.

To have your CPUs so flooded that they are the cause of an inability to
log in is pretty suspect.

Joshua D. Drake


>
> Thanks,
> Craig
>


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
От:
David Rees
Дата:

On Wed, Apr 7, 2010 at 2:37 PM, Craig James <> wrote:
> Most of the time Postgres runs nicely, but two or three times a day we get a
> huge spike in the CPU load that lasts just a short time -- it jumps to 10-20
> CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
> events.  During these spikes, the system is completely unresponsive (you
> can't even login via ssh).

You need to find out what all those Postgres processes are doing.  You
might try enabling update_process_title and then using ps to figure
out what each instance is using.  Otherwise, you might try enabling
logging of commands that take a certain amount of time to run (see
log_min_duration_statement).

> I managed to capture one such event using top(1) with the "batch" option as
> a background process.  See output below - it shows 19 active postgress
> processes, but I think it missed the bulk of the spike.

Looks like it.  The system doesn't appear to be overloaded at all at that point.

> 8 CPUs, 8 GB memory
> 8-disk RAID10 (10k SATA)
> Postgres 8.3.0

Should definitely update to the latest 8.3.10 - 8.3 has a LOT of known bugs.

> Fedora 8, kernel is 2.6.24.4-64.fc8

Wow, that is very old, too.

> Diffs from original postgres.conf:
>
> max_connections = 1000
> shared_buffers = 2000MB
> work_mem = 256MB

work_mem is way too high for 1000 connections and 8GB ram.  You could
simply be starting up too many postgres processes and overwhelming the
machine.  Either significantly reduce max_connections or work_mem.

> max_fsm_pages = 16000000
> max_fsm_relations = 625000
> synchronous_commit = off

You are playing with fire here.  You should never turn this off unless
you do not care if your data becomes irrecoverably corrupted.

> top - 11:24:59 up 81 days, 20:27,  4 users,  load average: 0.98, 0.83, 0.92
> Tasks: 366 total,  20 running, 346 sleeping,   0 stopped,   0 zombie
> Cpu(s): 30.6%us,  1.5%sy,  0.0%ni, 66.3%id,  1.5%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Mem:   8194800k total,  8118688k used,    76112k free,       36k buffers
> Swap:  2031608k total,   169348k used,  1862260k free,  7313232k cached

System load looks very much OK given that you have 8 CPUs.

> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 18972 postgres  20   0 2514m  11m 8752 R   11  0.1   0:00.35 postmaster
> 10618 postgres  20   0 2514m  12m 9456 R    9  0.2   0:00.54 postmaster
> 10636 postgres  20   0 2514m  11m 9192 R    9  0.1   0:00.45 postmaster
> 25903 postgres  20   0 2514m  11m 8784 R    9  0.1   0:00.21 postmaster
> 10626 postgres  20   0 2514m  11m 8716 R    6  0.1   0:00.45 postmaster
> 10645 postgres  20   0 2514m  12m 9352 R    6  0.2   0:00.42 postmaster
> 10647 postgres  20   0 2514m  11m 9172 R    6  0.1   0:00.51 postmaster
> 18502 postgres  20   0 2514m  11m 9016 R    6  0.1   0:00.23 postmaster
> 10641 postgres  20   0 2514m  12m 9296 R    5  0.2   0:00.36 postmaster
> 10051 postgres  20   0 2514m  13m  10m R    4  0.2   0:00.70 postmaster
> 10622 postgres  20   0 2514m  12m 9216 R    4  0.2   0:00.39 postmaster
> 10640 postgres  20   0 2514m  11m 8592 R    4  0.1   0:00.52 postmaster
> 18497 postgres  20   0 2514m  11m 8804 R    4  0.1   0:00.25 postmaster
> 18498 postgres  20   0 2514m  11m 8804 R    4  0.1   0:00.22 postmaster
> 10341 postgres  20   0 2514m  13m   9m R    2  0.2   0:00.57 postmaster
> 10619 postgres  20   0 2514m  12m 9336 R    1  0.2   0:00.38 postmaster
> 15687 postgres  20   0 2321m  35m  35m R    0  0.4   8:36.12 postmaster

Judging by the amount of CPU time each postmaster as accumulated, they
are all fairly new processes.  How many pg proceses of the ~350
currently running are there?

-Dave

От:
Craig James
Дата:

On 4/7/10 3:36 PM, Joshua D. Drake wrote:
> On Wed, 2010-04-07 at 14:45 -0700, Craig James wrote:
>> On 4/7/10 2:40 PM, Joshua D. Drake wrote:
>>> On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote:
>>>> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts
justa short time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
events. During these spikes, the system is completely unresponsive (you can't even login via ssh). 
>>>>
>>>> I managed to capture one such event using top(1) with the "batch" option as a background process.  See output
below- it shows 19 active postgress processes, but I think it missed the bulk of the spike. 
>>>
>>> What does iostat 5 say during the jump?
>>
>> It's very hard to say ... I'll have to start a background job to watch for a day or so.  While it's happening, you
can'tlogin, and any open windows become unresponsive.  I'll probably have to run it at high priority using nice(1) to
getany data at all during the event. 
>
> Do you have sar runing? Say a sar -A ?

No, I don't have it installed.  I'll have a look. At first glance it looks like a combination of what I can get with
"top-b" and vmstat, but with a single program. 

> My guess is that it is not CPU, it is IO and your CPU usage is all WAIT
> on IO.
>
> To have your CPUs so flooded that they are the cause of an inability to
> log in is pretty suspect.

I thought so too, except that I can't login during the flood.  If the CPUs were all doing iowaits, logging in should be
easy.

Greg's suggestion that shared_buffers and work_mem are too big for an 8 GB system fits these symptoms -- if it's having
aswap storm, login is effectively impossible. 

Craig

>
> Joshua D. Drake
>
>
>>
>> Thanks,
>> Craig
>>
>
>


От:
Craig James
Дата:

On 4/7/10 2:59 PM, Tom Lane wrote:
> Craig James<>  writes:
>> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts
justa short time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
events. During these spikes, the system is completely unresponsive (you can't even login via ssh). 
>> I managed to capture one such event using top(1) with the "batch" option as a background process.  See output below
-it shows 19 active postgress processes, but I think it missed the bulk of the spike. 
>
>> Any ideas where I can look to find what's triggering this?
>
>> Postgres 8.3.0
>             ^^^^^
>
> If it's really 8.3.0, try updating to 8.3.something-recent.  We've fixed
> a whole lot of bugs since then.

Good advice, I've been meaning to do this, maybe this will be a kick in the pants to motivate me.

> I have a suspicion that this might be an sinval overrun scenario, in
> which case you'd need to update to 8.4 to get a real fix.  But updating
> in the 8.3 branch would be cheap and easy.
>
> If it is sinval overrun, it would presumably be triggered by a whole lot
> of catalog changes being made at approximately the same time.  Can you
> correlate the spikes with anything like that?

Not that I know of.  Just regular web traffic.  On the backup server these events happen occasionally even when there
islittle or no web traffic, and nobody logged in doing maintenance. 

>
>             regards, tom lane
>


От:
Steve Crawford
Дата:

>> ...Can you
>> correlate the spikes with anything like that?
>
> Not that I know of.  Just regular web traffic.  On the backup server
> these events happen occasionally even when there is little or no web
> traffic, and nobody logged in doing maintenance.
What, if anything, are you logging in the PostgreSQL logs? Anything
interesting, there?

Cheers,
Steve


От:
David Rees
Дата:

On Wed, Apr 7, 2010 at 3:57 PM, Craig James <> wrote:
> On 4/7/10 3:36 PM, Joshua D. Drake wrote:
>> My guess is that it is not CPU, it is IO and your CPU usage is all WAIT
>> on IO.
>>
>> To have your CPUs so flooded that they are the cause of an inability to
>> log in is pretty suspect.
>
> I thought so too, except that I can't login during the flood.  If the CPUs
> were all doing iowaits, logging in should be easy.

No - logging in with high iowait is very harder to do than high CPU
time because of latency of disk access.

> Greg's suggestion that shared_buffers and work_mem are too big for an 8 GB
> system fits these symptoms -- if it's having a swap storm, login is
> effectively impossible.

A swap storm effectively puts the machine into very high iowait time.

-Dave

От:
Tom Lane
Дата:

Craig James <> writes:
> On 4/7/10 3:36 PM, Joshua D. Drake wrote:
>> To have your CPUs so flooded that they are the cause of an inability to
>> log in is pretty suspect.

> I thought so too, except that I can't login during the flood.  If the CPUs were all doing iowaits, logging in should
beeasy. 

> Greg's suggestion that shared_buffers and work_mem are too big for an 8 GB system fits these symptoms -- if it's
havinga swap storm, login is effectively impossible. 

Yeah, but there is also the question of what's causing all the backends
to try to run at the same time.  Oversubscribed memory could well be the
direct cause of the machine getting driven into the ground, but there's
something else going on here too IMO.

Anyway I concur with the advice to lower shared_buffers, and run fewer
backends if possible, to see if that ameliorates the problem.

            regards, tom lane

От:
Robert Haas
Дата:

On Wed, Apr 7, 2010 at 6:56 PM, David Rees <> wrote:
>> max_fsm_pages = 16000000
>> max_fsm_relations = 625000
>> synchronous_commit = off
>
> You are playing with fire here.  You should never turn this off unless
> you do not care if your data becomes irrecoverably corrupted.

That is not correct.  Turning off synchronous_commit is sensible if
you don't mind losing the last few transactions on a crash.  What will
corrupt your database is if you turn off fsync.

...Robert

От:
Craig James
Дата:

On 4/7/10 5:47 PM, Robert Haas wrote:
> On Wed, Apr 7, 2010 at 6:56 PM, David Rees<>  wrote:
>>> max_fsm_pages = 16000000
>>> max_fsm_relations = 625000
>>> synchronous_commit = off
>>
>> You are playing with fire here.  You should never turn this off unless
>> you do not care if your data becomes irrecoverably corrupted.
>
> That is not correct.  Turning off synchronous_commit is sensible if
> you don't mind losing the last few transactions on a crash.  What will
> corrupt your database is if you turn off fsync.

A bit off the original topic, but ...

I set it this way because I was advised that with a battery-backed RAID controller, this was a safe setting.  Is that
notthe case? 

Craig

От:
David Rees
Дата:

On Wed, Apr 7, 2010 at 7:06 PM, Craig James <> wrote:
> On 4/7/10 5:47 PM, Robert Haas wrote:
>> On Wed, Apr 7, 2010 at 6:56 PM, David Rees<>  wrote:
>>>> synchronous_commit = off
>>>
>>> You are playing with fire here.  You should never turn this off unless
>>> you do not care if your data becomes irrecoverably corrupted.
>>
>> That is not correct.  Turning off synchronous_commit is sensible if
>> you don't mind losing the last few transactions on a crash.  What will
>> corrupt your database is if you turn off fsync.

Whoops, you're right.

> A bit off the original topic, but ...
>
> I set it this way because I was advised that with a battery-backed RAID
> controller, this was a safe setting.  Is that not the case?

Robert has it right - with synchronous_commit off, your database will
always be consistent, but you may lose transactions in the event of a
crash.

Doesn't matter if you have a BBU or not - all the BBU does is give the
controller the ability to acknowledge a write without the data
actually having been written to disk.

According to the documentation, with synchronous_commit off and a
default wal_writer_delay of 200ms, it's possible to lose up to a
maximum of 600ms of data you thought were written to disk.

-Dave

От:
Robert Haas
Дата:

On Wed, Apr 7, 2010 at 10:50 PM, David Rees <> wrote:
> On Wed, Apr 7, 2010 at 7:06 PM, Craig James <> wrote:
>> On 4/7/10 5:47 PM, Robert Haas wrote:
>>> On Wed, Apr 7, 2010 at 6:56 PM, David Rees<>  wrote:
>>>>> synchronous_commit = off
>>>>
>>>> You are playing with fire here.  You should never turn this off unless
>>>> you do not care if your data becomes irrecoverably corrupted.
>>>
>>> That is not correct.  Turning off synchronous_commit is sensible if
>>> you don't mind losing the last few transactions on a crash.  What will
>>> corrupt your database is if you turn off fsync.
>
> Whoops, you're right.
>
>> A bit off the original topic, but ...
>>
>> I set it this way because I was advised that with a battery-backed RAID
>> controller, this was a safe setting.  Is that not the case?
>
> Robert has it right - with synchronous_commit off, your database will
> always be consistent, but you may lose transactions in the event of a
> crash.
>
> Doesn't matter if you have a BBU or not - all the BBU does is give the
> controller the ability to acknowledge a write without the data
> actually having been written to disk.
>
> According to the documentation, with synchronous_commit off and a
> default wal_writer_delay of 200ms, it's possible to lose up to a
> maximum of 600ms of data you thought were written to disk.

So, IOW, if you're running a social networking web site and your
database is full of status updates sent by teenagers to other
teenagers, you might judge that turning off synchronous_commit is a
reasonable thing to do, if you need the performance.  If you're
running a bank and your database is full of information on wire
transfers sent and received, not so much.

...Robert

От:
"Joshua D. Drake"
Дата:

On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote:
> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts
justa short time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
events. During these spikes, the system is completely unresponsive (you can't even login via ssh). 
>
> I managed to capture one such event using top(1) with the "batch" option as a background process.  See output below -
itshows 19 active postgress processes, but I think it missed the bulk of the spike. 

What does iostat 5 say during the jump?

Joshua D. Drake



--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.


От:
"Joshua D. Drake"
Дата:

On Wed, 2010-04-07 at 14:45 -0700, Craig James wrote:
> On 4/7/10 2:40 PM, Joshua D. Drake wrote:
> > On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote:
> >> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts
justa short time -- it jumps to 10-20 CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike
events. During these spikes, the system is completely unresponsive (you can't even login via ssh). 
> >>
> >> I managed to capture one such event using top(1) with the "batch" option as a background process.  See output
below- it shows 19 active postgress processes, but I think it missed the bulk of the spike. 
> >
> > What does iostat 5 say during the jump?
>
> It's very hard to say ... I'll have to start a background job to watch for a day or so.  While it's happening, you
can'tlogin, and any open windows become unresponsive.  I'll probably have to run it at high priority using nice(1) to
getany data at all during the event. 

Do you have sar runing? Say a sar -A ?

>
> Would vmstat be informative?

Yes.

My guess is that it is not CPU, it is IO and your CPU usage is all WAIT
on IO.

To have your CPUs so flooded that they are the cause of an inability to
log in is pretty suspect.

Joshua D. Drake


>
> Thanks,
> Craig
>


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.


От:
Greg Smith
Дата:

David Rees wrote:
> You need to find out what all those Postgres processes are doing.  You
> might try enabling update_process_title and then using ps to figure
> out what each instance is using.

That's what the addition of "-c" to top I suggested does on Linux; it
shows the updated process titles where the command line is in the
default config.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
Alan Hodgson
Дата:

On Wednesday 07 April 2010, Craig James <> wrote:
> I thought so too, except that I can't login during the flood.  If the
> CPUs were all doing iowaits, logging in should be easy.

Busying out the drives is about the most reliable way to make logging in
very slow (especially, but not only, if it's due to swapping).

От:
Bruce Momjian
Дата:

Craig James wrote:
> On 4/7/10 5:47 PM, Robert Haas wrote:
> > On Wed, Apr 7, 2010 at 6:56 PM, David Rees<>  wrote:
> >>> max_fsm_pages = 16000000
> >>> max_fsm_relations = 625000
> >>> synchronous_commit = off
> >>
> >> You are playing with fire here.  You should never turn this off unless
> >> you do not care if your data becomes irrecoverably corrupted.
> >
> > That is not correct.  Turning off synchronous_commit is sensible if
> > you don't mind losing the last few transactions on a crash.  What will
> > corrupt your database is if you turn off fsync.
>
> A bit off the original topic, but ...
>
> I set it this way because I was advised that with a battery-backed
> RAID controller, this was a safe setting.  Is that not the case?

To get good performance, you can either get a battery-backed RAID
controller or risk losing a few transaction with synchronous_commit =
off.  If you already have a battery-backed RAID controller, there is
little benefit to turning synchronous_commit off, and some major
downsides (possible data loss).

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

От:
Craig James
Дата:

I'm reviving this question because I never figured it out.  To summarize: At random intervals anywhere from a few times
perhour to once or twice a day, we see a huge spike in CPU load that essentially brings the system to a halt for up to
aminute or two.  Previous answers focused on "what is it doing", i.e. is it really Postgres or something else? 

Now the question has narrowed down to this: what could trigger EVERY postgres backend to do something at the same time?
See the attached output from "top -b", which shows what is happening during one of the CPU spikes. 

A little background about our system.  We have roughly 100 FastCGI clients connected at all times that are called on to
generateimages from data in the database.  Even though there are a lot of these, they don't do much.  They sit there
mostof the time, then they spew out a couple dozen GIF images in about one second as a user gets a new page of data.
EachGIF image requires fetching a single row using a single indexed column, so it's a trival amount of work for
Postgres.

We also have the "heavy lift" application that does the search.  Typically one or two of these is running at a time,
andtakes from a fraction of a second to a few minutes to complete.  In this particular instance, immediately before
thisspike, the CPU load was only at about 10% -- a couple users poking around with easy queries. 

So what is it that will cause every single Postgres backend to come to life at the same moment, when there's no real
loadon the server?  Maybe if a backend crashes?  Some other problem? 

There's nothing in the serverlog.

Thanks,
Craig


top - 12:15:09 up 81 days, 21:18,  4 users,  load average: 0.38, 0.38, 0.73
Tasks: 374 total,  95 running, 279 sleeping,   0 stopped,   0 zombie
Cpu(s): 62.5%us,  2.2%sy,  0.0%ni, 34.9%id,  0.2%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   8194800k total,  7948928k used,   245872k free,       36k buffers
Swap:  2031608k total,   161136k used,  1870472k free,  7129744k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
22120 postgres  20   0 2514m  17m  13m R   11  0.2   0:01.02 postmaster
18497 postgres  20   0 2514m  11m 8832 R    6  0.1   0:00.62 postmaster
22962 postgres  20   0 2514m  12m 9548 R    6  0.2   0:00.22 postmaster
24002 postgres  20   0 2514m  11m 8804 R    6  0.1   0:00.15 postmaster
25900 postgres  20   0 2514m  11m 8824 R    6  0.1   0:00.55 postmaster
  8941 postgres  20   0 2324m 6172 4676 R    5  0.1   0:00.32 postmaster
10622 postgres  20   0 2514m  12m 9444 R    5  0.2   0:00.79 postmaster
14021 postgres  20   0 2514m  11m 8548 R    5  0.1   0:00.28 postmaster
14075 postgres  20   0 2514m  11m 8672 R    5  0.1   0:00.27 postmaster
14423 postgres  20   0 2514m  11m 8572 R    5  0.1   0:00.29 postmaster
18896 postgres  20   0 2324m 5644 4204 R    5  0.1   0:00.11 postmaster
18897 postgres  20   0 2514m  12m 9800 R    5  0.2   0:00.27 postmaster
18928 postgres  20   0 2514m  11m 8792 R    5  0.1   0:00.18 postmaster
18973 postgres  20   0 2514m  11m 8792 R    5  0.1   0:00.70 postmaster
22049 postgres  20   0 2514m  17m  14m R    5  0.2   0:01.11 postmaster
22050 postgres  20   0 2514m  16m  13m R    5  0.2   0:01.06 postmaster
22843 postgres  20   0 2514m  12m 9328 R    5  0.2   0:00.20 postmaster
24202 postgres  20   0 2324m 5560 4120 R    5  0.1   0:00.07 postmaster
24388 postgres  20   0 2514m  12m 9380 R    5  0.2   0:00.16 postmaster
25903 postgres  20   0 2514m  11m 8828 R    5  0.1   0:00.55 postmaster
28362 postgres  20   0 2514m  11m 8952 R    5  0.1   0:00.48 postmaster
  5667 postgres  20   0 2324m 6752 5588 R    4  0.1   0:08.93 postmaster
  7531 postgres  20   0 2324m 5452 4008 R    4  0.1   0:03.21 postmaster
  9219 postgres  20   0 2514m  11m 8476 R    4  0.1   0:00.89 postmaster
  9820 postgres  20   0 2514m  12m 9.9m R    4  0.2   0:00.92 postmaster
10050 postgres  20   0 2324m 6172 4676 R    4  0.1   0:00.31 postmaster
10645 postgres  20   0 2514m  12m 9512 R    4  0.2   0:00.72 postmaster
14582 postgres  20   0 2514m  25m  21m R    4  0.3   0:02.10 postmaster
18502 postgres  20   0 2514m  11m 9040 R    4  0.1   0:00.64 postmaster
18972 postgres  20   0 2514m  11m 8792 R    4  0.1   0:00.76 postmaster
18975 postgres  20   0 2514m  11m 8904 R    4  0.1   0:00.63 postmaster
19496 postgres  20   0 2514m  14m  11m R    4  0.2   0:00.44 postmaster
22121 postgres  20   0 2514m  16m  13m R    4  0.2   0:00.81 postmaster
24340 postgres  20   0 2514m  12m 9424 R    4  0.2   0:00.15 postmaster
24483 postgres  20   0 2324m 6008 4536 R    4  0.1   0:00.21 postmaster
25668 postgres  20   0 2514m  16m  13m R    4  0.2   0:00.91 postmaster
26382 postgres  20   0 2514m  11m 8996 R    4  0.1   0:00.50 postmaster
28363 postgres  20   0 2514m  11m 8908 R    4  0.1   0:00.34 postmaster
  9754 postgres  20   0 2514m  11m 8752 R    3  0.1   0:00.29 postmaster
16113 postgres  20   0 2514m  17m  14m R    3  0.2   0:01.10 postmaster
18498 postgres  20   0 2514m  11m 8844 R    3  0.1   0:00.63 postmaster
18500 postgres  20   0 2514m  11m 8812 R    3  0.1   0:00.66 postmaster
22116 postgres  20   0 2514m  17m  13m R    3  0.2   0:01.05 postmaster
22287 postgres  20   0 2324m 6072 4596 R    3  0.1   0:00.24 postmaster
22425 postgres  20   0 2514m  17m  14m R    3  0.2   0:01.02 postmaster
22827 postgres  20   0 2514m  13m  10m R    3  0.2   0:00.43 postmaster
23285 postgres  20   0 2514m  13m  10m R    3  0.2   0:00.40 postmaster
24384 postgres  20   0 2514m  12m 9300 R    3  0.2   0:00.15 postmaster
30501 postgres  20   0 2514m  11m 9012 R    3  0.1   0:00.47 postmaster
  5665 postgres  20   0 2324m 6528 5396 R    2  0.1   0:08.71 postmaster
  5671 postgres  20   0 2324m 6720 5596 R    2  0.1   0:08.73 postmaster
  7428 postgres  20   0 2324m 6176 4928 R    2  0.1   0:07.37 postmaster
  7431 postgres  20   0 2324m 6140 4920 R    2  0.1   0:07.40 postmaster
  7433 postgres  20   0 2324m 6372 4924 R    2  0.1   0:07.29 postmaster
  7525 postgres  20   0 2324m 5468 4024 R    2  0.1   0:03.36 postmaster
  7530 postgres  20   0 2324m 5452 4008 R    2  0.1   0:03.40 postmaster
  7532 postgres  20   0 2324m 5440 3996 R    2  0.1   0:03.23 postmaster
  7533 postgres  20   0 2324m 5484 4040 R    2  0.1   0:03.25 postmaster
  8944 postgres  20   0 2514m  26m  23m R    2  0.3   0:02.16 postmaster
  8946 postgres  20   0 2514m  26m  22m R    2  0.3   0:02.06 postmaster
  9821 postgres  20   0 2514m  12m 9948 R    2  0.2   0:00.93 postmaster
10051 postgres  20   0 2514m  13m  10m R    2  0.2   0:01.03 postmaster
10226 postgres  20   0 2514m  27m  23m R    2  0.3   0:02.24 postmaster
10626 postgres  20   0 2514m  12m 9212 R    2  0.1   0:00.83 postmaster
14580 postgres  20   0 2324m 6120 4632 R    2  0.1   0:00.27 postmaster
16112 postgres  20   0 2514m  18m  14m R    2  0.2   0:01.18 postmaster
19450 postgres  20   0 2324m 6108 4620 R    2  0.1   0:00.22 postmaster
22289 postgres  20   0 2514m  22m  19m R    2  0.3   0:01.66 postmaster
  5663 postgres  20   0 2324m 6700 5576 R    1  0.1   0:08.23 postmaster
  7526 postgres  20   0 2324m 5444 4000 R    1  0.1   0:03.44 postmaster
  7528 postgres  20   0 2324m 5444 4000 R    1  0.1   0:03.44 postmaster
  7529 postgres  20   0 2324m 5420 3976 R    1  0.1   0:03.04 postmaster
  8888 postgres  20   0 2514m  25m  22m R    1  0.3   0:02.01 postmaster
  9622 postgres  20   0 2514m  13m  10m R    1  0.2   0:01.08 postmaster
  9625 postgres  20   0 2514m  13m  10m R    1  0.2   0:01.00 postmaster
14686 postgres  20   0 2324m 6116 4628 R    1  0.1   0:00.30 postmaster
14687 postgres  20   0 2514m  24m  21m R    1  0.3   0:01.95 postmaster
16111 postgres  20   0 2514m  17m  14m R    1  0.2   0:01.01 postmaster
16854 postgres  20   0 2324m 5468 4024 R    1  0.1   0:03.31 postmaster
  5664 postgres  20   0 2324m 6740 5584 R    0  0.1   0:08.45 postmaster
  5666 postgres  20   0 2324m 6744 5584 R    0  0.1   0:08.70 postmaster
  5668 postgres  20   0 2324m 6720 5588 R    0  0.1   0:08.58 postmaster
  5670 postgres  20   0 2324m 6748 5584 R    0  0.1   0:08.99 postmaster
  5672 postgres  20   0 2324m 6764 5596 R    0  0.1   0:08.30 postmaster
  7429 postgres  20   0 2324m 6000 4760 R    0  0.1   0:07.41 postmaster
  7430 postgres  20   0 2324m 6080 4928 R    0  0.1   0:07.09 postmaster
  7463 postgres  20   0 2324m 6412 4928 R    0  0.1   0:07.14 postmaster
  7538 postgres  20   0 2324m 5472 4028 R    0  0.1   0:03.42 postmaster
  8887 postgres  20   0 2324m 6184 4680 R    0  0.1   0:00.23 postmaster
  8942 postgres  20   0 2514m  26m  22m R    0  0.3   0:01.97 postmaster
10636 postgres  20   0 2514m  12m 9380 R    0  0.2   0:00.75 postmaster
10640 postgres  20   0 2514m  11m 9148 R    0  0.1   0:00.75 postmaster
15687 postgres  20   0 2321m  35m  35m R    0  0.4   8:38.38 postmaster


От:
"Joshua D. Drake"
Дата:

On Thu, 2010-06-24 at 17:50 -0700, Craig James wrote:
> I'm reviving this question because I never figured it out.  To summarize: At random intervals anywhere from a few
timesper hour to once or twice a day, we see a huge spike in CPU load that essentially brings the system to a halt for
upto a minute or two.  Previous answers focused on "what is it doing", i.e. is it really Postgres or something else? 
>
> Now the question has narrowed down to this: what could trigger EVERY postgres backend to do something at the same
time? See the attached output from "top -b", which shows what is happening during one of the CPU spikes. 

checkpoint causing IO Wait.

What does sar say about these times?

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
От:
Greg Smith
Дата:

Craig James wrote:
> Now the question has narrowed down to this: what could trigger EVERY
> postgres backend to do something at the same time?  See the attached
> output from "top -b", which shows what is happening during one of the
> CPU spikes.

By the way:  you probably want "top -b -c", which will actually show you
what each client is doing via inspecting what it's set its command line to.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
Tom Lane
Дата:

Craig James <> writes:
> So what is it that will cause every single Postgres backend to come to life at the same moment, when there's no real
loadon the server?  Maybe if a backend crashes?  Some other problem? 

sinval queue overflow comes to mind ... although that really shouldn't
happen if there's "no real load" on the server.  What PG version is
this?  Also, the pg_stat_activity view contents when this happens would
probably be more useful to look at than "top" output.

            regards, tom lane

От:
"Joshua D. Drake"
Дата:

On Thu, 2010-06-24 at 17:50 -0700, Craig James wrote:
> I'm reviving this question because I never figured it out.  To summarize: At random intervals anywhere from a few
timesper hour to once or twice a day, we see a huge spike in CPU load that essentially brings the system to a halt for
upto a minute or two.  Previous answers focused on "what is it doing", i.e. is it really Postgres or something else? 
>
> Now the question has narrowed down to this: what could trigger EVERY postgres backend to do something at the same
time? See the attached output from "top -b", which shows what is happening during one of the CPU spikes. 

checkpoint causing IO Wait.

What does sar say about these times?

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering


От:
Craig James
Дата:

On 6/24/10 9:04 PM, Tom Lane wrote:
> Craig James<>  writes:
>> So what is it that will cause every single Postgres backend to come to life at the same moment, when there's no real
loadon the server?  Maybe if a backend crashes?  Some other problem? 
>
> sinval queue overflow comes to mind ... although that really shouldn't
> happen if there's "no real load" on the server.  What PG version is
> this?

8.3.10.  Upgraded based on your advice when I first asked this question.

>   Also, the pg_stat_activity view contents when this happens would
> probably be more useful to look at than "top" output.

I'll try.  It's hard to discover anything because the whole machine is overwhelmed when this happens.  The only way I
gotthe top(1) output was by running it high priority as root using nice(1).  I can't do that with a Postgres backend,
butI'll see what I can do. 

Craig

От:
Tom Lane
Дата:

Craig James <> writes:
> On 6/24/10 9:04 PM, Tom Lane wrote:
>> sinval queue overflow comes to mind ... although that really shouldn't
>> happen if there's "no real load" on the server.  What PG version is
>> this?

> 8.3.10.  Upgraded based on your advice when I first asked this question.

Any chance of going to 8.4?  If this is what I suspect, you really need
this 8.4 fix:
http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php
which eliminated the thundering-herd behavior that previous releases
exhibit when the sinval queue overflows.

If you're stuck on 8.3 then you are going to have to modify your
application's behavior to eliminate sinval overflows.  If the overall
system load isn't high then I would have to guess that the problem is
some individual sessions sitting "idle in transaction" for long periods,
long enough that a number of DDL operations happen elsewhere.

You could also consider throwing memory at the problem by raising the
sinval queue size.  That'd require a custom build since it's not exposed
as a configurable parameter, but it'd be a one-line patch I think.

Or you could look at using connection pooling so you don't have quite
so many backends ...

            regards, tom lane

От:
Rajesh Kumar Mallah
Дата:

Dear Criag,

also check for the possibility of installing sysstat in our system.
it goes a long way in collecting the system stats. you may
consider increasing the frequency of data collection by
changing the interval of cron job manually in /etc/cron.d/
normally its */10 , you may make it */2 for time being.
the software automatically maintains historical records
of data for 1 month.
От:
Craig James
Дата:

On 6/25/10 7:47 AM, Tom Lane wrote:
> Craig James<>  writes:
>> On 6/24/10 9:04 PM, Tom Lane wrote:
>>> sinval queue overflow comes to mind ... although that really shouldn't
>>> happen if there's "no real load" on the server.  What PG version is
>>> this?
>
>> 8.3.10.  Upgraded based on your advice when I first asked this question.
>
> Any chance of going to 8.4?  If this is what I suspect, you really need
> this 8.4 fix:
> http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php
> which eliminated the thundering-herd behavior that previous releases
> exhibit when the sinval queue overflows.

Yes, there is a chance of upgrading to 8.4.4.  I just bought a new server and it has 8.4.4 on it, but it won't be
onlinefor a while so I can't compare yet.  This may motivate me to upgrade the current servers to 8.4.4 too.  I was
pleasedto see that 8.4 has a new upgrade-in-place feature that means we don't have to dump/restore.  That really helps
alot. 

A question about 8.4.4: I've been having problems with bloat.  I thought I'd adjusted the FSM parameters correctly
basedon advice I got here, but apparently not.  8.4.4 has removed the configurable FSM parameters completely, which is
verycool.  But ... if I upgrade a bloated database using the upgrade-in-place feature, will 8.4.4 recover the bloat and
returnit to the OS, or do I still have to recover the space manually (like vacuum-full/reindex, or cluster, or
copy/dropa table)? 

> Or you could look at using connection pooling so you don't have quite
> so many backends ...

I always just assumed that lots of backends that would be harmless if each one was doing very little.  If I understand
yourexplanation, it sounds like that's not entirely true in pre-8.4.4 releases due to the sinval queue problems. 

Thanks,
Craig

От:
"Kevin Grittner"
Дата:

Craig James <> wrote:

> I always just assumed that lots of backends that would be harmless
> if each one was doing very little.

Even if each is doing very little, if a large number of them happen
to make a request at the same time, you can have problems.  This is
exactly where a connection pool can massively improve both
throughput and response time.  If you can arrange it, you want a
connection pool which will put a limit on active database
transactions and queue requests to start a new transaction until one
of the pending ones finishes.

-Kevin

От:
Greg Smith
Дата:

Craig James wrote:
> if I upgrade a bloated database using the upgrade-in-place feature,
> will 8.4.4 recover the bloat and return it to the OS, or do I still
> have to recover the space manually (like vacuum-full/reindex, or
> cluster, or copy/drop a table)?

There's no way for an upgrade in place to do anything about bloat.  The
changes in 8.4 reduce the potential sources for new bloat (like running
out of a FSM pages), and the overhead of running VACUUM drops some due
to things like the "Partial VACUUM" changes.  But existing bloated
tables and indexes are moved forward to the new version without any change.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
Tom Lane
Дата:

Craig James <> writes:
> On 6/25/10 7:47 AM, Tom Lane wrote:
>> Any chance of going to 8.4?  If this is what I suspect, you really need
>> this 8.4 fix:
>> http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php
>> which eliminated the thundering-herd behavior that previous releases
>> exhibit when the sinval queue overflows.

> Yes, there is a chance of upgrading to 8.4.4.  I just bought a new server and it has 8.4.4 on it, but it won't be
onlinefor a while so I can't compare yet.  This may motivate me to upgrade the current servers to 8.4.4 too.  I was
pleasedto see that 8.4 has a new upgrade-in-place feature that means we don't have to dump/restore.  That really helps
alot. 

I wouldn't put a lot of faith in pg_migrator for an 8.3 to 8.4
conversion ... it might work, but test it on a copy of your DB first.
Possibly it'll actually be recommendable in 9.0.

> A question about 8.4.4: I've been having problems with bloat.  I thought I'd adjusted the FSM parameters correctly
basedon advice I got here, but apparently not.  8.4.4 has removed the configurable FSM parameters completely, which is
verycool.  But ... if I upgrade a bloated database using the upgrade-in-place feature, will 8.4.4 recover the bloat and
returnit to the OS, or do I still have to recover the space manually (like vacuum-full/reindex, or cluster, or
copy/dropa table)? 

No, an in-place upgrade to 8.4 isn't magically going to fix that.  This
might actually be sufficient reason to stick with the tried&true dump
and reload method, since you're going to have to do something fairly
expensive anyway to clean out the bloat.

            regards, tom lane

От:
Craig James
Дата:

On 6/25/10 9:41 AM, Kevin Grittner wrote:
> Craig James<>  wrote:
>
>> I always just assumed that lots of backends that would be harmless
>> if each one was doing very little.
>
> Even if each is doing very little, if a large number of them happen
> to make a request at the same time, you can have problems.  This is
> exactly where a connection pool can massively improve both
> throughput and response time.  If you can arrange it, you want a
> connection pool which will put a limit on active database
> transactions and queue requests to start a new transaction until one
> of the pending ones finishes.

No, that's doesn't seem to be the case.  There is no external activity that triggers this huge spike in usage.  It even
happensto our backup server when only one of us is using it to do a single query.  This problem seems to be triggered
byPostgres itself, not by anything external. 

Per Tom's suggestion, I think upgrading to 8.4.4 is the answer.  I'll learn more when our new hardware comes into use
witha shiny new 8.4.4 installation. 

Craig

От:
Craig James
Дата:

I've got a new server and want to make sure it's running well.  Are these pretty decent numbers?

8 cores (2x4 Intel Nehalem 2 GHz)
12 GB memory
12 x 7200 SATA 500 GB disks
3WARE 9650SE-12ML RAID controller with BBU
   WAL on ext2, 2 disks: RAID1 500GB, blocksize=4096
   Database on ext4, 8 disks: RAID10 2TB, stripe size 64K, blocksize=4096
Ubuntu 10.04 LTS (Lucid)
Postgres 8.4.4

pgbench -i -s 100 -U test
pgbench -c 5 -t 20000 -U test
tps = 4903
pgbench -c 10 -t 10000 -U test
tps = 4070
pgbench -c20 -t 5000 -U test
tps = 5789
pgbench -c30 -t 3333 -U test
tps = 6961
pgbench -c40 -t 2500 -U test
tps = 2945

Thanks,
Craig



От:
Scott Marlowe
Дата:

On Fri, Jun 25, 2010 at 2:53 PM, Craig James <> wrote:
> I've got a new server and want to make sure it's running well.  Are these
> pretty decent numbers?
>
> 8 cores (2x4 Intel Nehalem 2 GHz)
> 12 GB memory
> 12 x 7200 SATA 500 GB disks
> 3WARE 9650SE-12ML RAID controller with BBU
>  WAL on ext2, 2 disks: RAID1 500GB, blocksize=4096
>  Database on ext4, 8 disks: RAID10 2TB, stripe size 64K, blocksize=4096
> Ubuntu 10.04 LTS (Lucid)
> Postgres 8.4.4
>
> pgbench -i -s 100 -U test
> pgbench -c 5 -t 20000 -U test
> tps = 4903
> pgbench -c 10 -t 10000 -U test
> tps = 4070
> pgbench -c20 -t 5000 -U test
> tps = 5789
> pgbench -c30 -t 3333 -U test
> tps = 6961
> pgbench -c40 -t 2500 -U test
> tps = 2945

Numbers are okay, but you likely need much longer tests to see how
they average out with the bgwriter / checkpoints happening, and keep
track of your IO numbers to see where your dips are.  I usually run
pgbench runs, once they seem to get decent numbers, for several hours
non-stop.  Sometimes days during burn in.  Note that running pgbench
on a machine other than the actual db is often a good idea so you're
not measuring how fast pgbench can run in contention with your own
database.

От:
Greg Smith
Дата:

Craig James wrote:
> I've got a new server and want to make sure it's running well.

Any changes to the postgresql.conf file?  Generally you need at least a
moderate shared_buffers (1GB or so at a minimum) and checkpoint_segments
(32 or higher) in order for the standard pgbench test to give good results.


> pgbench -c20 -t 5000 -U test
> tps = 5789
> pgbench -c30 -t 3333 -U test
> tps = 6961
> pgbench -c40 -t 2500 -U test
> tps = 2945

General numbers are OK, the major drop going from 30 to 40 clients is
larger than it should be.  I'd suggest running the 40 client count one
again to see if that's consistent.  If it is, that may just be pgbench
itself running into a problem.  It doesn't handle high client counts
very well unless you use the 9.0 version that supports multiple pgbench
workers with the "-j" option.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
Craig James
Дата:

On 6/25/10 12:03 PM, Greg Smith wrote:
> Craig James wrote:
>> I've got a new server and want to make sure it's running well.
>
> Any changes to the postgresql.conf file? Generally you need at least a
> moderate shared_buffers (1GB or so at a minimum) and checkpoint_segments
> (32 or higher) in order for the standard pgbench test to give good results.

max_connections = 500
shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB

For fun I ran it with the installation defaults, and it never got above 1475 TPS.

>> pgbench -c20 -t 5000 -U test
>> tps = 5789
>> pgbench -c30 -t 3333 -U test
>> tps = 6961
>> pgbench -c40 -t 2500 -U test
>> tps = 2945
>
> General numbers are OK, the major drop going from 30 to 40 clients is
> larger than it should be. I'd suggest running the 40 client count one
> again to see if that's consistent.

It is consistent.  When I run pgbench from a different server, I get this:

    pgbench -c40 -t 2500 -U test
    tps = 7999

    pgbench -c100 -t 1000 -U test
    tps = 6693

Craig

От:
Merlin Moncure
Дата:

On Mon, Jun 28, 2010 at 1:12 PM, Craig James <> wrote:
> On 6/25/10 12:03 PM, Greg Smith wrote:
>>
>> Craig James wrote:
>>>
>>> I've got a new server and want to make sure it's running well.
>>
>> Any changes to the postgresql.conf file? Generally you need at least a
>> moderate shared_buffers (1GB or so at a minimum) and checkpoint_segments
>> (32 or higher) in order for the standard pgbench test to give good
>> results.
>
> max_connections = 500
> shared_buffers = 1000MB
> work_mem = 128MB
> synchronous_commit = off
> full_page_writes = off
> wal_buffers = 256kB
> checkpoint_segments = 30
> effective_cache_size = 4GB
>
> For fun I ran it with the installation defaults, and it never got above 1475
> TPS.
>
>>> pgbench -c20 -t 5000 -U test
>>> tps = 5789
>>> pgbench -c30 -t 3333 -U test
>>> tps = 6961
>>> pgbench -c40 -t 2500 -U test
>>> tps = 2945
>>
>> General numbers are OK, the major drop going from 30 to 40 clients is
>> larger than it should be. I'd suggest running the 40 client count one
>> again to see if that's consistent.
>
> It is consistent.  When I run pgbench from a different server, I get this:
>
>   pgbench -c40 -t 2500 -U test
>   tps = 7999
>
>   pgbench -c100 -t 1000 -U test
>   tps = 6693

6k tps over 8 7200 rpm disks is quite good imo.  synchronous_commit
setting is making that possible.  building a server that could handle
that much was insanely expensive just a few years ago, on relatively
cheap sorage. that's 21m transactions an hour or ~ half a billion
transactions a day (!).  running this kind of load 24x7 on postgres
7.x would have been an enormous headache.   how quickly we forget! :-)

your 'real' tps write, 1475 tps, spread over 4 disks doing the actual
writing, is giving you ~ 370 tps/device.  not bad at all -- the raid
controller is doing a good job (the raw drive might get 200 or so).  I
bet performance will be somewhat worse with a higher scaling factor
(say, 500) because there is less locality of writes -- something to
consider if you expect your database to get really big.

merlin

От:
Greg Smith
Дата:

Craig James wrote:
> synchronous_commit = off
> full_page_writes = off

I don't have any numbers handy on how much turning synchronous_commit
and full_page_writes off improves performance on a system with a
battery-backed write cache.  Your numbers are therefore a bit inflated
against similar ones that are doing a regular sync commit.  Just
something to keep in mind when comparing against other people's results.

Also, just as a general comment, increase in work_mem and
effective_cache_size don't actually do anything to the built-in pgbench
test results.

>> General numbers are OK, the major drop going from 30 to 40 clients is
>> larger than it should be. I'd suggest running the 40 client count one
>> again to see if that's consistent.
>
> It is consistent.  When I run pgbench from a different server, I get
> this:
>
>    pgbench -c40 -t 2500 -U test
>    tps = 7999
>
>    pgbench -c100 -t 1000 -U test
>    tps = 6693

Looks like you're just running into the limitations of the old pgbench
code failing to keep up with high client count loads when run on the
same system as the server.  Nothing to be concerned about--that the drop
is only small with the pgbench client remote says there's not actually a
server problem here.

With that sorted out, your system looks in the normal range for the sort
of hardware you're using.  I'm always concerned about the potential
reliability issues that come with async commit and turning off full page
writes though, so you might want to re-test with those turned on and see
if you can live with the results.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


От:
Craig James
Дата:

I can query either my PARENT table joined to PRICES, or my VERSION table joined to PRICES, and get an answer in 30-40
msec. But put the two together, it jumps to 4 seconds.  What am I missing here?  I figured this query would be nearly
instantaneous. The VERSION.ISOSMILES and PARENT.ISOSMILES columns both have unique indexes.  Instead of using these
indexes,it's doing a full-table scan of both tables, even though there can't possibly be more than one match in each
table.

I guess I could rewrite this as a UNION of the two subqueries, but that seems contrived.

This is PG 8.3.10 on Linux.

Thanks,
Craig


=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id
-> from plus p join sample s
->  on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id)
-> join version vn on (s.version_id = vn.version_id) join parent pn
->  on (s.parent_id = pn.parent_id)
-> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'
-> or pn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'
-> order by price;

  Sort  (cost=71922.00..71922.00 rows=1 width=19) (actual time=4337.114..4337.122 rows=10 loops=1)
    Sort Key: p.price   Sort Method:  quicksort  Memory: 25kB
    ->  Nested Loop  (cost=18407.53..71921.99 rows=1 width=19) (actual time=1122.685..4337.028 rows=10 loops=1)
          ->  Hash Join  (cost=18407.53..71903.71 rows=4 width=20) (actual time=1122.624..4336.682 rows=7 loops=1)
                Hash Cond: (s.version_id = vn.version_id)
                Join Filter: ((vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text) OR (pn.isosmiles =
'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text))
                ->  Hash Join  (cost=8807.15..44470.73 rows=620264 width=54) (actual time=431.501..2541.329 rows=620264
loops=1)
                      Hash Cond: (s.parent_id = pn.parent_id)
                      ->  Seq Scan on sample s  (cost=0.00..21707.64 rows=620264 width=24) (actual time=0.008..471.340
rows=620264loops=1) 
                      ->  Hash  (cost=5335.40..5335.40 rows=277740 width=38) (actual time=431.166..431.166 rows=277740
loops=1)
                            ->  Seq Scan on parent pn  (cost=0.00..5335.40 rows=277740 width=38) (actual
time=0.012..195.822rows=277740 loops=1) 
                ->  Hash  (cost=5884.06..5884.06 rows=297306 width=38) (actual time=467.267..467.267 rows=297306
loops=1)
                      ->  Seq Scan on version vn  (cost=0.00..5884.06 rows=297306 width=38) (actual time=0.017..215.285
rows=297306loops=1) 
          ->  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 rows=4 width=26) (actual
time=0.039..0.041rows=1 loops=7) 
                Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id))
  Total runtime: 4344.222 ms
(17 rows)


If I only query the VERSION table, it's very fast:

x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id
-> from plus p
-> join sample s on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id)
-> join version vn on (s.version_id = vn.version_id)
-> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' order by price;

Sort  (cost=45.73..45.74 rows=1 width=19) (actual time=32.438..32.448 rows=10 loops=1)
    Sort Key: p.price
    Sort Method:  quicksort  Memory: 25kB
    ->  Nested Loop  (cost=0.00..45.72 rows=1 width=19) (actual time=32.309..32.411 rows=10 loops=1)
          ->  Nested Loop  (cost=0.00..36.58 rows=2 width=20) (actual time=32.295..32.319 rows=7 loops=1)
                ->  Index Scan using i_version_isosmiles on version vn  (cost=0.00..8.39 rows=1 width=4) (actual
time=32.280..32.281rows=1 loops=1) 
                      Index Cond: (isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text)
                ->  Index Scan using i_sample_version_id on sample s  (cost=0.00..28.12 rows=6 width=20) (actual
time=0.011..0.024rows=7 loops=1) 
                      Index Cond: (s.version_id = vn.version_id)
          ->  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 rows=4 width=26) (actual
time=0.010..0.011rows=1 loops=7) 
                Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id))
  Total runtime: 32.528 ms
(12 rows)


Same good performance if I only query the PARENT table:

x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, pn.parent_id from plus p join sample s on
(p.compound_id= s.compound_id and p.supplier_id = s.supplier_id) join parent pn on (s.parent_id = pn.parent_id) where
pn.isosmiles= 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' order by price; 
                                                                     QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=57.73..57.74 rows=1 width=19) (actual time=43.564..43.564 rows=10 loops=1)
    Sort Key: p.price
    Sort Method:  quicksort  Memory: 25kB
    ->  Nested Loop  (cost=0.00..57.72 rows=1 width=19) (actual time=43.429..43.537 rows=10 loops=1)
          ->  Nested Loop  (cost=0.00..48.58 rows=2 width=20) (actual time=43.407..43.430 rows=7 loops=1)
                ->  Index Scan using i_parent_isosmiles on parent pn  (cost=0.00..8.38 rows=1 width=4) (actual
time=27.342..27.343rows=1 loops=1) 
                      Index Cond: (isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text)
                ->  Index Scan using i_sample_parent_id on sample s  (cost=0.00..40.09 rows=9 width=20) (actual
time=16.057..16.070rows=7 loops=1) 
                      Index Cond: (s.parent_id = pn.parent_id)
          ->  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 rows=4 width=26) (actual
time=0.010..0.011rows=1 loops=7) 
                Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id))
  Total runtime: 43.628 ms




x=> \d version
       Table "x.version"
    Column   |  Type   | Modifiers
------------+---------+-----------
  version_id | integer | not null
  parent_id  | integer | not null
  isosmiles  | text    | not null
  coord_2d   | text    |
Indexes:
     "version_pkey" PRIMARY KEY, btree (version_id)
     "i_version_isosmiles" UNIQUE, btree (isosmiles)
     "i_version_parent_id" btree (parent_id)
Foreign-key constraints:
     "fk_parent" FOREIGN KEY (parent_id) REFERENCES parent(parent_id) ON DELETE CASCADE

x=> \d parent
       Table "x.parent"
   Column   |  Type   | Modifiers
-----------+---------+-----------
  parent_id | integer | not null
  isosmiles | text    | not null
  coord_2d  | text    |
Indexes:
     "parent_pkey" PRIMARY KEY, btree (parent_id)
     "i_parent_isosmiles" UNIQUE, btree (isosmiles)

=> \d sample
                                Table "reaxys.sample"
        Column       |  Type   |                      Modifiers
--------------------+---------+-----------------------------------------------------
  sample_id          | integer | not null default nextval('sample_id_seq'::regclass)
  sample_id_src      | integer |
  parent_id          | integer | not null
  version_id         | integer | not null
  supplier_id        | integer | not null
  catalogue_id       | integer | not null
  catalogue_issue_id | integer | not null
  load_id            | integer | not null
  load_file_id       | integer |
  compound_id        | text    | not null
  cas_number         | text    |
  purity             | text    |
  chemical_name      | text    |
  url                | text    |
  price_code         | text    |
  comment            | text    |
  salt_comment       | text    |
Indexes:
     "sample_pkey" PRIMARY KEY, btree (sample_id)
     "i_sample_casno" btree (cas_number)
     "i_sample_catalogue_id" btree (catalogue_id)
     "i_sample_catalogue_issue_id" btree (catalogue_issue_id)
     "i_sample_chem_name" btree (chemical_name)
     "i_sample_compound_id" btree (compound_id)
     "i_sample_load_id" btree (load_id)
     "i_sample_parent_id" btree (parent_id)
     "i_sample_sample_id_src" btree (sample_id_src)
     "i_sample_supplier_id" btree (supplier_id)
     "i_sample_version_id" btree (version_id)
Foreign-key constraints:
     "fk_item" FOREIGN KEY (version_id) REFERENCES version(version_id) ON DELETE CASCADE

От:
Robert Haas
Дата:

On Thu, Aug 5, 2010 at 2:34 PM, Craig James <> wrote:
> => explain analyze select p.price, p.amount, p.units, s.catalogue_id,
> vn.version_id
> -> from plus p join sample s
> ->  on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id)
> -> join version vn on (s.version_id = vn.version_id) join parent pn
> ->  on (s.parent_id = pn.parent_id)
> -> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'
> -> or pn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'
> -> order by price;

Well, you can't evaluate the WHERE clause here until you've joined {s vn pn}.

> If I only query the VERSION table, it's very fast:
>
> x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id,
> vn.version_id
> -> from plus p
> -> join sample s on (p.compound_id = s.compound_id and p.supplier_id =
> s.supplier_id)
> -> join version vn on (s.version_id = vn.version_id)
> -> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' order by price;

But here you can push the WHERE clause all the way down to the vn
table, and evaluate it right at the get go, which is pretty much
exactly what is happening.

In the first case, you have to join all 297,306 vn rows against s,
because they could be interesting if the other half of the WHERE
clause turns out to hold.  In the second case, you can throw away
297,305 of those 297,306 rows before doing anything else, because
there's no possibility that they can ever be interesting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company