Обсуждение: Slow Postgresql server

От:
Jason Lustig
Дата:

Hello all,

My website has been having issues with our new Linux/PostgreSQL
server being somewhat slow. I have done tests using Apache Benchmark
and for pages that do not connect to Postgres, the speeds are much
faster (334 requests/second v. 1-2 requests/second), so it seems that
Postgres is what's causing the problem and not Apache. I did some
reserach, and it seems that the bottleneck is in fact the hard
drives! Here's an excerpt from vmstat:

procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa st
1  1    140  24780 166636 575144    0    0     0  3900 1462  3299  1
4 49 48  0
0  1    140  24780 166636 575144    0    0     0  3828 1455  3391  0
4 48 48  0
1  1    140  24780 166636 575144    0    0     0  2440  960  2033  0
3 48 48  0
0  1    140  24780 166636 575144    0    0     0  2552 1001  2131  0
2 50 49  0
0  1    140  24780 166636 575144    0    0     0  3188 1233  2755  0
3 49 48  0
0  1    140  24780 166636 575144    0    0     0  2048  868  1812  0
2 49 49  0
0  1    140  24780 166636 575144    0    0     0  2720 1094  2386  0
3 49 49  0

As you can see, almost 50% of the CPU is waiting on I/O. This doesn't
seem like it should be happening, however, since we are using a RAID
1 setup (160+160). We have 1GB ram, and have upped shared_buffers to
13000 and work_mem to 8096. What would cause the computer to only use
such a small percentage of the CPU, with more than half of it waiting
on I/O requests?

Thanks a lot
Jason


От:
Dennis Bjorklund
Дата:

Jason Lustig skrev:
> and work_mem to 8096. What would cause the computer to only use such a
> small percentage of the CPU, with more than half of it waiting on I/O
> requests?

Do your webpages write things to the database on each connect?

Maybe it do a bunch of writes each individually commited? For every
commit pg will wait for the data to be written down to the disk platter
before it move on. So if you do several writes you want to do them in
one transaction so you only need one commit.

/Dennis

От:
Jeff Frost
Дата:

On Wed, 11 Apr 2007, Jason Lustig wrote:

> Hello all,
>
> My website has been having issues with our new Linux/PostgreSQL server being
> somewhat slow. I have done tests using Apache Benchmark and for pages that do
> not connect to Postgres, the speeds are much faster (334 requests/second v.
> 1-2 requests/second), so it seems that Postgres is what's causing the problem
> and not Apache. I did some reserach, and it seems that the bottleneck is in
> fact the hard drives! Here's an excerpt from vmstat:
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> st
> 1  1    140  24780 166636 575144    0    0     0  3900 1462  3299  1  4 49 48
> 0
> 0  1    140  24780 166636 575144    0    0     0  3828 1455  3391  0  4 48 48
> 0
> 1  1    140  24780 166636 575144    0    0     0  2440  960  2033  0  3 48 48
> 0
> 0  1    140  24780 166636 575144    0    0     0  2552 1001  2131  0  2 50 49
> 0
> 0  1    140  24780 166636 575144    0    0     0  3188 1233  2755  0  3 49 48
> 0
> 0  1    140  24780 166636 575144    0    0     0  2048  868  1812  0  2 49 49
> 0
> 0  1    140  24780 166636 575144    0    0     0  2720 1094  2386  0  3 49 49
> 0
>
> As you can see, almost 50% of the CPU is waiting on I/O. This doesn't seem
> like it should be happening, however, since we are using a RAID 1 setup
> (160+160). We have 1GB ram, and have upped shared_buffers to 13000 and
> work_mem to 8096. What would cause the computer to only use such a small
> percentage of the CPU, with more than half of it waiting on I/O requests?

Well, the simple answer is a slow disk subsystem.  Is it hardware or software
RAID1?  If hardware, what's the RAID controller?  Based on your vmstat output,
I'd guess that this query activity is all writes since I see only blocks out.
Can you identify what the slow queries are?  What version of postgres?  How
large is the database?  Can you post the non-default values in your
postgresql.conf?

I'd suggest you test your disk subsystem to see if it's as performant as you
think with bonnie++.  Here's some output from my RAID1 test server:

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
pgtest           4G 47090  92 52348  11 30954   6 41838  65 73396   8 255.9  1
                     ------Sequential Create------ --------Random Create--------
                     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  16   894   2 +++++ +++   854   1   817   2 +++++ +++   969 2

So, that's 52MB/sec block writes and 73MB/sec block reads.  That's typical of
a RAID1 on 2 semi-fast SATA drives.

If you're doing writes to the DB on every web page, you might consider playing
with the commit_delay and commit_siblings parameters in the postgresql.conf.
Also, if you're doing multiple inserts as separate transactions, you should
consider batching them up in one transaction.

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

От:
Ron
Дата:

1= RAID 1improves data =intregrity=, not IO performance.
Your HD IO performance is essentially that of 1 160GB HD of whatever
performance one of those HDs have.
(what kind of HDs are they anyway?  For instance 7200rpm 160GB HDs
are not particularly "high performance")
BEST case is streaming IO involving no seeks => ~50 MBps.
You can't get even that as the back end of a website.

2= 1GB of RAM is -small- for a DB server.

You need to buy RAM and HD.

Boost the RAM to 4GB, change pg config parameters appropriately and
see how much it helps.
Non ECC RAM is currently running ~$60-$75 per GB for 1 or 2 GB sticks
ECC RAM prices will be ~ 1.5x - 2x that, $120 - $150 per GB for 1 or
2 GB sticks.
(do !not! buy 4GB sticks unless you have a large budget.  Their price
pr GB is still too high)

If adding RAM helps as much as I suspect it will, find out how big
the "hot" section of your DB is and see if you can buy enough RAM to
make it RAM resident.
If you can do this, it will result in the lowest term DB maintenance.

If you can't do that for whatever reason, the next step is to improve
your HD subsystem.
Cheap RAID cards with enough BB cache to allow writes to be coalesced
into larger streams (reducing seeks) will help, but you fundamentally
you will need more HDs.

RAID 5 is an reasonable option for most website DBs workloads.
To hit the 300MBps speeds attainable by the cheap RAID cards, you are
going to at least 7 HDs (6 HDs * 50MBps ASTR = 300MBps ASTR + the
equivalent of 1 HD gets used for the "R" in RAID).  A minimum of 8
HDs are need for this performance if you want to use RAID 6.
Most tower case (not mini-tower, tower) cases can hold this internally.
Price per MBps of HD is all over the map.  The simplest (but not
necessarily best) option is to buy more of the 160GB HDs you already have.
Optimizing the money spent when buying HDs for a RAID set is a bit
more complicated than doing so for RAM.  Lot's of context dependent
things affect the final decision.

I see you are mailing from Brandeis.  I'm local.  Drop me some
private email at the address I'm posting from if you want and I'll
send you further contact info so we can talk in more detail.

Cheers,
Ron Peacetree


At 06:02 PM 4/11/2007, Jason Lustig wrote:
>Hello all,
>
>My website has been having issues with our new Linux/PostgreSQL
>server being somewhat slow. I have done tests using Apache Benchmark
>and for pages that do not connect to Postgres, the speeds are much
>faster (334 requests/second v. 1-2 requests/second), so it seems that
>Postgres is what's causing the problem and not Apache. I did some
>reserach, and it seems that the bottleneck is in fact the hard
>drives! Here's an excerpt from vmstat:
>
>procs -----------memory---------- ---swap-- -----io---- --system--
>-----cpu------
>r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
>sy id wa st
>1  1    140  24780 166636 575144    0    0     0  3900 1462  3299  1
>4 49 48  0
>0  1    140  24780 166636 575144    0    0     0  3828 1455  3391  0
>4 48 48  0
>1  1    140  24780 166636 575144    0    0     0  2440  960  2033  0
>3 48 48  0
>0  1    140  24780 166636 575144    0    0     0  2552 1001  2131  0
>2 50 49  0
>0  1    140  24780 166636 575144    0    0     0  3188 1233  2755  0
>3 49 48  0
>0  1    140  24780 166636 575144    0    0     0  2048  868  1812  0
>2 49 49  0
>0  1    140  24780 166636 575144    0    0     0  2720 1094  2386  0
>3 49 49  0
>
>As you can see, almost 50% of the CPU is waiting on I/O. This doesn't
>seem like it should be happening, however, since we are using a RAID
>1 setup (160+160). We have 1GB ram, and have upped shared_buffers to
>13000 and work_mem to 8096. What would cause the computer to only use
>such a small percentage of the CPU, with more than half of it waiting
>on I/O requests?
>
>Thanks a lot
>Jason
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to  so that your
>       message can get through to the mailing list cleanly


От:
Guido Neitzer
Дата:

On 12.04.2007, at 07:26, Ron wrote:

> You need to buy RAM and HD.

Before he does that, wouldn't it be more useful, to find out WHY he
has so much IO?

Have I missed that or has nobody suggested finding the slow queries
(when you have much IO on them, they might be slow at least with a
high shared memory setting).

So, my first idea is, to turn on query logging for queries longer
than a xy milliseconds, "explain analyse" these queries and see
wether there are a lot of seq scans involved, which would explain the
high IO.

Just an idea, perhaps I missed that step in that discussion
somewhere ...

But yes, it might also be, that the server is swapping, that's
another thing to find out.

cug

От:
Ron
Дата:

At 10:08 AM 4/12/2007, Guido Neitzer wrote:
>On 12.04.2007, at 07:26, Ron wrote:
>
>>You need to buy RAM and HD.
>
>Before he does that, wouldn't it be more useful, to find out WHY he
>has so much IO?

1= Unless I missed something, the OP described pg being used as a
backend DB for a webserver.

I know the typical IO demands of that scenario better than I sometimes want to.
:-(


2= 1GB of RAM + effectively 1 160GB HD = p*ss poor DB IO support.
~ 1/2 that RAM is going to be used for OS stuff, leaving only ~512MB
of RAM to be used supporting pg.
That RAID 1 set is effectively 1 HD head that all IO requests are
going to contend for.
Even if the HD in question is a 15Krpm screamer, that level of HW
contention has very adverse implications.


Completely agree that at some point the queries need to be examined
(ditto the table schema, etc), but this system is starting off in a
Bad Place for its stated purpose IME.
Some minimum stuff is obvious even w/o spending time looking at
anything beyond the HW config.

Cheers,
Ron Peacetree


>Have I missed that or has nobody suggested finding the slow queries
>(when you have much IO on them, they might be slow at least with a
>high shared memory setting).
>
>So, my first idea is, to turn on query logging for queries longer
>than a xy milliseconds, "explain analyse" these queries and see
>wether there are a lot of seq scans involved, which would explain the
>high IO.
>
>Just an idea, perhaps I missed that step in that discussion
>somewhere ...
>
>But yes, it might also be, that the server is swapping, that's
>another thing to find out.
>
>cug
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq


От:
Guido Neitzer
Дата:

On 12.04.2007, at 08:59, Ron wrote:

> 1= Unless I missed something, the OP described pg being used as a
> backend DB for a webserver.

Yep.

> I know the typical IO demands of that scenario better than I
> sometimes want to.
> :-(

Yep. Same here. ;-)

> 2= 1GB of RAM + effectively 1 160GB HD = p*ss poor DB IO support.

Absolutely right. Depending a little bit on the DB and WebSite layout
and on the actual requirements, but yes - it's not really a kick-ass
machine ...

> Completely agree that at some point the queries need to be examined
> (ditto the table schema, etc), but this system is starting off in a
> Bad Place for its stated purpose IME.
> Some minimum stuff is obvious even w/o spending time looking at
> anything beyond the HW config.

Depends. As I said - if the whole DB fits into the remaining space,
and a lot of website backend DBs do, it might just work out. But this
seems not to be the case - either the site is chewing on seq scans
all the time which will cause I/O or it is bound by the lack of
memory and swaps the whole time ... He has to find out.

cug

От:
Jeff Frost
Дата:

On Thu, 12 Apr 2007, Jason Lustig wrote:

> 0 <-- BM starts here
> 10  0    180 700436  16420  91740    0    0     0   176  278  2923 59 41  0
> 0  0
> 11  0    180 696736  16420  91740    0    0     0     0  254  2904 57 43  0
> 0  0
> 12  0    180 691272  16420  91740    0    0     0     0  255  3043 60 39  1
> 0  0
> 9  0    180 690396  16420  91740    0    0     0     0  254  3078 63 36  2  0
> 0
>
> Obviously, I've turned off logging now but I'd like to get it running again
> (without bogging down the server) so that I can profile the system and find
> out which queries I need to optimize. My logging settings (with unnecessary
> comments taken out) were:

So what did you get in the logs when you had logging turned on?  If you have
the statement logging, perhaps it's worth running through pgfouine to generate
a report.

>
> log_destination = 'syslog'            # Valid values are combinations of
> redirect_stderr = off                   # Enable capturing of stderr into log
> log_min_duration_statement =  0              # -1 is disabled, 0 logs all
> statements
> silent_mode = on                        # DO NOT USE without syslog or
> log_duration = off
> log_line_prefix = 'user=%u,db=%d'                       # Special values:
> log_statement = 'none'                  # none, ddl, mod, all
>

Perhaps you just want to log slow queries > 100ms?  But since you don't seem
to know what queries you're running on each web page, I'd suggest you just
turn on the following and run your benchmark against it, then turn it back
off:

log_duration = on
log_statement = 'all'

Then go grab pgfouine and run the report against the logs to see what queries
are chewing up all your time.

> So you know, we're using Postgres 8.2.3. The database currently is pretty
> small (we're just running a testing database right now with a few megabytes
> of data). No doubt some of our queries are slow, but I was concerned because
> no matter how slow the queries were (at most the worst were taking a couple
> of msecs anyway), I was getting ridiculously slow responses from the server.
> Outside of logging, our only other non-default postgresql.conf items are:
>
> shared_buffers = 13000                  # min 128kB or max_connections*16kB
> work_mem = 8096                         # min 64kB
>
> In terms of the server itself, I think that it uses software raid. How can I
> tell? Our hosting company set it up with the server so I guess I could ask
> them, but is there a program I can run which will tell me the information? I
> also ran bonnie++ and got this output:
>
> Version  1.03       ------Sequential Output------ --Sequential Input-
> --Random-
>                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
> %CP
> pgtest 2000M 29277  67 33819  15 15446   4 35144  62 48887   5 152.7   0
>                   ------Sequential Create------ --------Random
> Create--------
>                   -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>             files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
>                16 17886  77 +++++ +++ +++++ +++ 23258  99 +++++ +++ +++++
> +++
>
> So I'm getting 33MB and 48MB write/read respectively. Is this slow? Is there
> anything I should be doing to optimize our RAID configuration?
>

It's not fast, but at least it's about the same speed as an average IDE drive
from this era.  More disks would help, but since you indicate the DB fits in
RAM with plenty of room to spare, how about you update your
effective_cache_size to something reasonable.  You can use the output of the
'free' command and take the cache number and divide by 8 to get a reasonable
value on linux.  Then turn on logging and run your benchmark.  After that, run
a pgfouine report against the log and post us the explain analyze from your
slow queries.

And if Ron is indeed local, it might be worthwhile to contact him.  Someone
onsite would likely get this taken care of much faster than we can on the
mailing list.

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

От:
Scott Marlowe
Дата:

On Thu, 2007-04-12 at 10:19, Guido Neitzer wrote:
> On 12.04.2007, at 08:59, Ron wrote:

>
> Depends. As I said - if the whole DB fits into the remaining space,
> and a lot of website backend DBs do, it might just work out. But this
> seems not to be the case - either the site is chewing on seq scans
> all the time which will cause I/O or it is bound by the lack of
> memory and swaps the whole time ... He has to find out.

It could also be something as simple as a very bloated data store.

I'd ask the user what vacuum verbose says at the end

От:
Jeff Frost
Дата:

On Thu, 12 Apr 2007, Scott Marlowe wrote:

> On Thu, 2007-04-12 at 10:19, Guido Neitzer wrote:
>> On 12.04.2007, at 08:59, Ron wrote:
>
>>
>> Depends. As I said - if the whole DB fits into the remaining space,
>> and a lot of website backend DBs do, it might just work out. But this
>> seems not to be the case - either the site is chewing on seq scans
>> all the time which will cause I/O or it is bound by the lack of
>> memory and swaps the whole time ... He has to find out.
>
> It could also be something as simple as a very bloated data store.
>
> I'd ask the user what vacuum verbose says at the end

You know, I should answer emails at night...we didn't ask when the last time
the data was vacuumed or analyzed and I believe he indicated that the only
non-default values were memory related, so no autovacuum running.

Jason,

Before you go any further, run 'vacuum analyze;' on your DB if you're not
doing this with regularity and strongly consider enabling autovacuum.

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

От:
Carlos Moreno
Дата:

Jeff Frost wrote:
>
> You know, I should answer emails at night...

Indeed you shouldN'T   ;-)

Carlos
--


От:
Steve
Дата:

Hey there;

On a Postgres 8.2.3 server, I've got a query that is running very slow in
some cases.  With some work, I've determined the 'slow part' of the query.
:)  This is a query on a table with like 10 million rows or something like
that.  encounter_id is an integer and receipt is of type 'date'.

This query runs really slow [minutes] (query and explain below):

select extract(epoch from ds.receipt) from detail_summary ds where
ds.receipt >= '1998-12-30 0:0:0' and
ds.encounter_id in
(8813186,8813187,8813188,8813189,8813190,8813191,8813192,
8813193,8813194,8813195,8813196,8813197,8813198,8813199,
8813200,8813201,8813202,8813203,8813204,8813205,8813206,
8813207,8813208,8813209,8813210,8813211,8813212,8813213,
8813214,8813215,8813216,8813217,8813218,8813219,8813220,
8813221,8813222,8813223,8813224,8813225,8813226,8813227,
8813228,8813229,8813230,8813231,8813232,8813233,8813234,
8813235,8813236,8813237,8813238,8813239,8813240,8813241,
8813242,8813243,8813244,8813245,8813246,8813247,8813248,
8813249,8813250,8813251,8813252,8813253,8813254,8813255,
8813256,8813257,8813258,8813259,8813260,8813261,8813262,
8813263,8813264,8813265,8813266,8813267,8813268,8813269,
8813270,8813271,8813272,8813273,8813274,8813275,8813276,
8813277,8813278,8813279,8813280,8813281,8813282,8813283,
8813284,8815534)

Results in the 'explain' :

  Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
    Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
(2 rows)


Turning enable_seqscan to off results in a slightly more interesting
explain, but an equally slow query.


HOWEVER!  The simple removal of the receipt date paramater results in a
fast query, as such:

select extract(epoch from ds.receipt) from detail_summary ds where
ds.encounter_id in
(8813186,8813187,8813188,8813189,8813190,8813191,8813192,
8813193,8813194,8813195,8813196,8813197,8813198,8813199,
8813200,8813201,8813202,8813203,8813204,8813205,8813206,
8813207,8813208,8813209,8813210,8813211,8813212,8813213,
8813214,8813215,8813216,8813217,8813218,8813219,8813220,
8813221,8813222,8813223,8813224,8813225,8813226,8813227,
8813228,8813229,8813230,8813231,8813232,8813233,8813234,
8813235,8813236,8813237,8813238,8813239,8813240,8813241,
8813242,8813243,8813244,8813245,8813246,8813247,8813248,
8813249,8813250,8813251,8813252,8813253,8813254,8813255,
8813256,8813257,8813258,8813259,8813260,8813261,8813262,
8813263,8813264,8813265,8813266,8813267,8813268,8813269,
8813270,8813271,8813272,8813273,8813274,8813275,8813276,
8813277,8813278,8813279,8813280,8813281,8813282,8813283,
8813284 ,8815534)

This query returns instantly and explains as:

  Bitmap Heap Scan on detail_summary ds  (cost=161.00..14819.81 rows=9963
width=4)
    Recheck Cond: (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
    ->  Bitmap Index Scan on detail_summary_encounter_id_idx
(cost=0.00..160.75 rows=9963 width=0)
          Index Cond: (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))


Any thoughts?  Both encounter_id and receipt date are indexed columns.
I've vacuumed and analyzed the table.  I tried making a combined index of
encounter_id and receipt and it hasn't worked out any better.


Thanks!


Steve

От:
Tom Lane
Дата:

Steve <> writes:
> On a Postgres 8.2.3 server, I've got a query that is running very slow in
> some cases.

Could we see the exact definition of that table and its indexes?
It looks like the planner is missing the bitmap scan for some reason,
but I've not seen a case like that before.

Also, I assume the restriction on receipt date is very nonselective?
It doesn't seem to have changed the estimated rowcount much.

            regards, tom lane

От:
Steve
Дата:

> Could we see the exact definition of that table and its indexes?
> It looks like the planner is missing the bitmap scan for some reason,
> but I've not seen a case like that before.
>
> Also, I assume the restriction on receipt date is very nonselective?
> It doesn't seem to have changed the estimated rowcount much.
>

     This is true -- This particular receipt date is actually quite
meaningless.  It's equivalent to saying 'all receipt dates'.  I don't
think there's even any data that goes back before 2005.

Here's the table and it's indexes.  Before looking, a note; there's
several 'revop' indexes, this is for sorting.  The customer insisted on,
frankly, meaninglessly complicated sorts.  I don't think any of that
matters for our purposes here though :)

         Column         |          Type          |
Modifiers
-----------------------+------------------------+--------------------------------------------------------------------
  detailsummary_id      | integer                | not null default
nextval(('detailsummary_id_seq'::text)::regclass)
  detailgroup_id        | integer                |
  receipt               | date                   |
  batchnum              | integer                |
  encounternum          | integer                |
  procedureseq          | integer                |
  procedurecode         | character varying(5)   |
  wrong_procedurecode   | character varying(5)   |
  batch_id              | integer                |
  encounter_id          | integer                |
  procedure_id          | integer                |
  carrier_id            | integer                |
  product_line          | integer                |
  provider_id           | integer                |
  member_num            | character varying(20)  |
  wrong_member_num      | character varying(20)  |
  member_name           | character varying(40)  |
  patient_control       | character varying(20)  |
  rendering_prov_id     | character varying(15)  |
  rendering_prov_name   | character varying(30)  |
  referring_prov_id     | character varying(15)  |
  referring_prov_name   | character varying(30)  |
  servicedate           | date                   |
  wrong_servicedate     | date                   |
  diagnosis_codes       | character varying(5)[] |
  wrong_diagnosis_codes | character varying(5)[] |
  ffs_charge            | double precision       |
  export_date           | date                   |
  hedis_date            | date                   |
  raps_date             | date                   |
  diagnosis_pointers    | character(1)[]         |
  modifiers             | character(2)[]         |
  units                 | double precision       |
  pos                   | character(2)           |
  isduplicate           | boolean                |
  duplicate_id          | integer                |
  encounter_corrected   | boolean                |
  procedure_corrected   | boolean                |
  numerrors             | integer                |
  encerrors_codes       | integer[]              |
  procerror_code        | integer                |
  error_servicedate     | text                   |
  e_duplicate_id        | integer                |
  ecode_counts          | integer[]              |
  p_record_status       | integer                |
  e_record_status       | integer                |
  e_delete_date         | date                   |
  p_delete_date         | date                   |
  b_record_status       | integer                |
  b_confirmation        | character varying(20)  |
  b_carrier_cobol_id    | character varying(16)  |
  b_provider_cobol_id   | character varying(20)  |
  b_provider_tax_id     | character varying(16)  |
  b_carrier_name        | character varying(50)  |
  b_provider_name       | character varying(50)  |
  b_submitter_file_id   | character varying(40)  |
  e_hist_carrier_id     | integer                |
  p_hist_carrier_id     | integer                |
  e_duplicate_id_orig   | character varying(25)  |
  p_duplicate_id_orig   | character varying(25)  |
  num_procerrors        | integer                |
  num_encerrors         | integer                |
  export_id             | integer                |
  raps_id               | integer                |
  hedis_id              | integer                |
Indexes:
     "detail_summary_b_record_status_idx" btree (b_record_status)
     "detail_summary_batch_id_idx" btree (batch_id)
     "detail_summary_batchnum_idx" btree (batchnum)
     "detail_summary_carrier_id_idx" btree (carrier_id)
     "detail_summary_duplicate_id_idx" btree (duplicate_id)
     "detail_summary_e_record_status_idx" btree (e_record_status)
     "detail_summary_encounter_id_idx" btree (encounter_id)
     "detail_summary_encounternum_idx" btree (encounternum)
     "detail_summary_export_date_idx" btree (export_date)
     "detail_summary_hedis_date_idx" btree (hedis_date)
     "detail_summary_member_name_idx" btree (member_name)
     "detail_summary_member_num_idx" btree (member_num)
     "detail_summary_p_record_status_idx" btree (p_record_status)
     "detail_summary_patient_control_idx" btree (patient_control)
     "detail_summary_procedurecode_idx" btree (procedurecode)
     "detail_summary_product_line_idx" btree (product_line)
     "detail_summary_provider_id_idx" btree (provider_id)
     "detail_summary_raps_date_idx" btree (raps_date)
     "detail_summary_receipt_encounter_idx" btree (receipt, encounter_id)
     "detail_summary_receipt_id_idx" btree (receipt)
     "detail_summary_referrering_prov_id_idx" btree (referring_prov_id)
     "detail_summary_rendering_prov_id_idx" btree (rendering_prov_id)
     "detail_summary_rendering_prov_name_idx" btree (rendering_prov_name)
     "detail_summary_servicedate_idx" btree (servicedate)
     "ds_sort_1" btree (receipt date_revop, carrier_id, batchnum,
encounternum, procedurecode, encounter_id)
     "ds_sort_10" btree (receipt date_revop, carrier_id, batchnum,
encounternum, procedurecode, encounter_id, procedure_id)
     "ed_cbee_norev" btree (export_date, carrier_id, batchnum,
encounternum, encounter_id)
     "ed_cbee_norev_p" btree (export_date, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)
     "ed_cbee_rev" btree (export_date date_revop, carrier_id, batchnum,
encounternum, encounter_id)
     "ed_cbee_rev_p" btree (export_date date_revop, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)
     "mcbe" btree (member_name, carrier_id, batchnum, encounternum,
encounter_id)
     "mcbe_p" btree (member_name, carrier_id, batchnum, encounternum,
encounter_id, procedure_id)
     "mcbe_rev" btree (member_name text_revop, carrier_id, batchnum,
encounternum, encounter_id)
     "mcbe_rev_p" btree (member_name text_revop, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)
     "mcbee_norev" btree (member_num, carrier_id, batchnum, encounternum,
encounter_id)
     "mcbee_norev_p" btree (member_num, carrier_id, batchnum, encounternum,
encounter_id, procedure_id)
     "mcbee_rev" btree (member_num text_revop, carrier_id, batchnum,
encounternum, encounter_id)
     "mcbee_rev_p" btree (member_num text_revop, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)
     "pcbee_norev" btree (patient_control, carrier_id, batchnum,
encounternum, encounter_id)
     "pcbee_norev_p" btree (patient_control, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)
     "pcbee_rev" btree (patient_control text_revop, carrier_id, batchnum,
encounternum, encounter_id)
     "pcbee_rev_p" btree (patient_control text_revop, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)
     "rcbee_norev" btree (receipt, carrier_id, batchnum, encounternum,
encounter_id)
     "rcbee_norev_p" btree (receipt, carrier_id, batchnum, encounternum,
encounter_id, procedure_id)
     "rp_cbee_norev" btree (rendering_prov_name, carrier_id, batchnum,
encounternum, encounter_id)
     "rp_cbee_norev_p" btree (rendering_prov_name, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)
     "rp_cbee_rev" btree (rendering_prov_name text_revop, carrier_id,
batchnum, encounternum, encounter_id)
     "rp_cbee_rev_p" btree (rendering_prov_name text_revop, carrier_id,
batchnum, encounternum, encounter_id, procedure_id)
     "sd_cbee_norev" btree (servicedate, carrier_id, batchnum,
encounternum, encounter_id)
     "sd_cbee_norev_p" btree (servicedate, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)
     "sd_cbee_rev" btree (servicedate date_revop, carrier_id, batchnum,
encounternum, encounter_id)
     "sd_cbee_rev_p" btree (servicedate date_revop, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)
     "testrev" btree (receipt date_revop, carrier_id, batchnum,
encounternum, encounter_id)
     "testrev_p" btree (receipt date_revop, carrier_id, batchnum,
encounternum, encounter_id, procedure_id)


От:
Scott Marlowe
Дата:

On Thu, 2007-04-12 at 16:03, Steve wrote:
> Hey there;
>
> On a Postgres 8.2.3 server, I've got a query that is running very slow in
> some cases.  With some work, I've determined the 'slow part' of the query.
> :)  This is a query on a table with like 10 million rows or something like
> that.  encounter_id is an integer and receipt is of type 'date'.

SNIP

>   Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
>     Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY
>
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
> (2 rows)

How accurate is the row estimate made by the planner?  (explain analyze
to be sure)

От:
Jason Lustig
Дата:

Hi all,

Wow! That's a lot to respond to. Let me go through some of the
ideas... First, I just turned on autovacuum, I forgot to do that. I'm
not seeing a major impact though. Also, I know that it's not optimal
for a dedicated server. It's not just for postgres, it's also got our
apache server on it. We're just getting started and didn't want to
make the major investment right now in getting the most expensive
server we can get. Within the next year, as our traffic grows, we
will most likely upgrade, but for now when we're in the beginning
phases of our project, we're going to work with this server.

In terms of RAID not helping speed-wise (only making an impact in
data integrity) - I was under the impression that even a mirrored
disk set improves speed, because read requests can be sent to either
of the disk controllers. Is this incorrect?

I turned on logging again, only logging queries > 5ms. and it caused
the same problems. I think it might be an issue within the OS's
logging facilities, since it's going through stderr.

Some of the queries are definitely making an impact on the speed. We
are constantly trying to improve performance, and part of that is
reassessing our indexes and denormalizing data where it would help.
We're also doing work with memcached to cache the results of some of
the more expensive operations.

Thanks for all your help guys - it's really fantastic to see the
community here! I've got a lot of database experience (mostly with ms
sql and mysql) but this is my first time doing serious work with
postgres and it's really a great system with great people too.

Jason

On Apr 12, 2007, at 11:35 AM, Jeff Frost wrote:

> On Thu, 12 Apr 2007, Jason Lustig wrote:
>
>> 0 <-- BM starts here
>> 10  0    180 700436  16420  91740    0    0     0   176  278  2923
>> 59 41  0 0  0
>> 11  0    180 696736  16420  91740    0    0     0     0  254  2904
>> 57 43  0 0  0
>> 12  0    180 691272  16420  91740    0    0     0     0  255  3043
>> 60 39  1 0  0
>> 9  0    180 690396  16420  91740    0    0     0     0  254  3078
>> 63 36  2  0 0
>>
>> Obviously, I've turned off logging now but I'd like to get it
>> running again (without bogging down the server) so that I can
>> profile the system and find out which queries I need to optimize.
>> My logging settings (with unnecessary comments taken out) were:
>
> So what did you get in the logs when you had logging turned on?  If
> you have the statement logging, perhaps it's worth running through
> pgfouine to generate a report.
>
>>
>> log_destination = 'syslog'            # Valid values are
>> combinations of
>> redirect_stderr = off                   # Enable capturing of
>> stderr into log
>> log_min_duration_statement =  0              # -1 is disabled, 0
>> logs all statements
>> silent_mode = on                        # DO NOT USE without
>> syslog or
>> log_duration = off
>> log_line_prefix = 'user=%u,db=%d'                       # Special
>> values:
>> log_statement = 'none'                  # none, ddl, mod, all
>>
>
> Perhaps you just want to log slow queries > 100ms?  But since you
> don't seem to know what queries you're running on each web page,
> I'd suggest you just turn on the following and run your benchmark
> against it, then turn it back off:
>
> log_duration = on
> log_statement = 'all'
>
> Then go grab pgfouine and run the report against the logs to see
> what queries are chewing up all your time.
>
>> So you know, we're using Postgres 8.2.3. The database currently is
>> pretty small (we're just running a testing database right now with
>> a few megabytes of data). No doubt some of our queries are slow,
>> but I was concerned because no matter how slow the queries were
>> (at most the worst were taking a couple of msecs anyway), I was
>> getting ridiculously slow responses from the server. Outside of
>> logging, our only other non-default postgresql.conf items are:
>>
>> shared_buffers = 13000                  # min 128kB or
>> max_connections*16kB
>> work_mem = 8096                         # min 64kB
>>
>> In terms of the server itself, I think that it uses software raid.
>> How can I tell? Our hosting company set it up with the server so I
>> guess I could ask them, but is there a program I can run which
>> will tell me the information? I also ran bonnie++ and got this
>> output:
>>
>> Version  1.03       ------Sequential Output------ --Sequential
>> Input- --Random-
>>                   -Per Chr- --Block-- -Rewrite- -Per Chr- --
>> Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %
>> CP  /sec %CP
>> pgtest 2000M 29277  67 33819  15 15446   4 35144  62 48887   5
>> 152.7   0
>>                   ------Sequential Create------ --------Random
>> Create--------
>>                   -Create-- --Read--- -Delete-- -Create-- --
>> Read--- -Delete--
>>             files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %
>> CP  /sec %CP
>>                16 17886  77 +++++ +++ +++++ +++ 23258  99 +++++ ++
>> + +++++ +++
>>
>> So I'm getting 33MB and 48MB write/read respectively. Is this
>> slow? Is there anything I should be doing to optimize our RAID
>> configuration?
>>
>
> It's not fast, but at least it's about the same speed as an average
> IDE drive from this era.  More disks would help, but since you
> indicate the DB fits in RAM with plenty of room to spare, how about
> you update your effective_cache_size to something reasonable.  You
> can use the output of the 'free' command and take the cache number
> and divide by 8 to get a reasonable value on linux.  Then turn on
> logging and run your benchmark.  After that, run a pgfouine report
> against the log and post us the explain analyze from your slow
> queries.
>
> And if Ron is indeed local, it might be worthwhile to contact him.
> Someone onsite would likely get this taken care of much faster than
> we can on the mailing list.
>
> --
> Jeff Frost, Owner     <>
> Frost Consulting, LLC     http://www.frostconsultingllc.com/
> Phone: 650-780-7908    FAX: 650-649-1954


От:
Tom Lane
Дата:

Steve <> writes:
> Here's the table and it's indexes.  Before looking, a note; there's
> several 'revop' indexes, this is for sorting.  The customer insisted on,
> frankly, meaninglessly complicated sorts.  I don't think any of that
> matters for our purposes here though :)

Oy vey ... I hope this is a read-mostly table, because having that many
indexes has got to be killing your insert/update performance.

I see that some of the revop indexes might be considered relevant to
this query, so how exactly have you got those opclasses defined?
There's built-in support for reverse sort as of CVS HEAD, but in
existing releases you must have cobbled something together, and I wonder
if that could be a contributing factor ...

            regards, tom lane

От:
Steve
Дата:

>
> Oy vey ... I hope this is a read-mostly table, because having that many
> indexes has got to be killing your insert/update performance.

     Hahaha yeah these are read-only tables.  Nightly inserts/updates.
Takes a few hours, depending on how many records (between 4 and 10
usually).  But during the day, while querying, read only.

> I see that some of the revop indexes might be considered relevant to
> this query, so how exactly have you got those opclasses defined?
> There's built-in support for reverse sort as of CVS HEAD, but in
> existing releases you must have cobbled something together, and I wonder
> if that could be a contributing factor ...

Here's the revops (the c functions are at the bottom):

CREATE FUNCTION ddd_date_revcmp(date, date) RETURNS integer
     AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_date_revcmp'
     LANGUAGE c STRICT;

CREATE FUNCTION ddd_int_revcmp(integer, integer) RETURNS integer
     AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_int_revcmp'
     LANGUAGE c STRICT;

CREATE FUNCTION ddd_text_revcmp(text, text) RETURNS integer
     AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_text_revcmp'
     LANGUAGE c STRICT;

CREATE OPERATOR CLASS date_revop
     FOR TYPE date USING btree AS
     OPERATOR 1 >(date,date) ,
     OPERATOR 2 >=(date,date) ,
     OPERATOR 3 =(date,date) ,
     OPERATOR 4 <=(date,date) ,
     OPERATOR 5 <(date,date) ,
     FUNCTION 1 ddd_date_revcmp(date,date);

CREATE OPERATOR CLASS int4_revop
     FOR TYPE integer USING btree AS
     OPERATOR 1 >(integer,integer) ,
     OPERATOR 2 >=(integer,integer) ,
     OPERATOR 3 =(integer,integer) ,
     OPERATOR 4 <=(integer,integer) ,
     OPERATOR 5 <(integer,integer) ,
     FUNCTION 1 ddd_int_revcmp(integer,integer);

CREATE OPERATOR CLASS text_revop
     FOR TYPE text USING btree AS
     OPERATOR 1 >(text,text) ,
     OPERATOR 2 >=(text,text) ,
     OPERATOR 3 =(text,text) ,
     OPERATOR 4 <=(text,text) ,
     OPERATOR 5 <(text,text) ,
     FUNCTION 1 ddd_text_revcmp(text,text);

Datum   ddd_date_revcmp(PG_FUNCTION_ARGS){
         DateADT         arg1=PG_GETARG_DATEADT(0);
         DateADT         arg2=PG_GETARG_DATEADT(1);

         PG_RETURN_INT32(arg2 - arg1);
}


Datum   ddd_int_revcmp(PG_FUNCTION_ARGS){
         int32           arg1=PG_GETARG_INT32(0);
         int32           arg2=PG_GETARG_INT32(1);

         PG_RETURN_INT32(arg2 - arg1);
}

Datum   ddd_text_revcmp(PG_FUNCTION_ARGS){
         char*           arg1=(char*)VARDATA(PG_GETARG_TEXT_P(0));
         char*           arg2=(char*)VARDATA(PG_GETARG_TEXT_P(1));

         if((*arg1) != (*arg2)){
                 PG_RETURN_INT32(*arg2 - *arg1);
         }else{
                 PG_RETURN_INT32(strcmp(arg2,arg1));
         }
}



От:
Steve
Дата:

>>   Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
>>     Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY
>>
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
>> (2 rows)
>
> How accurate is the row estimate made by the planner?  (explain analyze
> to be sure)
>

Results:

  Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
(actual time=62871.386..257258.249 rows=112 loops=1)
    Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
  Total runtime: 257258.652 ms



От:
Scott Marlowe
Дата:

On Thu, 2007-04-12 at 17:04, Steve wrote:
> >>   Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
> >>     Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY
> >>
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
> >> (2 rows)
> >
> > How accurate is the row estimate made by the planner?  (explain analyze
> > to be sure)
> >
>
> Results:
>
>   Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
> (actual time=62871.386..257258.249 rows=112 loops=1)
>     Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY
>
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
>   Total runtime: 257258.652 ms

So there's a misjudgment of the number of rows returned by a factor of
about 88.  That's pretty big.  Since you had the same number without the
receipt date (I think...) then it's the encounter_id that's not being
counted right.

Try upping the stats target on that column and running analyze again and
see if you get closer to 112 in your analyze or not.

От:
Guido Neitzer
Дата:

On 12.04.2007, at 15:58, Jason Lustig wrote:

> Wow! That's a lot to respond to. Let me go through some of the
> ideas... First, I just turned on autovacuum, I forgot to do that.
> I'm not seeing a major impact though. Also, I know that it's not
> optimal for a dedicated server.

Hmm, why not? Have you recently vacuumed your db manually so it gets
cleaned up? Even a vacuum full might be useful if the db is really
bloated.

> It's not just for postgres, it's also got our apache server on it.
> We're just getting started and didn't want to make the major
> investment right now in getting the most expensive server we can get

Hmmm, but more RAM would definitely make sense, especially in that
szenaria. It really sounds like you machine is swapping to dead.

What does the system say about memory usage?

> Some of the queries are definitely making an impact on the speed.
> We are constantly trying to improve performance, and part of that
> is reassessing our indexes and denormalizing data where it would
> help. We're also doing work with memcached to cache the results of
> some of the more expensive operations.

Hmmm, that kills you even more, as it uses RAM. I really don't think
at the moment that it has something to do with PG itself, but with
not enough memory for what you want to achieve.

What perhaps helps might be connection pooling, so that not so many
processes are created for the requests. It depends on your "middle-
ware" what you can do about that. pg_pool might be an option.

cug



От:
Tom Lane
Дата:

Steve <> writes:
> Datum   ddd_text_revcmp(PG_FUNCTION_ARGS){
>          char*           arg1=(char*)VARDATA(PG_GETARG_TEXT_P(0));
>          char*           arg2=(char*)VARDATA(PG_GETARG_TEXT_P(1));

>          if((*arg1) != (*arg2)){
>                  PG_RETURN_INT32(*arg2 - *arg1);
>          }else{
>                  PG_RETURN_INT32(strcmp(arg2,arg1));
>          }
> }

[ itch... ]  That code is just completely wrong, because the contents
of a TEXT datum aren't guaranteed null-terminated.  It'd be better to
invoke bttextcmp and negate its result.

That's not relevant to your immediate problem, but if you've noticed
any strange behavior with your text_revop indexes, that's the reason...

            regards, tom lane

От:
Steve
Дата:

> [ itch... ]  That code is just completely wrong, because the contents
> of a TEXT datum aren't guaranteed null-terminated.  It'd be better to
> invoke bttextcmp and negate its result.
>
> That's not relevant to your immediate problem, but if you've noticed
> any strange behavior with your text_revop indexes, that's the reason...

     The indexes have all worked, though I'll make the change anyway.
Documentation on how to code these things is pretty sketchy and I believe
I followed an example on the site if I remember right. :/

Thanks for the info though :)


Steve

От:
Steve
Дата:

>
> So there's a misjudgment of the number of rows returned by a factor of
> about 88.  That's pretty big.  Since you had the same number without the
> receipt date (I think...) then it's the encounter_id that's not being
> counted right.
>
> Try upping the stats target on that column and running analyze again and
> see if you get closer to 112 in your analyze or not.
>

     If I max the statistics targets at 1000, I get:

  Seq Scan on detail_summary ds  (cost=0.00..1903030.26 rows=1099 width=4)
    Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))


     Setting it ot 500 makes the estimated rows twice as much.  It
seems to have no effect on anything though, either way. :)


Steve

От:
Tom Lane
Дата:

Scott Marlowe <> writes:
> So there's a misjudgment of the number of rows returned by a factor of
> about 88.  That's pretty big.  Since you had the same number without the
> receipt date (I think...) then it's the encounter_id that's not being
> counted right.

I don't think that's Steve's problem, though.  It's certainly
misestimating, but nonetheless the cost estimate for the seqscan is
1902749.83 versus 14819.81 for the bitmap scan; it should've picked
the bitmap scan anyway.

I tried to duplicate the problem here, without any success; I get

Bitmap Heap Scan on detail_summary ds  (cost=422.01..801.27 rows=100 width=4)
   Recheck Cond: (encounter_id = ANY
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
   Filter: (receipt >= '1998-12-30'::date)
   ->  Bitmap Index Scan on detail_summary_encounter_id_idx  (cost=0.00..421.98 rows=100 width=0)
         Index Cond: (encounter_id = ANY
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))

so either this has been fixed by a post-8.2.3 bug fix (which I doubt,
it doesn't seem familiar at all) or there's some additional contributing
factor.  Steve, are you using any nondefault planner parameters?

            regards, tom lane

От:
Steve
Дата:

Here's my planner parameters:

seq_page_cost = 1.0                     # measured on an arbitrary scale
random_page_cost = 1.5                  # same scale as above
cpu_tuple_cost = 0.001                  # same scale as above
cpu_index_tuple_cost = 0.0005           # same scale as above
cpu_operator_cost = 0.00025             # same scale as above
effective_cache_size = 8192MB

default_statistics_target = 100         # range 1-1000


On a machine with 16 gigs of RAM.  I tried to make it skew towards
indexes.  However, even if I force it to use the indexes
(enable_seqscan=off) it doesn't make it any faster really :/

Steve

On Thu, 12 Apr 2007, Tom Lane wrote:

> Scott Marlowe <> writes:
>> So there's a misjudgment of the number of rows returned by a factor of
>> about 88.  That's pretty big.  Since you had the same number without the
>> receipt date (I think...) then it's the encounter_id that's not being
>> counted right.
>
> I don't think that's Steve's problem, though.  It's certainly
> misestimating, but nonetheless the cost estimate for the seqscan is
> 1902749.83 versus 14819.81 for the bitmap scan; it should've picked
> the bitmap scan anyway.
>
> I tried to duplicate the problem here, without any success; I get
>
> Bitmap Heap Scan on detail_summary ds  (cost=422.01..801.27 rows=100 width=4)
>   Recheck Cond: (encounter_id = ANY
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
>   Filter: (receipt >= '1998-12-30'::date)
>   ->  Bitmap Index Scan on detail_summary_encounter_id_idx  (cost=0.00..421.98 rows=100 width=0)
>         Index Cond: (encounter_id = ANY
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
>
> so either this has been fixed by a post-8.2.3 bug fix (which I doubt,
> it doesn't seem familiar at all) or there's some additional contributing
> factor.  Steve, are you using any nondefault planner parameters?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

От:
Tom Lane
Дата:

Steve <> writes:
> Here's my planner parameters:

I copied all these, and my 8.2.x still likes the bitmap scan a lot
better than the seqscan.  Furthermore, I double-checked the CVS history
and there definitely haven't been any changes in that area in REL8_2
branch since 8.2.3.  So I'm a bit baffled.  Maybe the misbehavior is
platform-specific ... what are you on exactly?  Is there anything
nonstandard about your Postgres installation?

            regards, tom lane

От:
Tom Lane
Дата:

Steve <> writes:
> ... even if I force it to use the indexes
> (enable_seqscan=off) it doesn't make it any faster really :/

Does that change the plan, or do you still get a seqscan?

BTW, how big is this table really (how many rows)?

            regards, tom lane

От:
Steve
Дата:

It's a redhat enterprise machine running AMD x64 processors.

Linux ers3.dddcorp.com 2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:13:42 EST
2007 x86_64 x86_64 x86_64 GNU/Linux

It was compiled by me, straight up, nothing weird at all, no odd compiler
options or wahtever :)

So yeah :/ I'm quite baffled as well,

Talk to you later,

Steve


On Thu, 12 Apr 2007, Tom Lane wrote:

> Steve <> writes:
>> Here's my planner parameters:
>
> I copied all these, and my 8.2.x still likes the bitmap scan a lot
> better than the seqscan.  Furthermore, I double-checked the CVS history
> and there definitely haven't been any changes in that area in REL8_2
> branch since 8.2.3.  So I'm a bit baffled.  Maybe the misbehavior is
> platform-specific ... what are you on exactly?  Is there anything
> nonstandard about your Postgres installation?
>
>             regards, tom lane
>

От:
Steve
Дата:

Table size: 16,037,728 rows

With enable_seqscan=off I get:

  Bitmap Heap Scan on detail_summary ds  (cost=4211395.20..4213045.32
rows=1099 width=4)
    Recheck Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
    ->  Bitmap Index Scan on detail_summary_receipt_encounter_idx
(cost=0.00..4211395.17 rows=1099 width=0)
          Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id =
ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))


The explain analyze is pending, running it now (however it doens't really
appear to be any faster using this plan).


Steve

On Thu, 12 Apr 2007, Tom Lane wrote:

> Steve <> writes:
>> ... even if I force it to use the indexes
>> (enable_seqscan=off) it doesn't make it any faster really :/
>
> Does that change the plan, or do you still get a seqscan?
>
> BTW, how big is this table really (how many rows)?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>

От:
Steve
Дата:

Here's the explain analyze with seqscan = off:

  Bitmap Heap Scan on detail_summary ds  (cost=4211395.20..4213045.32
rows=1099 width=4) (actual time=121288.825..121305.908 rows=112 loops=1)
    Recheck Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
    ->  Bitmap Index Scan on detail_summary_receipt_encounter_idx
(cost=0.00..4211395.17 rows=1099 width=0) (actual
time=121256.681..121256.681 rows=112 loops=1)
          Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id =
ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
  Total runtime: 121306.233 ms


Your other question is answered in the other mail along with the
non-analyze'd query plan :D

Steve

On Thu, 12 Apr 2007, Tom Lane wrote:

> Steve <> writes:
>> ... even if I force it to use the indexes
>> (enable_seqscan=off) it doesn't make it any faster really :/
>
> Does that change the plan, or do you still get a seqscan?
>
> BTW, how big is this table really (how many rows)?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>

От:
Tom Lane
Дата:

Steve <> writes:
> With enable_seqscan=off I get:

>     ->  Bitmap Index Scan on detail_summary_receipt_encounter_idx
> (cost=0.00..4211395.17 rows=1099 width=0)
>           Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id =
> ANY ...

> The explain analyze is pending, running it now (however it doens't really
> appear to be any faster using this plan).

Yeah, that index is nearly useless for this query --- since the receipt
condition isn't really eliminating anything, it'll have to look at every
index entry :-( ... in fact, do so again for each of the IN arms :-( :-(
So it's definitely right not to want to use that plan.  Question is, why
is it seemingly failing to consider the "right" index?

I'm busy setting up my test case on an x86_64 machine right now, but
I rather fear it'll still work just fine for me.  Have you got any
nondefault parameter settings besides the ones you already mentioned?

Another thing that might be interesting, if you haven't got a problem
with exclusive-locking the table for a little bit, is

    BEGIN;
    DROP INDEX each index except detail_summary_encounter_id_idx
    EXPLAIN the problem query
    ROLLBACK;

just to see if it does the right thing when it's not distracted by
all the "wrong" indexes (yeah, I'm grasping at straws here).  If you
set up the above as a SQL script it should only take a second to run.
Please try this with both settings of enable_seqscan --- you don't need
to do "explain analyze" though, we just want to know which plan it picks
and what the cost estimate is.

            regards, tom lane

От:
Steve
Дата:

If the other indexes are removed, with enable_seqscan=on:

  Bitmap Heap Scan on detail_summary ds  (cost=154.10..1804.22 rows=1099
width=4)
    Recheck Cond: (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
    Filter: (receipt >= '1998-12-30'::date)
    ->  Bitmap Index Scan on detail_summary_encounter_id_idx
(cost=0.00..154.07 rows=1099 width=0)
          Index Cond: (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))


With it off:

  Bitmap Heap Scan on detail_summary ds  (cost=154.10..1804.22 rows=1099
width=4)
    Recheck Cond: (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
    Filter: (receipt >= '1998-12-30'::date)
    ->  Bitmap Index Scan on detail_summary_encounter_id_idx
(cost=0.00..154.07 rows=1099 width=0)
          Index Cond: (encounter_id = ANY

('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))


Either way, it runs perfectly fast.  So it looks like the indexes are
confusing this query like you suspected.  Any advise?  This isn't the only
query we run on this table, much as I'd absolutely love to kill off some
indexes to imrpove our nightly load times I can't foul up the other
queries :)


Thank you very much for all your help on this issue, too!

Steve

On Thu, 12 Apr 2007, Tom Lane wrote:

> Steve <> writes:
>> With enable_seqscan=off I get:
>
>>     ->  Bitmap Index Scan on detail_summary_receipt_encounter_idx
>> (cost=0.00..4211395.17 rows=1099 width=0)
>>           Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id =
>> ANY ...
>
>> The explain analyze is pending, running it now (however it doens't really
>> appear to be any faster using this plan).
>
> Yeah, that index is nearly useless for this query --- since the receipt
> condition isn't really eliminating anything, it'll have to look at every
> index entry :-( ... in fact, do so again for each of the IN arms :-( :-(
> So it's definitely right not to want to use that plan.  Question is, why
> is it seemingly failing to consider the "right" index?
>
> I'm busy setting up my test case on an x86_64 machine right now, but
> I rather fear it'll still work just fine for me.  Have you got any
> nondefault parameter settings besides the ones you already mentioned?
>
> Another thing that might be interesting, if you haven't got a problem
> with exclusive-locking the table for a little bit, is
>
>     BEGIN;
>     DROP INDEX each index except detail_summary_encounter_id_idx
>     EXPLAIN the problem query
>     ROLLBACK;
>
> just to see if it does the right thing when it's not distracted by
> all the "wrong" indexes (yeah, I'm grasping at straws here).  If you
> set up the above as a SQL script it should only take a second to run.
> Please try this with both settings of enable_seqscan --- you don't need
> to do "explain analyze" though, we just want to know which plan it picks
> and what the cost estimate is.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>

От:
Tom Lane
Дата:

Steve <> writes:
> Either way, it runs perfectly fast.  So it looks like the indexes are
> confusing this query like you suspected.  Any advise?

Wow --- sometimes grasping at straws pays off.  I was testing here with
just a subset of the indexes to save build time, but I bet that one of
the "irrelevant" ones is affecting this somehow.  Time to re-test.

If you have some time to kill, it might be interesting to vary that
begin/rollback test script to leave one or two other indexes in place,
and see if you can identify exactly which other index(es) get it
confused.

I'm about to go out to dinner with the wife, but will have a closer
look when I get back, or tomorrow morning.  We'll figure this out.

            regards, tom lane

От:
Steve
Дата:

Okay -- I started leaving indexes on one by one.

The explain broke when the detail_summary_receipt_encounter_idx index was
left on (receipt, encounter_id).

Just dropping that index had no effect, but there's a LOT of indexes that
refer to receipt.  So on a hunch I tried dropping all indexes that refer
to receipt date and that worked -- so it's the indexes that contain
receipt date that are teh problem.

For more fun, I tried leaving the index that's just receipt date alone
(index detail_summary_receipt_id_idx) and THAT produced the correct query;
it's all these multi-column queries that are fouling things up, it would
seem!


.... So does this mean I should experiment with dropping those indexes?
I'm not sure if that will result in 'bad things' as there are other
complicated actions like sorts that may go real slow if I drop those
indexes.  BUT I think it'll be easy to convince the customer to drop their
absurdly complicated sorts if I can come back with serious results like
what we've worked out here.


And thanks again -- have a good dinner! :)

Steve


On Thu, 12 Apr 2007, Tom Lane wrote:

> Steve <> writes:
>> Either way, it runs perfectly fast.  So it looks like the indexes are
>> confusing this query like you suspected.  Any advise?
>
> Wow --- sometimes grasping at straws pays off.  I was testing here with
> just a subset of the indexes to save build time, but I bet that one of
> the "irrelevant" ones is affecting this somehow.  Time to re-test.
>
> If you have some time to kill, it might be interesting to vary that
> begin/rollback test script to leave one or two other indexes in place,
> and see if you can identify exactly which other index(es) get it
> confused.
>
> I'm about to go out to dinner with the wife, but will have a closer
> look when I get back, or tomorrow morning.  We'll figure this out.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate
>

От:
Tom Lane
Дата:

Steve <> writes:
> Okay -- I started leaving indexes on one by one.
> ...
> .... So does this mean I should experiment with dropping those indexes?

No, I think this means there's a planner bug to fix.  I haven't quite
scoped out what it is yet, though.

            regards, tom lane

От:
Tom Lane
Дата:

Steve <> writes:
> Just dropping that index had no effect, but there's a LOT of indexes that
> refer to receipt.  So on a hunch I tried dropping all indexes that refer
> to receipt date and that worked -- so it's the indexes that contain
> receipt date that are teh problem.

I'm still not having any luck reproducing the failure here.  Grasping at
straws again, I wonder if it's got something to do with the order in
which the planner examines the indexes --- which is OID order.  Could
you send me the results of

select indexrelid::regclass from pg_index where indrelid = 'detail_summary'::regclass order by indexrelid;

            regards, tom lane

От:
Steve
Дата:

Here you go:

  detail_summary_b_record_status_idx
  detail_summary_batch_id_idx
  detail_summary_batchnum_idx
  detail_summary_carrier_id_idx
  detail_summary_duplicate_id_idx
  detail_summary_e_record_status_idx
  detail_summary_encounter_id_idx
  detail_summary_encounternum_idx
  detail_summary_export_date_idx
  detail_summary_hedis_date_idx
  detail_summary_member_name_idx
  detail_summary_member_num_idx
  detail_summary_p_record_status_idx
  detail_summary_patient_control_idx
  detail_summary_procedurecode_idx
  detail_summary_product_line_idx
  detail_summary_provider_id_idx
  detail_summary_raps_date_idx
  detail_summary_receipt_id_idx
  detail_summary_referrering_prov_id_idx
  detail_summary_rendering_prov_id_idx
  detail_summary_rendering_prov_name_idx
  detail_summary_servicedate_idx
  ds_sort_1
  ds_sort_10
  ed_cbee_norev
  ed_cbee_norev_p
  ed_cbee_rev
  ed_cbee_rev_p
  mcbe
  mcbe_p
  mcbe_rev
  mcbe_rev_p
  mcbee_norev
  mcbee_norev_p
  mcbee_rev
  mcbee_rev_p
  pcbee_norev
  pcbee_norev_p
  pcbee_rev
  pcbee_rev_p
  rcbee_norev
  rcbee_norev_p
  rp_cbee_norev
  rp_cbee_norev_p
  rp_cbee_rev
  rp_cbee_rev_p
  sd_cbee_norev
  sd_cbee_norev_p
  sd_cbee_rev
  sd_cbee_rev_p
  testrev
  testrev_p
  detail_summary_receipt_encounter_idx


On Thu, 12 Apr 2007, Tom Lane wrote:

> Steve <> writes:
>> Just dropping that index had no effect, but there's a LOT of indexes that
>> refer to receipt.  So on a hunch I tried dropping all indexes that refer
>> to receipt date and that worked -- so it's the indexes that contain
>> receipt date that are teh problem.
>
> I'm still not having any luck reproducing the failure here.  Grasping at
> straws again, I wonder if it's got something to do with the order in
> which the planner examines the indexes --- which is OID order.  Could
> you send me the results of
>
> select indexrelid::regclass from pg_index where indrelid = 'detail_summary'::regclass order by indexrelid;
>
>             regards, tom lane
>

От:
Tom Lane
Дата:

Steve <> writes:
> On Thu, 12 Apr 2007, Tom Lane wrote:
>> I'm still not having any luck reproducing the failure here.  Grasping at
>> straws again, I wonder if it's got something to do with the order in
>> which the planner examines the indexes --- which is OID order.  Could
>> you send me the results of

> Here you go:

Nope, still doesn't fail for me.  [ baffled and annoyed... ]  There must
be something about your situation that's relevant but we aren't
recognizing it.  Are you in a position to let me ssh into your machine
and try to debug it?  Or other options like sending me a dump of your
database?   I'm about out of steam for tonight, but let's talk about it
off-list tomorrow.

            regards, tom lane

От:
Tom Lane
Дата:

Steve <> writes:
> [ strange planner misbehavior in 8.2.3 ]

After some off-list investigation (thanks, Steve, for letting me poke
at your machine), the short answer is that the heuristics used by
choose_bitmap_and() suck.  The problem query is like

select ... from ds where
ds.receipt >= '1998-12-30 0:0:0' and
ds.encounter_id in ( ... 100 distinct values ... );

and the table has a truly remarkable variety of indexes on encounter_id,
receipt, and combinations of them with other columns.  The receipt
condition is actually in effect a no-op, because all receipt dates are
later than that, but because ineq_histogram_selectivity doesn't trust
histogram data unreservedly we compute a selectivity of about 0.99997
for it.  That means that the indexes that cover both receipt and
encounter_id are given a selectivity score just fractionally better than
those involving encounter_id alone, and therefore they sort first in
choose_bitmap_and's sort step, and the way that that routine is coded,
only combinations of the very first index with other ones will be
considered for a bitmap heap scan.  So the possibility of using just the
index on encounter_id alone is never considered, even though that
alternative is vastly cheaper than the alternatives that are considered.
(It happens that encounter_id is a low-order column in all the indexes
that include receipt, and so these scans end up covering the whole index
... multiple times even.  The cost estimation is fine --- the thing
knows these are expensive --- what's falling down is the heuristic for
which combinations of indexes to consider using in a bitmap scan.)

The original coding of choose_bitmap_and involved a "fuzzy" comparison
of selectivities, which would have avoided this problem, but we got rid
of that later because it had its own problems.  In fact,
choose_bitmap_and has caused us enough problems that I'm thinking we
need a fundamental rethink of how it works, rather than just marginal
tweaks.  If you haven't looked at this code before, the comments explain
the idea well enough:

/*
 * choose_bitmap_and
 *        Given a nonempty list of bitmap paths, AND them into one path.
 *
 * This is a nontrivial decision since we can legally use any subset of the
 * given path set.  We want to choose a good tradeoff between selectivity
 * and cost of computing the bitmap.
 *
 * The result is either a single one of the inputs, or a BitmapAndPath
 * combining multiple inputs.
 */
...
    /*
     * In theory we should consider every nonempty subset of the given paths.
     * In practice that seems like overkill, given the crude nature of the
     * estimates, not to mention the possible effects of higher-level AND and
     * OR clauses.  As a compromise, we sort the paths by selectivity.  We
     * always take the first, and sequentially add on paths that result in a
     * lower estimated cost.
     *
     * We also make some effort to detect directly redundant input paths, as
     * can happen if there are multiple possibly usable indexes.  (Another way
     * it can happen is that best_inner_indexscan will find the same OR join
     * clauses that create_or_index_quals has pulled OR restriction clauses
     * out of, and then both versions show up as duplicate paths.)  We
     * consider an index redundant if any of its index conditions were already
     * used by earlier indexes.  (We could use predicate_implied_by to have a
     * more intelligent, but much more expensive, check --- but in most cases
     * simple pointer equality should suffice, since after all the index
     * conditions are all coming from the same RestrictInfo lists.)
     *
     * You might think the condition for redundancy should be "all index
     * conditions already used", not "any", but this turns out to be wrong.
     * For example, if we use an index on A, and then come to an index with
     * conditions on A and B, the only way that the second index can be later
     * in the selectivity-order sort is if the condition on B is completely
     * non-selective.  In any case, we'd surely be drastically misestimating
     * the selectivity if we count the same condition twice.
     *
     * We include index predicate conditions in the redundancy test.  Because
     * the test is just for pointer equality and not equal(), the effect is
     * that use of the same partial index in two different AND elements is
     * considered redundant.  (XXX is this too strong?)
     *
     * Note: outputting the selected sub-paths in selectivity order is a good
     * thing even if we weren't using that as part of the selection method,
     * because it makes the short-circuit case in MultiExecBitmapAnd() more
     * likely to apply.
     */


One idea I thought about was to sort by index scan cost, using
selectivity only as a tiebreaker for cost, rather than the other way
around as is currently done.  This seems fairly plausible because
indexscans that are cheaper than other indexscans likely return fewer
rows too, and so selectivity is already accounted for to some extent ---
at least you can't have an enormously worse selectivity at lower cost,
whereas Steve's example proves it doesn't work the other way.  But I'm
worried about breaking the reasoning about redundant indexes that's
mentioned in the comments.

Another alternative that would respond to the immediate problem is to
maintain the current sort order, but as we come to each index, consider
using that one alone, and throw away whatever AND we might have built up
if that one alone beats the AND-so-far.  This seems more conservative,
as it's unlikely to break any cases that work well now, but on the other
hand it feels like plastering another wart atop a structure that's
already rather rickety.

Has anyone got any thoughts about the best way to do this?

            regards, tom lane

От:
Alvaro Herrera
Дата:

Tom Lane wrote:

> One idea I thought about was to sort by index scan cost, using
> selectivity only as a tiebreaker for cost, rather than the other way
> around as is currently done.  This seems fairly plausible because
> indexscans that are cheaper than other indexscans likely return fewer
> rows too, and so selectivity is already accounted for to some extent ---
> at least you can't have an enormously worse selectivity at lower cost,
> whereas Steve's example proves it doesn't work the other way.  But I'm
> worried about breaking the reasoning about redundant indexes that's
> mentioned in the comments.
>
> Another alternative that would respond to the immediate problem is to
> maintain the current sort order, but as we come to each index, consider
> using that one alone, and throw away whatever AND we might have built up
> if that one alone beats the AND-so-far.  This seems more conservative,
> as it's unlikely to break any cases that work well now, but on the other
> hand it feels like plastering another wart atop a structure that's
> already rather rickety.
>
> Has anyone got any thoughts about the best way to do this?

How about doing both: sort the index by index scan cost; then pick the
first index on the list and start adding indexes when they lower the
cost.  When adding each index, consider it by itself against the
already stacked indexes.  If the cost is lower, put this index at the
top of the list, and restart the algorithm (after the sorting step of
course).

I think the concern about condition redundancy should be attacked
separately.  How about just comparing whether they have common prefixes
of conditions?  I admit I don't understand what would happen with
indexes defined like (lower(A), B, C) versus (A, B) for example.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

От:
Tom Lane
Дата:

"Dann Corbit" <> writes:
> Instead of sorting, I suggest the quickselect() algorithm, which is
> O(n).

What for?  Common cases have less than half a dozen entries.  That is
not the place we need to be spending engineering effort --- what we
need to worry about is what's the choice algorithm, not implementation
details.

            regards, tom lane

От:
Tom Lane
Дата:

Alvaro Herrera <> writes:
> I think the concern about condition redundancy should be attacked
> separately.  How about just comparing whether they have common prefixes
> of conditions?  I admit I don't understand what would happen with
> indexes defined like (lower(A), B, C) versus (A, B) for example.

I understand that issue a bit better than I did when I wrote the comment
(so I suppose I better rewrite it).  The $64 reason for rejecting
AND-combinations of indexes that are using some of the same
WHERE-conditions is that if we don't, we effectively double-count the
selectivity of those conditions, causing us to prefer useless
AND-combinations.  An example is "WHERE A > 42 AND B < 100" where we
have an index on A and one on (A,B).  The selectivity calculation
will blindly assume that the selectivities of the two indexes are
independent and thus prefer to BitmapAnd them, when obviously there
is no point in using both.  Ideally we should improve the selectivity
calculation to not get fooled like this, but that seems hard and
probably slow.  So for the moment we have the heuristic that no
WHERE-clause should be used twice in any AND-combination.

Given that we are using that heuristic, it becomes important that
we visit the indexes in the "right order" --- as the code stands,
in the (A) vs (A,B) case it will consider only the first index it
arrives at in the selectivity sort order, because the second will
be rejected on the basis of re-use of the WHERE A > 42 condition.
Sorting by selectivity tends to ensure that we pick the index that
can make use of as many WHERE-clauses as possible.

The idea of considering each index alone fixes the order dependency
for cases where a single index is the best answer, but it doesn't
help much for cases where you really do want a BitmapAnd, only not
one using the index with the individually best selectivity.

We really need a heuristic here --- exhaustive search will be
impractical in cases where there are many indexes, because you'd
be looking at 2^N combinations.  (In Steve's example there are
actually 38 relevant indexes, which is bad database design if
you ask me, but in any case we cannot afford to search through
2^38 possibilities.)  But the one we're using now is too fragile.

Maybe we should use a cutoff similar to the GEQO one: do exhaustive
search if there are less than N relevant indexes, for some N.
But that's not going to help Steve; we still need a smarter heuristic
for what to look for above the cutoff.

            regards, tom lane

От:
Tom Lane
Дата:

Alvaro Herrera <> writes:
> Tom Lane wrote:
>> Has anyone got any thoughts about the best way to do this?

> How about doing both: sort the index by index scan cost; then pick the
> first index on the list and start adding indexes when they lower the
> cost.  When adding each index, consider it by itself against the
> already stacked indexes.  If the cost is lower, put this index at the
> top of the list, and restart the algorithm (after the sorting step of
> course).

The "restart" part of that bothers me --- it's not entirely clear that
you couldn't get into an infinite loop.  (Imagine that A+B is better
than A alone, so we adopt it, but worse than C alone, so we take C as
the new leader and start over.  Then perhaps C+B is better than C alone
but worse than A alone, so we take A as the leader and start over.
Maybe this is impossible but I'm unsure.)

I looked back at the gdb results I'd gotten from Steve's example and
noticed that for his 38 indexes there were only three distinct index
selectivity values, and the sort step grouped indexes by cost within
those groups.  In hindsight of course this is obvious: the selectivity
depends on the set of WHERE-clauses used, so with two WHERE-clauses
there are three possible selectivities (to within roundoff error anyway)
depending on whether the index uses one or both clauses.  So the
existing algorithm gets one thing right: for any two indexes that make
use of just the same set of WHERE-clauses, it will always take the one
with cheaper scan cost.

Thinking more about this leads me to the following proposal:

1. Explicitly group the indexes according to the subset of
WHERE-conditions (and partial index conditions, if any) they use.
Within each such group, discard all but the cheapest-scan-cost one.

2. Sort the remaining indexes according to scan cost.

3. For each index in order, consider it as a standalone scan, and also
consider adding it on to the AND-group led by each preceding index,
using the same logic as now: reject using any WHERE-condition twice
in a group, and then add on only if the total cost of the AND-group
scan is reduced.

This would be approximately O(N^2) in the number of relevant indexes,
whereas the current scheme is roughly linear (handwaving a bit here
because the number of WHERE-clauses is a factor too).  But that seems
unlikely to be a problem for sane numbers of indexes, unlike the O(2^N)
behavior of an exhaustive search.  It would get rid of (most of) the
order-sensitivity problem, since we would definitely consider the
AND-combination of every pair of combinable indexes.  I can imagine
cases where it wouldn't notice useful three-or-more-way combinations
because the preceding two-way combination didn't win, but they seem
pretty remote possibilities.

Comments?

            regards, tom lane

От:
Robins
Дата:

Hi Tom / Steve,

Am one of the silent readers of performance issues that come up on this list (and are discussed in detail) ... just like this one.

If and when you do come up with a solution, please do post some details about them here... (i say that coz it seems that for obvious reasons, things must have gone off air after tom's last email, and one can understand that). But an analysis, or atleast a pointer may be of help to someone (like me) reading this list.

Thanks
Robins


---------- Forwarded message ----------
From: Tom Lane < >
Date: Apr 13, 2007 10:08 AM
Subject: Re: [PERFORM] Strangely Variable Query Performance
To: Steve <>
Cc: Scott Marlowe < >, PostgreSQL Performance <>

Steve < > writes:
> On Thu, 12 Apr 2007, Tom Lane wrote:
>> I'm still not having any luck reproducing the failure here.  Grasping at
>> straws again, I wonder if it's got something to do with the order in
>> which the planner examines the indexes --- which is OID order.  Could
>> you send me the results of

> Here you go:

Nope, still doesn't fail for me.  [ baffled and annoyed... ]  There must
be something about your situation that's relevant but we aren't
recognizing it.  Are you in a position to let me ssh into your machine
and try to debug it?  Or other options like sending me a dump of your
database?   I'm about out of steam for tonight, but let's talk about it
off-list tomorrow.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


От:
Tom Lane
Дата:

Robins <> writes:
> If and when you do come up with a solution, please do post some details
> about them here... (i say that coz it seems that for obvious reasons, things
> must have gone off air after tom's last email, and one can understand that).
> But an analysis, or atleast a pointer may be of help to someone (like me)
> reading this list.

Oh, sorry, the subsequent discussion moved over to pgsql-hackers:
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00621.php
and -patches:
http://archives.postgresql.org/pgsql-patches/2007-04/msg00374.php

Those are good places to look if a discussion on -bugs or other lists
seems to tail off...

            regards, tom lane

PS: the reason I couldn't reproduce the behavior was just that the dummy
data I was using didn't have the right statistics.

От:
"Robins Tharakan"
Дата:

Hi Tom / Steve,

Am one of the silent readers of performance issues that come up on this list (and are discussed in detail) ... just like this one.

If and when you do come up with a solution, please do post some details about them here... (i say that coz it seems that for obvious reasons, things must have gone off air after tom's last email, and one can understand that). But an analysis, or atleast a pointer may be of help to someone (like me) reading this list.

Thanks
Robins

---------- Forwarded message ----------
From: Tom Lane <>
Date: Apr 13, 2007 10:08 AM
Subject: Re: [PERFORM] Strangely Variable Query Performance
To: Steve <>
Cc: Scott Marlowe < >, PostgreSQL Performance <>

Steve <> writes:
> On Thu, 12 Apr 2007, Tom Lane wrote:
>> I'm still not having any luck reproducing the failure here.  Grasping at
>> straws again, I wonder if it's got something to do with the order in
>> which the planner examines the indexes --- which is OID order.  Could
>> you send me the results of

> Here you go:

Nope, still doesn't fail for me.  [ baffled and annoyed... ]  There must
be something about your situation that's relevant but we aren't
recognizing it.  Are you in a position to let me ssh into your machine
and try to debug it?  Or other options like sending me a dump of your
database?   I'm about out of steam for tonight, but let's talk about it
off-list tomorrow.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


--
Robins