Обсуждение: 7.3.1 New install, large queries are slow

Поиск

Список

Период

Сортировка

7.3.1 New install, large queries are slow

От

"Roman Fail"

Дата:

15 января 2003 г., 16:47:37

I am trying to get a PostgreSQL server into production (moving from MSSQL2K) but having some serious performance
issues. PostgreSQL is new to me, and I'm only just now comfortable with Linux.  So far I've succesfully compiled
postgresfrom source and migrated all the data from MSSQL.  Postgres is primarily accessed using JDBC.  
 
 
I really want to use Postgres for production, but if I can't get better results out of it by the end of the week we are
droppingit forever and going back to MSSQL despite the $$$.  I'm basically at a point where I've got to find help from
thelist.  Please help me make this server fly!
 
 
I have a query that does many joins (including two very big tables) which is slow on Postgres.  On PGSQL the query
takes19 minutes, but only 3 seconds on MSSQL.  The two servers have the same indexes created (including primary key
indexes). I finally gave up on creating all the foreign keys in Postgres - after 12 hours of 100% CPU.  It's hard for
meto believe that the hardware is the bottleneck - the $20k Postgres server far outclasses the MSSQL server (see below
forstats).  When I ran EXPLAIN ANALYZE for this query the CPU averaged 5%, sar -b shows about 6,000 block reads/sec,
andvmstat had zero swapping.  EXPLAIN results are below, I'm not sure how to interpret them.
 
 
The disks are not reading at max speed during the query - when I ran a VACUUM ANALYZE (after data migration), sar -b
wasconsistently 100,000 block reads/sec.  It does not seem like the hardware is holding back things here.  I read
somethingabout 'fsync' recently, would changing that setting apply in this case?
 
 
DATABASE:
'tranheader' table has 2000 tuples, PK index
'batchheader' table has 2.6 million tuples, 5 indexes, FK constraint to tranheader PK
'batchdetail' table has 23 million tuples, 6 indexes, FK constraint to batcheader PK
18 tables with <1000 tuples, most are child tables of batchdetail
All tables have a PK and are normalized 
Large nightly INSERTs (~200000 tuples)
Complex SELECTs all day long
No UPDATEs or DELETEs ever, at least until I run low on storage!
 
I've learned as much as I can absorb from the online docs and archives about performance tuning.  Based on my limited
understanding,I've changed the following settings.  I am totally open to any suggestions, including starting over with
RAID,filesystems, PGSQL.  I would almost consider FreeBSD if it helped a lot, but that would be a stretch given my time
investmentin Linux.  This is a brand new machine, so bad hardware is a possibility - but I'm not sure how to go about
determiningthat.
 
 
*** /etc/sysctl.conf
kernel.shmmni = 4096
kernel.shmall = 32000000
kernel.shmmax = 512000000  
 
*** /usr/local/pgsql/data/postgresql.conf
tcpip_socket=true
shared_buffers = 32768
max_fsm_relations = 10000
max_fsm_pages = 2000000
sort_mem = 8192
 
POSTGRESQL SYSTEM:
Red Hat Linux 8.0, PostgreSQL 7.3.1 (dedicated, besides SSH daemon)
Dell PE6600 Dual Xeon MP 2.0GHz, 2MB L3 cache,HyperThreading enabled
4.0 GB Physical RAM
/dev/sda1: ext3 101MB /boot 
/dev/sda2: ext3  34GB /      (sda is 2 disk RAID-1)
none     : swap 1.8GB
/dev/sdb1: ext3 104GB /usr/local/pgsql/data (sdb is 6 disk RAID-10)
All 8 drives are 36GB, 15k RPM, Ultra160 SCSI
PERC3/DC 128MB RAID controller
 
MSSQL SYSTEM:
Dell PE1650, Dual P3 1.1GHz, 1.5GB RAM
Single 18GB, 15k RPM SCSI drive (no RAID)
Windows 2000 Server SP3, SQL Server 2000 SP2

TIA,
Roman Fail
Sr. Web Application Developer
POS Portal, Inc.

EXPLAIN ANALYZE RESULTS:
Limit  (cost=370518.31..370518.31 rows=1 width=540) (actual time=1168722.18..1168722.20 rows=5 loops=1)
  ->  Sort  (cost=370518.31..370518.31 rows=1 width=540) (actual time=1168722.18..1168722.18 rows=5 loops=1)
        Sort Key: b.batchdate
        ->  Nested Loop  (cost=314181.17..370518.30 rows=1 width=540) (actual time=1148191.12..1168722.09 rows=5
loops=1)
              Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
              ->  Nested Loop  (cost=314181.17..370461.79 rows=1 width=502) (actual time=1148167.55..1168671.80 rows=5
loops=1)
                    Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                    ->  Nested Loop  (cost=314181.17..370429.29 rows=1 width=485) (actual time=1148167.48..1168671.45
rows=5loops=1)
 
                          Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                          ->  Nested Loop  (cost=314181.17..370396.79 rows=1 width=476) (actual
time=1148167.41..1168671.08rows=5 loops=1)
 
                                Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                                ->  Nested Loop  (cost=314181.17..314402.47 rows=1 width=457) (actual
time=1139099.39..1139320.79rows=5 loops=1)
 
                                      Join Filter: ("outer".cardtypeid = "inner".cardtypeid)
                                      ->  Merge Join  (cost=314181.17..314401.24 rows=1 width=443) (actual
time=1138912.13..1139133.00rows=5 loops=1)
 
                                            Merge Cond: ("outer".batchid = "inner".batchid)
                                            ->  Sort  (cost=127418.59..127418.59 rows=3 width=150) (actual
time=9681.91..9681.93rows=17 loops=1)
 
                                                  Sort Key: b.batchid
                                                  ->  Hash Join  (cost=120787.32..127418.56 rows=3 width=150) (actual
time=7708.04..9681.83rows=17 loops=1)
 
                                                        Hash Cond: ("outer".merchantid = "inner".merchantid)
                                                        ->  Merge Join  (cost=120781.58..125994.80 rows=283597
width=72)(actual time=7655.57..9320.49 rows=213387 loops=1)
 
                                                              Merge Cond: ("outer".tranheaderid =
"inner".tranheaderid)
                                                              ->  Index Scan using tranheader_ix_tranheaderid_idx on
tranheadert  (cost=0.00..121.15 rows=1923 width=16) (actual time=0.15..10.86 rows=1923 loops=1)
 
                                                                    Filter: (clientid = 6)
                                                              ->  Sort  (cost=120781.58..121552.88 rows=308520
width=56)(actual time=7611.75..8162.81 rows=329431 loops=1)
 
                                                                    Sort Key: b.tranheaderid
                                                                    ->  Seq Scan on batchheader b  (cost=0.00..79587.23
rows=308520width=56) (actual time=0.90..4186.30 rows=329431 loops=1)
 
                                                                          Filter: (batchdate > '2002-12-15
00:00:00'::timestampwithout time zone)
 
                                                        ->  Hash  (cost=5.74..5.74 rows=1 width=78) (actual
time=31.39..31.39rows=0 loops=1)
 
                                                              ->  Index Scan using merchants_ix_merchid_idx on
merchantsm  (cost=0.00..5.74 rows=1 width=78) (actual time=31.38..31.38 rows=1 loops=1)
 
                                                                    Index Cond: (merchid = '701252267'::character
varying)
                                            ->  Sort  (cost=186762.59..186872.62 rows=44010 width=293) (actual
time=1127828.96..1128725.39rows=368681 loops=1)
 
                                                  Sort Key: d.batchid
                                                  ->  Index Scan using batchdetail_ix_tranamount_idx on batchdetail d
(cost=0.00..176768.18rows=44010 width=293) (actual time=35.48..1104625.54 rows=370307 loops=1)
 
                                                        Index Cond: ((tranamount >= 500.0) AND (tranamount <= 700.0))
                                      ->  Seq Scan on cardtype c  (cost=0.00..1.10 rows=10 width=14) (actual
time=37.44..37.47rows=10 loops=5)
 
                                ->  Seq Scan on purc1 p1  (cost=0.00..44259.70 rows=938770 width=19) (actual
time=98.09..4187.32rows=938770 loops=5)
 
                          ->  Seq Scan on direct dr  (cost=0.00..20.00 rows=1000 width=9) (actual time=0.00..0.00
rows=0loops=5)
 
                    ->  Seq Scan on carrental cr  (cost=0.00..20.00 rows=1000 width=17) (actual time=0.00..0.00 rows=0
loops=5)
              ->  Seq Scan on checks ck  (cost=0.00..40.67 rows=1267 width=38) (actual time=1.03..7.63 rows=1267
loops=5)
Total runtime: 1168881.12 msec

Re: 7.3.1 New install, large queries are slow

От

Tomasz Myrta

Дата:

15 января 2003 г., 18:14:32

Roman Fail wrote:
<cut>

EXPLAIN ANALYZE RESULTS:
Limit  (cost=370518.31..370518.31 rows=1 width=540) (actual time=1168722.18..1168722.20 rows=5 loops=1)
  ->  Sort  (cost=370518.31..370518.31 rows=1 width=540) (actual time=1168722.18..1168722.18 rows=5 loops=1)
        Sort Key: b.batchdate
        ->  Nested Loop  (cost=314181.17..370518.30 rows=1 width=540) (actual time=1148191.12..1168722.09 rows=5
loops=1)
              Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
              ->  Nested Loop  (cost=314181.17..370461.79 rows=1 width=502) (actual time=1148167.55..1168671.80 rows=5
loops=1)
                    Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                    ->  Nested Loop  (cost=314181.17..370429.29 rows=1 width=485) (actual time=1148167.48..1168671.45
rows=5loops=1) 
                          Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                          ->  Nested Loop  (cost=314181.17..370396.79 rows=1 width=476) (actual
time=1148167.41..1168671.08rows=5 loops=1) 
                                Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                                ->  Nested Loop  (cost=314181.17..314402.47 rows=1 width=457) (actual
time=1139099.39..1139320.79rows=5 loops=1) 
                                      Join Filter: ("outer".cardtypeid = "inner".cardtypeid)
                                      ->  Merge Join  (cost=314181.17..314401.24 rows=1 width=443) (actual
time=1138912.13..1139133.00rows=5 loops=1) 
                                            Merge Cond: ("outer".batchid = "inner".batchid)
                                            ->  Sort  (cost=127418.59..127418.59 rows=3 width=150) (actual
time=9681.91..9681.93rows=17 loops=1) 
                                                  Sort Key: b.batchid
                                                  ->  Hash Join  (cost=120787.32..127418.56 rows=3 width=150) (actual
time=7708.04..9681.83rows=17 loops=1) 
                                                        Hash Cond: ("outer".merchantid = "inner".merchantid)
                                                        ->  Merge Join  (cost=120781.58..125994.80 rows=283597
width=72)(actual time=7655.57..9320.49 rows=213387 loops=1) 
                                                              Merge Cond: ("outer".tranheaderid = "inner".tranheaderid)
                                                              ->  Index Scan using tranheader_ix_tranheaderid_idx on
tranheadert  (cost=0.00..121.15 rows=1923 width=16) (actual time=0.15..10.86 rows=1923 loops=1) 
                                                                    Filter: (clientid = 6)
                                                              ->  Sort  (cost=120781.58..121552.88 rows=308520
width=56)(actual time=7611.75..8162.81 rows=329431 loops=1) 
                                                                    Sort Key: b.tranheaderid
                                                                    ->  Seq Scan on batchheader b  (cost=0.00..79587.23
rows=308520width=56) (actual time=0.90..4186.30 rows=329431 loops=1) 
                                                                          Filter: (batchdate > '2002-12-15
00:00:00'::timestampwithout time zone) 
                                                        ->  Hash  (cost=5.74..5.74 rows=1 width=78) (actual
time=31.39..31.39rows=0 loops=1) 
                                                              ->  Index Scan using merchants_ix_merchid_idx on
merchantsm  (cost=0.00..5.74 rows=1 width=78) (actual time=31.38..31.38 rows=1 loops=1) 
                                                                    Index Cond: (merchid = '701252267'::character
varying)
                                            ->  Sort  (cost=186762.59..186872.62 rows=44010 width=293) (actual
time=1127828.96..1128725.39rows=368681 loops=1) 
                                                  Sort Key: d.batchid
                                                  ->  Index Scan using batchdetail_ix_tranamount_idx on batchdetail d
(cost=0.00..176768.18rows=44010 width=293) (actual time=35.48..1104625.54 rows=370307 loops=1) 
                                                        Index Cond: ((tranamount >= 500.0) AND (tranamount <= 700.0))
                                      ->  Seq Scan on cardtype c  (cost=0.00..1.10 rows=10 width=14) (actual
time=37.44..37.47rows=10 loops=5) 
                                ->  Seq Scan on purc1 p1  (cost=0.00..44259.70 rows=938770 width=19) (actual
time=98.09..4187.32rows=938770 loops=5) 
                          ->  Seq Scan on direct dr  (cost=0.00..20.00 rows=1000 width=9) (actual time=0.00..0.00
rows=0loops=5) 
                    ->  Seq Scan on carrental cr  (cost=0.00..20.00 rows=1000 width=17) (actual time=0.00..0.00 rows=0
loops=5)
              ->  Seq Scan on checks ck  (cost=0.00..40.67 rows=1267 width=38) (actual time=1.03..7.63 rows=1267
loops=5)
Total runtime: 1168881.12 msec
<cut>

It looks like your execution time is not a hardware, but query problem.
Query nearly doesn't use indexes at all. You said, that that you have normalized database,
so you should have a lot of explicit joins, which work pretty well on Postgresql.

Can you add some examples of your queries? If it is difficult for you,
at least create one example, when you get "Join Filter" on "explain analyze".

From your analyze result:
Seq Scan on batchheader b  (cost=0.00..79587.23 rows=308520 width=56)
Can you write what condition and indexes does batchheader have?

Regards,
Tomasz Myrta

Re: 7.3.1 New install, large queries are slow

От

Josh Berkus

Дата:

15 января 2003 г., 18:17:55

Roman,

First, if this is a dedicated PostgreSQL server, you should try increasing
your shared_buffers to at least 512mb (65536) if not 1GB (double that) and
adjust your shmmax and shmmall to match.

Second, you will probably want to increase your sort_mem as well.  How much
depeneds on the number of concurrent queries you expect to be running and
their relative complexity.   Give me that information, and I'll offer you
some suggestions.  Part of your slow query

Your query problem is hopefully relatively easy.  The following clause is 95%
of your query time:

>                                                   ->  Index Scan using
batchdetail_ix_tranamount_idx on batchdetail d  (cost=0.00..176768.18
rows=44010 width=293) (actual time=35.48..1104625.54 rows=370307 loops=1)
>

See the actual time figures?  This one clause is taking 1,104,590 msec!

Now, why?

Well, look at the cost estimate figures in contrast to the actual row count:
estimate rows = 44,010        real rows 370,307
That's off by a factor of 9.   This index scan is obviously very cumbersome
and is slowing the query down.   Probably it should be using a seq scan
instead ... my guess is, you haven't run ANALYZE in a while and the incorrect
row estimate is causing the parser to choose a very slow index scan.

Try running ANALYZE on your database and re-running the query.   Also try
using REINDEX on batchdetail_ix_tranamount_idx .

Second, this clause near the bottom:

                                ->  Seq Scan on purc1 p1  (cost=0.00..44259.70
rows=938770 width=19) (actual time=98.09..4187.32 rows=938770 loops=5)

... suggests that you could save an additional 4 seconds by figuring out a way
for the criteria on purc1 to use a relevant index -- but only after you've
solved the problem with batchdetail_ix_tranamount_idx.

Finally, if you really want help, post the query.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: 7.3.1 New install, large queries are slow

От

Stephan Szabo

Дата:

15 января 2003 г., 18:37:17

On Wed, 15 Jan 2003, Roman Fail wrote:

> The disks are not reading at max speed during the query - when I ran a
> VACUUM ANALYZE (after data migration), sar -b was consistently 100,000
> block reads/sec.  It does not seem like the hardware is holding back
> things here.  I read something about 'fsync' recently, would changing
> that setting apply in this case?

You ran vacuum analyze, but some of the explain still looks suspiciously
like it's using default statistics (dr and cr for example, unless they
really do have 1000 rows).

What are the actual query and table definitions for the query?

Re: 7.3.1 New install, large queries are slow

От

Rod Taylor

Дата:

15 января 2003 г., 18:51:19

I didn't see the query itself in the message, but it looks to me like
it's poorly formed.  Could you send it?

By quick glance, either you're using a bunch of explicit joins that are
poorly formed (you've made a bad choice in order) or those particular
IDs are really popular.  There are a number of sequential scans that
possibly should be index scans.

> EXPLAIN ANALYZE RESULTS:
> Limit  (cost=370518.31..370518.31 rows=1 width=540) (actual time=1168722.18..1168722.20 rows=5 loops=1)
>   ->  Sort  (cost=370518.31..370518.31 rows=1 width=540) (actual time=1168722.18..1168722.18 rows=5 loops=1)
>         Sort Key: b.batchdate

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Вложения

signature.asc

Re: 7.3.1 New install, large queries are slow

От

Andrew Sullivan

Дата:

15 января 2003 г., 19:03:39

On Wed, Jan 15, 2003 at 10:00:04AM -0800, Roman Fail wrote:
> I have a query that does many joins (including two very big tables)
> which is slow on Postgres.  On PGSQL the query takes 19 minutes,

There are three things I can think of right off the bat.

First, the performance of foreign keys is flat-out awful in Postgres.
I suggest avoiding them if you can.

Second, ordering joins explicitly (with the JOIN keyword) constrains
the planner, and may select  bad plan.  The explain analyse output
was nice, but I didn't see the query, so I can't tell what the plan
maybe ought to be.

Third, I didn't see any suggestion that you'd moved the WAL onto its
own disk.  That will mostly help when you are under write load; I
guess it's not a problem here, but it's worth keeping in mind.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: 7.3.1 New install, large queries are slow

От

Hannu Krosing

Дата:

15 января 2003 г., 19:46:14

Roman Fail kirjutas K, 15.01.2003 kell 23:00:
> I am trying to get a PostgreSQL server into production (moving from MSSQL2K) but having some serious performance
issues. PostgreSQL is new to me, and I'm only just now comfortable with Linux.  So far I've succesfully compiled
postgresfrom source and migrated all the data from MSSQL.  Postgres is primarily accessed using JDBC.   
>
> I really want to use Postgres for production, but if I can't get better results out of it by the end of the week we
aredropping it forever and going back to MSSQL despite the $$$.  I'm basically at a point where I've got to find help
fromthe list.  Please help me make this server fly! 
>
> I have a query that does many joins (including two very big tables) which is slow on Postgres.  On PGSQL the query
takes19 minutes, but only 3 seconds on MSSQL.  The two servers have the same indexes created (including primary key
indexes). I finally gave up on creating all the foreign keys in Postgres - after 12 hours of 100% CPU.  It's hard for
meto believe that the hardware is the bottleneck - the $20k Postgres server far outclasses the MSSQL server (see below
forstats).  When I ran EXPLAIN ANALYZE for this query the CPU averaged 5%, sar -b shows about 6,000 block reads/sec,
andvmstat had zero swapping.  EXPLAIN results are below, I'm not sure how to interpret them. 
>

Two questions:

1) Have you run analyze on this database (after loading the data ?)

2) could you also post the actual query - it would make interpreting the
EXPLAIN ANALYZE RESULTS easier.

--
Hannu Krosing <hannu@tm.ee>

Re: 7.3.1 New install, large queries are slow

От

"Roman Fail"

Дата:

16 января 2003 г., 00:03:06

Thanks to everyone for the quick replies!  I'm sure that my lack of skill with SQL queries is the main problem.  What's
strangeto me is how MSSQL takes my bad queries and makes them look good anyway.  It must have a real smart planner.

Several changes:  shared_buffers = 131072, sort_mem = 32768, shmmax = 2097152000, shmall = 131072000.  I couldn't find
anyinfo out there on the relationship between shmmax and shmall, so I just preserved the ratio from the RedHat defaults
(1:16). As far as sort_mem goes, I expect to be running no more than 3 concurrent queries and they will all be just as
complexas this one.  Do you think sort_mem=32768 is a reasonable size?  None of these changes seemed to help speed up
thingshowever.

REINDEX INDEX batchdetail_ix_tranamount_idx;    was executed successfully, although it took 15 minutes.
ANALYZE executed in 2 minutes, even though I increased default_statistics_target = 30.  Should I increase it even more?
I don't mind the extra overhead each night if it will make my queries faster.  (Idiot check: I did actually stop and
startthe postmaster after changing all these settings).

Andrew Sullivan wrote:
>First, the performance of foreign keys is flat-out awful in Postgres.
>I suggest avoiding them if you can.

I don't have any problem getting rid of FKs, especially if it might actually help performance.  The nightly data import
iswell-defined and should always observe referential integrity, so I guess the db doesn't really need to enforce it.
InMSSQL, adding FKs was supposed to actually benefit SELECT performance.  Is it pretty much universally accepted that I
shoulddrop all my foreign keys?

>Second, ordering joins explicitly (with the JOIN keyword) constrains
>the planner, and may select  bad plan.  The explain analyse output
>was nice, but I didn't see the query, so I can't tell what the plan
>maybe ought to be.

I think this is the most likely problem.  I've read through Chapter 10 of the 7.3 docs, but I still don't feel like I
knowwhat would be a good order.  How do you learn this stuff anyway?  Trial and error?

>Third, I didn't see any suggestion that you'd moved the WAL onto its
>own disk.  That will mostly help when you are under write load;

I don't think I'm going to bother with moving the WAL....the write load during the day is very, very light (when
queriesare run).   Disk I/O is clearly not the limiting factor (yet!).  

So here's the query, and another EXPLAIN ANALYZE to go with it (executed after all setting changes).  The same result
columnsand JOINS are performed all day with variations on the WHERE clause; other possible search columns are the ones
thatare indexed (see below).  The 4 tables that use LEFT JOIN only sometimes have matching records, hence the OUTER
join.

EXPLAIN ANALYZE
SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate, d.tranamount, 
d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode, 
m.name AS merchantname, c.cardtype, m.merchid, 
p1.localtaxamount, p1.productidentifier, dr.avsresponse, 
cr.checkoutdate, cr.noshowindicator, ck.checkingacctno, 
ck.abaroutingno, ck.checkno 
FROM tranheader t 
INNER JOIN batchheader b ON t.tranheaderid = b.tranheaderid 
INNER JOIN merchants m ON m.merchantid = b.merchantid 
INNER JOIN batchdetail d ON d.batchid = b.batchid 
INNER JOIN cardtype c ON d.cardtypeid = c.cardtypeid 
LEFT JOIN purc1 p1 ON p1.batchdetailid = d.batchdetailid 
LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid 
LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid 
LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid 
WHERE t.clientid = 6 
AND d.tranamount BETWEEN 500.0 AND 700.0 
AND b.batchdate > '2002-12-15' 
AND m.merchid = '701252267' 
ORDER BY b.batchdate DESC 
LIMIT 50

Limit  (cost=1829972.39..1829972.39 rows=1 width=285) (actual time=1556497.79..1556497.80 rows=5 loops=1)
  ->  Sort  (cost=1829972.39..1829972.39 rows=1 width=285) (actual time=1556497.78..1556497.79 rows=5 loops=1)
        Sort Key: b.batchdate
        ->  Nested Loop  (cost=1771874.32..1829972.38 rows=1 width=285) (actual time=1538783.03..1556486.64 rows=5
loops=1)
              Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
              ->  Nested Loop  (cost=1771874.32..1829915.87 rows=1 width=247) (actual time=1538760.60..1556439.67
rows=5loops=1)

                    Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                    ->  Nested Loop  (cost=1771874.32..1829915.86 rows=1 width=230) (actual time=1538760.55..1556439.50
rows=5loops=1)

                          Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                          ->  Nested Loop  (cost=1771874.32..1829915.85 rows=1 width=221) (actual
time=1538760.51..1556439.31rows=5 loops=1)

                                Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                                ->  Nested Loop  (cost=1771874.32..1773863.81 rows=1 width=202) (actual
time=1529153.84..1529329.65rows=5 loops=1)

                                      Join Filter: ("outer".cardtypeid = "inner".cardtypeid)
                                      ->  Merge Join  (cost=1771874.32..1773862.58 rows=1 width=188) (actual
time=1529142.55..1529317.99rows=5 loops=1)

                                            Merge Cond: ("outer".batchid = "inner".batchid)
                                            ->  Sort  (cost=116058.42..116058.43 rows=3 width=118) (actual
time=14184.11..14184.14rows=17 loops=1)

                                                  Sort Key: b.batchid
                                                  ->  Hash Join  (cost=109143.44..116058.39 rows=3 width=118) (actual
time=12398.29..14184.03rows=17 loops=1)

                                                        Hash Cond: ("outer".merchantid = "inner".merchantid)
                                                        ->  Merge Join  (cost=109137.81..114572.94 rows=295957
width=40)(actual time=12359.75..13848.67 rows=213387 loops=1)

                                                              Merge Cond: ("outer".tranheaderid =
"inner".tranheaderid)
                                                              ->  Index Scan using tranheader_ix_tranheaderid_idx on
tranheadert  (cost=0.00..121.15 rows=1923 width=16) (actual time=0.17..10.91 rows=1923 loops=1)

                                                                    Filter: (clientid = 6)
                                                              ->  Sort  (cost=109137.81..109942.73 rows=321966
width=24)(actual time=12317.83..12848.43 rows=329431 loops=1)

                                                                    Sort Key: b.tranheaderid
                                                                    ->  Seq Scan on batchheader b  (cost=0.00..79683.44
rows=321966width=24) (actual time=29.93..10422.75 rows=329431 loops=1)

                                                                          Filter: (batchdate > '2002-12-15
00:00:00'::timestampwithout time zone)

                                                        ->  Hash  (cost=5.63..5.63 rows=1 width=78) (actual
time=21.06..21.06rows=0 loops=1)

                                                              ->  Index Scan using merchants_ix_merchid_idx on
merchantsm  (cost=0.00..5.63 rows=1 width=78) (actual time=21.05..21.05 rows=1 loops=1)

                                                                    Index Cond: (merchid = '701252267'::character
varying)
                                            ->  Sort  (cost=1655815.90..1656810.15 rows=397698 width=70) (actual
time=1513860.73..1514497.92rows=368681 loops=1)

                                                  Sort Key: d.batchid
                                                  ->  Index Scan using batchdetail_ix_tranamount_idx on batchdetail d
(cost=0.00..1597522.38rows=397698 width=70) (actual time=14.05..1505397.17 rows=370307 loops=1)

                                                        Index Cond: ((tranamount >= 500.0) AND (tranamount <= 700.0))
                                      ->  Seq Scan on cardtype c  (cost=0.00..1.10 rows=10 width=14) (actual
time=2.25..2.28rows=10 loops=5)

                                ->  Seq Scan on purc1 p1  (cost=0.00..44285.35 rows=941335 width=19) (actual
time=2.40..3812.43rows=938770 loops=5)

                          ->  Seq Scan on direct dr  (cost=0.00..0.00 rows=1 width=9) (actual time=0.00..0.00 rows=0
loops=5)
                    ->  Seq Scan on carrental cr  (cost=0.00..0.00 rows=1 width=17) (actual time=0.00..0.00 rows=0
loops=5)
              ->  Seq Scan on checks ck  (cost=0.00..40.67 rows=1267 width=38) (actual time=0.50..7.05 rows=1267
loops=5)
Total runtime: 1556553.76 msec

Tomasz Myrta wrote:
>Seq Scan on batchheader b  (cost=0.00..79587.23 rows=308520 width=56)
>Can you write what condition and indexes does batchheader have?

batchheader has 2.6 million records:
CREATE TABLE public.batchheader (
  batchid int8 DEFAULT nextval('"batchheader_batchid_key"'::text) NOT NULL, 
  line int4, 
  tranheaderid int4, 
  merchantid int4, 
  batchdate timestamp, 
  merchref char(16), 
  carryindicator char(1), 
  assocno varchar(6), 
  merchbankno char(4), 
  debitcredit char(1), 
  achpostdate timestamp, 
  trancode char(4), 
  netdeposit numeric(18, 4), 
  CONSTRAINT batchheader_ix_batchid_idx UNIQUE (batchid), 
  CONSTRAINT batchheader_pkey PRIMARY KEY (batchid), 
  CONSTRAINT fk_bh_th FOREIGN KEY (tranheaderid) REFERENCES tranheader (tranheaderid) ON DELETE RESTRICT ON UPDATE NO
ACTIONNOT DEFERRABLE INITIALLY IMMEDIATE

) WITH OIDS;
CREATE UNIQUE INDEX batchheader_ix_batchid_idx ON batchheader USING btree (batchid);
CREATE INDEX batchheader_ix_batchdate_idx ON batchheader USING btree (batchdate);
CREATE INDEX batchheader_ix_merchantid_idx ON batchheader USING btree (merchantid);
CREATE INDEX batchheader_ix_merchref_idx ON batchheader USING btree (merchref);
CREATE INDEX batchheader_ix_netdeposit_idx ON batchheader USING btree (netdeposit);

And here's batchdetail too, just for kicks.  23 million records.
CREATE TABLE public.batchdetail (
  batchdetailid int8 DEFAULT nextval('public.batchdetail_batchdetailid_seq'::text) NOT NULL, 
  line int4, 
  batchid int4, 
  merchno varchar(16), 
  assocno varchar(6), 
  refnumber char(23), 
  trandate timestamp, 
  tranamount numeric(18, 4), 
  netdeposit numeric(18, 4), 
  cardnocfb bytea, 
  bestinterchange char(2), 
  submitinterchange char(2), 
  downgrader1 char(4), 
  downgrader2 char(4), 
  downgrader3_1 char(1), 
  downgrader3_2 char(1), 
  downgrader3_3 char(1), 
  downgrader3_4 char(1), 
  downgrader3_5 char(1), 
  downgrader3_6 char(1), 
  downgrader3_7 char(1), 
  onlineentry char(1), 
  achflag char(1), 
  authsource char(1), 
  cardholderidmeth char(1), 
  catindicator char(1), 
  reimbattribute char(1), 
  motoindicator char(1), 
  authcharind char(1), 
  banknetrefno char(9), 
  banknetauthdate char(6), 
  draftaflag char(1), 
  authcurrencycode char(3), 
  authamount numeric(18, 4), 
  validcode char(4), 
  authresponsecode char(2), 
  debitnetworkid char(3), 
  switchsetindicator char(1), 
  posentrymode char(2), 
  debitcredit char(1), 
  reversalflag char(1), 
  merchantname varchar(25), 
  authno char(6), 
  rejectreason char(4), 
  cardtypeid int4, 
  currencycode char(3), 
  origtranamount numeric(18, 4), 
  foreigncard char(1), 
  carryover char(1), 
  extensionrecord char(2), 
  mcccode char(4), 
  terminalid char(8), 
  submitinterchange3b char(3), 
  purchaseid varchar(25), 
  trancode char(4), 
  CONSTRAINT batchdetail_pkey PRIMARY KEY (batchdetailid)
) WITH OIDS;
CREATE INDEX batchdetail_ix_authno_idx ON batchdetail USING btree (authno);
CREATE INDEX batchdetail_ix_batchdetailid_idx ON batchdetail USING btree (batchdetailid);
CREATE INDEX batchdetail_ix_cardnocfb_idx ON batchdetail USING btree (cardnocfb);
CREATE INDEX batchdetail_ix_posentrymode_idx ON batchdetail USING btree (posentrymode);
CREATE INDEX batchdetail_ix_submitinterchange3b_idx ON batchdetail USING btree (submitinterchange3b);
CREATE INDEX batchdetail_ix_tranamount_idx ON batchdetail USING btree (tranamount);

Roman Fail
Sr. Web Application Developer
POS Portal, Inc.
Sacramento, CA

Re: 7.3.1 New install, large queries are slow

От

Kevin Brown

Дата:

16 января 2003 г., 01:09:39

Roman Fail wrote:
> I am trying to get a PostgreSQL server into production (moving from
> MSSQL2K) but having some serious performance issues.  PostgreSQL is
> new to me, and I'm only just now comfortable with Linux.  So far
> I've succesfully compiled postgres from source and migrated all the
> data from MSSQL.  Postgres is primarily accessed using JDBC.

[...]

> POSTGRESQL SYSTEM:
> Red Hat Linux 8.0, PostgreSQL 7.3.1 (dedicated, besides SSH daemon)
> Dell PE6600 Dual Xeon MP 2.0GHz, 2MB L3 cache,HyperThreading enabled
> 4.0 GB Physical RAM
> /dev/sda1: ext3 101MB /boot
> /dev/sda2: ext3  34GB /      (sda is 2 disk RAID-1)
> none     : swap 1.8GB
> /dev/sdb1: ext3 104GB /usr/local/pgsql/data (sdb is 6 disk RAID-10)
> All 8 drives are 36GB, 15k RPM, Ultra160 SCSI
> PERC3/DC 128MB RAID controller

Ext3, huh?  Ext3 is a journalling filesystem that is capable of
journalling data as well as metadata.  But if you mount it such that
it journals data, writes will be significantly slower.

The default for ext3 is to do ordered writes: data is written before
the associated metadata transaction commits, but the data itself isn't
journalled.  But because PostgreSQL synchronously writes the
transaction log (using fsync() by default, if I'm not mistaken) and
uses sync() during a savepoint, I would think that ordered writes at
the filesystem level would probably buy you very little in the way of
additional data integrity in the event of a crash.

So if I'm right about that, then you might consider using the
"data=writeback" option to ext3 on the /usr/local/pgsql/data
filesystem.  I'd recommend the default ("data=ordered") for everything
else.

That said, I doubt the above change will make the orders of magnitude
difference you're looking for.  But every little bit helps...

You might also consider experimenting with different filesystems, but
others here may be able to chime in with better information on that.

People, please correct me if I'm wrong in my analysis of PostgreSQL on
ext3 above.  If the database on an ext3 filesystem mounted in
writeback mode is subject to corruption upon a crash despite the
efforts PostgreSQL makes to keep things sane, then writeback mode
shouldn't be used!  And clearly it shouldn't be used if it doesn't
make a significant performance difference.

--
Kevin Brown                          kevin@sysexperts.com

Re: 7.3.1 New install, large queries are slow

От

Stephan Szabo

Дата:

16 января 2003 г., 01:44:59

>  So here's the query, and another EXPLAIN ANALYZE to go with it
> (executed after all setting changes).  The same result columns and
> JOINS are performed all day with variations on the WHERE clause; other
> possible search columns are the ones that are indexed (see below).
> The 4 tables that use LEFT JOIN only sometimes have matching records,
> hence the OUTER join.
>
> EXPLAIN ANALYZE
> SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate, d.tranamount,
> d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode,
> m.name AS merchantname, c.cardtype, m.merchid,
> p1.localtaxamount, p1.productidentifier, dr.avsresponse,
> cr.checkoutdate, cr.noshowindicator, ck.checkingacctno,
> ck.abaroutingno, ck.checkno
> FROM tranheader t
> INNER JOIN batchheader b ON t.tranheaderid = b.tranheaderid
> INNER JOIN merchants m ON m.merchantid = b.merchantid
> INNER JOIN batchdetail d ON d.batchid = b.batchid
> INNER JOIN cardtype c ON d.cardtypeid = c.cardtypeid
> LEFT JOIN purc1 p1 ON p1.batchdetailid = d.batchdetailid
> LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid
> LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid
> LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid
> WHERE t.clientid = 6
> AND d.tranamount BETWEEN 500.0 AND 700.0
> AND b.batchdate > '2002-12-15'
> AND m.merchid = '701252267'
> ORDER BY b.batchdate DESC
> LIMIT 50

Well, you might get a little help by replace the from with
 something like:

FROM transheader t, batchheader b, merchants m, cardtype c,
batchdetail d
LEFT JOIN purc1 p1 on p1.batchdetailid=d.batchdetailid
LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid
LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid
LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid

and adding
AND t.tranheaderid=b.tranheaderid
AND m.merchantid=b.merchantid
AND d.batchid=b.batchid
AND c.cardtypeid=d.cardtypeid
to the WHERE conditions.

That should at least allow it to do some small reordering
of the joins.  I don't think that alone is going to do much,
since most of the time seems to be on the scan of d.

What does vacuum verbose batchdetail give you (it'll give
an idea of pages anyway)

Re: 7.3.1 New install, large queries are slow

От

Stephan Szabo

Дата:

16 января 2003 г., 01:46:23

On Wed, 15 Jan 2003, Roman Fail wrote:

> Thanks to everyone for the quick replies!  I'm sure that my lack of
> skill with SQL queries is the main problem.  What's strange to me is
> how MSSQL takes my bad queries and makes them look good anyway.  It
> must have a real smart planner.

As a followup, if you do
set enable_indexscan=off;
before running the explain analyze, what does that give you?

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

16 января 2003 г., 02:35:24

"Roman Fail" <rfail@posportal.com> writes:
> Thanks to everyone for the quick replies!  I'm sure that my lack of
> skill with SQL queries is the main problem.  What's strange to me is
> how MSSQL takes my bad queries and makes them look good anyway.  It
> must have a real smart planner.

I think more likely the issue is that your use of JOIN syntax is forcing
Postgres into a bad plan.  MSSQL probably doesn't assign any semantic
significance to the use of "a JOIN b" syntax as opposed to "FROM a, b"
syntax.  Postgres does.  Whether this is a bug or a feature depends on
your point of view --- but there are folks out there who find it to be
a life-saver.  You can find some explanations at
http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/explicit-joins.html

> Is it pretty much universally accepted that I should drop all my
> foreign keys?

No.  They don't have any effect on SELECT performance in Postgres.
They will impact update speed, but that's not your complaint (at the
moment).  Don't throw away data integrity protection until you know
you need to.

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

"Josh Berkus"

Дата:

16 января 2003 г., 02:41:56

Tom, Roman,

> I think more likely the issue is that your use of JOIN syntax is
> forcing
> Postgres into a bad plan.  MSSQL probably doesn't assign any semantic
> significance to the use of "a JOIN b" syntax as opposed to "FROM a,
> b"
> syntax.

That's correct.  MSSQL will reorder equijoins, even when explicitly
declared.

Hey, Roman, how many records in BatchDetail, anyway?

Josh Berkus

Re: 7.3.1 New install, large queries are slow

От

Kevin Brown

Дата:

16 января 2003 г., 02:48:43

Tom Lane wrote:
> "Roman Fail" <rfail@posportal.com> writes:
> > Thanks to everyone for the quick replies!  I'm sure that my lack of
> > skill with SQL queries is the main problem.  What's strange to me is
> > how MSSQL takes my bad queries and makes them look good anyway.  It
> > must have a real smart planner.
>
> I think more likely the issue is that your use of JOIN syntax is forcing
> Postgres into a bad plan.  MSSQL probably doesn't assign any semantic
> significance to the use of "a JOIN b" syntax as opposed to "FROM a, b"
> syntax.  Postgres does.  Whether this is a bug or a feature depends on
> your point of view --- but there are folks out there who find it to be
> a life-saver.

Since it *does* depend on one's point of view, would it be possible to
have control over this implemented in a session-defined variable (with
the default in the GUC, of course)?  I wouldn't be surprised if a lot
of people get bitten by this.


--
Kevin Brown                          kevin@sysexperts.com

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

16 января 2003 г., 03:07:52

Kevin Brown <kevin@sysexperts.com> writes:
> Tom Lane wrote:
>> ... Whether this is a bug or a feature depends on
>> your point of view --- but there are folks out there who find it to be
>> a life-saver.

> Since it *does* depend on one's point of view, would it be possible to
> have control over this implemented in a session-defined variable (with
> the default in the GUC, of course)?

I have no objection to doing that --- anyone care to contribute code to
make it happen?  (I think the trick would be to fold plain-JOIN jointree
entries into FROM-list items in planner.c, somewhere near the code that
hoists sub-SELECTs into the main join tree.  But I haven't tried it, and
have no time to in the near future.)

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

"Roman Fail"

Дата:

16 января 2003 г., 07:03:51

***********************
> Josh Berkus wrote:
> Hey, Roman, how many records in BatchDetail, anyway?

23 million.
 
***********************
> Stephan Szabo wrote:
> What does vacuum verbose batchdetail give you (it'll give an idea of pages anyway)

trans=# VACUUM VERBOSE batchdetail;
INFO:  --Relation public.batchdetail--
INFO:  Pages 1669047: Changed 0, Empty 0; Tup 23316674: Vac 0, Keep 0, UnUsed 0.
        Total CPU 85.36s/9.38u sec elapsed 253.38 sec.
INFO:  --Relation pg_toast.pg_toast_8604247--
INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
trans=#

***********************
At Stephan Szabo and Tom Lane's suggestion, I reorganized the query so the JOIN syntax was only used in the outer
joins. This did not seem to help at all.  Of note: during this query 'sar -b' showed a consistent 6000 blocks read/sec,
CPUwas about 2%.
 
 
EXPLAIN ANALYZE
SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate, d.tranamount,
d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode,
m.name AS merchantname, c.cardtype, m.merchid,
p1.localtaxamount, p1.productidentifier, dr.avsresponse,
cr.checkoutdate, cr.noshowindicator, ck.checkingacctno,
ck.abaroutingno, ck.checkno
FROM tranheader t, batchheader b, merchants m, cardtype c, batchdetail d
LEFT JOIN purc1 p1 on p1.batchdetailid=d.batchdetailid
LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid
LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid
LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid
WHERE t.tranheaderid=b.tranheaderid
AND m.merchantid=b.merchantid
AND d.batchid=b.batchid
AND c.cardtypeid=d.cardtypeid
AND t.clientid = 6
AND d.tranamount BETWEEN 500.0 AND 700.0
AND b.batchdate > '2002-12-15'
AND m.merchid = '701252267'
ORDER BY b.batchdate DESC
LIMIT 50
Limit  (cost=1789105.21..1789105.22 rows=1 width=285) (actual time=1222029.59..1222029.61 rows=5 loops=1)
  ->  Sort  (cost=1789105.21..1789105.22 rows=1 width=285) (actual time=1222029.58..1222029.59 rows=5 loops=1)
        Sort Key: b.batchdate
        ->  Nested Loop  (cost=1787171.22..1789105.20 rows=1 width=285) (actual time=1221815.14..1222019.46 rows=5
loops=1)
              Join Filter: ("inner".tranheaderid = "outer".tranheaderid)
              ->  Nested Loop  (cost=1787171.22..1789026.02 rows=1 width=269) (actual time=1221809.33..1221978.62
rows=5loops=1)
 
                    Join Filter: ("inner".cardtypeid = "outer".cardtypeid)
                    ->  Merge Join  (cost=1787171.22..1789024.79 rows=1 width=255) (actual time=1221802.47..1221971.48
rows=5loops=1)
 
                          Merge Cond: ("outer".batchid = "inner".batchid)
                          ->  Sort  (cost=476.17..476.18 rows=4 width=102) (actual time=678.05..678.07 rows=17
loops=1)
                                Sort Key: b.batchid
                                ->  Nested Loop  (cost=0.00..476.14 rows=4 width=102) (actual time=161.62..677.95
rows=17loops=1)
 
                                      ->  Index Scan using merchants_ix_merchid_idx on merchants m  (cost=0.00..5.65
rows=1width=78) (actual time=13.87..13.88 rows=1 loops=1)
 
                                            Index Cond: (merchid = '701252267'::character varying)
                                      ->  Index Scan using batchheader_ix_merchantid_idx on batchheader b
(cost=0.00..470.30rows=15 width=24) (actual time=147.72..663.94 rows=17 loops=1)
 
                                            Index Cond: ("outer".merchantid = b.merchantid)
                                            Filter: (batchdate > '2002-12-15 00:00:00'::timestamp without time zone)
                          ->  Sort  (cost=1786695.05..1787621.82 rows=370710 width=153) (actual
time=1220080.34..1220722.19rows=368681 loops=1)
 
                                Sort Key: d.batchid
                                ->  Merge Join  (cost=1704191.25..1713674.49 rows=370710 width=153) (actual
time=1200184.91..1213352.77rows=370307 loops=1)
 
                                      Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
                                      ->  Merge Join  (cost=1704085.28..1712678.33 rows=370710 width=115) (actual
time=1199705.71..1210336.37rows=370307 loops=1)
 
                                            Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
                                            ->  Merge Join  (cost=1704085.27..1711751.54 rows=370710 width=98) (actual
time=1199705.65..1208122.73rows=370307 loops=1)
 
                                                  Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
                                                  ->  Merge Join  (cost=1704085.26..1710824.75 rows=370710 width=89)
(actualtime=1199705.55..1205977.76 rows=370307 loops=1)
 
                                                        Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
                                                        ->  Sort  (cost=1543119.01..1544045.79 rows=370710 width=70)
(actualtime=1181172.79..1181902.77 rows=370307 loops=1)
 
                                                              Sort Key: d.batchdetailid
                                                              ->  Index Scan using batchdetail_ix_tranamount_idx on
batchdetaild  (cost=0.00..1489103.46 rows=370710 width=70) (actual time=14.45..1176074.90 rows=370307 loops=1)
 
                                                                    Index Cond: ((tranamount >= 500.0) AND (tranamount
<=700.0))
 
                                                        ->  Sort  (cost=160966.25..163319.59 rows=941335 width=19)
(actualtime=18532.70..20074.09 rows=938770 loops=1)
 
                                                              Sort Key: p1.batchdetailid
                                                              ->  Seq Scan on purc1 p1  (cost=0.00..44285.35
rows=941335width=19) (actual time=9.44..9119.83 rows=938770 loops=1)
 
                                                  ->  Sort  (cost=0.01..0.02 rows=1 width=9) (actual time=0.08..0.08
rows=0loops=1)
 
                                                        Sort Key: dr.batchdetailid
                                                        ->  Seq Scan on direct dr  (cost=0.00..0.00 rows=1 width=9)
(actualtime=0.01..0.01 rows=0 loops=1)
 
                                            ->  Sort  (cost=0.01..0.02 rows=1 width=17) (actual time=0.04..0.04 rows=0
loops=1)
                                                  Sort Key: cr.batchdetailid
                                                  ->  Seq Scan on carrental cr  (cost=0.00..0.00 rows=1 width=17)
(actualtime=0.00..0.00 rows=0 loops=1)
 
                                      ->  Sort  (cost=105.97..109.13 rows=1267 width=38) (actual time=479.17..480.74
rows=1267loops=1)
 
                                            Sort Key: ck.batchdetailid
                                            ->  Seq Scan on checks ck  (cost=0.00..40.67 rows=1267 width=38) (actual
time=447.88..475.60rows=1267 loops=1)
 
                    ->  Seq Scan on cardtype c  (cost=0.00..1.10 rows=10 width=14) (actual time=1.37..1.39 rows=10
loops=5)
              ->  Seq Scan on tranheader t  (cost=0.00..55.15 rows=1923 width=16) (actual time=0.01..5.14 rows=1923
loops=5)
                    Filter: (clientid = 6)
Total runtime: 1222157.28 msec

***********************
Just to see what would happen, I executed:
     ALTER TABLE batchdetail ALTER COLUMN tranamount SET STATISTICS 1000;
     ANALYZE;
It seemed to hurt performance if anything.  But the EXPLAIN estimate for rows was much closer to the real value than it
waspreviously.
 
 
***********************
It seems to me that the big, big isolated problem is the index scan on batchdetail.tranamount.  During this small
query,'sar -b' showed consistent 90,000 block reads/sec. (contrast with only 6,000 with larger query index scan).
'top'shows the CPU is at 20% user, 30% system the whole time (contrast with 2% total in larger query above).  This
resultshere still seem pretty bad (although not as bad as above), but I still don't know what is the bottleneck.  And
thestrange sar stats are confusing me.
 
 
EXPLAIN ANALYZE SELECT * FROM batchdetail WHERE tranamount BETWEEN 300 AND 499;
Seq Scan on batchdetail  (cost=0.00..2018797.11 rows=783291 width=440) (actual time=45.66..283926.58 rows=783687
loops=1)
  Filter: ((tranamount >= 300::numeric) AND (tranamount <= 499::numeric))
Total runtime: 285032.47 msec

 
***********************
> Stephan Szabo wrote:
> As a followup, if you do set enable_indexscan=off;
> before running the explain analyze, what does that give you?

Now this is very interesting: 'sar -b' shows about 95,000 block reads/sec; CPU is at 20% user 30% system, vmstat shows
noswapping, query takes only 5 minutes to execute (which is one-quarter of the time WITH the index scan!!!!).
Obviouslythe execution plan is pretty different on this one (query is identical the larger one above).
 
 
EXPLAIN ANALYZE
SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate, d.tranamount,
d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode,
m.name AS merchantname, c.cardtype, m.merchid,
p1.localtaxamount, p1.productidentifier, dr.avsresponse,
cr.checkoutdate, cr.noshowindicator, ck.checkingacctno,
ck.abaroutingno, ck.checkno
FROM tranheader t, batchheader b, merchants m, cardtype c,
batchdetail d
LEFT JOIN purc1 p1 on p1.batchdetailid=d.batchdetailid
LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid
LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid
LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid
WHERE t.tranheaderid=b.tranheaderid
AND m.merchantid=b.merchantid
AND d.batchid=b.batchid
AND c.cardtypeid=d.cardtypeid
AND t.clientid = 6
AND d.tranamount BETWEEN 500.0 AND 700.0
AND b.batchdate > '2002-12-15'
AND m.merchid = '701252267'
ORDER BY b.batchdate DESC
LIMIT 50
Limit  (cost=2321460.56..2321460.57 rows=1 width=285) (actual time=308194.57..308194.59 rows=5 loops=1)
  ->  Sort  (cost=2321460.56..2321460.57 rows=1 width=285) (actual time=308194.57..308194.58 rows=5 loops=1)
        Sort Key: b.batchdate
        ->  Nested Loop  (cost=2319526.57..2321460.55 rows=1 width=285) (actual time=307988.56..308194.46 rows=5
loops=1)
              Join Filter: ("inner".tranheaderid = "outer".tranheaderid)
              ->  Nested Loop  (cost=2319526.57..2321381.37 rows=1 width=269) (actual time=307982.80..308153.22 rows=5
loops=1)
                    Join Filter: ("inner".cardtypeid = "outer".cardtypeid)
                    ->  Merge Join  (cost=2319526.57..2321380.14 rows=1 width=255) (actual time=307982.69..308152.82
rows=5loops=1)
 
                          Merge Cond: ("outer".batchid = "inner".batchid)
                          ->  Sort  (cost=2316388.70..2317315.47 rows=370710 width=153) (actual
time=305976.74..306622.88rows=368681 loops=1)
 
                                Sort Key: d.batchid
                                ->  Merge Join  (cost=2233884.90..2243368.15 rows=370710 width=153) (actual
time=286452.12..299485.43rows=370307 loops=1)
 
                                      Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
                                      ->  Merge Join  (cost=2233778.93..2242371.98 rows=370710 width=115) (actual
time=286428.77..296939.66rows=370307 loops=1)
 
                                            Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
                                            ->  Merge Join  (cost=2233778.92..2241445.19 rows=370710 width=98) (actual
time=286428.72..294750.01rows=370307 loops=1)
 
                                                  Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
                                                  ->  Merge Join  (cost=2233778.91..2240518.40 rows=370710 width=89)
(actualtime=286428.60..292606.56 rows=370307 loops=1)
 
                                                        Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
                                                        ->  Sort  (cost=2072812.66..2073739.44 rows=370710 width=70)
(actualtime=269738.34..270470.83 rows=370307 loops=1)
 
                                                              Sort Key: d.batchdetailid
                                                              ->  Seq Scan on batchdetail d  (cost=0.00..2018797.11
rows=370710width=70) (actual time=41.66..266568.83 rows=370307 loops=1)
 
                                                                    Filter: ((tranamount >= 500.0) AND (tranamount <=
700.0))
                                                        ->  Sort  (cost=160966.25..163319.59 rows=941335 width=19)
(actualtime=16690.20..18202.65 rows=938770 loops=1)
 
                                                              Sort Key: p1.batchdetailid
                                                              ->  Seq Scan on purc1 p1  (cost=0.00..44285.35
rows=941335width=19) (actual time=6.88..7779.31 rows=938770 loops=1)
 
                                                  ->  Sort  (cost=0.01..0.02 rows=1 width=9) (actual time=0.10..0.10
rows=0loops=1)
 
                                                        Sort Key: dr.batchdetailid
                                                        ->  Seq Scan on direct dr  (cost=0.00..0.00 rows=1 width=9)
(actualtime=0.00..0.00 rows=0 loops=1)
 
                                            ->  Sort  (cost=0.01..0.02 rows=1 width=17) (actual time=0.03..0.03 rows=0
loops=1)
                                                  Sort Key: cr.batchdetailid
                                                  ->  Seq Scan on carrental cr  (cost=0.00..0.00 rows=1 width=17)
(actualtime=0.00..0.00 rows=0 loops=1)
 
                                      ->  Sort  (cost=105.97..109.13 rows=1267 width=38) (actual time=23.32..24.89
rows=1267loops=1)
 
                                            Sort Key: ck.batchdetailid
                                            ->  Seq Scan on checks ck  (cost=0.00..40.67 rows=1267 width=38) (actual
time=6.51..19.59rows=1267 loops=1)
 
                          ->  Sort  (cost=3137.87..3137.88 rows=4 width=102) (actual time=954.18..954.20 rows=19
loops=1)
                                Sort Key: b.batchid
                                ->  Nested Loop  (cost=0.00..3137.84 rows=4 width=102) (actual time=236.26..954.04
rows=17loops=1)
 
                                      ->  Seq Scan on merchants m  (cost=0.00..2667.35 rows=1 width=78) (actual
time=2.48..227.71rows=1 loops=1)
 
                                            Filter: (merchid = '701252267'::character varying)
                                      ->  Index Scan using batchheader_ix_merchantid_idx on batchheader b
(cost=0.00..470.30rows=15 width=24) (actual time=233.75..726.22 rows=17 loops=1)
 
                                            Index Cond: ("outer".merchantid = b.merchantid)
                                            Filter: (batchdate > '2002-12-15 00:00:00'::timestamp without time zone)
                    ->  Seq Scan on cardtype c  (cost=0.00..1.10 rows=10 width=14) (actual time=0.02..0.04 rows=10
loops=5)
              ->  Seq Scan on tranheader t  (cost=0.00..55.15 rows=1923 width=16) (actual time=0.01..5.21 rows=1923
loops=5)
                    Filter: (clientid = 6)
Total runtime: 308323.60 msec

***********************
I hope we can come up with something soon.....it seems this index scan is a big part of the problem.  I'm still really
curiouswhy the disk reads are so few with the index scan.  Let's hope I can get it near the 3 second time for MSSQL by
Friday!
 
Roman Fail

Re: 7.3.1 New install, large queries are slow

От

Tomasz Myrta

Дата:

16 января 2003 г., 07:55:39

Roman Fail wrote:
> The same result columns and JOINS are performed all day with variations on the WHERE clause;
Are there any where clauses which all of theses variation have?
If yes - query can be reordered to contain explicit joins for these clauses and
to let Postgres to find best solution for other joins.

I know, it is not best solution, but sometimes I prefer finding best join order by myself.
I create then several views returning the same values, but manualy ordered for specific where clauses.

Tomasz Myrta

Re: 7.3.1 New install, large queries are slow

От

Kevin Brown

Дата:

16 января 2003 г., 08:40:23

Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > Tom Lane wrote:
> >> ... Whether this is a bug or a feature depends on
> >> your point of view --- but there are folks out there who find it to be
> >> a life-saver.
>
> > Since it *does* depend on one's point of view, would it be possible to
> > have control over this implemented in a session-defined variable (with
> > the default in the GUC, of course)?
>
> I have no objection to doing that --- anyone care to contribute code to
> make it happen?  (I think the trick would be to fold plain-JOIN jointree
> entries into FROM-list items in planner.c, somewhere near the code that
> hoists sub-SELECTs into the main join tree.  But I haven't tried it, and
> have no time to in the near future.)

I'm looking at the code now (the 7.2.3 code in particular, but I
suspect for this purpose the code is likely to be very similar to the
CVS tip), but it's all completely new to me and the developer
documentation isn't very revealing of the internals.  The optimizer
code (I've been looking especially at make_jointree_rel() and
make_fromexpr_rel()) looks a bit tricky...it'll take me some time to
completely wrap my brain around it.  Any pointers to revealing
documentation would be quite helpful!

--
Kevin Brown                          kevin@sysexperts.com

Re: 7.3.1 New install, large queries are slow

От

Hannu Krosing

Дата:

16 января 2003 г., 09:05:14

On Thu, 2003-01-16 at 03:40, Stephan Szabo wrote:
> >  So here's the query, and another EXPLAIN ANALYZE to go with it
> > (executed after all setting changes).  The same result columns and
> > JOINS are performed all day with variations on the WHERE clause; other
> > possible search columns are the ones that are indexed (see below).
> > The 4 tables that use LEFT JOIN only sometimes have matching records,
> > hence the OUTER join.
> >
> > EXPLAIN ANALYZE
> > SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate, d.tranamount,
> > d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode,
> > m.name AS merchantname, c.cardtype, m.merchid,
> > p1.localtaxamount, p1.productidentifier, dr.avsresponse,
> > cr.checkoutdate, cr.noshowindicator, ck.checkingacctno,
> > ck.abaroutingno, ck.checkno
> > FROM tranheader t
> > INNER JOIN batchheader b ON t.tranheaderid = b.tranheaderid
> > INNER JOIN merchants m ON m.merchantid = b.merchantid
> > INNER JOIN batchdetail d ON d.batchid = b.batchid
> > INNER JOIN cardtype c ON d.cardtypeid = c.cardtypeid
> > LEFT JOIN purc1 p1 ON p1.batchdetailid = d.batchdetailid
> > LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid
> > LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid
> > LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid
> > WHERE t.clientid = 6
> > AND d.tranamount BETWEEN 500.0 AND 700.0

How much of data in d has tranamount BETWEEN 500.0 AND 700.0 ?

Do you have an index on d.tranamount ?

> > AND b.batchdate > '2002-12-15'

again - how much of b.batchdate > '2002-12-15' ?

is there an index

> > AND m.merchid = '701252267'

ditto

> > ORDER BY b.batchdate DESC
> > LIMIT 50

these two together make me think that perhaps

b.batchdate between  '2003-12-12' and '2002-12-15'

could be better at making the optimiser see that reverse index scan on
b.batchdate would be the way to go.

> Well, you might get a little help by replace the from with

--
Hannu Krosing <hannu@tm.ee>

Re: 7.3.1 New install, large queries are slow

От

Ron Johnson

Дата:

16 января 2003 г., 10:29:55

On Thu, 2003-01-16 at 03:03, Roman Fail wrote:
> ***********************
> > Josh Berkus wrote:
> > Hey, Roman, how many records in BatchDetail, anyway?
>
> 23 million.

What are the indexes on batchdetail?

There's one on batchid and a seperate one on tranamount?

If so, what about dropping them and create a single multi-segment
index on "batchid, tranamount".  (A constraint can then enforce
uniqueness on batchid.

> ***********************
> > Stephan Szabo wrote:
> > What does vacuum verbose batchdetail give you (it'll give an idea of pages anyway)
>
> trans=# VACUUM VERBOSE batchdetail;
> INFO:  --Relation public.batchdetail--
> INFO:  Pages 1669047: Changed 0, Empty 0; Tup 23316674: Vac 0, Keep 0, UnUsed 0.
>         Total CPU 85.36s/9.38u sec elapsed 253.38 sec.
> INFO:  --Relation pg_toast.pg_toast_8604247--
> INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
>         Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
> VACUUM
> trans=#
>
> ***********************
> At Stephan Szabo and Tom Lane's suggestion, I reorganized the query
> so the JOIN syntax was only used in the outer joins.  This did not
> seem to help at all.  Of note: during this query 'sar -b' showed a
> consistent 6000 blocks read/sec, CPU was about 2%.
>
> EXPLAIN ANALYZE
> SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate, d.tranamount,
> d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode,
> m.name AS merchantname, c.cardtype, m.merchid,
> p1.localtaxamount, p1.productidentifier, dr.avsresponse,
> cr.checkoutdate, cr.noshowindicator, ck.checkingacctno,
> ck.abaroutingno, ck.checkno
> FROM tranheader t, batchheader b, merchants m, cardtype c, batchdetail d
> LEFT JOIN purc1 p1 on p1.batchdetailid=d.batchdetailid
> LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid
> LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid
> LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid
> WHERE t.tranheaderid=b.tranheaderid
> AND m.merchantid=b.merchantid
> AND d.batchid=b.batchid
> AND c.cardtypeid=d.cardtypeid
> AND t.clientid = 6
> AND d.tranamount BETWEEN 500.0 AND 700.0
> AND b.batchdate > '2002-12-15'
> AND m.merchid = '701252267'
> ORDER BY b.batchdate DESC
> LIMIT 50
> Limit  (cost=1789105.21..1789105.22 rows=1 width=285) (actual time=1222029.59..1222029.61 rows=5 loops=1)
>   ->  Sort  (cost=1789105.21..1789105.22 rows=1 width=285) (actual time=1222029.58..1222029.59 rows=5 loops=1)
>         Sort Key: b.batchdate
>         ->  Nested Loop  (cost=1787171.22..1789105.20 rows=1 width=285) (actual time=1221815.14..1222019.46 rows=5
loops=1)
>               Join Filter: ("inner".tranheaderid = "outer".tranheaderid)
>               ->  Nested Loop  (cost=1787171.22..1789026.02 rows=1 width=269) (actual time=1221809.33..1221978.62
rows=5loops=1) 
>                     Join Filter: ("inner".cardtypeid = "outer".cardtypeid)
>                     ->  Merge Join  (cost=1787171.22..1789024.79 rows=1 width=255) (actual
time=1221802.47..1221971.48rows=5 loops=1) 
>                           Merge Cond: ("outer".batchid = "inner".batchid)
>                           ->  Sort  (cost=476.17..476.18 rows=4 width=102) (actual time=678.05..678.07 rows=17
loops=1)
>                                 Sort Key: b.batchid
>                                 ->  Nested Loop  (cost=0.00..476.14 rows=4 width=102) (actual time=161.62..677.95
rows=17loops=1) 
>                                       ->  Index Scan using merchants_ix_merchid_idx on merchants m  (cost=0.00..5.65
rows=1width=78) (actual time=13.87..13.88 rows=1 loops=1) 
>                                             Index Cond: (merchid = '701252267'::character varying)
>                                       ->  Index Scan using batchheader_ix_merchantid_idx on batchheader b
(cost=0.00..470.30rows=15 width=24) (actual time=147.72..663.94 rows=17 loops=1) 
>                                             Index Cond: ("outer".merchantid = b.merchantid)
>                                             Filter: (batchdate > '2002-12-15 00:00:00'::timestamp without time zone)
>                           ->  Sort  (cost=1786695.05..1787621.82 rows=370710 width=153) (actual
time=1220080.34..1220722.19rows=368681 loops=1) 
>                                 Sort Key: d.batchid
>                                 ->  Merge Join  (cost=1704191.25..1713674.49 rows=370710 width=153) (actual
time=1200184.91..1213352.77rows=370307 loops=1) 
>                                       Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
>                                       ->  Merge Join  (cost=1704085.28..1712678.33 rows=370710 width=115) (actual
time=1199705.71..1210336.37rows=370307 loops=1) 
>                                             Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
>                                             ->  Merge Join  (cost=1704085.27..1711751.54 rows=370710 width=98)
(actualtime=1199705.65..1208122.73 rows=370307 loops=1) 
>                                                   Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
>                                                   ->  Merge Join  (cost=1704085.26..1710824.75 rows=370710 width=89)
(actualtime=1199705.55..1205977.76 rows=370307 loops=1) 
>                                                         Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
>                                                         ->  Sort  (cost=1543119.01..1544045.79 rows=370710 width=70)
(actualtime=1181172.79..1181902.77 rows=370307 loops=1) 
>                                                               Sort Key: d.batchdetailid
>                                                               ->  Index Scan using batchdetail_ix_tranamount_idx on
batchdetaild  (cost=0.00..1489103.46 rows=370710 width=70) (actual time=14.45..1176074.90 rows=370307 loops=1) 
>                                                                     Index Cond: ((tranamount >= 500.0) AND
(tranamount<= 700.0)) 
>                                                         ->  Sort  (cost=160966.25..163319.59 rows=941335 width=19)
(actualtime=18532.70..20074.09 rows=938770 loops=1) 
>                                                               Sort Key: p1.batchdetailid
>                                                               ->  Seq Scan on purc1 p1  (cost=0.00..44285.35
rows=941335width=19) (actual time=9.44..9119.83 rows=938770 loops=1) 
>                                                   ->  Sort  (cost=0.01..0.02 rows=1 width=9) (actual time=0.08..0.08
rows=0loops=1) 
>                                                         Sort Key: dr.batchdetailid
>                                                         ->  Seq Scan on direct dr  (cost=0.00..0.00 rows=1 width=9)
(actualtime=0.01..0.01 rows=0 loops=1) 
>                                             ->  Sort  (cost=0.01..0.02 rows=1 width=17) (actual time=0.04..0.04
rows=0loops=1) 
>                                                   Sort Key: cr.batchdetailid
>                                                   ->  Seq Scan on carrental cr  (cost=0.00..0.00 rows=1 width=17)
(actualtime=0.00..0.00 rows=0 loops=1) 
>                                       ->  Sort  (cost=105.97..109.13 rows=1267 width=38) (actual time=479.17..480.74
rows=1267loops=1) 
>                                             Sort Key: ck.batchdetailid
>                                             ->  Seq Scan on checks ck  (cost=0.00..40.67 rows=1267 width=38) (actual
time=447.88..475.60rows=1267 loops=1) 
>                     ->  Seq Scan on cardtype c  (cost=0.00..1.10 rows=10 width=14) (actual time=1.37..1.39 rows=10
loops=5)
>               ->  Seq Scan on tranheader t  (cost=0.00..55.15 rows=1923 width=16) (actual time=0.01..5.14 rows=1923
loops=5)
>                     Filter: (clientid = 6)
> Total runtime: 1222157.28 msec
>
> ***********************
> Just to see what would happen, I executed:
>      ALTER TABLE batchdetail ALTER COLUMN tranamount SET STATISTICS 1000;
>      ANALYZE;
> It seemed to hurt performance if anything.  But the EXPLAIN estimate
> for rows was much closer to the real value than it was previously.
>
> ***********************
> It seems to me that the big, big isolated problem is the index scan on
> batchdetail.tranamount.  During this small query, 'sar -b' showed
> consistent 90,000 block reads/sec. (contrast with only 6,000 with
> larger query index scan).  'top' shows the CPU is at 20% user, 30%
> system the whole time (contrast with 2% total in larger query above).
> This results here still seem pretty bad (although not as bad as
> above), but I still don't know what is the bottleneck.  And the
> strange sar stats are confusing me.
>
> EXPLAIN ANALYZE SELECT * FROM batchdetail WHERE tranamount BETWEEN 300 AND 499;
> Seq Scan on batchdetail  (cost=0.00..2018797.11 rows=783291 width=440) (actual time=45.66..283926.58 rows=783687
loops=1)
>   Filter: ((tranamount >= 300::numeric) AND (tranamount <= 499::numeric))
> Total runtime: 285032.47 msec
>
>
> ***********************
> > Stephan Szabo wrote:
> > As a followup, if you do set enable_indexscan=off;
> > before running the explain analyze, what does that give you?
>
> Now this is very interesting: 'sar -b' shows about 95,000 block
> reads/sec; CPU is at 20% user 30% system, vmstat shows no swapping,
> query takes only 5 minutes to execute (which is one-quarter of the
> time WITH the index scan!!!!).  Obviously the execution plan is pretty
> different on this one (query is identical the larger one above).
>
> EXPLAIN ANALYZE
> SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate, d.tranamount,
> d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode,
> m.name AS merchantname, c.cardtype, m.merchid,
> p1.localtaxamount, p1.productidentifier, dr.avsresponse,
> cr.checkoutdate, cr.noshowindicator, ck.checkingacctno,
> ck.abaroutingno, ck.checkno
> FROM tranheader t, batchheader b, merchants m, cardtype c,
> batchdetail d
> LEFT JOIN purc1 p1 on p1.batchdetailid=d.batchdetailid
> LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid
> LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid
> LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid
> WHERE t.tranheaderid=b.tranheaderid
> AND m.merchantid=b.merchantid
> AND d.batchid=b.batchid
> AND c.cardtypeid=d.cardtypeid
> AND t.clientid = 6
> AND d.tranamount BETWEEN 500.0 AND 700.0
> AND b.batchdate > '2002-12-15'
> AND m.merchid = '701252267'
> ORDER BY b.batchdate DESC
> LIMIT 50
> Limit  (cost=2321460.56..2321460.57 rows=1 width=285) (actual time=308194.57..308194.59 rows=5 loops=1)
>   ->  Sort  (cost=2321460.56..2321460.57 rows=1 width=285) (actual time=308194.57..308194.58 rows=5 loops=1)
>         Sort Key: b.batchdate
>         ->  Nested Loop  (cost=2319526.57..2321460.55 rows=1 width=285) (actual time=307988.56..308194.46 rows=5
loops=1)
>               Join Filter: ("inner".tranheaderid = "outer".tranheaderid)
>               ->  Nested Loop  (cost=2319526.57..2321381.37 rows=1 width=269) (actual time=307982.80..308153.22
rows=5loops=1) 
>                     Join Filter: ("inner".cardtypeid = "outer".cardtypeid)
>                     ->  Merge Join  (cost=2319526.57..2321380.14 rows=1 width=255) (actual time=307982.69..308152.82
rows=5loops=1) 
>                           Merge Cond: ("outer".batchid = "inner".batchid)
>                           ->  Sort  (cost=2316388.70..2317315.47 rows=370710 width=153) (actual
time=305976.74..306622.88rows=368681 loops=1) 
>                                 Sort Key: d.batchid
>                                 ->  Merge Join  (cost=2233884.90..2243368.15 rows=370710 width=153) (actual
time=286452.12..299485.43rows=370307 loops=1) 
>                                       Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
>                                       ->  Merge Join  (cost=2233778.93..2242371.98 rows=370710 width=115) (actual
time=286428.77..296939.66rows=370307 loops=1) 
>                                             Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
>                                             ->  Merge Join  (cost=2233778.92..2241445.19 rows=370710 width=98)
(actualtime=286428.72..294750.01 rows=370307 loops=1) 
>                                                   Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
>                                                   ->  Merge Join  (cost=2233778.91..2240518.40 rows=370710 width=89)
(actualtime=286428.60..292606.56 rows=370307 loops=1) 
>                                                         Merge Cond: ("outer".batchdetailid = "inner".batchdetailid)
>                                                         ->  Sort  (cost=2072812.66..2073739.44 rows=370710 width=70)
(actualtime=269738.34..270470.83 rows=370307 loops=1) 
>                                                               Sort Key: d.batchdetailid
>                                                               ->  Seq Scan on batchdetail d  (cost=0.00..2018797.11
rows=370710width=70) (actual time=41.66..266568.83 rows=370307 loops=1) 
>                                                                     Filter: ((tranamount >= 500.0) AND (tranamount <=
700.0))
>                                                         ->  Sort  (cost=160966.25..163319.59 rows=941335 width=19)
(actualtime=16690.20..18202.65 rows=938770 loops=1) 
>                                                               Sort Key: p1.batchdetailid
>                                                               ->  Seq Scan on purc1 p1  (cost=0.00..44285.35
rows=941335width=19) (actual time=6.88..7779.31 rows=938770 loops=1) 
>                                                   ->  Sort  (cost=0.01..0.02 rows=1 width=9) (actual time=0.10..0.10
rows=0loops=1) 
>                                                         Sort Key: dr.batchdetailid
>                                                         ->  Seq Scan on direct dr  (cost=0.00..0.00 rows=1 width=9)
(actualtime=0.00..0.00 rows=0 loops=1) 
>                                             ->  Sort  (cost=0.01..0.02 rows=1 width=17) (actual time=0.03..0.03
rows=0loops=1) 
>                                                   Sort Key: cr.batchdetailid
>                                                   ->  Seq Scan on carrental cr  (cost=0.00..0.00 rows=1 width=17)
(actualtime=0.00..0.00 rows=0 loops=1) 
>                                       ->  Sort  (cost=105.97..109.13 rows=1267 width=38) (actual time=23.32..24.89
rows=1267loops=1) 
>                                             Sort Key: ck.batchdetailid
>                                             ->  Seq Scan on checks ck  (cost=0.00..40.67 rows=1267 width=38) (actual
time=6.51..19.59rows=1267 loops=1) 
>                           ->  Sort  (cost=3137.87..3137.88 rows=4 width=102) (actual time=954.18..954.20 rows=19
loops=1)
>                                 Sort Key: b.batchid
>                                 ->  Nested Loop  (cost=0.00..3137.84 rows=4 width=102) (actual time=236.26..954.04
rows=17loops=1) 
>                                       ->  Seq Scan on merchants m  (cost=0.00..2667.35 rows=1 width=78) (actual
time=2.48..227.71rows=1 loops=1) 
>                                             Filter: (merchid = '701252267'::character varying)
>                                       ->  Index Scan using batchheader_ix_merchantid_idx on batchheader b
(cost=0.00..470.30rows=15 width=24) (actual time=233.75..726.22 rows=17 loops=1) 
>                                             Index Cond: ("outer".merchantid = b.merchantid)
>                                             Filter: (batchdate > '2002-12-15 00:00:00'::timestamp without time zone)
>                     ->  Seq Scan on cardtype c  (cost=0.00..1.10 rows=10 width=14) (actual time=0.02..0.04 rows=10
loops=5)
>               ->  Seq Scan on tranheader t  (cost=0.00..55.15 rows=1923 width=16) (actual time=0.01..5.21 rows=1923
loops=5)
>                     Filter: (clientid = 6)
> Total runtime: 308323.60 msec
>
> ***********************
> I hope we can come up with something soon.....it seems this index
> scan is a big part of the problem.  I'm still really curious why the
> disk reads are so few with the index scan.  Let's hope I can get it
> near the 3 second time for MSSQL by Friday!

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "Basically, I got on the plane with a bomb. Basically, I   |
|  tried to ignite it. Basically, yeah, I intended to damage |
|  the plane."                                               |
|    RICHARD REID, who tried to blow up American Airlines    |
|                  Flight 63                                 |
+------------------------------------------------------------+

Re: 7.3.1 New install, large queries are slow

От

Andrew Sullivan

Дата:

16 января 2003 г., 11:17:37

On Wed, Jan 15, 2003 at 03:30:55PM -0800, Roman Fail wrote:
> Thanks to everyone for the quick replies!  I'm sure that my lack of skill with SQL queries is the main problem.
What'sstrange to me is how MSSQL takes my bad queries and makes them look good anyway.  It must have a real smart
planner.
> Andrew Sullivan wrote:
> >First, the performance of foreign keys is flat-out awful in Postgres.
> >I suggest avoiding them if you can.
>
> I don't have any problem getting rid of FKs, especially if it might
> actually help performance.  The nightly data import is well-defined

Sorry, I think I sent this too quickly.  FKs make no difference to
SELECT performance, so if you're not doing updates and the like at
the same time as the SELECTs, there's no advantage.  So you should
leave the FKs in place.

> I think this is the most likely problem.  I've read through Chapter
> 10 of the 7.3 docs, but I still don't feel like I know what would
> be a good order.  How do you learn this stuff anyway?  Trial and
> error?

Sorry, but yes.

A
--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: 7.3.1 New install, large queries are slow

От

Rod Taylor

Дата:

16 января 2003 г., 12:23:29

On Thu, 2003-01-16 at 07:29, Ron Johnson wrote:
> On Thu, 2003-01-16 at 03:03, Roman Fail wrote:
> > ***********************
> > > Josh Berkus wrote:
> > > Hey, Roman, how many records in BatchDetail, anyway?
> >
> > 23 million.
>
> What are the indexes on batchdetail?
>
> There's one on batchid and a seperate one on tranamount?
>
> If so, what about dropping them and create a single multi-segment
> index on "batchid, tranamount".  (A constraint can then enforce
> uniqueness on batchid.

Thats a good step.  Once done, CLUSTER by that index -- might buy 10 to
20% extra.


--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Вложения

signature.asc

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

16 января 2003 г., 13:13:43

"Roman Fail" <rfail@posportal.com> writes:
> SELECT ...
> FROM tranheader t, batchheader b, merchants m, cardtype c, (batchdetail d
> LEFT JOIN purc1 p1 on p1.batchdetailid=d.batchdetailid
> LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid
> LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid
> LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid)
> WHERE t.tranheaderid=b.tranheaderid
> AND m.merchantid=b.merchantid
> AND d.batchid=b.batchid
> AND c.cardtypeid=d.cardtypeid
> AND t.clientid = 6
> AND d.tranamount BETWEEN 500.0 AND 700.0
> AND b.batchdate > '2002-12-15'
> AND m.merchid = '701252267'

No no no ... this is even worse than before.  Your big tables are
batchdetail (d) and purc1 (p1).  What you've got to do is arrange the
computation so that those are trimmed to just the interesting records as
soon as possible.  The constraint on d.tranamount helps, but after that
you proceed to join d to p1 *first*, before any of the other constraints
can be applied.  That's a huge join that you then proceed to throw away
most of, as shown by the row counts in the EXPLAIN output.

Note the parentheses I added above to show how the system interprets
your FROM clause.  Since dr,cr,ck are contributing nothing to
elimination of records, you really want them joined last, not first.

What would probably work better is

SELECT ...
FROM
  (SELECT ...
   FROM tranheader t, batchheader b, merchants m, cardtype c, batchdetail d
   WHERE t.tranheaderid=b.tranheaderid
   AND m.merchantid=b.merchantid
   AND d.batchid=b.batchid
   AND c.cardtypeid=d.cardtypeid
   AND t.clientid = 6
   AND d.tranamount BETWEEN 500.0 AND 700.0
   AND b.batchdate > '2002-12-15'
   AND m.merchid = '701252267') ss
  LEFT JOIN purc1 p1 on p1.batchdetailid=ss.batchdetailid
  LEFT JOIN direct dr ON dr.batchdetailid = ss.batchdetailid
  LEFT JOIN carrental cr ON cr.batchdetailid = ss.batchdetailid
  LEFT JOIN checks ck ON ck.batchdetailid = ss.batchdetailid

which lets the system get the useful restrictions applied before it has
to finish expanding out the star query.  Since cardtype isn't
contributing any restrictions, you might think about moving it into the
LEFT JOIN series too (although I think the planner will choose to join
it last in the subselect, anyway).

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

16 января 2003 г., 13:46:31

Kevin Brown <kevin@sysexperts.com> writes:
> I'm looking at the code now (the 7.2.3 code in particular, but I
> suspect for this purpose the code is likely to be very similar to the
> CVS tip), but it's all completely new to me and the developer
> documentation isn't very revealing of the internals.  The optimizer
> code (I've been looking especially at make_jointree_rel() and
> make_fromexpr_rel()) looks a bit tricky...it'll take me some time to
> completely wrap my brain around it.  Any pointers to revealing
> documentation would be quite helpful!

src/backend/optimizer/README is a good place to start.

I'd recommend working with CVS tip; there is little point in doing any
nontrivial development in the 7.2 branch.  You'd have to port it forward
anyway.

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

"Charles H. Woloszynski"

Дата:

16 января 2003 г., 13:53:16

I was surprised to hear that JOIN syntax constrained the planner.  We
have a policy of using JOIN syntax to describe the table relationships
and where clauses to describe the selection process for our queries.  It
was our understanding that the JOIN syntax was introduced to support
this approach, but not to contrain the planner.

Is there any way to sell the planner to consider JOIN syntax as
equivalent to WHERE clauses and to not use them to force the planner
down a specific path?  Can we get that added as an option (and then made
available to use JDBC folks as a URL parameter).  It would make my team
very happy :-).

I think that making this an option will help all those migrating to
Postgres who did not expect that JOINs forced the planner down specific
plans.    Is  it possible/reasonable to add?

Charlie

Tom Lane wrote:

>"Roman Fail" <rfail@posportal.com> writes:
>
>
>>Thanks to everyone for the quick replies!  I'm sure that my lack of
>>skill with SQL queries is the main problem.  What's strange to me is
>>how MSSQL takes my bad queries and makes them look good anyway.  It
>>must have a real smart planner.
>>
>>
>
>I think more likely the issue is that your use of JOIN syntax is forcing
>Postgres into a bad plan.  MSSQL probably doesn't assign any semantic
>significance to the use of "a JOIN b" syntax as opposed to "FROM a, b"
>syntax.  Postgres does.  Whether this is a bug or a feature depends on
>your point of view --- but there are folks out there who find it to be
>a life-saver.  You can find some explanations at
>http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/explicit-joins.html
>
>
>
>>Is it pretty much universally accepted that I should drop all my
>>foreign keys?
>>
>>
>
>No.  They don't have any effect on SELECT performance in Postgres.
>They will impact update speed, but that's not your complaint (at the
>moment).  Don't throw away data integrity protection until you know
>you need to.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>

--

Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com

Re: 7.3.1 New install, large queries are slow

От

Bruce Momjian

Дата:

16 января 2003 г., 14:19:49

Is this a TODO item?

---------------------------------------------------------------------------

Charles H. Woloszynski wrote:
> I was surprised to hear that JOIN syntax constrained the planner.  We
> have a policy of using JOIN syntax to describe the table relationships
> and where clauses to describe the selection process for our queries.  It
> was our understanding that the JOIN syntax was introduced to support
> this approach, but not to contrain the planner.
>
> Is there any way to sell the planner to consider JOIN syntax as
> equivalent to WHERE clauses and to not use them to force the planner
> down a specific path?  Can we get that added as an option (and then made
> available to use JDBC folks as a URL parameter).  It would make my team
> very happy :-).
>
>
> I think that making this an option will help all those migrating to
> Postgres who did not expect that JOINs forced the planner down specific
> plans.    Is  it possible/reasonable to add?
>
> Charlie
>
>
> Tom Lane wrote:
>
> >"Roman Fail" <rfail@posportal.com> writes:
> >
> >
> >>Thanks to everyone for the quick replies!  I'm sure that my lack of
> >>skill with SQL queries is the main problem.  What's strange to me is
> >>how MSSQL takes my bad queries and makes them look good anyway.  It
> >>must have a real smart planner.
> >>
> >>
> >
> >I think more likely the issue is that your use of JOIN syntax is forcing
> >Postgres into a bad plan.  MSSQL probably doesn't assign any semantic
> >significance to the use of "a JOIN b" syntax as opposed to "FROM a, b"
> >syntax.  Postgres does.  Whether this is a bug or a feature depends on
> >your point of view --- but there are folks out there who find it to be
> >a life-saver.  You can find some explanations at
> >http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/explicit-joins.html
> >
> >
> >
> >>Is it pretty much universally accepted that I should drop all my
> >>foreign keys?
> >>
> >>
> >
> >No.  They don't have any effect on SELECT performance in Postgres.
> >They will impact update speed, but that's not your complaint (at the
> >moment).  Don't throw away data integrity protection until you know
> >you need to.
> >
> >            regards, tom lane
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 4: Don't 'kill -9' the postmaster
> >
> >
>
> --
>
>
> Charles H. Woloszynski
>
> ClearMetrix, Inc.
> 115 Research Drive
> Bethlehem, PA 18015
>
> tel: 610-419-2210 x400
> fax: 240-371-3256
> web: www.clearmetrix.com
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: 7.3.1 New install, large queries are slow

От

Stephan Szabo

Дата:

16 января 2003 г., 15:03:05

On Thu, 16 Jan 2003, Roman Fail wrote:

> ***********************

Hmm, I wonder if maybe we're going about things backwards in this
case.  Does the original database have something like EXPLAIN
that'll show what it's doing? Perhaps that'll give an idea.

> > What does vacuum verbose batchdetail give you (it'll give an idea of pages anyway)
>
> trans=# VACUUM VERBOSE batchdetail;
> INFO:  --Relation public.batchdetail--
> INFO:  Pages 1669047: Changed 0, Empty 0; Tup 23316674: Vac 0, Keep 0, UnUsed 0.

So about 12 gigabytes of data, then?

> It seems to me that the big, big isolated problem is the index scan on
> batchdetail.tranamount.  During this small query, 'sar -b' showed
> consistent 90,000 block reads/sec. (contrast with only 6,000 with
> larger query index scan).  'top' shows the CPU is at 20% user, 30%
> system the whole time (contrast with 2% total in larger query above).

Note that in this case below, you've gotten a sequence scan not an
index scan. (similar to setting enable_indexscan=off performance)

> This results here still seem pretty bad (although not as bad as
> above), but I still don't know what is the bottleneck.  And the
> strange sar stats are confusing me.
>
> EXPLAIN ANALYZE SELECT * FROM batchdetail WHERE tranamount BETWEEN 300 AND 499;
> Seq Scan on batchdetail  (cost=0.00..2018797.11 rows=783291 width=440) (actual time=45.66..283926.58 rows=783687
loops=1)
>   Filter: ((tranamount >= 300::numeric) AND (tranamount <= 499::numeric))
> Total runtime: 285032.47 msec

I'd assume that tranamount values are fairly randomly distributed
throughout the table, right?  It takes about 5 minutes for the
system to read the entire table and more for the index scan, so
you're probably reading most of the table randomly and the index
as well.

What values on batchdetail do you use in query where clauses
regularly?  It's possible that occasional clusters would help
if this was the main field you filtered on.  The cluster
itself is time consuming, but it might help make the index
scans actually read fewer pages.

Re: 7.3.1 New install, large queries are slow

От

"Josh Berkus"

Дата:

16 января 2003 г., 15:16:29

Roman, Tom:

> No no no ... this is even worse than before.  Your big tables are
> batchdetail (d) and purc1 (p1).  What you've got to do is arrange the
> computation so that those are trimmed to just the interesting records
> as
> soon as possible.

When joining disproportionally large tables, I've also had some success
with the following method:

SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate,
d.tranamount,
d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode,
m.name AS merchantname, c.cardtype, m.merchid,
p1.localtaxamount, p1.productidentifier, dr.avsresponse,
cr.checkoutdate, cr.noshowindicator, ck.checkingacctno,
ck.abaroutingno, ck.checkno
FROM tranheader t
JOIN batchheader b ON (t.tranheaderid = b.tranheaderid AND b.batchdate
> '2002-12-15')
JOIN merchants m ON (m.merchantid = b.merchantid AND mmerchid =
'701252267')
JOIN batchdetail d ON (d.batchid = b.batchid AND d.tranamount BETWEEN
500 and 700)
JOIN cardtype c ON d.cardtypeid = c.cardtypeid
LEFT JOIN purc1 p1 ON p1.batchdetailid = d.batchdetailid
LEFT JOIN direct dr ON dr.batchdetailid = d.batchdetailid
LEFT JOIN carrental cr ON cr.batchdetailid = d.batchdetailid
LEFT JOIN checks ck ON ck.batchdetailid = d.batchdetailid
WHERE t.clientid = 6
AND d.tranamount BETWEEN 500.0 AND 700.0
AND b.batchdate > '2002-12-15'
AND m.merchid = '701252267'
ORDER BY b.batchdate DESC
LIMIT 50

This could be re-arranged some, but I think you get the idea ... I've
been able, in some queries, to get the planner to use a better and
faster join strategy by repeating my WHERE conditions in the JOIN
criteria.

-Josh

Re: 7.3.1 New install, large queries are slow

От

"Josh Berkus"

Дата:

16 января 2003 г., 15:28:23

Roman,

> > Hey, Roman, how many records in BatchDetail, anyway?
>
> 23 million.

And MSSQL is returning results in 3 seconds?    I find that a bit hard
to believe, unless this query is called repeatedly and that's the
figure for the last call, where the records are being cached.   I'll
have to look at your hardware descriptions again.

> It seems to me that the big, big isolated problem is the index scan
> on batchdetail.tranamount.

Nope.  This was a misimpression caused by batchdetail waiting for a
bunch of other processes to complete.  Sometimes the parallelizing
gives me a wrong impression of what's holding up the query.  Sorry if I
confused you.

> I hope we can come up with something soon.....it seems this index
> scan is a big part of the problem.  I'm still really curious why the
> disk reads are so few with the index scan.  Let's hope I can get it
> near the 3 second time for MSSQL by Friday!

Um, Roman, keep in mind this is a mailing list.   I'm sure that
everyone here is happy to give you the tools to figure out how to fix
things, but only in a DIY fashion, and not on your schedule.

If you have a deadline, you'd better hire some paid query/database
tuning help.  DB Tuning experts .... whether on MSSQL or Postgres ...
run about $250/hour last I checked.

-Josh Berkus

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

16 января 2003 г., 15:30:11

"Josh Berkus" <josh@agliodbs.com> writes:
> This could be re-arranged some, but I think you get the idea ... I've
> been able, in some queries, to get the planner to use a better and
> faster join strategy by repeating my WHERE conditions in the JOIN
> criteria.

Hm.  It shouldn't be necessary to do that --- the planner should be able
to push down the WHERE conditions to the right place without that help.

The list of explicit JOINs as you have here is a good way to proceed
*if* you write the JOINs in an appropriate order for implementation.
I believe the problem with Roman's original query was that he listed
the JOINs in a bad order.  Unfortunately I didn't keep a copy of that
message, and the list archives seem to be a day or more behind...
but at least for these WHERE conditions, it looks like the best bet
would to join m to b (I'm assuming m.merchid is unique), then to t,
then to d, then add on the others.

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

Stephan Szabo

Дата:

16 января 2003 г., 15:47:00

On Thu, 16 Jan 2003, Josh Berkus wrote:

> Roman,
>
> > > Hey, Roman, how many records in BatchDetail, anyway?
> >
> > 23 million.
>
> And MSSQL is returning results in 3 seconds?    I find that a bit hard
> to believe, unless this query is called repeatedly and that's the
> figure for the last call, where the records are being cached.   I'll
> have to look at your hardware descriptions again.
>
> > It seems to me that the big, big isolated problem is the index scan
> > on batchdetail.tranamount.
>
> Nope.  This was a misimpression caused by batchdetail waiting for a
> bunch of other processes to complete.  Sometimes the parallelizing
> gives me a wrong impression of what's holding up the query.  Sorry if I
> confused you.

I'm still not sure that it isn't a big part given that the time went down
by a factor of about 4 when index scans were disabled and a sequence scan
was done and that a sequence scan over the table with no other tables
joined looked to take about 5 minutes itself and the difference between
that seqscan and the big query was only about 20 seconds when
enable_indexscan was off unless I'm misreading those results.

Re: 7.3.1 New install, large queries are slow

От

"Josh Berkus"

Дата:

16 января 2003 г., 15:52:51

Tom,

> The list of explicit JOINs as you have here is a good way to proceed
> *if* you write the JOINs in an appropriate order for implementation.
> I believe the problem with Roman's original query was that he listed
> the JOINs in a bad order.  Unfortunately I didn't keep a copy of that
> message, and the list archives seem to be a day or more behind...
> but at least for these WHERE conditions, it looks like the best bet
> would to join m to b (I'm assuming m.merchid is unique), then to t,
> then to d, then add on the others.

I realize that I've contributed nothing other than bug reports to the
parser design.  But shouldn't Postgres, given a free hand, figure out
the above automatically?    I'd be embarassed if MS could one-up us in
parser planning anywhere, theirs sucks on sub-selects ....

-Josh Berkus

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

16 января 2003 г., 16:40:44

"Josh Berkus" <josh@agliodbs.com> writes:
>> but at least for these WHERE conditions, it looks like the best bet
>> would to join m to b (I'm assuming m.merchid is unique), then to t,
>> then to d, then add on the others.

> I realize that I've contributed nothing other than bug reports to the
> parser design.  But shouldn't Postgres, given a free hand, figure out
> the above automatically?

I believe it will.  So far I've not seen an EXPLAIN from a query that
was structured to give it a free hand.

As noted elsewhere, the fact that we allow JOIN syntax to constrain the
planner is a real pain if you are accustomed to databases that don't do
that.  On the other hand, it's a real lifesaver for people who need to
pare the planning time for dozen-way joins; it was only a day or two
back in this same mailing list that we last had a discussion about that
end of the problem.  So even though it started out as an implementation
shortcut rather than an intended feature, I'm loathe to just disable the
behavior entirely.

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

Stephan Szabo

Дата:

16 января 2003 г., 16:51:28

On Wed, 15 Jan 2003, Roman Fail wrote:

I just had new thoughts.

If you make an index on batchdetail(batchid)
does that help?

I realized that it was doing a merge join
to join d and the (t,b,m) combination when it
was expecting 3 rows out of the latter, and
batchid is presumably fairly selective on
the batchdetail table, right? I'd have expected
a nested loop over the id column, but it
doesn't appear you have an index on it in
batchdetail.

Then I realized that batchheader.batchid and
batchdetail.batchid don't even have the same
type, and that's probably something else you'd
need to fix.

> batchheader has 2.6 million records:
> CREATE TABLE public.batchheader (
>   batchid int8 DEFAULT nextval('"batchheader_batchid_key"'::text) NOT NULL,

> And here's batchdetail too, just for kicks.  23 million records.
> CREATE TABLE public.batchdetail (
>   batchid int4,

Re: 7.3.1 New install, large queries are slow

От

"Roman Fail"

Дата:

16 января 2003 г., 17:51:34

> Josh Berkus wrote:
> And MSSQL is returning results in 3 seconds?    I find that a bit hard
> to believe, unless this query is called repeatedly and that's the
> figure for the last call, where the records are being cached.   I'll
> have to look at your hardware descriptions again.

Hardware-wise, the Postgres server is a hot rod and MSSQL is a basic vanilla server.  I changed all the WHERE clauses
toradically different values and couldn't get it to take more than 5 seconds on MSSQL.  Most of it's cost savings seems
tocome from some kind of "Table Spool/Lazy Spool" in it's execution plan, which looks to me like it only exists for the
lifeof the query.  You can read more about this at:
 
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1_1m7g.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1_7rjg.asp
Maybe there are some good ideas here for Postgres.  Unfortunately, the MSSQL Execution Plan is displayed graphically,
andI can't figure out a way to get it to text without typing it all.  I could do some screen shots if you really want
tosee it.
 
 
> Stephan Szabo wrote:
> I'd assume that tranamount values are fairly randomly distributed
> throughout the table, right?  It takes about 5 minutes for the
> system to read the entire table and more for the index scan, so
> you're probably reading most of the table randomly and the index
> as well.
> What values on batchdetail do you use in query where clauses regularly? 
 
Yes, tranamount values are randomly distributed.  I don't understand why an index scan would be "random", isn't the
wholepoint of an index to have an ordered reference into the data?  batchdetail has 5 columns that can be in the WHERE
clause,all of which are indexed.  None is more likely than the other to be searched, so a clustered index doesn't make
muchsense to me.  The whole thing needs to be fast.
 
 
>> Nope.  This was a misimpression caused by batchdetail waiting for a
>> bunch of other processes to complete.  Sometimes the parallelizing
>> gives me a wrong impression of what's holding up the query.  Sorry if I
>> confused you.
>
>I'm still not sure that it isn't a big part given that the time went down
>by a factor of about 4 when index scans were disabled and a sequence scan
>was done and that a sequence scan over the table with no other tables
>joined looked to take about 5 minutes itself and the difference between
>that seqscan and the big query was only about 20 seconds when
>enable_indexscan was off unless I'm misreading those results.

You are not misreading the results.  There was a huge difference.  Nobody has ever made note of it, but this still
seemsvery odd to me:
 
*** 'sar -b' during the query
with index scan: 6,000 block reads/sec
with seq scan: 95,000 block reads/sec
 
 
Tom, here is the EXPLAIN for your suggested version of the query with enable_indexscan=on.  I performed the same query
withenable_indexscan=off and total runtime was *much* better: 296174.60 msec.  By the way, thank you for your detailed
descriptionof how the JOIN order affects the outcome - I understand much better now.
 
EXPLAIN ANALYZE
SELECT ss.batchdate, ss.batchdetailid, ss.bankno, ss.trandate, ss.tranamount,
ss.submitinterchange, ss.authamount, ss.authno, ss.cardtypeid, ss.mcccode,
ss.name AS merchantname, ss.cardtype, ss.merchid,
p1.localtaxamount, p1.productidentifier, dr.avsresponse,
cr.checkoutdate, cr.noshowindicator, ck.checkingacctno,
ck.abaroutingno, ck.checkno
FROM
  (SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate, d.tranamount,
       d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode,
       m.name, c.cardtype, m.merchid
   FROM tranheader t, batchheader b, merchants m, cardtype c, batchdetail d
   WHERE t.tranheaderid=b.tranheaderid
   AND m.merchantid=b.merchantid
   AND d.batchid=b.batchid
   AND c.cardtypeid=d.cardtypeid
   AND t.clientid = 6
   AND d.tranamount BETWEEN 500.0 AND 700.0
   AND b.batchdate > '2002-12-15'
   AND m.merchid = '701252267') ss
  LEFT JOIN purc1 p1 on p1.batchdetailid=ss.batchdetailid
  LEFT JOIN direct dr ON dr.batchdetailid = ss.batchdetailid
  LEFT JOIN carrental cr ON cr.batchdetailid = ss.batchdetailid
  LEFT JOIN checks ck ON ck.batchdetailid = ss.batchdetailid
ORDER BY ss.batchdate DESC
LIMIT 50
Limit  (cost=1601637.75..1601637.75 rows=1 width=285) (actual time=1221606.41..1221606.42 rows=5 loops=1)
  ->  Sort  (cost=1601637.75..1601637.75 rows=1 width=285) (actual time=1221606.40..1221606.41 rows=5 loops=1)
        Sort Key: b.batchdate
        ->  Nested Loop  (cost=1543595.18..1601637.74 rows=1 width=285) (actual time=1204815.02..1221606.27 rows=5
loops=1)
              Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
              ->  Nested Loop  (cost=1543595.18..1601581.23 rows=1 width=247) (actual time=1204792.38..1221560.42
rows=5loops=1)
 
                    Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                    ->  Nested Loop  (cost=1543595.18..1601581.22 rows=1 width=230) (actual time=1204792.35..1221560.27
rows=5loops=1)
 
                          Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                          ->  Nested Loop  (cost=1543595.18..1601581.21 rows=1 width=221) (actual
time=1204792.31..1221560.09rows=5 loops=1)
 
                                Join Filter: ("inner".batchdetailid = "outer".batchdetailid)
                                ->  Nested Loop  (cost=1543595.18..1545529.17 rows=1 width=202) (actual
time=1195376.48..1195578.86rows=5 loops=1)
 
                                      Join Filter: ("inner".tranheaderid = "outer".tranheaderid)
                                      ->  Nested Loop  (cost=1543595.18..1545449.98 rows=1 width=186) (actual
time=1195370.72..1195536.53rows=5 loops=1)
 
                                            Join Filter: ("inner".cardtypeid = "outer".cardtypeid)
                                            ->  Merge Join  (cost=1543595.18..1545448.76 rows=1 width=172) (actual
time=1195311.88..1195477.32rows=5 loops=1)
 
                                                  Merge Cond: ("outer".batchid = "inner".batchid)
                                                  ->  Sort  (cost=476.17..476.18 rows=4 width=102) (actual
time=30.57..30.59rows=17 loops=1)
 
                                                        Sort Key: b.batchid
                                                        ->  Nested Loop  (cost=0.00..476.14 rows=4 width=102) (actual
time=25.21..30.47rows=17 loops=1)
 
                                                              ->  Index Scan using merchants_ix_merchid_idx on
merchantsm  (cost=0.00..5.65 rows=1 width=78) (actual time=23.81..23.82 rows=1 loops=1)
 
                                                                    Index Cond: (merchid = '701252267'::character
varying)
                                                              ->  Index Scan using batchheader_ix_merchantid_idx on
batchheaderb  (cost=0.00..470.30 rows=15 width=24) (actual time=1.38..6.55 rows=17 loops=1)
 
                                                                    Index Cond: ("outer".merchantid = b.merchantid)
                                                                    Filter: (batchdate > '2002-12-15
00:00:00'::timestampwithout time zone)
 
                                                  ->  Sort  (cost=1543119.01..1544045.79 rows=370710 width=70) (actual
time=1194260.51..1194892.79rows=368681 loops=1)
 
                                                        Sort Key: d.batchid
                                                        ->  Index Scan using batchdetail_ix_tranamount_idx on
batchdetaild  (cost=0.00..1489103.46 rows=370710 width=70) (actual time=5.26..1186051.44 rows=370307 loops=1)
 
                                                              Index Cond: ((tranamount >= 500.0) AND (tranamount <=
700.0))
                                            ->  Seq Scan on cardtype c  (cost=0.00..1.10 rows=10 width=14) (actual
time=11.77..11.79rows=10 loops=5)
 
                                      ->  Seq Scan on tranheader t  (cost=0.00..55.15 rows=1923 width=16) (actual
time=0.02..5.46rows=1923 loops=5)
 
                                            Filter: (clientid = 6)
                                ->  Seq Scan on purc1 p1  (cost=0.00..44285.35 rows=941335 width=19) (actual
time=10.79..3763.56rows=938770 loops=5)
 
                          ->  Seq Scan on direct dr  (cost=0.00..0.00 rows=1 width=9) (actual time=0.00..0.00 rows=0
loops=5)
                    ->  Seq Scan on carrental cr  (cost=0.00..0.00 rows=1 width=17) (actual time=0.00..0.00 rows=0
loops=5)
              ->  Seq Scan on checks ck  (cost=0.00..40.67 rows=1267 width=38) (actual time=0.77..7.15 rows=1267
loops=5)
Total runtime: 1221645.52 msec

 
> Tomasz Myrta wrote:
> Are there any where clauses which all of theses variation have?

Yes.....WHERE clientid = ? will appear in every query.  The others are present based on user input.

 
> Ron Johnson wrote:
> What are the indexes on batchdetail?
> There's one on batchid and a seperate one on tranamount?
> If so, what about dropping them and create a single multi-segment
> index on "batchid, tranamount".  (A constraint can then enforce
> uniqueness on batchid.
 
There is no index on batchid, I think it is a good idea to create one.  Stephan also suggested this.  After I try the
singlebatchid index, I might try to multi-segment index idea as well.   I'll post results later today.
 
 
> Stephan Szabo wrote:
> Then I realized that batchheader.batchid and
> batchdetail.batchid don't even have the same
> type, and that's probably something else you'd
> need to fix.

Yes, that's a mistake on my part....batchdetail(batchid) should be an int8.  It looks to me like converting this
datatypecan't be done with a single ALTER TABLE ALTER COLUMN statement.....so I guess I'll work around it with an ADD,
UPDATE,DROP, and RENAME.
 
 
> Josh Berkus wrote:
> Um, Roman, keep in mind this is a mailing list.   I'm sure that
> everyone here is happy to give you the tools to figure out how to fix
> things, but only in a DIY fashion, and not on your schedule. 

I hate being defensive, but I don't remember saying that I expect anyone to fix my problems for me on my schedule.  *I*
hopethat *I* can get this done by Friday, because otherwise my boss is going to tell me to dump Postgres and install
MSSQLon the server.  I only mention this fact because it's a blow against PostgreSQL's reputation if I have to give up.
There is no pressure on you, and I apologize if something I said sounded like whining.
 
 
I am VERY grateful for the time that all of you have given to this problem.
 
Roman Fail
Sr. Web Application Programmer
POS Portal, Inc.

Re: 7.3.1 New install, large queries are slow

От

Stephan Szabo

Дата:

16 января 2003 г., 17:54:09

On Thu, 16 Jan 2003, Roman Fail wrote:

> > Stephan Szabo wrote:
> > I'd assume that tranamount values are fairly randomly distributed
> > throughout the table, right?  It takes about 5 minutes for the
> > system to read the entire table and more for the index scan, so
> > you're probably reading most of the table randomly and the index
> > as well.
> > What values on batchdetail do you use in query where clauses regularly?

>  Yes, tranamount values are randomly distributed.  I don't understand
> why an index scan would be "random", isn't the whole point of an index
> to have an ordered reference into the data?  batchdetail has 5 columns
> that can be in the WHERE clause, all of which are indexed.  None is
> more likely than the other to be searched, so a clustered index
> doesn't make much sense to me.  The whole thing needs to be fast.

Yeah, in that case a clustered index doesn't help.
Indexes give you an ordered way to find the rows that meet a condition,
but say you had three rows in your table in this order (note that this is
an amazing oversimplification):
(1,'a')
(2,'b')
(0,'c')

And you want to scan the index from values with the first number between 0
and 2.  It reads the third row, then the first, then the second (to get
the letter associated).  Between those reads, it's got to seek back and
forth through the heap file and the order in which it hits them is pretty
random seeming (to the kernel).

> > Ron Johnson wrote:
> > What are the indexes on batchdetail?
> > There's one on batchid and a seperate one on tranamount?
> > If so, what about dropping them and create a single multi-segment
> > index on "batchid, tranamount".  (A constraint can then enforce
> > uniqueness on batchid.
>  There is no index on batchid, I think it is a good idea to create
> one.  Stephan also suggested this.  After I try the single batchid
> index, I might try to multi-segment index idea as well.  I'll post
> results later today.

I think we may all have misread the index list to include an index on
batchid. Also you have two indexes on batchdetailid right now (primary key
also creates one) which added to the confusion.

> > Stephan Szabo wrote:
> > Then I realized that batchheader.batchid and
> > batchdetail.batchid don't even have the same
> > type, and that's probably something else you'd
> > need to fix.
>
> Yes, that's a mistake on my part....batchdetail(batchid) should be an
> int8.  It looks to me like converting this datatype can't be done with
> a single ALTER TABLE ALTER COLUMN statement.....so I guess I'll work
> around it with an ADD, UPDATE, DROP, and RENAME.

Don't forget to do a vacuum full in there as well.

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

16 января 2003 г., 17:57:44

"Roman Fail" <rfail@posportal.com> writes:
>                                             ->  Merge Join  (cost=1543595.18..1545448.76 rows=1 width=172) (actual
time=1195311.88..1195477.32rows=5 loops=1) 
>                                                   Merge Cond: ("outer".batchid = "inner".batchid)
>                                                   ->  Sort  (cost=476.17..476.18 rows=4 width=102) (actual
time=30.57..30.59rows=17 loops=1) 
>                                                         Sort Key: b.batchid
>                                                         ->  Nested Loop  (cost=0.00..476.14 rows=4 width=102) (actual
time=25.21..30.47rows=17 loops=1) 
>                                                               ->  Index Scan using merchants_ix_merchid_idx on
merchantsm  (cost=0.00..5.65 rows=1 width=78) (actual time=23.81..23.82 rows=1 loops=1) 
>                                                                     Index Cond: (merchid = '701252267'::character
varying)
>                                                               ->  Index Scan using batchheader_ix_merchantid_idx on
batchheaderb  (cost=0.00..470.30 rows=15 width=24) (actual time=1.38..6.55 rows=17 loops=1) 
>                                                                     Index Cond: ("outer".merchantid = b.merchantid)
>                                                                     Filter: (batchdate > '2002-12-15
00:00:00'::timestampwithout time zone) 
>                                                   ->  Sort  (cost=1543119.01..1544045.79 rows=370710 width=70)
(actualtime=1194260.51..1194892.79 rows=368681 loops=1) 
>                                                         Sort Key: d.batchid
>                                                         ->  Index Scan using batchdetail_ix_tranamount_idx on
batchdetaild  (cost=0.00..1489103.46 rows=370710 width=70) (actual time=5.26..1186051.44 rows=370307 loops=1) 
>                                                               Index Cond: ((tranamount >= 500.0) AND (tranamount <=
700.0))

The expensive part of this is clearly the sort and merge of the rows
extracted from batchdetail.  The index on tranamount is not helping
you at all, because the condition (between 500 and 700) isn't very
selective --- it picks up 370000 rows --- and since those rows are
totally randomly scattered in the table, you do a ton of random
seeking.  It's actually faster to scan the table linearly --- that's why
enable_indexscan=off was faster.

However, I'm wondering why the thing picked this plan, when it knew it
would get only a few rows out of the m/b join (estimate 4, actual 17,
not too bad).  I would have expected it to use an inner indexscan on
d.batchid.  Either you've not got an index on d.batchid, or there's a
datatype mismatch that prevents the index from being used.  What are the
datatypes of d.batchid and b.batchid, exactly?  If they're not the same,
either make them the same or add an explicit coercion to the query, like
    WHERE d.batchid = b.batchid::typeof_d_batchid

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

jasiek@klaster.net

Дата:

16 января 2003 г., 19:25:15

Tom said:
> datatype mismatch that prevents the index from being used.  What are the
> datatypes of d.batchid and b.batchid, exactly?  If they're not the same,
> either make them the same or add an explicit coercion to the query, like
>     WHERE d.batchid = b.batchid::typeof_d_batchid
>
It can be source of problem. I found in one of Roman's mail, that
batchid is declared as int8 in master table and as int4 in detail table.
Regards,
Tomasz Myrta

Re: 7.3.1 New install, large queries are slow

От

"Roman Fail"

Дата:

17 января 2003 г., 03:54:41

Tom and Tomasz:
I have to change the datatype of batchdetail.batchid from int4 to int8.  After over 4 hours, the UPDATE transfer from
theold column to new has not yet completed.  After that I still have to build a new index and run VACUUM FULL.  When
thatis all done I'll re-run the various queries, including a specific small one that Josh requested.  

Chad Thompson suggested that I add single quotes around the literals in the WHERE clause, which sounded like a great
ideabased on his experience.  Unfortunately, it did not make the query any faster.  But read on!

For kicks, I tried this simple query, which should happen in an instant.  It is the first row in the table.
EXPLAIN ANALYZE select batchdetailid from batchdetail where batchdetailid = 27321;
 Seq Scan on batchdetail  (cost=0.00..1960485.43 rows=1 width=8) (actual time=17.58..264303.76 rows=1 loops=1)
   Filter: (batchdetailid = 27321)
 Total runtime: 264303.87 msec
Does it make sense to do a sequence scan when the primary key index is available?  Even so, it's still a pretty
horribletime given the hardware.

HOWEVER.....look at this:
EXPLAIN ANALYZE select batchdetailid from batchdetail where batchdetailid = 27321::bigint;
 Index Scan using batchdetail_pkey on batchdetail  (cost=0.00..4.13 rows=1 width=8) (actual time=0.03..0.03 rows=1
loops=1)
   Index Cond: (batchdetailid = 27321::bigint)
 Total runtime: 0.07 msec

It sort of feels like a magic moment.  I went back and looked through a lot of the JOIN columns and found that I was
mixingint4 with int8 in a lot of them.  All of these tables (except batchdetail) were migrated using pgAdminII's
migrationwizard, so I didn't really give a hard look at all the data types matching up since it has a nice data map (I
usedthe defaults except for the money type).  

Now I think I'm just going to drop the entire database and reload the data from scratch, making sure that the data
typesare mapped exactly right.  Correct me if I'm wrong, but int4 only ranges from negative 2 billion to positive 2
billion. All the primary keys for my tables would fit in this range with the exception of batchdetail, which could
conceivablygrow beyond 2 billion someday (although I'd be archiving a lot of it when it got that big).  Maybe I just
shouldn'tworry about it for now and make everything int4 for simplicity.

I doubt I will accomplish all this on Friday, but I'll give a full report once I get it all reloaded. 

> Stephan Szabo wrote:
> Also you have two indexes on batchdetailid right now (primary key
> also creates one) which added to the confusion.

The 7.3.1 docs for CREATE TABLE don't mention anything about automatic index creation for a PRIMARY KEY.  I didn't see
anyPK indexes via pgAdminII, so I read this line from the docs and decided to create them separately.

    "Technically, PRIMARY KEY is merely a combination of UNIQUE and NOT NULL"
However, this query proves you are right:
trans=# select relname, relpages, indisunique, indisprimary from pg_class, pg_index
trans-# where indexrelid in (37126739, 8604257) and pg_class.oid = pg_index.indexrelid;
 relname                          | relpages | indisunique | indisprimary
----------------------------------+----------+-------------+--------------
 batchdetail_pkey                 |   121850 | t           | t
 batchdetail_ix_batchdetailid_idx |    63934 | f           | f

All other columns in the two tables are identical for these two indexes.  So now I've gone through and deleted all of
theseduplicate indexes I created (and then a VACUUM FULL).  Perhaps an extra sentence in the docs might prevent someone
elsefrom making the same mistake as I?

*** Current postgresql.conf settings:
tcpip_socket=true
shared_buffers = 131072
max_fsm_relations = 10000
max_fsm_pages = 2000000
sort_mem = 32768
default_statistics_target = 30

Thanks again for all your help!

Roman Fail
Sr. Web Application Developer
POS Portal, Inc.

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

17 января 2003 г., 04:06:22

"Roman Fail" <rfail@posportal.com> writes:
> shared_buffers = 131072

Yipes!  Try about a tenth that much.  Or less.

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

Stephan Szabo

Дата:

17 января 2003 г., 05:00:54

On Thu, 16 Jan 2003, Roman Fail wrote:

> > Stephan Szabo wrote:
> > Also you have two indexes on batchdetailid right now (primary key
> > also creates one) which added to the confusion.
>
> The 7.3.1 docs for CREATE TABLE don't mention anything about automatic
> index creation for a PRIMARY KEY.  I didn't see any PK indexes via
> pgAdminII, so I read this line from the docs and decided to create
> them separately.
>     "Technically, PRIMARY KEY is merely a combination of UNIQUE and NOT NULL"

Right, but the implementation of UNIQUE constraints in postgresql right
now is through a unique index.  That's not necessarily a guarantee for
the future, but for right now you can rely on it.

Re: 7.3.1 New install, large queries are slow

От

"Charles H. Woloszynski"

Дата:

17 января 2003 г., 11:29:37

I'd love to see this as a TODO item, but I am hardly one to add to the
list...

Charlie


Bruce Momjian wrote:

>Is this a TODO item?
>
>---------------------------------------------------------------------------
>
>Charles H. Woloszynski wrote:
>
>
>>I was surprised to hear that JOIN syntax constrained the planner.  We
>>have a policy of using JOIN syntax to describe the table relationships
>>and where clauses to describe the selection process for our queries.  It
>>was our understanding that the JOIN syntax was introduced to support
>>this approach, but not to contrain the planner.
>>
>>Is there any way to sell the planner to consider JOIN syntax as
>>equivalent to WHERE clauses and to not use them to force the planner
>>down a specific path?  Can we get that added as an option (and then made
>>available to use JDBC folks as a URL parameter).  It would make my team
>>very happy :-).
>>
>>
>>I think that making this an option will help all those migrating to
>>Postgres who did not expect that JOINs forced the planner down specific
>>plans.    Is  it possible/reasonable to add?
>>
>>Charlie
>>
>>
>>Tom Lane wrote:
>>
>>
>>
>>>"Roman Fail" <rfail@posportal.com> writes:
>>>
>>>
>>>
>>>
>>>>Thanks to everyone for the quick replies!  I'm sure that my lack of
>>>>skill with SQL queries is the main problem.  What's strange to me is
>>>>how MSSQL takes my bad queries and makes them look good anyway.  It
>>>>must have a real smart planner.
>>>>
>>>>
>>>>
>>>>
>>>I think more likely the issue is that your use of JOIN syntax is forcing
>>>Postgres into a bad plan.  MSSQL probably doesn't assign any semantic
>>>significance to the use of "a JOIN b" syntax as opposed to "FROM a, b"
>>>syntax.  Postgres does.  Whether this is a bug or a feature depends on
>>>your point of view --- but there are folks out there who find it to be
>>>a life-saver.  You can find some explanations at
>>>http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/explicit-joins.html
>>>
>>>
>>>
>>>
>>>
>>>>Is it pretty much universally accepted that I should drop all my
>>>>foreign keys?
>>>>
>>>>
>>>>
>>>>
>>>No.  They don't have any effect on SELECT performance in Postgres.
>>>They will impact update speed, but that's not your complaint (at the
>>>moment).  Don't throw away data integrity protection until you know
>>>you need to.
>>>
>>>            regards, tom lane
>>>
>>>---------------------------(end of broadcast)---------------------------
>>>TIP 4: Don't 'kill -9' the postmaster
>>>
>>>
>>>
>>>
>>--
>>
>>
>>Charles H. Woloszynski
>>
>>ClearMetrix, Inc.
>>115 Research Drive
>>Bethlehem, PA 18015
>>
>>tel: 610-419-2210 x400
>>fax: 240-371-3256
>>web: www.clearmetrix.com
>>
>>
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>
>>
>>
>
>
>

--


Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com

Re: 7.3.1 New install, large queries are slow

От

Jeff

Дата:

17 января 2003 г., 12:01:22

On Thu, 16 Jan 2003, Roman Fail wrote:

>
> HOWEVER.....look at this:
> EXPLAIN ANALYZE select batchdetailid from batchdetail where batchdetailid = 27321::bigint;
>  Index Scan using batchdetail_pkey on batchdetail  (cost=0.00..4.13 rows=1 width=8) (actual time=0.03..0.03 rows=1
loops=1)
>    Index Cond: (batchdetailid = 27321::bigint)
>  Total runtime: 0.07 msec
>

We had this happen to us - we had a serial8 column (int8) and our query
was straight forward where id = 12345; which ran craptacularly.  After
much head banging and cursing I had tried where id = '12345' and it
magically worked. I think the parser is interpreting a "number" to be an
int4 instead of int8.  (instead of quotes you can also cast via
12345::int8 like you did)

Perhaps this should go on the TODO - when one side is an int8 and the
other is a literal number assume the number to be int8 instead of int4?

------------------------------------------------------------------------------
Jeff Trout <jeff@jefftrout.com>                  http://www.jefftrout.com/
   Ronald McDonald, with the help of cheese soup,
       controls America from a secret volkswagon hidden in the past
-------------------------------------------------------------------------------

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

17 января 2003 г., 13:11:31

Jeff <threshar@torgo.978.org> writes:
> Perhaps this should go on the TODO - when one side is an int8 and the
> other is a literal number assume the number to be int8 instead of int4?

It's been on TODO for so long that it's buried near the bottom.

* Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8,
  float4, numeric/decimal too [optimizer]

This behavior interacts with enough other stuff that we can't just
change it willy-nilly.  See many past discussions in the pghackers
archives if you want details.  A recent example of a promising-looking
fix crashing and burning is
http://fts.postgresql.org/db/mw/msg.html?mid=1357121

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

Andrew Sullivan

Дата:

17 января 2003 г., 13:16:45

On Fri, Jan 17, 2003 at 09:00:19AM -0500, Jeff wrote:
> Perhaps this should go on the TODO - when one side is an int8 and the
> other is a literal number assume the number to be int8 instead of int4?

Actually, this is a broader problem having to do with type coercion.
There are a couple of TODO items which refer to this, it looks to me,
but in any case there has been _plenty_ of discussion on -general and
-hackers about what's wrong here.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: 7.3.1 New install, large queries are slow

От

"Josh Berkus"

Дата:

17 января 2003 г., 15:14:36

Tom,

> > shared_buffers = 131072
>
> Yipes!  Try about a tenth that much.  Or less.

Why?  He has 4GB RAM on the machine.

-Josh Berkus

Strange Join question

От

Noah Silverman

Дата:

17 января 2003 г., 15:33:56

Hi,

I have a challenging (for me) SQL question:

Two tables
(Note: these are fictitious, the real tables actually make sense, so no
need to re-design our table structure)

Table 1
id | name | count
------------------------
1 |   foo    |  10
1 |   foo    |  20
2 |  bar     |  100


Table 2
id  | f1  | f2  | t1ref
-----------------------
1  | 10  |  20 | 1
2  | 50  | 40  | 2


The question:

I want to do the following select:
select table2.f1, table1.name   from table1,table2 where table1.id =
table 2.id and table2.id = 2;

The problem is that I really only need the name from table2 returned
once.  With this query, I get two records back.  Clearly this is
because of the join that I am doing.  Is there a different way to
perform this join, so that I only get back ONE record from table1 that
matches?

Thanks,

-Noah

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

17 января 2003 г., 15:37:13

"Josh Berkus" <josh@agliodbs.com> writes:
>>> shared_buffers = 131072
>>
>> Yipes!  Try about a tenth that much.  Or less.

> Why?  He has 4GB RAM on the machine.

I think a gig of shared buffers is overkill no matter what.

One reason not to crank up shared_buffers "just because you can" is that
there are operations (such as CHECKPOINT) that have to scan through all
the buffers, linearly.  I don't *think* any of these are in
performance-critical paths, but nonetheless you're wasting CPU.  I trust
the kernel to manage a huge number of buffers efficiently more than I
trust Postgres.

There's another issue, which is somewhat platform-dependent; I'm not
sure if it applies to whatever OS Roman is using.  But if you have a
machine where SysV shared memory is not locked into RAM, then a huge
shared buffer arena creates the probability that some of it will be
touched seldom enough that the kernel will decide to swap it out.  When
that happens, you *really* pay through the nose --- a page that you
might have been able to get from kernel cache without incurring I/O will
now certainly cost you I/O to touch.  It's even worse if the buffer
contained a dirty page at the time it was swapped out --- now that page
is going to require being read back in and written out again, a net cost
of three I/Os where there should have been one.  Bottom line is that
shared_buffers should be kept small enough that the space all looks like
a hot spot to the kernel's memory allocation manager.

In short, I believe in keeping shared_buffers relatively small --- one
to ten thousand seems like the right ballpark --- and leaving the kernel
to allocate the rest of RAM as kernel disk cache.

I have been thinking recently about proposing that we change the factory
default shared_buffers to 1000, which if this line of reasoning is
correct would eliminate the need for average installations to tune it.
The reason the default is 64 is that on some older Unixen, the default
SHMMAX is only one meg --- but it's been a long time since our default
shared memory request was less than a meg anyway, because of bloat in
other components of shared memory.  It's probably time to change the
default to something more reasonable from a performance standpoint, and
put some burden on users of older Unixen to either reduce the setting
or fix their SHMMAX parameter.

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

Bruce Momjian

Дата:

17 января 2003 г., 15:56:13

Tom Lane wrote:
> "Josh Berkus" <josh@agliodbs.com> writes:
> >>> shared_buffers = 131072
> >>
> >> Yipes!  Try about a tenth that much.  Or less.
>
> > Why?  He has 4GB RAM on the machine.
>
> I think a gig of shared buffers is overkill no matter what.
>
> One reason not to crank up shared_buffers "just because you can" is that
> there are operations (such as CHECKPOINT) that have to scan through all
> the buffers, linearly.  I don't *think* any of these are in
> performance-critical paths, but nonetheless you're wasting CPU.  I trust
> the kernel to manage a huge number of buffers efficiently more than I
> trust Postgres.
>
> There's another issue, which is somewhat platform-dependent; I'm not
> sure if it applies to whatever OS Roman is using.  But if you have a
> machine where SysV shared memory is not locked into RAM, then a huge
> shared buffer arena creates the probability that some of it will be
> touched seldom enough that the kernel will decide to swap it out.  When
> that happens, you *really* pay through the nose --- a page that you
> might have been able to get from kernel cache without incurring I/O will
> now certainly cost you I/O to touch.  It's even worse if the buffer
> contained a dirty page at the time it was swapped out --- now that page
> is going to require being read back in and written out again, a net cost
> of three I/Os where there should have been one.  Bottom line is that
> shared_buffers should be kept small enough that the space all looks like
> a hot spot to the kernel's memory allocation manager.

Just as a data point, I believe other database systems recommend very
large shared memory areas if a lot of data is being accessed.  I seem to
remember Informix doing that.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: 7.3.1 New install, large queries are slow

От

Andrew Sullivan

Дата:

17 января 2003 г., 16:03:30

On Fri, Jan 17, 2003 at 12:33:11PM -0500, Tom Lane wrote:
> One reason not to crank up shared_buffers "just because you can" is that
> there are operations (such as CHECKPOINT) that have to scan through all
> the buffers, linearly.  I don't *think* any of these are in
> performance-critical paths, but nonetheless you're wasting CPU.  I trust
> the kernel to manage a huge number of buffers efficiently more than I
> trust Postgres.

For what it's worth, we have exactly that experience on our Sun
E4500s.  I had machines with 12 gig I was testing on, and I increased
the buffers to 2 Gig, because truss was showing us some sluggishness
in the system was tripping on the system call to get a page.  It was
satisifed right away by the kernel's cache, but the system call was
still the most expensive part of the operation.

After we'd increased the shared buffers, however, performance
_degraded_ considerably.  It now spent all its time instead managing
the huge shared buffer, and the cost of that was much worse than the
cost of the system call.

So it is extremely dependent on the efficiency of PostgreSQL's use of
shared memory as compared to the efficiency of the system call.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

17 января 2003 г., 16:04:02

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Just as a data point, I believe other database systems recommend very
> large shared memory areas if a lot of data is being accessed.  I seem to
> remember Informix doing that.

Yeah, but isn't that theory a hangover from pre-Unix operating systems?
In all modern Unixen, you can expect the kernel to make use of any spare
RAM for disk buffer cache --- and that behavior makes it pointless for
Postgres to try to do large amounts of its own buffering.

Having a page in our own buffer instead of kernel buffer saves a context
swap to access the page, but it doesn't save I/O, so the benefit is a
lot less than you might think.  I think there's seriously diminishing
returns in pushing shared_buffers beyond a few thousand, and once you
get to the point where it distorts the kernel's ability to manage
memory for processes, you're really shooting yourself in the foot.

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

Jeff

Дата:

17 января 2003 г., 17:04:06

On Fri, 17 Jan 2003, Tom Lane wrote:

>
> Yeah, but isn't that theory a hangover from pre-Unix operating systems?
> In all modern Unixen, you can expect the kernel to make use of any spare
> RAM for disk buffer cache --- and that behavior makes it pointless for
> Postgres to try to do large amounts of its own buffering.
>

Informix, oracle, etc all do raw device access bypassing the kernels
buffering, etc.  So they need heaping gobules of memory to do the same
thing the kernel does.. but since they know the exact patterns of data and
how things will be done they can fine tune their buffer caches to get much
better performance than the kernel (15-20% in informix's case) since the
kernel needs to be a "works good generally"

probably the desire to crank that up stems from using those other db's I
know I used to do that with pgsql. (Ahh, I crank that setting up through
the roof on informix, I'll do the same with pg)

perhaps a FAQ entry or comment in the shipped config about it?
I think if people realize it isn't quite the same as what it does in
oracle/informix/etc then they'll be less inclined to cranking it.

------------------------------------------------------------------------------
Jeff Trout <jeff@jefftrout.com>                  http://www.jefftrout.com/
   Ronald McDonald, with the help of cheese soup,
       controls America from a secret volkswagon hidden in the past
-------------------------------------------------------------------------------

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

18 января 2003 г., 02:51:27

Jeff <threshar@torgo.978.org> writes:
> On Fri, 17 Jan 2003, Tom Lane wrote:
>> Yeah, but isn't that theory a hangover from pre-Unix operating systems?

> Informix, oracle, etc all do raw device access bypassing the kernels
> buffering, etc.  So they need heaping gobules of memory to do the same
> thing the kernel does..

D'oh, I believe Jeff's put his finger on it.  You need lotsa RAM if you
are trying to bypass the OS.  But Postgres would like to work with the
OS, not bypass it.

> but since they know the exact patterns of data and
> how things will be done they can fine tune their buffer caches to get much
> better performance than the kernel (15-20% in informix's case) since the
> kernel needs to be a "works good generally"

They go to all that work for 15-20% ???  Remind me not to follow that
primrose path.  I can think of lots of places where we can buy 20% for
less work than implementing (and maintaining) our own raw-device access
layer.

> perhaps a FAQ entry or comment in the shipped config about it?
> I think if people realize it isn't quite the same as what it does in
> oracle/informix/etc then they'll be less inclined to cranking it.

Good thought.  But we do need to set the default to something a tad
more realistic-for-2003 than 64 buffers ...

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

Dennis Björklund

Дата:

18 января 2003 г., 05:39:54

On Thu, 16 Jan 2003, Roman Fail wrote:

> It sort of feels like a magic moment.  I went back and looked through a
> lot of the JOIN columns and found that I was mixing int4 with int8 in a
> lot of them.

There is note about it in the docs:

http://www.postgresql.org/idocs/index.php?datatype.html#DATATYPE-INT

I don't know if this is in a faq anywhere, but it should be. I myself have
helped a number of persons with this. Every once in a while there come
someone in to the #postgresql irc channel with the exact same problem.
Usually they leave the channel very happy, when their queries take less
then a second instead of minutes.

--
/Dennis

Re: 7.3.1 New install, large queries are slow

От

Andrew Sullivan

Дата:

18 января 2003 г., 17:55:18

On Fri, Jan 17, 2003 at 11:49:31PM -0500, Tom Lane wrote:
> Jeff <threshar@torgo.978.org> writes:
> > Informix, oracle, etc all do raw device access bypassing the kernels
> > buffering, etc.  So they need heaping gobules of memory to do the same
> > thing the kernel does..
>
> D'oh, I believe Jeff's put his finger on it.  You need lotsa RAM if you
> are trying to bypass the OS.  But Postgres would like to work with the
> OS, not bypass it.

One of the interesting things I have been playing with on Solaris
recently is the various no-buffer settings you can give to the kernel
for filesystems.  The idea is that you don't have the kernel do the
buffering, and you set your database's shared memory setting
_reeeeal_ high.

As nearly as I can tell, there is again no benefit with PostgreSQL.
I'd also be amazed if this approach is a win for other systems.  But
a lot of DBAs seem to believe that they know better than their
computers which tables are "really" accessed frequently.  I think
they must be smarter than I am: I'd rather trust a system that was
designed to track these things and change the tuning on the fly,
myself.

(To be fair, there are some cases where you have an
infrequently-accessed table which nevertheless is required to be fast
for some reason or other, so you might want to force it to stay in
memory.)

A

----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: 7.3.1 New install, large queries are slow

От

"Shridhar Daithankar"

Дата:

20 января 2003 г., 04:34:29

On 17 Jan 2003 at 12:33, Tom Lane wrote:

> "Josh Berkus" <josh@agliodbs.com> writes:
> >>> shared_buffers = 131072
> >>
> >> Yipes!  Try about a tenth that much.  Or less.
>
> > Why?  He has 4GB RAM on the machine.
>
> I think a gig of shared buffers is overkill no matter what.
>
> One reason not to crank up shared_buffers "just because you can" is that
> there are operations (such as CHECKPOINT) that have to scan through all
> the buffers, linearly.  I don't *think* any of these are in
> performance-critical paths, but nonetheless you're wasting CPU.  I trust

Assuming that one knows what he/she is doing, would it help in such cases i.e.
the linear search thing, to bump up page size to day 16K/32K?

and that is also the only way to make postgresql use more than couple of gigs
of RAM, isn't it?

Bye
 Shridhar

--
Arithmetic:    An obscure art no longer practiced in the world's developed
countries.

Re: 7.3.1 New install, large queries are slow

От

Tom Lane

Дата:

20 января 2003 г., 05:14:54

"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> On 17 Jan 2003 at 12:33, Tom Lane wrote:
>> One reason not to crank up shared_buffers "just because you can" is that
>> there are operations (such as CHECKPOINT) that have to scan through all
>> the buffers, linearly.  I don't *think* any of these are in
>> performance-critical paths, but nonetheless you're wasting CPU.  I trust

> Assuming that one knows what he/she is doing, would it help in such cases i.e.
> the linear search thing, to bump up page size to day 16K/32K?

You mean increase page size and decrease the number of buffers
proportionately?  It'd save on buffer-management overhead, but
I wouldn't assume there'd be an overall performance gain.  The
system would have to do more I/O per page read or written; which
might be a wash for sequential scans, but I bet it would hurt for
random access.

> and that is also the only way to make postgresql use more than couple of gigs
> of RAM, isn't it?

It seems quite unrelated.  The size of our shared memory segment is
limited by INT_MAX --- chopping it up differently won't change that.

In any case, I think worrying because you can't push shared buffers
above two gigs is completely wrongheaded, for the reasons already
discussed in this thread.  The notion that Postgres can't use more
than two gig because its shared memory is limited to that is
*definitely* wrongheaded.  We can exploit however much memory your
kernel can manage for kernel disk cache.

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

"Shridhar Daithankar"

Дата:

20 января 2003 г., 05:44:10

On 20 Jan 2003 at 2:14, Tom Lane wrote:

> "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> > Assuming that one knows what he/she is doing, would it help in such cases i.e.
> > the linear search thing, to bump up page size to day 16K/32K?
>
> You mean increase page size and decrease the number of buffers
> proportionately?  It'd save on buffer-management overhead, but
> I wouldn't assume there'd be an overall performance gain.  The
> system would have to do more I/O per page read or written; which
> might be a wash for sequential scans, but I bet it would hurt for
> random access.

Right. But it has its own applications. If I am saving huge data blocks like
say gene stuff, I might be better off living with a relatively bigger page
fragmentation.

> > and that is also the only way to make postgresql use more than couple of gigs
> > of RAM, isn't it?
>
> It seems quite unrelated.  The size of our shared memory segment is
> limited by INT_MAX --- chopping it up differently won't change that.

Well, if my page size is doubled, I can get double amount of shared buffers.
That was the logic nothing else.

> In any case, I think worrying because you can't push shared buffers
> above two gigs is completely wrongheaded, for the reasons already
> discussed in this thread.  The notion that Postgres can't use more
> than two gig because its shared memory is limited to that is
> *definitely* wrongheaded.  We can exploit however much memory your
> kernel can manage for kernel disk cache.

Well, I agree completely. However there are folks and situation which demands
things because they can be done. This is just to check out the absolute limit
what it can manage.


Bye
 Shridhar

--
Bagdikian's Observation:    Trying to be a first-rate reporter on the average
American newspaper    is like trying to play Bach's "St. Matthew Passion" on a
ukelele.

Very large caches (was Re: 7.3.1 New install, large queries are slow)

От

Ron Johnson

Дата:

20 января 2003 г., 07:41:32

On Mon, 2003-01-20 at 01:14, Tom Lane wrote:
> "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> > On 17 Jan 2003 at 12:33, Tom Lane wrote:
[snip]
> > and that is also the only way to make postgresql use more than couple of gigs
> > of RAM, isn't it?
>
> It seems quite unrelated.  The size of our shared memory segment is
> limited by INT_MAX --- chopping it up differently won't change that.
>
> In any case, I think worrying because you can't push shared buffers
> above two gigs is completely wrongheaded, for the reasons already
> discussed in this thread.  The notion that Postgres can't use more
> than two gig because its shared memory is limited to that is
> *definitely* wrongheaded.  We can exploit however much memory your
> kernel can manage for kernel disk cache.

http://www.redhat.com/services/techsupport/production/GSS_caveat.html
"RAM Limitations on IA32
Red Hat Linux releases based on the 2.4 kernel -- including Red Hat
Linux 7.1, 7.2, 7.3 and Red Hat Linux Advanced Server 2.1 -- support
a maximum of 16GB of RAM."

So if I have some honking "big" Compaq Xeon SMP server w/ 16GB RAM,
and top(1) shows that there is 8GB of buffers, then Pg will be happy
as a pig in the mud?

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "Basically, I got on the plane with a bomb. Basically, I   |
|  tried to ignite it. Basically, yeah, I intended to damage |
|  the plane."                                               |
|    RICHARD REID, who tried to blow up American Airlines    |
|                  Flight 63                                 |
+------------------------------------------------------------+

Re: Very large caches (was Re: 7.3.1 New install, large queries are slow)

От

Tom Lane

Дата:

20 января 2003 г., 13:16:47

Ron Johnson <ron.l.johnson@cox.net> writes:
> So if I have some honking "big" Compaq Xeon SMP server w/ 16GB RAM,
> and top(1) shows that there is 8GB of buffers, then Pg will be happy
> as a pig in the mud?

Sounds good to me ...

            regards, tom lane

Re: 7.3.1 New install, large queries are slow

От

"Roman Fail"

Дата:

20 января 2003 г., 19:40:49

>> It sort of feels like a magic moment.  I went back and looked through a
>> lot of the JOIN columns and found that I was mixing int4 with int8 in a
>> lot of them.

>There is note about it in the docs:
>http://www.postgresql.org/idocs/index.php?datatype.html#DATATYPE-INT
>
>I don't know if this is in a faq anywhere, but it should be. I myself have
>helped a number of persons with this. Every once in a while there come
>someone in to the #postgresql irc channel with the exact same problem. 
>Usually they leave the channel very happy, when their queries take less
>then a second instead of minutes.
>
>--
>/Dennis

I'm really surprised that this issue doesn't pop up all the time.  As the community grows, I think it will start to.  I
camevery, very close to dropping PostgreSQL entirely because of it.  Hopefully the TODO issue on implicit type casting
willmove closer to the top of the hackers list.  But I'm just a beggar so I won't pretend to be a chooser.
 
 
Back to my original problems:  I re-created everything from scratch and made sure there are no int8's in my entire
database.  I found a few more places that I could create useful indexes as well.  I didn't get to test it over the
weekend,but today I played with it for several hours and could not get the queries to perform much better than last
week. I was about ready to give up, throw Postgres in the junk pile, and get out the MSSQL CD.  
 
 
Luckily, an unrelated post on one of the lists mentioned something about ANALYZE, and I realized that I had forgotten
torun it after all the new data was imported (although I did remember a VACUUM FULL).  After running ANALYZE, I started
gettingamazing results.....like a query that took 20 minutes last week was taking only 6 milliseconds now.  That kicks
theMSSQL server's ass all over the map (as I had originally expected it would!!!).
 
 
So things are working pretty good now....and it looks like the whole problem was the data type mismatch issue.  I hate
topoint fingers, but the pgAdminII Migration Wizard forces all your primary keys to be int8 even if you set the Type
Mapto int4.  The second time through I recognized this and did a pg_dump so I could switch everything to int4.  Now I'm
goingto write some minor mods in my Java programs for PGSQL-syntax compatibility, and will hopefully have the
PostgreSQLserver in production shortly.  
 
 
THANK YOU to everyone on pgsql-performance for all your help.  You are the reason that I'll be a long term member of
thePostgres community.  I hope that I can assist someone else out in the future.  
 
 
Roman Fail
Sr. Web Application Developer
POS Portal, Inc.

Re: 7.3.1 New install, large queries are slow

От

"Josh Berkus"

Дата:

20 января 2003 г., 20:15:20

Roman,

> I'm really surprised that this issue doesn't pop up all the time.  As
> the community grows, I think it will start to.

Actually, in the general sense of intelligent casting, the issue *does*
come up all the time.   Unfortunately, this is one of those issues that
requires both an inspired solution and program-wide overhaul work to
fix.   In fact, in the FTP achives you can find an alternate version of
Postgres (7.1 I think) where someone tried to fix the "stupid casting"
issue and succeeded in making Postgres crash and burn instead.

> Luckily, an unrelated post on one of the lists mentioned something
> about ANALYZE, and I realized that I had forgotten to run it after
> all the new data was imported (although I did remember a VACUUM
> FULL).  After running ANALYZE, I started getting amazing
> results.....like a query that took 20 minutes last week was taking
> only 6 milliseconds now.  That kicks the MSSQL server's ass all over
> the map (as I had originally expected it would!!!).

That's great!

> So things are working pretty good now....and it looks like the whole
> problem was the data type mismatch issue.  I hate to point fingers,
> but the pgAdminII Migration Wizard forces all your primary keys to be
> int8 even if you set the Type Map to int4.

So?  Send Dave Page (address at pgadmin.postgresql.org) a quick note
documenting the problem.  I'm sure he'll patch it, or at least fix it
for PGAdmin III.

> THANK YOU to everyone on pgsql-performance for all your help.  You
> are the reason that I'll be a long term member of the Postgres
> community.  I hope that I can assist someone else out in the future.

You're welcome!   If you can get your boss to authorize it, the
Advocacy page (advocacy.postgresql.org) could use some more business
testimonials.

-Josh Berkus

Re: 7.3.1 New install, large queries are slow

От

"Roman Fail"

Дата:

20 января 2003 г., 20:33:50

> Jochem van Dieten wrote:
> Just out of curiosity and for archiving purposes, could you post the new
> EXPLAIN ANALYZE output to the list?

To reiterate, the batchdetail table is 24 million rows, batchheader is 2.7 million, and purc1 is 1 million.  The rest
are2000 rows or less.  I think having the 6-disk RAID-10 devoted to /usr/local/pgsql/data helps out a little here.   I
didtry changing the WHERE clauses to radically different values and it was still just as fast.  This is the original
queryI was working with (plus suggested modifications from the list):
 

EXPLAIN ANALYZE
SELECT ss.batchdate, ss.batchdetailid, ss.bankno, ss.trandate, ss.tranamount,
ss.submitinterchange, ss.authamount, ss.authno, ss.cardtypeid, ss.mcccode,
ss.name AS merchantname, ss.cardtype, ss.merchid,
p1.localtaxamount, p1.productidentifier, dr.avsresponse,
cr.checkoutdate, cr.noshowindicator, ck.checkingacctno,
ck.abaroutingno, ck.checkno
FROM
  (SELECT b.batchdate, d.batchdetailid, t.bankno, d.trandate, d.tranamount,
       d.submitinterchange, d.authamount, d.authno, d.cardtypeid, d.mcccode,
       m.name, c.cardtype, m.merchid
   FROM tranheader t, batchheader b, merchants m, cardtype c, batchdetail d
   WHERE t.tranheaderid=b.tranheaderid
   AND m.merchantid=b.merchantid
   AND d.batchid=b.batchid
   AND c.cardtypeid=d.cardtypeid
   AND t.clientid = 6
   AND d.tranamount BETWEEN 500.0 AND 700.0
   AND b.batchdate > '2002-12-15'
   AND m.merchid = '701252267') ss
  LEFT JOIN purc1 p1 on p1.batchdetailid=ss.batchdetailid
  LEFT JOIN direct dr ON dr.batchdetailid = ss.batchdetailid
  LEFT JOIN carrental cr ON cr.batchdetailid = ss.batchdetailid
  LEFT JOIN checks ck ON ck.batchdetailid = ss.batchdetailid
ORDER BY ss.batchdate DESC
LIMIT 50

Limit  (cost=1351.93..1351.93 rows=1 width=261) (actual time=5.34..5.36 rows=8 loops=1)
  ->  Sort  (cost=1351.93..1351.93 rows=1 width=261) (actual time=5.33..5.34 rows=8 loops=1)
        Sort Key: b.batchdate
        ->  Nested Loop  (cost=0.01..1351.92 rows=1 width=261) (actual time=1.61..5.24 rows=8 loops=1)
              ->  Hash Join  (cost=0.01..1346.99 rows=1 width=223) (actual time=1.58..5.06 rows=8 loops=1)
                    Hash Cond: ("outer".batchdetailid = "inner".batchdetailid)
                    ->  Hash Join  (cost=0.00..1346.98 rows=1 width=210) (actual time=1.21..4.58 rows=8 loops=1)
                          Hash Cond: ("outer".batchdetailid = "inner".batchdetailid)
                          ->  Nested Loop  (cost=0.00..1346.97 rows=1 width=201) (actual time=0.82..4.05 rows=8
loops=1)
                                ->  Nested Loop  (cost=0.00..1343.84 rows=1 width=182) (actual time=0.78..3.82 rows=8
loops=1)
                                      Join Filter: ("inner".cardtypeid = "outer".cardtypeid)
                                      ->  Nested Loop  (cost=0.00..1342.62 rows=1 width=172) (actual time=0.74..3.35
rows=8loops=1)
 
                                            ->  Nested Loop  (cost=0.00..539.32 rows=4 width=106) (actual
time=0.17..1.61rows=26 loops=1)
 
                                                  ->  Nested Loop  (cost=0.00..515.48 rows=5 width=94) (actual
time=0.13..1.01rows=26 loops=1)
 
                                                        ->  Index Scan using merchants_ix_merchid_idx on merchants m
(cost=0.00..5.65rows=1 width=78) (actual time=0.07..0.08 rows=1 loops=1)
 
                                                              Index Cond: (merchid = '701252267'::character varying)
                                                        ->  Index Scan using batchheader_ix_merchantid_idx on
batchheaderb  (cost=0.00..508.56 rows=20 width=16) (actual time=0.04..0.81 rows=26 loops=1)
 
                                                              Index Cond: ("outer".merchantid = b.merchantid)
                                                              Filter: (batchdate > '2002-12-15'::date)
                                                  ->  Index Scan using tranheader_pkey on tranheader t
(cost=0.00..5.08rows=1 width=12) (actual time=0.01..0.01 rows=1 loops=26)
 
                                                        Index Cond: (t.tranheaderid = "outer".tranheaderid)
                                                        Filter: (clientid = 6)
                                            ->  Index Scan using batchdetail_ix_batchid_idx on batchdetail d
(cost=0.00..186.81rows=2 width=66) (actual time=0.05..0.06 rows=0 loops=26)
 
                                                  Index Cond: (d.batchid = "outer".batchid)
                                                  Filter: ((tranamount >= 500.0) AND (tranamount <= 700.0))
                                      ->  Seq Scan on cardtype c  (cost=0.00..1.10 rows=10 width=10) (actual
time=0.00..0.03rows=10 loops=8)
 
                                ->  Index Scan using purc1_ix_batchdetailid_idx on purc1 p1  (cost=0.00..3.12 rows=1
width=19)(actual time=0.01..0.01 rows=0 loops=8)
 
                                      Index Cond: (p1.batchdetailid = "outer".batchdetailid)
                          ->  Hash  (cost=0.00..0.00 rows=1 width=9) (actual time=0.00..0.00 rows=0 loops=1)
                                ->  Seq Scan on direct dr  (cost=0.00..0.00 rows=1 width=9) (actual time=0.00..0.00
rows=0loops=1)
 
                    ->  Hash  (cost=0.00..0.00 rows=1 width=13) (actual time=0.01..0.01 rows=0 loops=1)
                          ->  Seq Scan on carrental cr  (cost=0.00..0.00 rows=1 width=13) (actual time=0.00..0.00
rows=0loops=1)
 
              ->  Index Scan using checks_ix_batchdetailid_idx on checks ck  (cost=0.00..4.92 rows=1 width=38) (actual
time=0.01..0.01rows=0 loops=8)
 
                    Index Cond: (ck.batchdetailid = "outer".batchdetailid)
Total runtime: 5.89 msec

Re: 7.3.1 New install, large queries are slow

От

Sean Chittenden

Дата:

27 января 2003 г., 06:18:22

> >> Yeah, but isn't that theory a hangover from pre-Unix operating systems?
>
> > Informix, oracle, etc all do raw device access bypassing the kernels
> > buffering, etc.  So they need heaping gobules of memory to do the same
> > thing the kernel does..
>
> D'oh, I believe Jeff's put his finger on it.  You need lotsa RAM if you
> are trying to bypass the OS.  But Postgres would like to work with the
> OS, not bypass it.
>
> > but since they know the exact patterns of data and
> > how things will be done they can fine tune their buffer caches to get much
> > better performance than the kernel (15-20% in informix's case) since the
> > kernel needs to be a "works good generally"
>
> They go to all that work for 15-20% ???  Remind me not to follow that
> primrose path.  I can think of lots of places where we can buy 20% for
> less work than implementing (and maintaining) our own raw-device access
> layer.

This is related somewhat to the raw device access discussion.  This is
a quote from Matt Dillion (FreeBSD VM guru) on the topic of disk
caches (Message-Id:
<200301270657.h0R6v2qH071774@apollo.backplane.com>) and a few bits at
the end:


### Begin quote
    Mmmmm.  Basically what it comes down to is that without foreknowledge
    of the data locations being accessed, it is not possible for any
    cache algorithm to adapt to all the myriad ways data might be accessed.
    If you focus the cache on one methodology it will probably perform
    terribly when presented with some other methodology.

    What this means is that for the cases where you *KNOW* how a program
    intends to access a dataset larger then main memory, it is better
    to have the program explicitly cache/not-cache the data under program
    control rather then trying to force the system cache to adapt.

    I'll also be nice and decode some of Terry's Jargon for the rest of
    the readers.

:will result in significant failure of random page replacement to
:result in cache hits; likewise, going to 85% overage will practically
:guarantee an almost 100% failure rate, as cyclical access with random
:replacement is statistically more likely, in aggregate, to replace
:the pages which are there longer (the probability is iterative and
:additive: it's effectively a permutation).

    What Terry is saying is that if you have a dataset that is 2x
    the size of your cache, the cache hit rate on that data with random
    page replacement is NOT going to be 50%.  This is because with random
    page replacement the likelihood of a piece of data being found in
    the cache depends on how long the data has been sitting in the cache.
    The longer the data has been sitting in the cache, the less likely you
    will find it when you need it (because it is more likely to have been
    replaced by the random replacement algorithm over time).

    So, again, the best caching methodology to use in the case where
    you *know* how much data you will be accessing and how you will access
    it is to build the caching directly into your program and not depend
    on system caching at all (for datasets which are far larger then
    main memory).

    This is true of all systems, not just BSD.  This is one reason why
    databases do their own caching (another is so databases know when an
    access will require I/O for scheduling reasons, but that's a different
    story).

    The FreeBSD VM caching system does prevent one process from exhausting
    another process's cached data due to a sequential access, but the
    FreeBSD VM cache does not try to outsmart sequential data accesses to
    datasets which are larger then available cache space because it's an
    insanely difficult (impossible) problem to solve properly without
    foreknowledge of what data elements will be accessed when.

    This isn't to say that we can't improve on what we have now.
    I certainly think we can.  But there is no magic bullet that will
    deal with every situation.

                        -Matt
### End quote

So if there really is only a 15-20% performance gain to be had from
using raw disk access, that 15-20% loss comes from not being able to
tell the OS what to cache, what not to cache, and what order to have
the pages in... which only really matters if there is RAM available to
the kernel to cache, and that it is able to determine what is valuable
to cache in the course of its operations.  Predictive caching by the
OS isn't done because it understands PostgreSQL, because it
understands a generic algorithm for page hits/misses.

What is interesting after reading this, however, is the prospect of a
15-20% speed up on certain tables that we know are accessed frequently
by implicitly specifying a set of data to be preferred in a user space
cache.  It's impossible for the OS to cache the pages that make the
biggest impact on user visible performance given the OS has no
understanding of what pages make a big difference on user visible
performance, a user land database process, however, would.

As things stand, it's entirely possible for a set of large queries to
come through and wipe the kernel's cache that smaller queries were
using.  Once a cache misses, the kernel then has to fetch the data
again which could slow down over all number of transactions per
second.  That said, this is something that an in-database scheduler
could avoid by placing a lower priority on larger, more complex
queries with the assumption being that having the smaller queries
continue to process and get in/out is more important than shaving a
few seconds off of a larger query that would deplete the cache used by
the smaller queries.  Oh to be a DBA and being able to make those
decisions instead of the kernel...

Hrm, so two ideas or questions come to mind:

1) On some of my really large read only queries, it would be SUUUPER
   nice to be able to re-nice the process from SQL land to 5, 10, or
   even 20.  IIRC, BSD's VM system is smart enough to prevent lower
   priority jobs from monopolizing the disk cache, which would let the
   smaller faster turn around queries, continue to exist with their
   data in the kernel's disk cache.  (some kind of query complexity
   threshold that results in a reduction of priority or an explicit
   directive to run at a lower priority)

2) Is there any way of specifying that a particular set of tables
   should be kept in RAM or some kind of write through cache?  I know
   data is selected into a backend out of the catalogs, but would it
   be possible to have them kept in memory and only re-read on change
   with some kind of semaphore?  Now that all system tables are in
   their own schemas (pg_catalog and pg_toast), would it be hard to
   set a flag on a change to those tables that would cause the
   postmaster, or children, to re-read then instead of rely on their
   cache?  With copy-on-write forking, this could be pretty efficient
   if the postmaster did this and forked off a copy with the tables
   already in memory instead of on disk.

Just a few ideas/ramblings, hope someone finds them interesting... the
renice function is one that I think I'll spend some time looking into
here shortly actually.  -sc

--
Sean Chittenden

Re: 7.3.1 New install, large queries are slow

От

Curt Sampson

Дата:

27 января 2003 г., 08:34:56

On Mon, 27 Jan 2003, Sean Chittenden wrote:

>     The FreeBSD VM caching system does prevent one process from exhausting
>     another process's cached data due to a sequential access, but the
>     FreeBSD VM cache does not try to outsmart sequential data accesses to
>     datasets which are larger then available cache space because it's an
>     insanely difficult (impossible) problem to solve properly without
>     foreknowledge of what data elements will be accessed when.

This is not impossible; Solaris does just this. I'm a little short of
time right now, but I can probably dig up the paper on google if nobody
else finds it.

Also, it is not hard to give the OS foreknowledge of your access
pattern, if you use mmap. Just call madvise and use the MADV_RANDOM,
MADV_SEQUENTIAL, MADV_WILLNEED and MADV_DONTNEED flags. (This is one
of the reasons I think we might see a performance improvement from
switching from regular I/O to mmap I/O.)

You can go back through the archives and see a much fuller discussion of
all of this.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

Re: 7.3.1 New install, large queries are slow

От

Ron Johnson

Дата:

27 января 2003 г., 13:17:30

On Mon, 2003-01-27 at 04:34, Curt Sampson wrote:
> On Mon, 27 Jan 2003, Sean Chittenden wrote:
>
> >     The FreeBSD VM caching system does prevent one process from exhausting
> >     another process's cached data due to a sequential access, but the
> >     FreeBSD VM cache does not try to outsmart sequential data accesses to
> >     datasets which are larger then available cache space because it's an
> >     insanely difficult (impossible) problem to solve properly without
> >     foreknowledge of what data elements will be accessed when.
>
> This is not impossible; Solaris does just this. I'm a little short of

Quite.  One way to do it is:
- the OS notices that process X has been sequentially reading thru
  file Y for, say, 3 seconds.
- the OS knows that X is currently at the mid-point of file Y
- OS says, "Hey, I think I'll be a bit more agressive about, when I
  have a bit of time, trying to read Y faster than X is requesting
  it

It wouldn't work well, though, in a client-server DB like Postgres,
which, in a busy multi-user system, is constantly hitting different
parts of different files.

The algorithm, though, is used in the RDBMS Rdb.  It uses the algorithm
above, substituting "process X" for "client X", and passes the agressive
reads of Y on to the OS.  It's a big win when processing a complete
table, like during a CREATE INDEX, or  "SELECT foo, COUNT(*)" where
there's no index on foo.

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "Fear the Penguin!!"                                          |
+---------------------------------------------------------------+

Re: 7.3.1 New install, large queries are slow

От

Bruce Momjian

Дата:

27 января 2003 г., 19:10:57

Detecting sequential scan and increasing read-ahead is a standard OS
capability, and most/all do that already.  Solaris has code to detect
when a sequential scan is wiping the cache and adjusting the buffer
frees, called "free-behind."

---------------------------------------------------------------------------

Ron Johnson wrote:
> On Mon, 2003-01-27 at 04:34, Curt Sampson wrote:
> > On Mon, 27 Jan 2003, Sean Chittenden wrote:
> >
> > >     The FreeBSD VM caching system does prevent one process from exhausting
> > >     another process's cached data due to a sequential access, but the
> > >     FreeBSD VM cache does not try to outsmart sequential data accesses to
> > >     datasets which are larger then available cache space because it's an
> > >     insanely difficult (impossible) problem to solve properly without
> > >     foreknowledge of what data elements will be accessed when.
> >
> > This is not impossible; Solaris does just this. I'm a little short of
>
> Quite.  One way to do it is:
> - the OS notices that process X has been sequentially reading thru
>   file Y for, say, 3 seconds.
> - the OS knows that X is currently at the mid-point of file Y
> - OS says, "Hey, I think I'll be a bit more agressive about, when I
>   have a bit of time, trying to read Y faster than X is requesting
>   it
>
> It wouldn't work well, though, in a client-server DB like Postgres,
> which, in a busy multi-user system, is constantly hitting different
> parts of different files.
>
> The algorithm, though, is used in the RDBMS Rdb.  It uses the algorithm
> above, substituting "process X" for "client X", and passes the agressive
> reads of Y on to the OS.  It's a big win when processing a complete
> table, like during a CREATE INDEX, or  "SELECT foo, COUNT(*)" where
> there's no index on foo.
>
> --
> +---------------------------------------------------------------+
> | Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
> | Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
> |                                                               |
> | "Fear the Penguin!!"                                          |
> +---------------------------------------------------------------+
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: 7.3.1 New install, large queries are slow

От

Ron Johnson

Дата:

27 января 2003 г., 23:42:43

On Mon, 2003-01-27 at 15:08, Bruce Momjian wrote:
> Detecting sequential scan and increasing read-ahead is a standard OS
> capability, and most/all do that already.  Solaris has code to detect
> when a sequential scan is wiping the cache and adjusting the buffer
> frees, called "free-behind."

Ah, didn't know that.

> ---------------------------------------------------------------------------
>
> Ron Johnson wrote:
> > On Mon, 2003-01-27 at 04:34, Curt Sampson wrote:
> > > On Mon, 27 Jan 2003, Sean Chittenden wrote:
> > >
> > > >     The FreeBSD VM caching system does prevent one process from exhausting
> > > >     another process's cached data due to a sequential access, but the
> > > >     FreeBSD VM cache does not try to outsmart sequential data accesses to
> > > >     datasets which are larger then available cache space because it's an
> > > >     insanely difficult (impossible) problem to solve properly without
> > > >     foreknowledge of what data elements will be accessed when.
> > >
> > > This is not impossible; Solaris does just this. I'm a little short of
> >
> > Quite.  One way to do it is:
> > - the OS notices that process X has been sequentially reading thru
> >   file Y for, say, 3 seconds.
> > - the OS knows that X is currently at the mid-point of file Y
> > - OS says, "Hey, I think I'll be a bit more agressive about, when I
> >   have a bit of time, trying to read Y faster than X is requesting
> >   it
> >
> > It wouldn't work well, though, in a client-server DB like Postgres,
> > which, in a busy multi-user system, is constantly hitting different
> > parts of different files.
> >
> > The algorithm, though, is used in the RDBMS Rdb.  It uses the algorithm
> > above, substituting "process X" for "client X", and passes the agressive
> > reads of Y on to the OS.  It's a big win when processing a complete
> > table, like during a CREATE INDEX, or  "SELECT foo, COUNT(*)" where
> > there's no index on foo.

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "Fear the Penguin!!"                                          |
+---------------------------------------------------------------+

Re: 7.3.1 New install, large queries are slow

От

Curt Sampson

Дата:

28 января 2003 г., 08:54:53

On Tue, 27 Jan 2003, Ron Johnson wrote:

> [read-ahead detection stuff deleted]
>
> It wouldn't work well, though, in a client-server DB like Postgres,
> which, in a busy multi-user system, is constantly hitting different
> parts of different files.

It works great. You just do it on a file-descriptor by file-descriptor
basis.

Unfortunately, I don't know of any OSes that detect backwards scans.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: 7.3.1 New install, large queries are slow

Вложения

Вложения