Обсуждение: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

Поиск

Список

Период

Сортировка

Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

11 января 2011 г., 14:53:33

Hello,
I have noticed that in 9.0.2 EXPLAIN ANALYZE takes considerably longer time to return than the actual query. I never
noticedthis in 8.3. 
Example:
In 9.0.2 i get
dynacom=# EXPLAIN ANALYZE SELECT count(ms.id) from marinerstates ms,mariner m where ms.endtime IS NOT NULL AND
ms.marinerid=m.id;
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4716.48..4716.49 rows=1 width=4) (actual time=413.378..413.379 rows=1 loops=1)
   ->  Hash Join  (cost=788.77..4582.04 rows=53775 width=4) (actual time=54.029..330.286 rows=53845 loops=1)
         Hash Cond: (ms.marinerid = m.id)
         ->  Seq Scan on marinerstates ms  (cost=0.00..2583.33 rows=53775 width=8) (actual time=0.005..99.092
rows=53845loops=1) 
               Filter: (endtime IS NOT NULL)
         ->  Hash  (cost=583.90..583.90 rows=16390 width=4) (actual time=54.011..54.011 rows=16390 loops=1)
               Buckets: 2048  Batches: 1  Memory Usage: 577kB
               ->  Seq Scan on mariner m  (cost=0.00..583.90 rows=16390 width=4) (actual time=0.004..27.587 rows=16390
loops=1)
 Total runtime: 413.495 ms

dynacom=# \timing
Timing is on.
dynacom=# SELECT count(ms.id) from marinerstates ms,mariner m where ms.endtime IS NOT NULL AND ms.marinerid=m.id;
 count
-------
 53845
(1 row)

Time: 39,038 ms

So the actual time (39 ms) differs considerably from the EXPLAIN ANALYZE Total runtime (413 ms) (10 times bigger)

In 8.3 i get
postgres@dynacom=# EXPLAIN ANALYZE SELECT count(ms.id) from marinerstates ms,mariner m where ms.endtime IS NOT NULL AND
ms.marinerid=m.id;
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5094.13..5094.14 rows=1 width=4) (actual time=84.180..84.180 rows=1 loops=1)
   ->  Hash Join  (cost=869.15..4954.27 rows=55942 width=4) (actual time=11.284..72.630 rows=54168 loops=1)
         Hash Cond: (ms.marinerid = m.id)
         ->  Seq Scan on marinerstates ms  (cost=0.00..2756.50 rows=55942 width=8) (actual time=0.013..25.696
rows=54168loops=1) 
               Filter: (endtime IS NOT NULL)
         ->  Hash  (cost=642.40..642.40 rows=18140 width=4) (actual time=11.256..11.256 rows=16442 loops=1)
               ->  Seq Scan on mariner m  (cost=0.00..642.40 rows=18140 width=4) (actual time=0.007..5.762 rows=16442
loops=1)
 Total runtime: 84.346 ms
(8 rows)

postgres@dynacom=# SELECT count(ms.id) from marinerstates ms,mariner m where ms.endtime IS NOT NULL AND
ms.marinerid=m.id;
 count
-------
 54168
(1 row)

Time: 42.224 ms

In 8.3 the Total runtime is only double the value of the actual time.

Is there anything that can shed some light on this?

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Tom Lane

Дата:

11 января 2011 г., 15:19:22

Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> I have noticed that in 9.0.2 EXPLAIN ANALYZE takes considerably longer time to return than the actual query. I never
noticedthis in 8.3. 

Same hardware?  What it sounds like is you're running 9.0 on a machine
with slow gettimeofday().

            regards, tom lane

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

12 января 2011 г., 10:18:54

Στις Tuesday 11 January 2011 18:19:11 ο/η Tom Lane έγραψε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> > I have noticed that in 9.0.2 EXPLAIN ANALYZE takes considerably longer time to return than the actual query. I
nevernoticed this in 8.3. 
>
> Same hardware?  What it sounds like is you're running 9.0 on a machine
> with slow gettimeofday().
>

It is on a different hardware. I have three setups here, i have written a small program to assess gettimeofday.
My setups are as follows
1) Prod machine (thereafter called LINUX_PROD) :
System:    Linux Suse 2.6.16.46-0.12-smp, 16 x Intel Xeon(R) X7350 @ 2.93GHz, 64GB memory
DB: PostgreSQL 8.3.13, shared_buffers=16GB, work_mem=512MB, db size=94GB
2) Dev machine (therafter called FBSD_DEV) :
System : FreeBSD 6.3, Intel(R) Core(TM)2 Duo CPU @ 2.80GHz, 2GB memory
DB: PostgreSQL 8.3.13, shared_buffers=512MB, work_mem=1MB, db size=76GB
3) Test machine (thereafter called FBSD_TEST) :
System: FreeBSD 8.1, 4 x AMD Phenom(tm) 965 @ 3.4 GHz, 8GB memory
DB: PostgreSQL 9.0.2, shared_buffers=5GB, work_mem=512MB, db size=7GB

The FBSD_TEST system,  in the majority of cases seems to outperform LINUX_PROD
(rough impression watching execution times of my app, random benchmarks, nothing too detailed, but it is still a very
fastsystem, 
and i have been testing for over 2 months with this new system, testing 9.0.2 performance, etc..)
However i found two issues with FBSD_TEST: The EXPLAIN ANALYZE performance, and one specific query
which runs considerably slower in FBSD_TEST than both LINUX_PROD and FBSD_DEV.
Regarding gettimeofday (2), i wrote this program :
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int main(int argc,char** argv) {
        struct timeval *tp=calloc(1,sizeof(struct timeval));
        int runna;
        for (runna=0;runna<1000000;runna++) {
                int rc=gettimeofday(tp,NULL);
                long micros = tp->tv_sec * 1000000 + tp->tv_usec;
                printf("cur_time=%u micro secs\n",micros);
        }
}

For some reason, LINUX_PROD runs this program much much faster than the two FBDs (FBSD_DEV, FBSD_PROD)
I get these values for
% time ./gettimeofday_test > /dev/null
LINUX_PROD real=0m0.373s, user=0m0.364s, sys=0m0.004s
FBSD_DEV real=4.29s, user=0.526s, sys=3.764s
FBSD_TEST real=1.66s, user=0.260s, sys=1.407s

So at least regarding gettimeofday, and setting linux aside, the slower fbsd (FBSD_DEV) seems considerably slower
than the faster fbsd (FBSD_TEST).

However for the query:

SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels
vsl,marinerm  
where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15'
and 
ms.starttime::date <= '2007-01-11'  and m.marinertype='Mariner'  and m.id not in
(SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  where mold.id=msold.marinerid and
msold.vslid=vslold.idand  
msold.state='Active' and coalesce(msold.endtime,now())::date >= '2006-07-15' and msold.starttime::date <= '2007-01-11'
andexists  
(select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and msold2.id <>
msold.idand  
msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months')  and mold.marinertype='Mariner' )
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');
with the EXPLAIN ANALYZE version

i get the following execution times: (with \timing) (note however that FBSD_DEV has a considerably smaller database,
theother two are compareable) 
FBSD_DEV : query : 240,419 ms, EXPLAIN ANALYZE query : Total runtime: 538.468 ms
LINUX_PROD : query : 219.568 ms, EXPLAIN ANALYZE query : Total runtime: 216.926 ms
FBSD_TEST : query : 2587,465 ms, EPXLAIN ANALYZE query : Total runtime: 93711.648 ms
The last numbers seem huge.

Up to now this query was the first operation in which i witnessed FBSD_TEST to be actually slower than LINUX_PROD,
but that could be my fault as a DBA, haven't looked much into EXPLAIN yet.
However the performance of EXPLAIN ANALYZE is very bad any way you see it.

In other queries (the majority of them), FBSD_TEST is slightly faster than LINUX_PROD, e.g
SELECT avg(md.perioddue) from status st,items it,machdefs md WHERE st.id<=3626961 AND st.apptblidval=it.id AND
it.defid=md.defid;
i get:
FBSD_DEV : query : 20166,047 ms, EXPLAIN ANALYZE : Total runtime: 74245.576 ms
LINUX_PROD : query : 5908.960 ms, EXPLAIN ANALYZE : Total runtime: 8847.124 ms
FBSD_TEST : query : 4190,873 ms, EXPLAIN ANALYZE : Total runtime: 30334.086 ms

Here we see that although FBSD_TEST is the faster of the 3, its EXPLAIN ANALYZE performance
is nowehere near the plain query.

What could i do to have a better understanding of why this is happening?
Thanx

>             regards, tom lane
>

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Tom Lane

Дата:

12 января 2011 г., 14:08:04

Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> Regarding gettimeofday (2), i wrote this program :
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/time.h>
> int main(int argc,char** argv) {
>         struct timeval *tp=calloc(1,sizeof(struct timeval));
>         int runna;
>         for (runna=0;runna<1000000;runna++) {
>                 int rc=gettimeofday(tp,NULL);
>                 long micros = tp->tv_sec * 1000000 + tp->tv_usec;
>                 printf("cur_time=%u micro secs\n",micros);
>         }
> }

Right offhand I'd wonder whether that was more bound by gettimeofday or
by printf.  Please try it without printf in the loop.

> 3) Test machine (thereafter called FBSD_TEST) :
> System: FreeBSD 8.1, 4 x AMD Phenom(tm) 965 @ 3.4 GHz, 8GB memory
> DB: PostgreSQL 9.0.2, shared_buffers=5GB, work_mem=512MB, db size=7GB

The shared_buffers setting seems way out of line compared to actual
memory on this machine.  Maybe it's swapping.

> i get the following execution times: (with \timing) (note however that FBSD_DEV has a considerably smaller database,
theother two are compareable) 
> FBSD_DEV : query : 240,419 ms, EXPLAIN ANALYZE query : Total runtime: 538.468 ms
> LINUX_PROD : query : 219.568 ms, EXPLAIN ANALYZE query : Total runtime: 216.926 ms
> FBSD_TEST : query : 2587,465 ms, EPXLAIN ANALYZE query : Total runtime: 93711.648 ms
> The last numbers seem huge.

Are they in fact the same query plans in each case?

            regards, tom lane

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

12 января 2011 г., 15:17:24

Στις Wednesday 12 January 2011 17:07:53 ο/η Tom Lane έγραψε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> > Regarding gettimeofday (2), i wrote this program :
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <sys/time.h>
> > int main(int argc,char** argv) {
> >         struct timeval *tp=calloc(1,sizeof(struct timeval));
> >         int runna;
> >         for (runna=0;runna<1000000;runna++) {
> >                 int rc=gettimeofday(tp,NULL);
> >                 long micros = tp->tv_sec * 1000000 + tp->tv_usec;
> >                 printf("cur_time=%u micro secs\n",micros);
> >         }
> > }
>
> Right offhand I'd wonder whether that was more bound by gettimeofday or
> by printf.  Please try it without printf in the loop.
>

Changed that to smth like: micros_total = micros_total + (double) micros;
instead of the printf to beat any compiler optimization, and still linux runs at light speed:
FBSD_TEST : user 0.089s, sys 1.4s
FBSD_DEV : user 0.183s, sys 3.8s
LINUX_PROD : user 0.168s, sys 0s
(regarding that gettimeofday is a syscall in FreeBSD, and that sys time is 0 for linux, makes me think some
optimizationis being done) 

> > 3) Test machine (thereafter called FBSD_TEST) :
> > System: FreeBSD 8.1, 4 x AMD Phenom(tm) 965 @ 3.4 GHz, 8GB memory
> > DB: PostgreSQL 9.0.2, shared_buffers=5GB, work_mem=512MB, db size=7GB
>
> The shared_buffers setting seems way out of line compared to actual
> memory on this machine.  Maybe it's swapping.
>

I tried with what pgtune suggests (1920MB) and i get same results.
For the simple query: SELECT avg(md.perioddue) from status st,items it,machdefs md WHERE st.id<=3626961 AND
st.apptblidval=it.idAND it.defid=md.defid; 
still FBSD_TEST runs faster than LINUX_PROD smth like (3.5 secs VS 5.8 secs)

> > i get the following execution times: (with \timing) (note however that FBSD_DEV has a considerably smaller
database,the other two are compareable) 
> > FBSD_DEV : query : 240,419 ms, EXPLAIN ANALYZE query : Total runtime: 538.468 ms
> > LINUX_PROD : query : 219.568 ms, EXPLAIN ANALYZE query : Total runtime: 216.926 ms
> > FBSD_TEST : query : 2587,465 ms, EPXLAIN ANALYZE query : Total runtime: 93711.648 ms
> > The last numbers seem huge.
>
> Are they in fact the same query plans in each case?

The query plans seem to differ. A lot of seq scans in the FBSD case.
I attach the query plans for LINUX_PROD, FBSD_TEST (the times did not change noticeably after the new shared_buffers
setting)

>
>             regards, tom lane
>



--
Achilleas Mantzios

Вложения

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Tom Lane

Дата:

12 января 2011 г., 15:45:31

Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> ���� Wednesday 12 January 2011 17:07:53 �/� Tom Lane ������:
>> Right offhand I'd wonder whether that was more bound by gettimeofday or
>> by printf.  Please try it without printf in the loop.

> Changed that to smth like: micros_total = micros_total + (double) micros;
> instead of the printf to beat any compiler optimization, and still linux runs at light speed:
> FBSD_TEST : user 0.089s, sys 1.4s
> FBSD_DEV : user 0.183s, sys 3.8s
> LINUX_PROD : user 0.168s, sys 0s

Well, there's your problem all right: the FBSD_DEV system takes 22X
longer to execute gettimeofday() than the LINUX_PROD system.  The
particular plan that 9.0 is choosing is especially vulnerable to this
because it involves a whole lot of calls of the Materialize plan node:

>                                    ->  Nested Loop  (cost=3390.49..9185.73 rows=1 width=4) (actual
time=279.916..93280.499rows=1824 loops=1) 
>                                          Join Filter: (msold.marinerid = mold.id)
>                                          ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..1957.89
rows=15914width=4) (actual time=0.009..38.449 rows=15914 loops=1) 
>                                                Filter: ((marinertype)::text = 'Mariner'::text)
>                                          ->  Materialize  (cost=3390.49..6989.13 rows=1 width=8) (actual
time=0.013..2.881rows=1888 loops=15914) 

You've got 1888 * 15914 * 2 gettimeofday() calls needed to count up the
time spent in Materialize, which at 3.8usec apiece adds up to ... hmm
... I get 228 seconds, which is more than the reported runtime.  So I
think there's something wrong with your measurement of the gettimeofday
cost.  But I wouldn't be a bit surprised to hear that the true cost of
gettimeofday on that machine is a microsecond or so, in which case all
of the EXPLAIN ANALYZE time bloat is indeed due to that.

This problem has been discussed repeatedly on the pgsql-performance
list, which is where your question really belongs, not here.  The short
of it is that cheap motherboards tend to provide cheap clock hardware
that takes about a microsecond to read.  It also helps to be running
an OS that is able to read the clock in userspace without invoking a
kernel call.  It looks like your Linux box is winning on both counts
compared to your BSD boxes.

            regards, tom lane

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

"Mark Felder"

Дата:

12 января 2011 г., 16:43:38

On Wed, 12 Jan 2011 10:45:20 -0600, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> The short of it is that cheap motherboards tend to provide cheap clock
> hardware that takes about a microsecond to read.

I'm not sure how gettimeofday() works, but is there a slight chance this
could be alleviated by choosing a different kern.timecounter.hardware
option?

kern.timecounter.choice: TSC(800) HPET(900) ACPI-fast(1000) i8254(0)
dummy(-1000000)
kern.timecounter.hardware: ACPI-fast

Or does the kernel timer stuff have nothing to do with gettimeofday()?

Regards,

Mark

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Alex Hunsaker

Дата:

12 января 2011 г., 18:17:29

2011/1/12 Achilleas Mantzios <achill@matrix.gatewaynet.com>:

>
> Changed that to smth like: micros_total = micros_total + (double) micros;
> instead of the printf to beat any compiler optimization, and still linux runs at light speed:
> FBSD_TEST : user 0.089s, sys 1.4s
> FBSD_DEV : user 0.183s, sys 3.8s
> LINUX_PROD : user 0.168s, sys 0s
> (regarding that gettimeofday is a syscall in FreeBSD, and that sys time is 0 for linux, makes me think some
optimizationis being done) 

At least on x86_64, linux will use a vsyscall (v for virtual) for
gettimeofday which does not actually need to enter/exit the kernel.
That probably explains a large part of the speed difference here.

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

13 января 2011 г., 09:18:04

Στις Wednesday 12 January 2011 21:17:00 ο/η Alex Hunsaker έγραψε:
> 2011/1/12 Achilleas Mantzios <achill@matrix.gatewaynet.com>:
>
> >
> > Changed that to smth like: micros_total = micros_total + (double) micros;
> > instead of the printf to beat any compiler optimization, and still linux runs at light speed:
> > FBSD_TEST : user 0.089s, sys 1.4s
> > FBSD_DEV : user 0.183s, sys 3.8s
> > LINUX_PROD : user 0.168s, sys 0s
> > (regarding that gettimeofday is a syscall in FreeBSD, and that sys time is 0 for linux, makes me think some
optimizationis being done) 
>
> At least on x86_64, linux will use a vsyscall (v for virtual) for
> gettimeofday which does not actually need to enter/exit the kernel.
> That probably explains a large part of the speed difference here.
>

Thanx, thats the ARCH (x86_64) of our linux system.


--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

13 января 2011 г., 09:35:22

Στις Wednesday 12 January 2011 19:17:28 ο/η Mark Felder έγραψε:
> On Wed, 12 Jan 2011 10:45:20 -0600, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > The short of it is that cheap motherboards tend to provide cheap clock
> > hardware that takes about a microsecond to read.
>
> I'm not sure how gettimeofday() works, but is there a slight chance this
> could be alleviated by choosing a different kern.timecounter.hardware
> option?
>
> kern.timecounter.choice: TSC(800) HPET(900) ACPI-fast(1000) i8254(0)
> dummy(-1000000)
> kern.timecounter.hardware: ACPI-fast
>
> Or does the kernel timer stuff have nothing to do with gettimeofday()?
Thanx,
Actually setting sysctl kern.timecounter.hardware=TSC
and making the program look like: (increased the loop from 1,000,000 to 100,000,000)

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int main(int argc,char** argv) {
        struct timeval *tp=calloc(1,sizeof(struct timeval));
        int runna;
        double micros_total=0;
        for (runna=0;runna<100000000;runna++) {
                int rc=gettimeofday(tp,NULL);
                long micros = tp->tv_sec * 1000000 + tp->tv_usec;
                micros_total = micros_total + (double) micros;
        }
        printf("micros_total=%f\n",micros_total);
}

I get in LINUX_PROD
time ./gettimeofday_test
micros_total=129491418242412709937152.000000

real    0m16.665s
user    0m16.621s
sys     0m0.000s

and in FBSD_TEST
time ./gettimeofday_test
micros_total=129491437864563859521536.000000

real    0m15.506s
user    0m0.517s
sys     0m1.161s

So, the FBSD_TEST case although still involves some kernel overhead, it is faster than LINUX_PROD
(but i am puzzled as to where those 15.5 secs come from since 1.16+
>
>
> Regards,
>
>
> Mark
>



--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

13 января 2011 г., 11:13:32

Στις Wednesday 12 January 2011 18:45:20 ο/η Tom Lane έγραψε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> > Στις Wednesday 12 January 2011 17:07:53 ο/η Tom Lane έγραψε:
> >> Right offhand I'd wonder whether that was more bound by gettimeofday or
> >> by printf.  Please try it without printf in the loop.
>
> > Changed that to smth like: micros_total = micros_total + (double) micros;
> > instead of the printf to beat any compiler optimization, and still linux runs at light speed:
> > FBSD_TEST : user 0.089s, sys 1.4s
> > FBSD_DEV : user 0.183s, sys 3.8s
> > LINUX_PROD : user 0.168s, sys 0s
>
> Well, there's your problem all right: the FBSD_DEV system takes 22X
> longer to execute gettimeofday() than the LINUX_PROD system.  The
> particular plan that 9.0 is choosing is especially vulnerable to this
> because it involves a whole lot of calls of the Materialize plan node:
>
> >                                    ->  Nested Loop  (cost=3390.49..9185.73 rows=1 width=4) (actual
time=279.916..93280.499rows=1824 loops=1) 
> >                                          Join Filter: (msold.marinerid = mold.id)
> >                                          ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..1957.89
rows=15914width=4) (actual time=0.009..38.449 rows=15914 loops=1) 
> >                                                Filter: ((marinertype)::text = 'Mariner'::text)
> >                                          ->  Materialize  (cost=3390.49..6989.13 rows=1 width=8) (actual
time=0.013..2.881rows=1888 loops=15914) 
>
> You've got 1888 * 15914 * 2 gettimeofday() calls needed to count up the
> time spent in Materialize, which at 3.8usec apiece adds up to ... hmm
> ... I get 228 seconds, which is more than the reported runtime.  So I
> think there's something wrong with your measurement of the gettimeofday
> cost.  But I wouldn't be a bit surprised to hear that the true cost of
> gettimeofday on that machine is a microsecond or so, in which case all
> of the EXPLAIN ANALYZE time bloat is indeed due to that.
>
> This problem has been discussed repeatedly on the pgsql-performance
> list, which is where your question really belongs, not here.  The short
> of it is that cheap motherboards tend to provide cheap clock hardware
> that takes about a microsecond to read.  It also helps to be running
> an OS that is able to read the clock in userspace without invoking a
> kernel call.  It looks like your Linux box is winning on both counts
> compared to your BSD boxes.

Just comparing the two FBSD boxes on equal terms gives crazy results, so
I'll try to leave LINUX out of the equation for a while, and also i'll try to leave the
difference in DB sizes out of the equation so what i did was make a dump of the small DB
on the small (slow) FBSD box (FBSD_DEV) and load it on the bigger (fast) FBSD box (FBSD_TEST).
I set : sysctl kern.timecounter.hardware=TSC on both systems
and now the below program

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int main(int argc,char** argv) {
        struct timeval *tp=calloc(1,sizeof(struct timeval));
        int runna;
        long double micros_total=0;
        for (runna=0;runna<1000000;runna++) {
                int rc=gettimeofday(tp,NULL);
                long micros = tp->tv_sec * 1000000 + tp->tv_usec;
                micros_total = micros_total + (long double) micros;
        }
        printf("micros_total=%LE\n",micros_total);
}

% time ./gettimeofday_test gives

FBSD_DEV (slow) : 0.127u 0.231s 0:00.35 100.0%    5+192k 0+0io 0pf+0w
FBSD_TEST (fast) : 0.002u 0.014s 0:00.15 6.6%      84+24192k 0+0io 0pf+0w

Database on the fast system, as i said is exactly the same as in the slow one.

The query "SELECT avg(md.perioddue) from status st,items it,machdefs md WHERE st.apptblidval=it.id AND
it.defid=md.defid;"
gives with \timing

FBSD_DEV (slow) : Time: 28747,374 ms, EXPLAIN ANALYZE: Total runtime: 31816.648 ms
FBSD_TEST (fast) : Time: 3153,372 ms, EXPLAIN ANALYZE: Total runtime: 5701.942 ms

(the fast system is indeed a lot faster on this one (by 10x))

However the query (known from the initial mail as well)
SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels
vsl,marinerm  
where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15'
and 
ms.starttime::date <= '2007-01-11'  and m.marinertype='Mariner'  and m.id not in
   (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  where mold.id=msold.marinerid and
msold.vslid=vslold.id 
      and msold.state='Active' and coalesce(msold.endtime,now())::date >= '2006-07-15' and msold.starttime::date <=
'2007-01-11'and exists  
         (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and
msold2.id<> msold.id and  
           msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months')  and
mold.marinertype='Mariner')  
       order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'')
gives:

FBSD_DEV (slow) : Time : 246,032 ms  EXPLAIN ANALYZE: Total runtime: 274.425 ms
FBSD_TEST (fast) : Time: 2314,914 ms EXPLAIN ANALYZE: Total runtime: 10438.061 ms

This query runs much much slower in the case of FBSD_TEST.

The explain analyze for this query are:

FBSD_DEV:

Unique  (cost=633677.56..633700.48 rows=1834 width=23) (actual time=270.184..271.042 rows=603 loops=1)
   ->  Sort  (cost=633677.56..633682.14 rows=1834 width=23) (actual time=270.182..270.440 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         Sort Method:  quicksort  Memory: 53kB
         ->  Hash Join  (cost=630601.65..633578.15 rows=1834 width=23) (actual time=194.815..269.294 rows=603 loops=1)
               Hash Cond: (ms.vslid = vsl.id)
               ->  Hash Join  (cost=630580.33..633530.01 rows=2261 width=27) (actual time=194.444..268.140 rows=603
loops=1)
                     Hash Cond: (ms.marinerid = m.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual
time=0.012..70.281rows=2625 loops=1) 
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= ' 
2006-07-15'::date))
                     ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) (actual time=194.391..194.391 rows=12832
loops=1)
                           ->  Index Scan using mariner_pkey on mariner m  (cost=628776.89..630491.54 rows=7103
width=23)(actual time=162.607..184.282 rows= 
12832 loops=1)
                                 Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
                                 SubPlan
                                   ->  Unique  (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.064..161.099
rows=1454loops=1) 
                                         ->  Nested Loop  (cost=0.00..628767.72 rows=1834 width=4) (actual
time=0.062..159.091rows=1835 loops=1) 
                                               ->  Nested Loop  (cost=0.00..627027.98 rows=1865 width=4) (actual
time=0.056..147.464rows=1876 loops=1) 
                                                     ->  Index Scan using marinerstates_marinerid on marinerstates
msold (cost=0.00..626316.07 rows=2299 wid 
th=8) (actual time=0.050..138.100 rows=1876 loops=1)
                                                           Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND ((COAL 
ESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan))
                                                           SubPlan
                                                             ->  Bitmap Heap Scan on marinerstates msold2
(cost=4.28..12.11rows=1 width=0) (actual time=0.0 
08..0.008 rows=1 loops=2625)
                                                                   Recheck Cond: ((marinerid = $0) AND (starttime <
$2))
                                                                   Filter: ((id <> $1) AND ((state)::text =
'Active'::text)AND (($2 - endtime) <= '1 year 6 
mons'::interval))
                                                                   ->  Bitmap Index Scan on
marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width= 
0) (actual time=0.005..0.005 rows=6 loops=2625)
                                                                         Index Cond: ((marinerid = $0) AND (starttime <
$2))
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.30rows=1 width=4) (actual time=0.002 
..0.003 rows=1 loops=1876)
                                                           Index Cond: (vslold.id = msold.vslid)
                                               ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..0.92
rows=1width=4) (actual time=0.004..0.004 
rows=1 loops=1876)
                                                     Index Cond: (mold.id = msold.marinerid)
                                                     Filter: ((mold.marinertype)::text = 'Mariner'::text)
               ->  Hash  (cost=17.81..17.81 rows=281 width=4) (actual time=0.363..0.363 rows=272 loops=1)
                     ->  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 width=4) (actual time=0.009..0.199
rows=272loops=1) 
 Total runtime: 271.466 ms

FBSD_TEST:

Unique  (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10455.286..10455.635 rows=603 loops=1)
   ->  Sort  (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10455.285..10455.390 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character
varying)),m.id 
         Sort Method:  quicksort  Memory: 71kB
         ->  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10418.032..10454.399 rows=603 loops=1)
               Hash Cond: (ms.marinerid = m.id)
               ->  Hash Join  (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.199..34.986 rows=2625 loops=1)
                     Hash Cond: (ms.vslid = vsl.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2889.32 rows=4590 width=8) (actual
time=0.015..33.507rows=2625 loops=1) 
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND
((COALESCE(endtime,now()))::date >= ' 
2006-07-15'::date))
                     ->  Hash  (cost=16.72..16.72 rows=272 width=4) (actual time=0.179..0.179 rows=272 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 10kB
                           ->  Seq Scan on vessels vsl  (cost=0.00..16.72 rows=272 width=4) (actual time=0.003..0.099
rows=272loops=1) 
               ->  Hash  (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10417.803..10417.803 rows=12832
loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 702kB
                     ->  Seq Scan on mariner m  (cost=7614.86..8172.57 rows=7143 width=23) (actual
time=10402.282..10412.840rows=12832 loops=1) 
                           Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text))
                           SubPlan 1
                             ->  Unique  (cost=2768.00..7614.86 rows=1 width=4) (actual time=85.291..10401.289
rows=1454loops=1) 
                                   ->  Nested Loop  (cost=2768.00..7614.86 rows=1 width=4) (actual
time=85.290..10400.399rows=1835 loops=1) 
                                         Join Filter: (msold.marinerid = mold.id)
                                         ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..1728.60
rows=14286width=4) (actual time=0.007..13.53 
1 rows=14286 loops=1)
                                               Filter: ((marinertype)::text = 'Mariner'::text)
                                         ->  Materialize  (cost=2768.00..5671.97 rows=1 width=8) (actual
time=0.003..0.329rows=1876 loops=14286) 
                                               ->  Nested Loop  (cost=2768.00..5671.96 rows=1 width=8) (actual
time=37.502..83.326rows=1876 loops=1) 
                                                     ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12)
(actualtime=37.491..79.434 rows=1876 loops= 
1)
                                                           Hash Cond: (msold.marinerid = msold2.marinerid)
                                                           Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime
<msold.starttime) AND ((msold.startti 
me - msold2.endtime) <= '1 year 6 mons'::interval))
                                                           ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32
rows=4590width=20) (actual time=0.004..3 
4.281 rows=2625 loops=1)
                                                                 Filter: (((state)::text = 'Active'::text) AND
((starttime)::date<= '2007-01-11'::date) AND 
((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
                                                           ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24)
(actualtime=37.446..37.446 rows=41250 loops=1) 
                                                                 Buckets: 8192  Batches: 1  Memory Usage: 2246kB
->  Seq Scan on marinerstates msold2  (cost=0.00..2251.66 rows=41307 width=24) (actual time=0.002..23.350 rows=41250
loops=1)
                                                                       Filter: ((state)::text = 'Active'::text)
                                                     ->  Index Scan using vessels_pkey on vessels vslold
(cost=0.00..0.28rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876) 
                                                           Index Cond: (vslold.id = msold.vslid)
 Total runtime: 10456.092 ms

So the questions remain,
a) Why do we have "EXPLAIN ANALYZE" in FBSD_TEST (fast) taking an order of magnitude slower than the actual query.
b) Why this particular query is so slow on this generally much faster system.

PS

I agree that this is a lot "performance" related, but the "EXPLAIN ANALYZE" part just dont seem right.
Thats why i posted on admin in the first place.

>
>             regards, tom lane
>



--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

13 января 2011 г., 11:27:38

Στις Thursday 13 January 2011 14:13:21 ο/η Achilleas Mantzios έγραψε:

> SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels
vsl,marinerm  
> where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >=
'2006-07-15'and  
> ms.starttime::date <= '2007-01-11'  and m.marinertype='Mariner'  and m.id not in
>    (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  where mold.id=msold.marinerid and
msold.vslid=vslold.id 
>       and msold.state='Active' and coalesce(msold.endtime,now())::date >= '2006-07-15' and msold.starttime::date <=
'2007-01-11'and exists  
>          (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and
msold2.id<> msold.id and  
>            msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months')  and
mold.marinertype='Mariner')  
>        order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'')

Rewriting the query with NOT EXISTS (a practice quite common pre 8.3, IIRC) as:

SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels
vsl,marinerm  
where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15'
and 
ms.starttime::date <= '2007-01-11'  and m.marinertype='Mariner'  and  NOT EXISTS
                  (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  where
mold.id=msold.marineridand  
                  msold.vslid=vslold.id and msold.state='Active' and coalesce(msold.endtime,now())::date >=
'2006-07-15'and  
                  msold.starttime::date <= '2007-01-11' and exists
                      (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and
                      msold2.state='Active' and msold2.id <> msold.id and msold2.starttime<msold.starttime AND
                      (msold.starttime-msold2.endtime)<='18 months')  and mold.marinertype='Mariner' AND mold.id=m.id)
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');

Is fast.

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

13 января 2011 г., 13:13:41

Στις Wednesday 12 January 2011 18:45:20 ο/η Tom Lane έγραψε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> > Στις Wednesday 12 January 2011 17:07:53 ο/η Tom Lane έγραψε:
> >> Right offhand I'd wonder whether that was more bound by gettimeofday or
> >> by printf.  Please try it without printf in the loop.
>
> > Changed that to smth like: micros_total = micros_total + (double) micros;
> > instead of the printf to beat any compiler optimization, and still linux runs at light speed:
> > FBSD_TEST : user 0.089s, sys 1.4s
> > FBSD_DEV : user 0.183s, sys 3.8s
> > LINUX_PROD : user 0.168s, sys 0s
>
> Well, there's your problem all right: the FBSD_DEV system takes 22X
> longer to execute gettimeofday() than the LINUX_PROD system.  The
> particular plan that 9.0 is choosing is especially vulnerable to this
> because it involves a whole lot of calls of the Materialize plan node:
>

You are absolutely right!
Changing kern.timecounter.hardware=TSC (from ACPI-fast) *solved* the slow explain analyze effect.
Now i get reasonable EXPLAIN ANALYZE times!

 It also helps to be running
> an OS that is able to read the clock in userspace without invoking a
> kernel call.  It looks like your Linux box is winning on both counts
> compared to your BSD boxes.

Next step, if i am lucky tonight, i'll set up a new linux box with 9.0.2 and load
the same small (coming from FBSD_DEV) database.

So my problem was twofold : a) EXPLAIN ANALYZE performance, b) the particular slow query.

It seems there is only b) to solve. (see the previous email)

A lot of thanx (but stay tuned ;)

>
>             regards, tom lane
>



--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

14 января 2011 г., 12:53:49

Στις Thursday 13 January 2011 16:13:23 ο/η Achilleas Mantzios έγραψε:
>
> Next step, if i am lucky tonight, i'll set up a new linux box with 9.0.2 and load
> the same small (coming from FBSD_DEV) database.
>
> So my problem was twofold : a) EXPLAIN ANALYZE performance, b) the particular slow query.
>
> It seems there is only b) to solve. (see the previous email)
>
> A lot of thanx (but stay tuned ;)

Linux behaves the same way. I setup postgersql-9.0.2 in a fairly new debian (Debian GNU/Linux 5.0)
(2x AMD athlon @2.2GZ, 4GB Mem) (lets call this LINUX_TEST)
with the exact same small db as currently in FBSD_TEST test (derived from FBSD_DEV)
and the times i get for this particular query (known from the previous posts) are bad
(and of course much worse than the faster FBSD_TEST, which is normal since LINUX_TEST
is a slower/older machine)
The plans are identical for the two 9.0.2 boxes. (FBSD_TEST, LINUX_TEST)
In the LINUX_TEST i have left only postgres running to decrease the uncontrolled consumption of resources
and reduce the probability of swapping (i.e. gave all machine to postgresql)

The query is (i rewrite it here)
SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels
vsl,marinerm where  
m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15' and
ms.starttime::date <= '2007-01-11'  and m.marinertype='Mariner'  and m.id not in
   (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  where mold.id=msold.marinerid and
msold.vslid=vslold.idand  
    msold.state='Active' and coalesce(msold.endtime,now())::date >= '2006-07-15' and msold.starttime::date <=
'2007-01-11'and  
    exists (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and
msold2.id<> msold.id and  
               msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months')
    and mold.marinertype='Mariner' )
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');

LINUX_TEST : query : Time: 6685.910 ms , EXPLAIN ANALYZE : Total runtime: 111169.959 ms
FBSD_TEST : query : Time: 2284,918 ms , EXPLAIN ANALYZE : Total runtime: 10432.405 ms

And for the NOT EXISTS version:
SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels
vsl,marinerm where  
m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15' and
ms.starttime::date <= '2007-01-11'  and m.marinertype='Mariner'  and NOT EXISTS
   (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  where mold.id=msold.marinerid and
msold.vslid=vslold.idand  
   msold.state='Active' and coalesce(msold.endtime,now())::date >= '2006-07-15' and msold.starttime::date <=
'2007-01-11'and  
   exists (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and
msold2.id<> msold.id and  
              msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months')
   and mold.marinertype='Mariner' AND mold.id=m.id)
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');

LINUX_TEST : query : Time: 771.730 ms , EXPLAIN ANALYZE : Total runtime: 1248.320 ms
FBSD_TEST : query : Time: 139,568 ms , EXPLAIN ANALYZE : Total runtime: 173.524 ms

So, it seems that the original "not in" version of the query suffers in postgresql 9.0.2

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

"Kevin Grittner"

Дата:

14 января 2011 г., 13:41:29

Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

> and NOT EXISTS
>    (SELECT distinct mold.id from marinerstates

Does that do better if you leave out the DISTINCT keyword?  Maybe
use * instead of the column name?

-Kevin

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

14 января 2011 г., 14:57:17

Στις Friday 14 January 2011 16:41:18 γράψατε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
>
> > and NOT EXISTS
> >    (SELECT distinct mold.id from marinerstates
>
> Does that do better if you leave out the DISTINCT keyword?  Maybe
> use * instead of the column name?

Its already really fast, it doesn't matter. Anyways just tried without the distinct
...NOT EXISTS (SELECT 1 from ...
and its the same speed. The other form (NOT IN) is the problem.

>
> -Kevin
>

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

14 января 2011 г., 15:02:39

Στις Friday 14 January 2011 16:41:18 γράψατε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
>
> > and NOT EXISTS
> >    (SELECT distinct mold.id from marinerstates
>
> Does that do better if you leave out the DISTINCT keyword?  Maybe
> use * instead of the column name?

Its already really fast, it doesn't matter. Anyways just tried without the distinct
...NOT EXISTS (SELECT 1 from ...
and its the same speed. The other form (NOT IN) is the problem.

>
> -Kevin
>

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

14 января 2011 г., 15:03:01

Στις Friday 14 January 2011 16:41:18 γράψατε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
>
> > and NOT EXISTS
> >    (SELECT distinct mold.id from marinerstates
>
> Does that do better if you leave out the DISTINCT keyword?  Maybe
> use * instead of the column name?

Its already really fast, it doesn't matter. Anyways just tried without the distinct
...NOT EXISTS (SELECT 1 from ...
and its the same (high) speed. The other form (NOT IN) is the problem (in the case of postgresql-9.0.2).
(in 8.3 both queries are fast)

>
> -Kevin
>

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

"Kevin Grittner"

Дата:

14 января 2011 г., 19:07:09

Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

> The other form (NOT IN) is the problem (in the case of
> postgresql-9.0.2).  (in 8.3 both queries are fast)

So running EXPLAIN ANALYZE for this plan on the 9.0 version checks
the time in more places than the 8.3 version.  For a plan that's an
order of magnitude slower than the plan used by the recommended
version of the query.  Have you looked at what options you have for
getting a faster clock reading on this "slower/older machine"?  Is
there something you're asking for as a change to the PostgreSQL
product?

-Kevin

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

14 января 2011 г., 19:55:58

Στις Friday 14 January 2011 22:06:56 ο/η Kevin Grittner έγραψε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
>
> > The other form (NOT IN) is the problem (in the case of
> > postgresql-9.0.2).  (in 8.3 both queries are fast)
>
> So running EXPLAIN ANALYZE for this plan on the 9.0 version checks
> the time in more places than the 8.3 version.  For a plan that's an
> order of magnitude slower than the plan used by the recommended
> version of the query.  Have you looked at what options you have for
> getting a faster clock reading on this "slower/older machine"?

Kevin, momentarily, lets forget about the slow EXPLAIN ANALYZE part.
Also lets forget about the fact that the LINUX_TEST running 9.0.2 is a "slower/older" machine.
(its definetely "faster/newer" than FBSD_DEV)
The reason i got into trouble setting a new linux box (the 4th box in the test) just for this problem was to show that
the thing is not FreeBSD related.

We have clearly a query that runs fine in two boxes (FreeBSD - FBSD_DEV, Linux LINUX_PROD) running PostgreSQL 8.3.13
but runs a lot slower in two other boxes (FreeBSD - FBSD_TEST, Linux LINUX_TEST) running PostgreSQL 9.0.2

It is true that the thread might have to move over to pgsql-performance, if it there was no 8.3.13 involved,
in other words if we had only the old "EXISTS" vs "IN" situation.

Here we have smth which runs fast in 8.3.13 (over a variety of OS/HW) but slow in 9.0.2 (over a variety of OS/HW).

> Is
> there something you're asking for as a change to the PostgreSQL
> product?
>
What i am asking is ways to understand what is going on, and what to expect when i will have to actually perform the
production
upgrade to 9.0.2
> -Kevin
>

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

14 января 2011 г., 19:59:24

Στις Friday 14 January 2011 22:06:56 ο/η Kevin Grittner έγραψε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
>
> > The other form (NOT IN) is the problem (in the case of
> > postgresql-9.0.2).  (in 8.3 both queries are fast)
>
> So running EXPLAIN ANALYZE for this plan on the 9.0 version checks
> the time in more places than the 8.3 version.  For a plan that's an
> order of magnitude slower than the plan used by the recommended
> version of the query.  Have you looked at what options you have for
> getting a faster clock reading on this "slower/older machine"?

Kevin, momentarily, lets forget about the slow EXPLAIN ANALYZE part.
Also lets forget about the fact that the LINUX_TEST running 9.0.2 is a "slower/older" machine.
(its definetely "faster/newer" than FBSD_DEV)
The reason i got into trouble setting a new linux box (the 4th box in the test) just for this problem was to show that
the thing is not FreeBSD related.

We have clearly a query that runs fine in two boxes (FreeBSD - FBSD_DEV, Linux LINUX_PROD) running PostgreSQL 8.3.13
but runs a lot slower in two other boxes (FreeBSD - FBSD_TEST, Linux LINUX_TEST) running PostgreSQL 9.0.2

It is true that the thread might have to move over to pgsql-performance, if it there was no 8.3.13 involved,
in other words if we had only the old "EXISTS" vs "IN" situation.

Here we have smth which runs fast in 8.3.13 (over a variety of OS/HW) but slow in 9.0.2 (over a variety of OS/HW).

> Is
> there something you're asking for as a change to the PostgreSQL
> product?
>
What i am asking is ways to understand what is going on, and what to expect when i will have to actually perform the
production
upgrade to 9.0.2
> -Kevin
>

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

"Kevin Grittner"

Дата:

14 января 2011 г., 20:20:43

Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

> What i am asking is ways to understand what is going on, and what
> to expect when i will have to actually perform the production
> upgrade to 9.0.2

Sorry to have focused on the wrong issues from your latest post.
Here's what I suggest to try to get you the best information.

(1)  Abandon this thread, start a new one on the performance list
with a different subject line.

(2)  Present the results of running one particular query on the two
different PostgreSQL versions on the same machine, preferably with
identical data, and after running VACUUM ANALYZE on the databases.

(3)  Review this page and include the related information -- schema,
configuration, hardware, etc.:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

On that list, with the above information, you will probably get
exactly what you seem to want at this point in the discussion, which
has nothing to do with the subject line of this thread.  A lot of
people pick which messages to read based on the subject line, so
that's probably working against you at the moment.

-Kevin

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

14 января 2011 г., 21:18:54

Στις Friday 14 January 2011 23:20:25 ο/η Kevin Grittner έγραψε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
>
> > What i am asking is ways to understand what is going on, and what
> > to expect when i will have to actually perform the production
> > upgrade to 9.0.2
>
> Sorry to have focused on the wrong issues from your latest post.
> Here's what I suggest to try to get you the best information.
>
> (1)  Abandon this thread, start a new one on the performance list
> with a different subject line.

I agree about the subject line. But whouldn't -bugs be a more apropriate place than -performance?
Pre 8.3 (e.g. 7.4), EXISTS used to run faster than IN, this was fixed in 8.3 (which was IMO a milestone in postgresql
performance), 
should not that be the case with 9.0 as well?

>
>
> -Kevin
>

--
Achilleas Mantzios

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

"Kevin Grittner"

Дата:

14 января 2011 г., 21:36:21

Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

> I agree about the subject line. But whouldn't -bugs be a more
> apropriate place than -performance?

Is it returning incorrect results, or is it not running as fast as
you would like?

-Kevin

Re: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

От

Achilleas Mantzios

Дата:

17 января 2011 г., 14:39:39

Στις Saturday 15 January 2011 00:35:44 ο/η Kevin Grittner έγραψε:
> Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
>
> > I agree about the subject line. But whouldn't -bugs be a more
> > apropriate place than -performance?
>
> Is it returning incorrect results, or is it not running as fast as
> you would like?
alright i opened a thread in -performance.
>
> -Kevin
>



--
Achilleas Mantzios

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Postgresql 9.0.2 explain analyze very slow (10x), compared to actual query time

Вложения