Обсуждение: Major performance problem after upgrade from 8.3 to 8.4

Поиск

Список

Период

Сортировка

Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

30 августа 2010 г., 06:39:28

Hello,

I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but
I'm having major performance problems with a query with many left joins.
Problem is that costs are now very, very, very high (was ok in 8.3).
Analyze has been done. Indexes are of course there.

   ->  Merge Left Join
(cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00
rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400
width=16)
         Merge Cond: (l.id = d2000903.fk_id)

Details with execution plan can be found at:
http://www.wiesinger.com/tmp/pg_perf_84.txt

I know that the data model is key/value pairs but it worked well in 8.3. I
need this flexibility.

Any ideas?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Scott Marlowe

Дата:

30 августа 2010 г., 07:00:39

On Mon, Aug 30, 2010 at 12:20 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
> Hello,
>
> I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but I'm
> having major performance problems with a query with many left joins. Problem
> is that costs are now very, very, very high (was ok in 8.3). Analyze has
> been done. Indexes are of course there.
>
>  ->  Merge Left Join
>
(cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00
> rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400
> width=16)
>        Merge Cond: (l.id = d2000903.fk_id)

Wow!  Other than an incredibly high cost AND row estimate, was the
query plan the same on 8.3 or different?

> Details with execution plan can be found at:
> http://www.wiesinger.com/tmp/pg_perf_84.txt

What's up with the "(actual time=.. rows= loops=) " in the explain analyze?

> I know that the data model is key/value pairs but it worked well in 8.3. I
> need this flexibility.
>
> Any ideas?

Not really.  I would like an explain analyze from both 8.3 and 8.4.
Are they tuned the same, things like work mem and default stats
target?

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Andreas Kretschmer

Дата:

30 августа 2010 г., 07:22:47

Gerhard Wiesinger <lists@wiesinger.com> wrote:

> I know that the data model is key/value pairs but it worked well in 8.3.
> I need this flexibility.
>
> Any ideas?

If i understand the query correctly it's a pivot-table, right?

If yes, and if i where you, i would try to rewrite this query, to
something like:

select
  timestamp,
  sum (case when keyid = 1 then value else 0 end) as Raumsolltemperatur,
  ...
from
  log
group by
  timestamp;

Assuming you can read a german text:
http://www.pg-forum.de/h-ufig-gestellte-fragen-faq/4067-faq-zeilen-zu-spalten.html

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

30 августа 2010 г., 07:25:53

On Mon, 30 Aug 2010, Scott Marlowe wrote:

> On Mon, Aug 30, 2010 at 12:20 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
>> Hello,
>>
>> I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but I'm
>> having major performance problems with a query with many left joins. Problem
>> is that costs are now very, very, very high (was ok in 8.3). Analyze has
>> been done. Indexes are of course there.
>>
>>  ->  Merge Left Join
>>
(cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00
>> rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400
>> width=16)
>>        Merge Cond: (l.id = d2000903.fk_id)
>
> Wow!  Other than an incredibly high cost AND row estimate, was the
> query plan the same on 8.3 or different?
>
>> Details with execution plan can be found at:
>> http://www.wiesinger.com/tmp/pg_perf_84.txt
>
> What's up with the "(actual time=.. rows= loops=) " in the explain analyze?

What do you mean exactly? missing?
I did it not with psql but with a GUI program.

>
>> I know that the data model is key/value pairs but it worked well in 8.3. I
>> need this flexibility.
>>
>> Any ideas?
>
> Not really.  I would like an explain analyze from both 8.3 and 8.4.
> Are they tuned the same, things like work mem and default stats
> target?

I don't have a 8.3 version running anymore. But I'm havin an OLD version
of a nearly exactly query plan (The sort was missing due to performance
issues and it done now in a view, maybe also some more JOINS are added,
but all that doesn't have impacts on the basic principle of the query
plan):
http://www.wiesinger.com/tmp/pg_perf.txt

Tuning: Yes, on same machine with same parameters (manual diff on old
config and added manually the parameters again).

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Scott Marlowe

Дата:

30 августа 2010 г., 07:34:12

On Mon, Aug 30, 2010 at 1:25 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
> On Mon, 30 Aug 2010, Scott Marlowe wrote:
>
>> On Mon, Aug 30, 2010 at 12:20 AM, Gerhard Wiesinger <lists@wiesinger.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but
>>> I'm
>>> having major performance problems with a query with many left joins.
>>> Problem
>>> is that costs are now very, very, very high (was ok in 8.3). Analyze has
>>> been done. Indexes are of course there.
>>>
>>>  ->  Merge Left Join
>>>
>>>
(cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00
>>>
>>> rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400
>>> width=16)
>>>        Merge Cond: (l.id = d2000903.fk_id)
>>
>> Wow!  Other than an incredibly high cost AND row estimate, was the
>> query plan the same on 8.3 or different?
>>
>>> Details with execution plan can be found at:
>>> http://www.wiesinger.com/tmp/pg_perf_84.txt
>>
>> What's up with the "(actual time=.. rows= loops=) " in the explain
>> analyze?
>
> What do you mean exactly? missing?
> I did it not with psql but with a GUI program.

Nevermind, that was an artifact at http://explain.depesz.com/s/KyU not
your fault.  Sorry.

>>> I know that the data model is key/value pairs but it worked well in 8.3.
>>> I
>>> need this flexibility.
>>>
>>> Any ideas?
>>
>> Not really.  I would like an explain analyze from both 8.3 and 8.4.
>> Are they tuned the same, things like work mem and default stats
>> target?
>
> I don't have a 8.3 version running anymore. But I'm havin an OLD version of
> a nearly exactly query plan (The sort was missing due to performance issues
> and it done now in a view, maybe also some more JOINS are added, but all
> that doesn't have impacts on the basic principle of the query plan):
> http://www.wiesinger.com/tmp/pg_perf.txt
>
> Tuning: Yes, on same machine with same parameters (manual diff on old config
> and added manually the parameters again).

How long does the query take to run in 8.4?  Do you have an explain
analyze of that?  I'm still thinking that some change in the query
planner might be seeing all those left joins and coming up with some
non-linear value for row estimation.  What's default stats target set
to in that db?

--
To understand recursion, one must first understand recursion.

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Pavel Stehule

Дата:

30 августа 2010 г., 07:35:25

Hello

2010/8/30 Andreas Kretschmer <akretschmer@spamfence.net>:
> Gerhard Wiesinger <lists@wiesinger.com> wrote:
>
>> I know that the data model is key/value pairs but it worked well in 8.3.
>> I need this flexibility.
>>
>> Any ideas?
>
> If i understand the query correctly it's a pivot-table, right?
>

no - it's just EAV table on very large data :(

Regards

Pavel Stehule

> If yes, and if i where you, i would try to rewrite this query, to
> something like:
>
> select
>  timestamp,
>  sum (case when keyid = 1 then value else 0 end) as Raumsolltemperatur,
>  ...
> from
>  log
> group by
>  timestamp;
>
>
> Assuming you can read a german text:
> http://www.pg-forum.de/h-ufig-gestellte-fragen-faq/4067-faq-zeilen-zu-spalten.html
>
>
>
> Andreas
> --
> Really, I'm not out to destroy Microsoft. That will just be a completely
> unintentional side effect.                              (Linus Torvalds)
> "If I was god, I would recompile penguin with --enable-fly."   (unknown)
> Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

30 августа 2010 г., 07:49:24

On Mon, 30 Aug 2010, Andreas Kretschmer wrote:

> Gerhard Wiesinger <lists@wiesinger.com> wrote:
>
>> I know that the data model is key/value pairs but it worked well in 8.3.
>> I need this flexibility.
>>
>> Any ideas?
>
> If i understand the query correctly it's a pivot-table, right?
>

The view flattens the key/value structure (for enchanceable technical
measure values) to a flat structure with many columns. The view is that I
have a flat structure and then have easy to write queries for aggregation.

The query itself is just an aggregating query over that flat structure.

Any ideas for better optimizations?

> If yes, and if i where you, i would try to rewrite this query, to
> something like:
>
> select
>  timestamp,
>  sum (case when keyid = 1 then value else 0 end) as Raumsolltemperatur,
>  ...
> from
>  log
> group by
>  timestamp;
>

I will try that. But what I don't understand: query was really fast in
8.3 even with 24 hours timeframe.

>
> Assuming you can read a german text:
> http://www.pg-forum.de/h-ufig-gestellte-fragen-faq/4067-faq-zeilen-zu-spalten.html

Hopefully yes :-) after that Fedora/Postgresql nightmare update night
(Parallel Fedora upgrade stalled whole machine since that query from
Nagios was executed I guess hundreds of times since it got very slow in
8.4). Machine was pingable but nothing else even on console :-( RAID
rebuild and all other stuff :-(

I planned the upgrade but I didn't expect problems with the query plan
instability :-(

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

30 августа 2010 г., 07:57:05

On Mon, 30 Aug 2010, Scott Marlowe wrote:

> On Mon, Aug 30, 2010 at 1:25 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
>> On Mon, 30 Aug 2010, Scott Marlowe wrote:
>>
>>> On Mon, Aug 30, 2010 at 12:20 AM, Gerhard Wiesinger <lists@wiesinger.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but
>>>> I'm
>>>> having major performance problems with a query with many left joins.
>>>> Problem
>>>> is that costs are now very, very, very high (was ok in 8.3). Analyze has
>>>> been done. Indexes are of course there.
>>>>
>>>>  ->  Merge Left Join
>>>>
>>>>
(cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00
>>>>
>>>> rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400
>>>> width=16)
>>>>        Merge Cond: (l.id = d2000903.fk_id)
>>>
>>> Wow!  Other than an incredibly high cost AND row estimate, was the
>>> query plan the same on 8.3 or different?
>>>
>>>> Details with execution plan can be found at:
>>>> http://www.wiesinger.com/tmp/pg_perf_84.txt
>>>
>>> What's up with the "(actual time=.. rows= loops=) " in the explain
>>> analyze?
>>
>> What do you mean exactly? missing?
>> I did it not with psql but with a GUI program.
>
> Nevermind, that was an artifact at http://explain.depesz.com/s/KyU not
> your fault.  Sorry.
>
>>>> I know that the data model is key/value pairs but it worked well in 8.3.
>>>> I
>>>> need this flexibility.
>>>>
>>>> Any ideas?
>>>
>>> Not really.  I would like an explain analyze from both 8.3 and 8.4.
>>> Are they tuned the same, things like work mem and default stats
>>> target?
>>
>> I don't have a 8.3 version running anymore. But I'm havin an OLD version of
>> a nearly exactly query plan (The sort was missing due to performance issues
>> and it done now in a view, maybe also some more JOINS are added, but all
>> that doesn't have impacts on the basic principle of the query plan):
>> http://www.wiesinger.com/tmp/pg_perf.txt
>>
>> Tuning: Yes, on same machine with same parameters (manual diff on old config
>> and added manually the parameters again).
>
> How long does the query take to run in 8.4?  Do you have an explain
> analyze of that?  I'm still thinking that some change in the query
> planner might be seeing all those left joins and coming up with some
> non-linear value for row estimation.  What's default stats target set
> to in that db?

In config, default values:
#default_statistics_target = 100        # range 1-10000

How can I find that out?

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

30 августа 2010 г., 07:59:09

On Mon, 30 Aug 2010, Pavel Stehule wrote:

> Hello
>
> 2010/8/30 Andreas Kretschmer <akretschmer@spamfence.net>:
>> Gerhard Wiesinger <lists@wiesinger.com> wrote:
>>
>>> I know that the data model is key/value pairs but it worked well in 8.3.
>>> I need this flexibility.
>>>
>>> Any ideas?
>>
>> If i understand the query correctly it's a pivot-table, right?
>>
>
> no - it's just EAV table on very large data :(

Yes, it is an EAV table, but with query space comparable low (Max. 1 day
out of years, typically 5mins out of years).

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Pavel Stehule

Дата:

30 августа 2010 г., 08:18:16

2010/8/30 Gerhard Wiesinger <lists@wiesinger.com>:
> On Mon, 30 Aug 2010, Pavel Stehule wrote:
>
>> Hello
>>
>> 2010/8/30 Andreas Kretschmer <akretschmer@spamfence.net>:
>>>
>>> Gerhard Wiesinger <lists@wiesinger.com> wrote:
>>>
>>>> I know that the data model is key/value pairs but it worked well in 8.3.
>>>> I need this flexibility.
>>>>
>>>> Any ideas?
>>>
>>> If i understand the query correctly it's a pivot-table, right?
>>>
>>
>> no - it's just EAV table on very large data :(
>
> Yes, it is an EAV table, but with query space comparable low (Max. 1 day out
> of years, typically 5mins out of years).
>

it is irelevant - there are repeated seq scans - so you need a
partitioning or classic table - maybe materialized views can help

Pavel


> Thnx.
>
> Ciao,
> Gerhard
>
> --
> http://www.wiesinger.com/
>
>

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

30 августа 2010 г., 16:13:31

On Mon, 30 Aug 2010, Pavel Stehule wrote:

> 2010/8/30 Gerhard Wiesinger <lists@wiesinger.com>:
>> On Mon, 30 Aug 2010, Pavel Stehule wrote:
>>
>>> Hello
>>>
>>> 2010/8/30 Andreas Kretschmer <akretschmer@spamfence.net>:
>>>>
>>>> Gerhard Wiesinger <lists@wiesinger.com> wrote:
>>>>
>>>>> I know that the data model is key/value pairs but it worked well in 8.3.
>>>>> I need this flexibility.
>>>>>
>>>>> Any ideas?
>>>>
>>>> If i understand the query correctly it's a pivot-table, right?
>>>>
>>>
>>> no - it's just EAV table on very large data :(
>>
>> Yes, it is an EAV table, but with query space comparable low (Max. 1 day out
>> of years, typically 5mins out of years).
>>
>
> it is irelevant - there are repeated seq scans - so you need a
> partitioning or classic table - maybe materialized views can help

I know the drawbacks of an EAV design but I don't want to discuss that. I
want to discuss the major performance decrease of PostgreSQL 8.3
(performance was ok) to PostgreSQL 8.4 (performance is NOT ok).

Any further ideas how I can track this down?
Can someone explain the difference in query plan from an optimizer point
of view?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Tom Lane

Дата:

30 августа 2010 г., 16:23:01

Gerhard Wiesinger <lists@wiesinger.com> writes:
> I know the drawbacks of an EAV design but I don't want to discuss that. I
> want to discuss the major performance decrease of PostgreSQL 8.3
> (performance was ok) to PostgreSQL 8.4 (performance is NOT ok).

> Any further ideas how I can track this down?
> Can someone explain the difference in query plan from an optimizer point
> of view?

Since you haven't shown us the 8.3 plan, it's kind of hard to speculate ;-)

One thing that jumped out at me was that 8.4 appears to be expecting
multiple matches in each of the left-joined tables, which is why the
total rowcount estimate balloons so fast.  I rather imagine that you are
expecting at most one match in reality, else the query isn't going to
behave nicely.  Is this correct?  Are you *sure* you analyzed all these
tables?  And if that is how the data looks, where is the actual
performance problem?  A bad rowcount estimate isn't in itself going
to kill you.

FWIW, in a similar albeit toy example, I don't see any difference
between the 8.3 and 8.4 plans or cost estimates.

            regards, tom lane

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

30 августа 2010 г., 16:46:45

On Mon, 30 Aug 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> I know the drawbacks of an EAV design but I don't want to discuss that. I
>> want to discuss the major performance decrease of PostgreSQL 8.3
>> (performance was ok) to PostgreSQL 8.4 (performance is NOT ok).
>
>> Any further ideas how I can track this down?
>> Can someone explain the difference in query plan from an optimizer point
>> of view?
>
> Since you haven't shown us the 8.3 plan, it's kind of hard to speculate ;-)
>
> One thing that jumped out at me was that 8.4 appears to be expecting
> multiple matches in each of the left-joined tables, which is why the
> total rowcount estimate balloons so fast.  I rather imagine that you are
> expecting at most one match in reality, else the query isn't going to
> behave nicely.  Is this correct?  Are you *sure* you analyzed all these
> tables?  And if that is how the data looks, where is the actual
> performance problem?  A bad rowcount estimate isn't in itself going
> to kill you.
>
> FWIW, in a similar albeit toy example, I don't see any difference
> between the 8.3 and 8.4 plans or cost estimates.

Yes, I'm expecting only one match in reality and I thing PostgreSQL should
also know that from table definition and constraints. Long answer below.

Query doesn't "end" in PostgreSQL.

From the definition:
CREATE TABLE value_types (
   valuetypeid bigint PRIMARY KEY,
   description varchar(256) NOT NULL -- e.g. 'float', 'integer', 'boolean'
);

CREATE TABLE key_description (
   keyid bigint PRIMARY KEY,
   description varchar(256) NOT NULL UNIQUE,
   fk_valuetypeid bigint NOT NULL,
   unit varchar(256) NOT NULL,        -- e.g. '°C'
   FOREIGN KEY(fk_valuetypeid) REFERENCES value_types(valuetypeid) ON DELETE RESTRICT
);
-- ALTER TABLE key_description DROP CONSTRAINT c_key_description_description;
-- ALTER TABLE key_description ADD CONSTRAINT c_key_description_description UNIQUE(description);


CREATE TABLE log (
   id bigserial PRIMARY KEY,
   datetime timestamp with time zone NOT NULL,
   tdate date NOT NULL,
   ttime time with time zone NOT NULL
);

CREATE TABLE log_details (
   fk_id bigint NOT NULL,
   fk_keyid bigint NOT NULL,
   value double precision NOT NULL,
   FOREIGN KEY (fk_id) REFERENCES log(id) ON DELETE CASCADE,
   FOREIGN KEY (fk_keyid) REFERENCES key_description(keyid) ON DELETE RESTRICT,
   CONSTRAINT unique_key_and_id UNIQUE(fk_id, fk_keyid)
);



Therefore keyid is unique and eg d1.fk_keyid is unique.
With constraint from log_details and d1.fk_keyid is unique fk_id is
unique for a given d1.fk_keyid.

BTW: I have the old data setup. /var/lib/pgsql-old. Is there a fast setup
with old version on different TCP port possible to compare query plans?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Tom Lane

Дата:

30 августа 2010 г., 16:49:49

Gerhard Wiesinger <lists@wiesinger.com> writes:
> BTW: I have the old data setup. /var/lib/pgsql-old. Is there a fast setup
> with old version on different TCP port possible to compare query plans?

You'll need to reinstall the old executables.  If you put the new
executables in the same directories, it's not going to be easy to
run both in parallel.  If you didn't, then you just need to start
the old postmaster using a different port number.

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

02 сентября 2010 г., 05:54:16

On Mon, 30 Aug 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> BTW: I have the old data setup. /var/lib/pgsql-old. Is there a fast setup
>> with old version on different TCP port possible to compare query plans?
>
> You'll need to reinstall the old executables.  If you put the new
> executables in the same directories, it's not going to be easy to
> run both in parallel.  If you didn't, then you just need to start
> the old postmaster using a different port number.
>

I tried to get 8.3.11 ready again:
# Problems with timezone on startup (Default not found)
./configure --with-system-tzdata=/usr/share
cp ./src/backend/postgres /bin/postgres-8.3.11
su -l postgres -c "/bin/postgres-8.3.11 -p 54321 -D
/var/lib/pgsql-old/data &" >> /var/lib/pgsql-old/pgstartup.log 2>&1 < /dev/null

Problem is that PostgreSQL doesn't listen and take much CPU and also disk
I/O. 8.3 was shut down cleanly. 8.4 runs in parallel. Are there any
problems with shared buffer conflicts?

   PID USER      PR  NI  VIRT SWAP  RES CODE DATA  SHR S %CPU %MEM    TIME+  COMMAND
  6997 postgres  20   0  113m 112m 1236 3688 1064  712 D 38.7  0.0   0:45.43 /bin/postgres-8.3.11 -p 54321 -D
/var/lib/pgsql-old/data

Any ideas?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Tom Lane

Дата:

02 сентября 2010 г., 13:14:48

Gerhard Wiesinger <lists@wiesinger.com> writes:
> Problem is that PostgreSQL doesn't listen and take much CPU and also disk
> I/O. 8.3 was shut down cleanly.

Hm, you sure about that?  What's in the postmaster log?

            regards, tom lane

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

03 сентября 2010 г., 05:04:45

On Thu, 2 Sep 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> Problem is that PostgreSQL doesn't listen and take much CPU and also disk
>> I/O. 8.3 was shut down cleanly.
>
> Hm, you sure about that?  What's in the postmaster log?

That's the strange thing: I don't have anything in stdout/stderr log and
in pg_log directory and even in syslog.

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

03 сентября 2010 г., 05:08:49

On Thu, 2 Sep 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> Problem is that PostgreSQL doesn't listen and take much CPU and also disk
>> I/O. 8.3 was shut down cleanly.
>
> Hm, you sure about that?  What's in the postmaster log?

BTW: Do I need other postgres user with a different home directory?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Tom Lane

Дата:

03 сентября 2010 г., 14:15:01

Gerhard Wiesinger <lists@wiesinger.com> writes:
> On Thu, 2 Sep 2010, Tom Lane wrote:
>> Hm, you sure about that?  What's in the postmaster log?

> That's the strange thing: I don't have anything in stdout/stderr log and
> in pg_log directory and even in syslog.

Not even in that pgstartup.log file you sent stderr to?

I have seen cases before where Postgres couldn't log anything because of
SELinux.  If this is a Red Hat based system, look in the kernel log for
AVC messages.  If you see any, then SELinux is probably blocking things
because of the nonstandard directory locations.  Turning it off
temporarily would be the easiest fix, though relabeling the files would
be a better one.

            regards, tom lane

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

03 сентября 2010 г., 16:40:02

On Fri, 3 Sep 2010, Tom Lane wrote:
> Not even in that pgstartup.log file you sent stderr to?
>
> I have seen cases before where Postgres couldn't log anything because of
> SELinux.  If this is a Red Hat based system, look in the kernel log for
> AVC messages.  If you see any, then SELinux is probably blocking things
> because of the nonstandard directory locations.  Turning it off
> temporarily would be the easiest fix, though relabeling the files would
> be a better one.

SELinux is already disabled.
cat /etc/selinux/config | grep -v "^#"|grep -v "^$"
SELINUXTYPE=targeted
SELINUX=disabled

Yes, also the redirected log file is empty. Also kernel log is empty.

I tried to redirect it to a different, new file, on startup I get nothing,
after killing it I get:
2010-09-03 16:35:39.177 GMT [2149] @/: LOG:  could not stat "/usr/share/doc/xalan-j2-manual-2.7.0/apidocs": No such
fileor directory 

Any ideas?

BTW: Shared memory can't be any issue?

Ciao,
Gerhard

-- http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Tom Lane

Дата:

03 сентября 2010 г., 16:47:12

Gerhard Wiesinger <lists@wiesinger.com> writes:
> On Fri, 3 Sep 2010, Tom Lane wrote:
>> Not even in that pgstartup.log file you sent stderr to?

> Yes, also the redirected log file is empty. Also kernel log is empty.

Huh.  Try strace'ing the process to see what it's doing.

> BTW: Shared memory can't be any issue?

If you're not getting any log messages at all, you've got worse problems
than shared memory.

            regards, tom lane

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

03 сентября 2010 г., 16:57:48

On Fri, 3 Sep 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> On Fri, 3 Sep 2010, Tom Lane wrote:
>>> Not even in that pgstartup.log file you sent stderr to?
>
>> Yes, also the redirected log file is empty. Also kernel log is empty.
>
> Huh.  Try strace'ing the process to see what it's doing.

It tries to find something in /usr/share/ ...
Ok, I think from the compile time options:
./configure --with-system-tzdata=/usr/share

Previously I had problems with the timezone on startup (Default not
found) so I tried to set the directory.

Maybe the timezone thing (it looks for the Default timezone) is easier to
fix ...

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Tom Lane

Дата:

03 сентября 2010 г., 17:17:04

Gerhard Wiesinger <lists@wiesinger.com> writes:
> On Fri, 3 Sep 2010, Tom Lane wrote:
>> Huh.  Try strace'ing the process to see what it's doing.

> It tries to find something in /usr/share/ ...
> Ok, I think from the compile time options:
> ./configure --with-system-tzdata=/usr/share

Doh.  I hadn't looked closely at that.  Probably you want
/usr/share/zoneinfo --- at least that's what the Red Hat RPMs use.

            regards, tom lane

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

03 сентября 2010 г., 17:57:58

On Fri, 3 Sep 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> On Fri, 3 Sep 2010, Tom Lane wrote:
>>> Huh.  Try strace'ing the process to see what it's doing.
>
>> It tries to find something in /usr/share/ ...
>> Ok, I think from the compile time options:
>> ./configure --with-system-tzdata=/usr/share
>
> Doh.  I hadn't looked closely at that.  Probably you want
> /usr/share/zoneinfo --- at least that's what the Red Hat RPMs use.
>

I tried even before I wrote to the mailinglist without success:
./configure
./configure --with-system-tzdata=/usr/share/pgsql/timezonesets
./configure --with-system-tzdata=/usr/share/pgsql
./configure --with-system-tzdata=/usr/share
Now I tried without success:
./configure --with-system-tzdata=/usr/share/zoneinfo

With last one I also get:
2010-09-03 19:51:29.079 CEST [27916] @/: FATAL:  invalid value for parameter "timezone_abbreviations": "Default"

Correct directory would be:
ls -l /usr/share/pgsql/timezonesets/Default
-rw-r--r-- 1 root root 29602 2010-05-17 20:07 /usr/share/pgsql/timezonesets/Default

File looks also good.

Any ideas?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Tom Lane

Дата:

03 сентября 2010 г., 18:09:31

Gerhard Wiesinger <lists@wiesinger.com> writes:
> On Fri, 3 Sep 2010, Tom Lane wrote:
>> Doh.  I hadn't looked closely at that.  Probably you want
>> /usr/share/zoneinfo --- at least that's what the Red Hat RPMs use.

> I tried even before I wrote to the mailinglist without success:
> ./configure

I'd definitely suggest leaving out the --with-system-tzdata option
altogether if you're not certain it works.

> With last one I also get:
> 2010-09-03 19:51:29.079 CEST [27916] @/: FATAL:  invalid value for parameter "timezone_abbreviations": "Default"

This is a different problem; the --with-system-tzdata option wouldn't
affect that.

I think what may be happening here is that a postgres executable expects
to find itself in a full installation tree, ie if it's in /someplace/bin
then the timezone files are in /someplace/share, etc.  Did you do a full
"make install" after building, or did you just copy the postgres
executable?

            regards, tom lane

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

03 сентября 2010 г., 18:27:52

On Fri, 3 Sep 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> On Fri, 3 Sep 2010, Tom Lane wrote:
>>> Doh.  I hadn't looked closely at that.  Probably you want
>>> /usr/share/zoneinfo --- at least that's what the Red Hat RPMs use.
>
>> I tried even before I wrote to the mailinglist without success:
>> ./configure
>
> I'd definitely suggest leaving out the --with-system-tzdata option
> altogether if you're not certain it works.

OK.

>
>> With last one I also get:
>> 2010-09-03 19:51:29.079 CEST [27916] @/: FATAL:  invalid value for parameter "timezone_abbreviations": "Default"
>
> This is a different problem; the --with-system-tzdata option wouldn't
> affect that.
>
> I think what may be happening here is that a postgres executable expects
> to find itself in a full installation tree, ie if it's in /someplace/bin
> then the timezone files are in /someplace/share, etc.  Did you do a full
> "make install" after building, or did you just copy the postgres
> executable?

I just copied it as discussed in the original mail to avoid that make
install kills the 8.4 production RPM version:
cp ./src/backend/postgres /bin/postgres-8.3.11

cd /bin
ln -s /usr/share/pgsql/timezonesets share
cd tarballdir
./configure
make
cp ./src/backend/postgres /bin/postgres-8.3.11

2010-09-03 18:24:53.936 GMT [11753] @/: LOG:  could not open directory "/share/timezone": No such file or directory
2010-09-03 18:24:53.936 GMT [11753] @/: LOG:  could not open directory "/share/timezone": No such file or directory
2010-09-03 18:24:53.936 GMT [11753] @/: LOG:  could not open directory "/share/timezone": No such file or directory
2010-09-03 18:24:53.936 GMT [11753] @/: LOG:  could not open directory "/share/timezone": No such file or directory
2010-09-03 18:24:53.936 GMT [11753] @/: LOG:  could not open directory "/share/timezone": No such file or directory
2010-09-03 20:24:53.936 CEST [11753] @/: FATAL:  invalid value for parameter "timezone_abbreviations": "Default"

I previously made the strace and therefore I added the option to configure
to get the right directory.

Any further idea where I should copy the binary or any option or any file
copy for the time zone files?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Tom Lane

Дата:

03 сентября 2010 г., 18:44:38

Gerhard Wiesinger <lists@wiesinger.com> writes:
> On Fri, 3 Sep 2010, Tom Lane wrote:
>> I think what may be happening here is that a postgres executable expects
>> to find itself in a full installation tree, ie if it's in /someplace/bin
>> then the timezone files are in /someplace/share, etc.  Did you do a full
>> "make install" after building, or did you just copy the postgres
>> executable?

> I just copied it as discussed in the original mail to avoid that make
> install kills the 8.4 production RPM version:
> cp ./src/backend/postgres /bin/postgres-8.3.11

Definitely not going to work.  Instead, configure with --prefix set
to /someplace/harmless, make, make install, execute from
/someplace/harmless/bin/.

            regards, tom lane

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

03 сентября 2010 г., 19:35:07

On Fri, 3 Sep 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> On Fri, 3 Sep 2010, Tom Lane wrote:
>>> I think what may be happening here is that a postgres executable expects
>>> to find itself in a full installation tree, ie if it's in /someplace/bin
>>> then the timezone files are in /someplace/share, etc.  Did you do a full
>>> "make install" after building, or did you just copy the postgres
>>> executable?
>
>> I just copied it as discussed in the original mail to avoid that make
>> install kills the 8.4 production RPM version:
>> cp ./src/backend/postgres /bin/postgres-8.3.11
>
> Definitely not going to work.  Instead, configure with --prefix set
> to /someplace/harmless, make, make install, execute from
> /someplace/harmless/bin/.

Thanks tom, your support for PostgreSQL is really very, very good.
Install:
./configure --prefix /opt/postgres-8.3
make
make install
su -l postgres -c "/opt/postgres-8.3/bin/postgres -p 54321 -D /var/lib/pgsql-old/data &" >>
/var/lib/pgsql-old/pgstartup.log2>&1 < /dev/null 

Back to the original problem:
8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt
8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt

Main difference as I saw:
8.3: ->  Nested Loop Left Join  (cost=0.00..1195433.19 rows=67 width=16)
8.4:   ->  Merge Left Join
(cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00
rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400width=16) 

Any ideas why? How to fix?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Tom Lane

Дата:

03 сентября 2010 г., 21:11:00

Gerhard Wiesinger <lists@wiesinger.com> writes:
> Back to the original problem:

Finally ;-)

> 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt
> 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt

Hmm.  The 8.3 plan is indeed assuming that the number of rows will stay
constant as we bubble up through the join levels, but AFAICS this is
simply wrong:

          ->  Nested Loop Left Join  (cost=0.00..38028.89 rows=67 width=8)
            ->  Nested Loop Left Join  (cost=0.00..25399.46 rows=67 width=8)
              ->  Nested Loop Left Join  (cost=0.00..12770.04 rows=67 width=8)
                ->  Index Scan using i_log_unique on log l  (cost=0.00..140.61 rows=67 width=8)
                    Index Cond: (datetime >= (now() - '00:01:00'::interval))
                ->  Index Scan using unique_key_and_id on log_details d7  (cost=0.00..187.39 rows=89 width=8)
                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
              ->  Index Scan using unique_key_and_id on log_details d6  (cost=0.00..187.39 rows=89 width=8)
                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
            ->  Index Scan using unique_key_and_id on log_details d5  (cost=0.00..187.39 rows=89 width=8)
                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))

If the log_details indexscans are expected to produce 89 rows per
execution, then surely the join size should go up 89x at each level,
because the join steps themselves don't eliminate anything.

In 8.4 the arithmetic is at least self-consistent:

          ->  Nested Loop Left Join  (cost=0.00..505256.95 rows=57630 width=8)
            ->  Nested Loop Left Join  (cost=0.00..294671.96 rows=6059 width=8)
              ->  Nested Loop Left Join  (cost=0.00..272532.55 rows=637 width=8)
                ->  Index Scan using log_pkey on log l  (cost=0.00..270203.92 rows=67 width=8)
                    Filter: (datetime >= (now() - '00:01:00'::interval))
                ->  Index Scan using unique_key_and_id on log_details d7  (cost=0.00..34.63 rows=10 width=8)
                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
              ->  Index Scan using unique_key_and_id on log_details d6  (cost=0.00..34.63 rows=10 width=8)
                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
            ->  Index Scan using unique_key_and_id on log_details d5  (cost=0.00..34.63 rows=10 width=8)
                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))

The rowcount estimates are apparently a shade less than 10, but they get
rounded off in the display.

I believe the reason for this change is that 8.4's join estimation code
was rewritten so that it wasn't completely bogus for outer joins.  8.3
might have been getting the right answer, but it was for the wrong
reasons.

So the real question to be answered here is why doesn't it think that
each of the unique_key_and_id indexscans produce just a single row, as
you indicated was the case.  The 8.4 estimate is already a factor of
almost 10 closer to reality than 8.3's, but you need another factor of
10.  You might find that increasing the statistics target for the
log_details table helps.

            regards, tom lane

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

04 сентября 2010 г., 06:59:24

On Fri, 3 Sep 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists@wiesinger.com> writes:
>> 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt
>> 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt
>
> Hmm.  The 8.3 plan is indeed assuming that the number of rows will stay
> constant as we bubble up through the join levels, but AFAICS this is
> simply wrong:
>
>          ->  Nested Loop Left Join  (cost=0.00..38028.89 rows=67 width=8)
>            ->  Nested Loop Left Join  (cost=0.00..25399.46 rows=67 width=8)
>              ->  Nested Loop Left Join  (cost=0.00..12770.04 rows=67 width=8)
>                ->  Index Scan using i_log_unique on log l  (cost=0.00..140.61 rows=67 width=8)
>                    Index Cond: (datetime >= (now() - '00:01:00'::interval))
>                ->  Index Scan using unique_key_and_id on log_details d7  (cost=0.00..187.39 rows=89 width=8)
>                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
>              ->  Index Scan using unique_key_and_id on log_details d6  (cost=0.00..187.39 rows=89 width=8)
>                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
>            ->  Index Scan using unique_key_and_id on log_details d5  (cost=0.00..187.39 rows=89 width=8)
>                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>
> If the log_details indexscans are expected to produce 89 rows per
> execution, then surely the join size should go up 89x at each level,
> because the join steps themselves don't eliminate anything.
>
> In 8.4 the arithmetic is at least self-consistent:
>
>          ->  Nested Loop Left Join  (cost=0.00..505256.95 rows=57630 width=8)
>            ->  Nested Loop Left Join  (cost=0.00..294671.96 rows=6059 width=8)
>              ->  Nested Loop Left Join  (cost=0.00..272532.55 rows=637 width=8)
>                ->  Index Scan using log_pkey on log l  (cost=0.00..270203.92 rows=67 width=8)
>                    Filter: (datetime >= (now() - '00:01:00'::interval))
>                ->  Index Scan using unique_key_and_id on log_details d7  (cost=0.00..34.63 rows=10 width=8)
>                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
>              ->  Index Scan using unique_key_and_id on log_details d6  (cost=0.00..34.63 rows=10 width=8)
>                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
>            ->  Index Scan using unique_key_and_id on log_details d5  (cost=0.00..34.63 rows=10 width=8)
>                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>
> The rowcount estimates are apparently a shade less than 10, but they get
> rounded off in the display.
>
> I believe the reason for this change is that 8.4's join estimation code
> was rewritten so that it wasn't completely bogus for outer joins.  8.3
> might have been getting the right answer, but it was for the wrong
> reasons.
>
> So the real question to be answered here is why doesn't it think that
> each of the unique_key_and_id indexscans produce just a single row, as
> you indicated was the case.  The 8.4 estimate is already a factor of
> almost 10 closer to reality than 8.3's, but you need another factor of
> 10.  You might find that increasing the statistics target for the
> log_details table helps.


Ok, Tom, tried different things (more details are below):
1.) Setting statistic target to 1000 and 10000 (without success), still
merge join
2.) Tried to added a Index on description to help the planner for
uniqueness (without success)
3.) Forced the planner to use nested loop joins (SUCCESS):
SET enable_hashjoin=false;SET enable_mergejoin=false;
(BTW: How do use such settings in Java and PHP and Perl, is there a
command available?)

Open questions:
Why does the planner not choose nested loop joins, that should be the
optimal one for that situation?
Does the planner value: a.) UNIQUENESS b.) UNIQUENESS and NOT NULLs?
Any ideas for improvement of the planner?

Details:
-- CREATE UNIQUE INDEX unique_key_and_id ON log_details USING btree (fk_id, fk_keyid)
-- 1000 and 10000 didn't help for better query plan for Nested Loop Left Join, still Merge Left Join
-- Sample with:
-- ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 10000;
-- ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 10000;
-- ANALYZE VERBOSE log_details;
-- Still Merge Join:
--   ->  Merge Left Join  (cost=9102353.88..83786934.25 rows=2726186787 width=16)
--         Merge Cond: (l.id = d2000902.fk_id)
--         ->  Merge Left Join  (cost=8926835.18..40288402.09 rows=972687282 width=24)
--               Merge Cond: (l.id = d2000904.fk_id)
-- Default values again
ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 100;
ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 100;
ANALYZE VERBOSE log_details;

-- Tried to add WITHOUT SUCCESS (that planner could know that description is NOT NULL and UNIQE)
DROP INDEX IF EXISTS i_key_description_desc;
CREATE UNIQUE INDEX i_key_description_desc ON key_description (description);
-- Therefore planner should know: keyid is NOT NULL and UNIQUE and only one result: (SELECT keyid FROM key_description
WHEREdescription = 'Raumsolltemperatur') 
-- Therefore from constraint planner should know that fk_id is NOT NULL and UNIQUE: CONSTRAINT unique_key_and_id
UNIQUE(fk_id,fk_keyid): 
-- LEFT JOIN log_details d1 ON l.id = d1.fk_id AND
-- d1.fk_keyid = (SELECT keyid FROM key_description WHERE description = 'Raumsolltemperatur')
-- Does the planner value alls those UNIQUEnesses and NOT NULLs?

-- Again back to 8.3 query plan which is fast (319ms):
SET enable_hashjoin=false;
SET enable_mergejoin=false;
--   ->  Nested Loop Left Join  (cost=0.00..22820970510.45 rows=2727492136 width=16)
--         ->  Nested Loop Left Join  (cost=0.00..12810087616.29 rows=973121653 width=24)
--               ->  Nested Loop Left Join  (cost=0.00..9238379092.22 rows=347192844 width=24)

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

13 сентября 2010 г., 06:40:55

Hello,

Any news or ideas regarding this issue?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Sat, 4 Sep 2010, Gerhard Wiesinger wrote:

> On Fri, 3 Sep 2010, Tom Lane wrote:
>
>> Gerhard Wiesinger <lists@wiesinger.com> writes:
>>> 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt
>>> 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt
>>
>> Hmm.  The 8.3 plan is indeed assuming that the number of rows will stay
>> constant as we bubble up through the join levels, but AFAICS this is
>> simply wrong:
>>
>>          ->  Nested Loop Left Join  (cost=0.00..38028.89 rows=67 width=8)
>>            ->  Nested Loop Left Join  (cost=0.00..25399.46 rows=67 width=8)
>>              ->  Nested Loop Left Join  (cost=0.00..12770.04 rows=67
>> width=8)
>>                ->  Index Scan using i_log_unique on log l
>> (cost=0.00..140.61 rows=67 width=8)
>>                    Index Cond: (datetime >= (now() - '00:01:00'::interval))
>>                ->  Index Scan using unique_key_and_id on log_details d7
>> (cost=0.00..187.39 rows=89 width=8)
>>                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
>>              ->  Index Scan using unique_key_and_id on log_details d6
>> (cost=0.00..187.39 rows=89 width=8)
>>                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
>>            ->  Index Scan using unique_key_and_id on log_details d5
>> (cost=0.00..187.39 rows=89 width=8)
>>                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>>
>> If the log_details indexscans are expected to produce 89 rows per
>> execution, then surely the join size should go up 89x at each level,
>> because the join steps themselves don't eliminate anything.
>>
>> In 8.4 the arithmetic is at least self-consistent:
>>
>>          ->  Nested Loop Left Join  (cost=0.00..505256.95 rows=57630
>> width=8)
>>            ->  Nested Loop Left Join  (cost=0.00..294671.96 rows=6059
>> width=8)
>>              ->  Nested Loop Left Join  (cost=0.00..272532.55 rows=637
>> width=8)
>>                ->  Index Scan using log_pkey on log l
>> (cost=0.00..270203.92 rows=67 width=8)
>>                    Filter: (datetime >= (now() - '00:01:00'::interval))
>>                ->  Index Scan using unique_key_and_id on log_details d7
>> (cost=0.00..34.63 rows=10 width=8)
>>                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
>>              ->  Index Scan using unique_key_and_id on log_details d6
>> (cost=0.00..34.63 rows=10 width=8)
>>                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
>>            ->  Index Scan using unique_key_and_id on log_details d5
>> (cost=0.00..34.63 rows=10 width=8)
>>                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>>
>> The rowcount estimates are apparently a shade less than 10, but they get
>> rounded off in the display.
>>
>> I believe the reason for this change is that 8.4's join estimation code
>> was rewritten so that it wasn't completely bogus for outer joins.  8.3
>> might have been getting the right answer, but it was for the wrong
>> reasons.
>>
>> So the real question to be answered here is why doesn't it think that
>> each of the unique_key_and_id indexscans produce just a single row, as
>> you indicated was the case.  The 8.4 estimate is already a factor of
>> almost 10 closer to reality than 8.3's, but you need another factor of
>> 10.  You might find that increasing the statistics target for the
>> log_details table helps.
>
>
> Ok, Tom, tried different things (more details are below):
> 1.) Setting statistic target to 1000 and 10000 (without success), still merge
> join
> 2.) Tried to added a Index on description to help the planner for uniqueness
> (without success)
> 3.) Forced the planner to use nested loop joins (SUCCESS):
> SET enable_hashjoin=false;SET enable_mergejoin=false;
> (BTW: How do use such settings in Java and PHP and Perl, is there a command
> available?)
>
> Open questions:
> Why does the planner not choose nested loop joins, that should be the optimal
> one for that situation?
> Does the planner value: a.) UNIQUENESS b.) UNIQUENESS and NOT NULLs?
> Any ideas for improvement of the planner?
>
> Details:
> -- CREATE UNIQUE INDEX unique_key_and_id ON log_details USING btree (fk_id,
> fk_keyid)
> -- 1000 and 10000 didn't help for better query plan for Nested Loop Left
> Join, still Merge Left Join
> -- Sample with:
> -- ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 10000;
> -- ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 10000;
> -- ANALYZE VERBOSE log_details;
> -- Still Merge Join:
> --   ->  Merge Left Join  (cost=9102353.88..83786934.25 rows=2726186787
> width=16)
> --         Merge Cond: (l.id = d2000902.fk_id)
> --         ->  Merge Left Join  (cost=8926835.18..40288402.09 rows=972687282
> width=24)
> --               Merge Cond: (l.id = d2000904.fk_id)
> -- Default values again
> ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 100;
> ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 100;
> ANALYZE VERBOSE log_details;
>
> -- Tried to add WITHOUT SUCCESS (that planner could know that description is
> NOT NULL and UNIQE)
> DROP INDEX IF EXISTS i_key_description_desc;
> CREATE UNIQUE INDEX i_key_description_desc ON key_description (description);
> -- Therefore planner should know: keyid is NOT NULL and UNIQUE and only one
> result: (SELECT keyid FROM key_description WHERE description =
> 'Raumsolltemperatur')
> -- Therefore from constraint planner should know that fk_id is NOT NULL and
> UNIQUE: CONSTRAINT unique_key_and_id UNIQUE(fk_id, fk_keyid):
> -- LEFT JOIN log_details d1 ON l.id = d1.fk_id AND
> -- d1.fk_keyid = (SELECT keyid FROM key_description WHERE description =
> 'Raumsolltemperatur')
> -- Does the planner value alls those UNIQUEnesses and NOT NULLs?
>
> -- Again back to 8.3 query plan which is fast (319ms):
> SET enable_hashjoin=false;
> SET enable_mergejoin=false;
> --   ->  Nested Loop Left Join  (cost=0.00..22820970510.45 rows=2727492136
> width=16)
> --         ->  Nested Loop Left Join  (cost=0.00..12810087616.29
> rows=973121653 width=24)
> --               ->  Nested Loop Left Join  (cost=0.00..9238379092.22
> rows=347192844 width=24)
>
> Thnx.
>
> Ciao,
> Gerhard
>
> --
> http://www.wiesinger.com/
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Merlin Moncure

Дата:

13 сентября 2010 г., 18:34:49

On Mon, Sep 13, 2010 at 2:39 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
> Hello,
>
> Any news or ideas regarding this issue?

hm.  is retooling the query an option?  specifically, can you try converting

CREATE OR REPLACE VIEW log_entries AS
SELECT
  l.id AS id,
  l.datetime AS datetime,
  l.tdate AS tdate,
  l.ttime AS ttime,
  d1.value  AS Raumsolltemperatur,
  [...]
FROM
  log l
LEFT JOIN log_details d1 ON l.id = d1.fk_id AND
  d1.fk_keyid = (SELECT keyid FROM key_description WHERE description =
'Raumsolltemperatur')
  [...]

to

CREATE OR REPLACE VIEW log_entries AS
SELECT
  l.id AS id,
  l.datetime AS datetime,
  l.tdate AS tdate,
  l.ttime AS ttime,
  (select value from log_details ld join key_description kd on
ld.fk_keyid = kd.keyid where ld.fk_id = l.id and  description =
'Raumsolltemperatur') AS Raumsolltemperatur,
 [...]

(I am not 100% sure I have your head around your query, but I think I do)?
This should get you a guaranteed (although not necessarily 'the best'
plan, with each returned view column being treated independently of
the other (is that what you want?).  Also, if schema changes are under
consideration, you can play log_details/key_description, using natural
key and cut out one of the joins.  I can't speak to some of the more
complex planner issues at play, but your query absolutely screams
optimization at the SQL level.

What I am 100% sure of, is that you can get better performance if you
do a little out of the box thinking here...

merlin

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

14 сентября 2010 г., 06:09:03

Hello Merlin,

Seems to be a feasible approach. On problem which might be that when
multiple rows are returned that they are not ordered in each subselect
correctly. Any idea to solve that?

e.g.
Raumsolltemperatur | Raumisttemperatur
Value from time 1  | Value from time 2
Value from time 2  | Value from time 1

but should be
Raumsolltemperatur | Raumisttemperatur
Value from time 1  | Value from time 1
Value from time 2  | Value from time 2

But that might be solveable by first selecting keys from the log_details
table and then join again.

I will try it in the evening and I have to think about in detail.

But thank you for the new approach and opening the mind :-)

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Mon, 13 Sep 2010, Merlin Moncure wrote:

> On Mon, Sep 13, 2010 at 2:39 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
>> Hello,
>>
>> Any news or ideas regarding this issue?
>
> hm.  is retooling the query an option?  specifically, can you try converting
>
> CREATE OR REPLACE VIEW log_entries AS
> SELECT
>  l.id AS id,
>  l.datetime AS datetime,
>  l.tdate AS tdate,
>  l.ttime AS ttime,
>  d1.value  AS Raumsolltemperatur,
>  [...]
> FROM
>  log l
> LEFT JOIN log_details d1 ON l.id = d1.fk_id AND
>  d1.fk_keyid = (SELECT keyid FROM key_description WHERE description =
> 'Raumsolltemperatur')
>  [...]
>
> to
>
> CREATE OR REPLACE VIEW log_entries AS
> SELECT
>  l.id AS id,
>  l.datetime AS datetime,
>  l.tdate AS tdate,
>  l.ttime AS ttime,
>  (select value from log_details ld join key_description kd on
> ld.fk_keyid = kd.keyid where ld.fk_id = l.id and  description =
> 'Raumsolltemperatur') AS Raumsolltemperatur,
> [...]
>
> (I am not 100% sure I have your head around your query, but I think I do)?
> This should get you a guaranteed (although not necessarily 'the best'
> plan, with each returned view column being treated independently of
> the other (is that what you want?).  Also, if schema changes are under
> consideration, you can play log_details/key_description, using natural
> key and cut out one of the joins.  I can't speak to some of the more
> complex planner issues at play, but your query absolutely screams
> optimization at the SQL level.
>
> What I am 100% sure of, is that you can get better performance if you
> do a little out of the box thinking here...
>
> merlin
>

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Merlin Moncure

Дата:

14 сентября 2010 г., 12:02:29

On Tue, Sep 14, 2010 at 2:07 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
> Hello Merlin,
>
> Seems to be a feasible approach. On problem which might be that when
> multiple rows are returned that they are not ordered in each subselect
> correctly. Any idea to solve that?
>
> e.g.
> Raumsolltemperatur | Raumisttemperatur
> Value from time 1  | Value from time 2
> Value from time 2  | Value from time 1
>
> but should be
> Raumsolltemperatur | Raumisttemperatur
> Value from time 1  | Value from time 1
> Value from time 2  | Value from time 2
>
> But that might be solveable by first selecting keys from the log_details
> table and then join again.
>
> I will try it in the evening and I have to think about in detail.
>
> But thank you for the new approach and opening the mind :-)

Using subquery in that style select (<subquery>), ... is limited to
results that return 1 row, 1 column.  I assumed that was the case...if
it isn't in your view, you can always attempt arrays:

CREATE OR REPLACE VIEW log_entries AS
SELECT
 l.id AS id,
 l.datetime AS datetime,
 l.tdate AS tdate,
 l.ttime AS ttime,
 array(select value from log_details ld join key_description kd on
ld.fk_keyid = kd.keyid where ld.fk_id = l.id and  description =
'Raumsolltemperatur' order by XYZ) AS Raumsolltemperatur,
 [...]

arrays might raise the bar somewhat in terms of dealing with the
returned data, or they might work great.  some experimentation is in
order.

XYZ being the ordering condition you want.  If that isn't available
inside the join then we need to think about this some more.  We could
probably help more if you could describe the schema in a little more
detail.  This is solvable.

merlin

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

14 сентября 2010 г., 20:00:36

On Tue, 14 Sep 2010, Merlin Moncure wrote:

> On Tue, Sep 14, 2010 at 2:07 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
>> Hello Merlin,
>>
>> Seems to be a feasible approach. On problem which might be that when
>> multiple rows are returned that they are not ordered in each subselect
>> correctly. Any idea to solve that?
>>
>> e.g.
>> Raumsolltemperatur | Raumisttemperatur
>> Value from time 1  | Value from time 2
>> Value from time 2  | Value from time 1
>>
>> but should be
>> Raumsolltemperatur | Raumisttemperatur
>> Value from time 1  | Value from time 1
>> Value from time 2  | Value from time 2
>>
>> But that might be solveable by first selecting keys from the log_details
>> table and then join again.
>>
>> I will try it in the evening and I have to think about in detail.
>>
>> But thank you for the new approach and opening the mind :-)
>
> Using subquery in that style select (<subquery>), ... is limited to
> results that return 1 row, 1 column.  I assumed that was the case...if
> it isn't in your view, you can always attempt arrays:
>
> CREATE OR REPLACE VIEW log_entries AS
> SELECT
> l.id AS id,
> l.datetime AS datetime,
> l.tdate AS tdate,
> l.ttime AS ttime,
> array(select value from log_details ld join key_description kd on
> ld.fk_keyid = kd.keyid where ld.fk_id = l.id and  description =
> 'Raumsolltemperatur' order by XYZ) AS Raumsolltemperatur,
> [...]
>
> arrays might raise the bar somewhat in terms of dealing with the
> returned data, or they might work great.  some experimentation is in
> order.
>
> XYZ being the ordering condition you want.  If that isn't available
> inside the join then we need to think about this some more.  We could
> probably help more if you could describe the schema in a little more
> detail.  This is solvable.

Of course, subquery is limited to a result set returning 1 row and 1
column. Also order is of course preserved because of the join.

Further, I think I found a perfect query plan for the EAV pattern.

First I tried your suggestion but there were some limitation with O(n^2)
efforts (e.g. nested loops=12586 and also index scans with loop 12586):

CREATE OR REPLACE VIEW log_entries_test AS
SELECT
   l.id AS id,
   l.datetime AS datetime,
   l.tdate AS tdate,
   l.ttime AS ttime,
   (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = kd.keyid WHERE l.id = d.fk_id AND
kd.description= 'Raumsolltemperatur') AS Raumsolltemperatur, 
   (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = kd.keyid WHERE l.id = d.fk_id AND
kd.description= 'Raumtemperatur') AS Raumtemperatur, 
   (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = kd.keyid WHERE l.id = d.fk_id AND
kd.description= 'Kesselsolltemperatur') AS Kesselsolltemperatur, 
   (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = kd.keyid WHERE l.id = d.fk_id AND
kd.description= 'Kesseltemperatur') AS Kesseltemperatur, 
....
FROM
   log l
;


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() - INTERVAL '10 days' ORDER BY datetime DESC;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan Backward using i_log_unique on log l  (cost=0.00..140820.12 rows=69 width=32) (actual time=2.848..22812.331
rows=12586loops=1) 
   Index Cond: (datetime > (now() - '10 days'::interval))
   SubPlan 1
     ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual time=0.007..0.018 rows=1 loops=12586)
           ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1 width=8) (actual time=0.003..0.013 rows=1
loops=12586)
                 Filter: ((description)::text = 'Raumsolltemperatur'::text)
           ->  Index Scan using unique_key_and_id on log_details d  (cost=0.00..17.60 rows=1 width=16) (actual
time=0.004..0.004rows=1 loops=12586) 
                 Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
   SubPlan 2
     ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual time=0.006..0.017 rows=1 loops=12586)
           ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1 width=8) (actual time=0.003..0.013 rows=1
loops=12586)
                 Filter: ((description)::text = 'Raumtemperatur'::text)
           ->  Index Scan using unique_key_and_id on log_details d  (cost=0.00..17.60 rows=1 width=16) (actual
time=0.002..0.003rows=1 loops=12586) 
                 Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
   SubPlan 3
     ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual time=0.005..0.017 rows=1 loops=12586)
           ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1 width=8) (actual time=0.002..0.013 rows=1
loops=12586)
                 Filter: ((description)::text = 'Kesselsolltemperatur'::text)
           ->  Index Scan using unique_key_and_id on log_details d  (cost=0.00..17.60 rows=1 width=16) (actual
time=0.003..0.003rows=1 loops=12586) 
                 Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
   SubPlan 4
     ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual time=0.006..0.017 rows=1 loops=12586)
           ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1 width=8) (actual time=0.002..0.013 rows=1
loops=12586)
                 Filter: ((description)::text = 'Kesseltemperatur'::text)
           ->  Index Scan using unique_key_and_id on log_details d  (cost=0.00..17.60 rows=1 width=16) (actual
time=0.002..0.003rows=1 loops=12586) 
                 Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
   SubPlan 5
     ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual time=0.005..0.017 rows=1 loops=12586)
           ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1 width=8) (actual time=0.002..0.014 rows=1
loops=12586)
                 Filter: ((description)::text = 'Speichersolltemperatur'::text)
           ->  Index Scan using unique_key_and_id on log_details d  (cost=0.00..17.60 rows=1 width=16) (actual
time=0.002..0.003rows=1 loops=12586) 
                 Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
   SubPlan 6
     ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual time=0.006..0.017 rows=1 loops=12586)
           ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1 width=8) (actual time=0.003..0.013 rows=1
loops=12586)
                 Filter: ((description)::text = 'Speichertemperatur'::text)
           ->  Index Scan using unique_key_and_id on log_details d  (cost=0.00..17.60 rows=1 width=16) (actual
time=0.002..0.003rows=1 loops=12586) 
                 Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Therefore I optimized the query further which can be done in the
following way with another subquery and IHMO a perfect query plan. Also
the subselect avoid multiple iterations for each of the result rows:

CREATE OR REPLACE VIEW log_entries_test AS
SELECT
   l.id AS id,
   l.datetime AS datetime,
   l.tdate AS tdate,
   l.ttime AS ttime,
   (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = (SELECT keyid FROM key_description WHERE
description= 'Raumsolltemperatur')) AS Raumsolltemperatur, 
   (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = (SELECT keyid FROM key_description WHERE
description= 'Raumtemperatur')) AS Raumtemperatur, 
   (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = (SELECT keyid FROM key_description WHERE
description= 'Kesselsolltemperatur')) AS Kesselsolltemperatur, 
   (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = (SELECT keyid FROM key_description WHERE
description= 'Kesseltemperatur')) AS Kesseltemperatur, 
...
FROM
   log l
;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() - INTERVAL '10 days' ORDER BY datetime DESC;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan Backward using i_log_unique on log l  (cost=0.00..140603.99 rows=69 width=32) (actual time=2.588..5602.899
rows=12586loops=1) 
   Index Cond: (datetime > (now() - '10 days'::interval))
   SubPlan 2
     ->  Index Scan using unique_key_and_id on log_details d  (cost=2.38..19.97 rows=1 width=8) (actual
time=0.010..0.011rows=1 loops=12586) 
           Index Cond: (($1 = fk_id) AND (fk_keyid = $0))
           InitPlan 1 (returns $0)
             ->  Seq Scan on key_description  (cost=0.00..2.38 rows=1 width=8) (actual time=0.015..0.066 rows=1
loops=1)
                   Filter: ((description)::text = 'Raumsolltemperatur'::text)
   SubPlan 4
     ->  Index Scan using unique_key_and_id on log_details d  (cost=2.38..19.97 rows=1 width=8) (actual
time=0.003..0.003rows=1 loops=12586) 
           Index Cond: (($1 = fk_id) AND (fk_keyid = $2))
           InitPlan 3 (returns $2)
             ->  Seq Scan on key_description  (cost=0.00..2.38 rows=1 width=8) (actual time=0.009..0.020 rows=1
loops=1)
                   Filter: ((description)::text = 'Raumtemperatur'::text)
   SubPlan 6
     ->  Index Scan using unique_key_and_id on log_details d  (cost=2.38..19.97 rows=1 width=8) (actual
time=0.002..0.003rows=1 loops=12586) 
           Index Cond: (($1 = fk_id) AND (fk_keyid = $3))
           InitPlan 5 (returns $3)
             ->  Seq Scan on key_description  (cost=0.00..2.38 rows=1 width=8) (actual time=0.005..0.017 rows=1
loops=1)
                   Filter: ((description)::text = 'Kesselsolltemperatur'::text)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

BTW: Schemadata is in the links discussed in the thread

Thnx to all for helping me.

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Merlin Moncure

Дата:

14 сентября 2010 г., 21:05:33

On Tue, Sep 14, 2010 at 3:59 PM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
> On Tue, 14 Sep 2010, Merlin Moncure wrote:
>
>> On Tue, Sep 14, 2010 at 2:07 AM, Gerhard Wiesinger <lists@wiesinger.com>
>> wrote:
>>>
>>> Hello Merlin,
>>>
>>> Seems to be a feasible approach. On problem which might be that when
>>> multiple rows are returned that they are not ordered in each subselect
>>> correctly. Any idea to solve that?
>>>
>>> e.g.
>>> Raumsolltemperatur | Raumisttemperatur
>>> Value from time 1  | Value from time 2
>>> Value from time 2  | Value from time 1
>>>
>>> but should be
>>> Raumsolltemperatur | Raumisttemperatur
>>> Value from time 1  | Value from time 1
>>> Value from time 2  | Value from time 2
>>>
>>> But that might be solveable by first selecting keys from the log_details
>>> table and then join again.
>>>
>>> I will try it in the evening and I have to think about in detail.
>>>
>>> But thank you for the new approach and opening the mind :-)
>>
>> Using subquery in that style select (<subquery>), ... is limited to
>> results that return 1 row, 1 column.  I assumed that was the case...if
>> it isn't in your view, you can always attempt arrays:
>>
>> CREATE OR REPLACE VIEW log_entries AS
>> SELECT
>> l.id AS id,
>> l.datetime AS datetime,
>> l.tdate AS tdate,
>> l.ttime AS ttime,
>> array(select value from log_details ld join key_description kd on
>> ld.fk_keyid = kd.keyid where ld.fk_id = l.id and  description =
>> 'Raumsolltemperatur' order by XYZ) AS Raumsolltemperatur,
>> [...]
>>
>> arrays might raise the bar somewhat in terms of dealing with the
>> returned data, or they might work great.  some experimentation is in
>> order.
>>
>> XYZ being the ordering condition you want.  If that isn't available
>> inside the join then we need to think about this some more.  We could
>> probably help more if you could describe the schema in a little more
>> detail.  This is solvable.
>
> Of course, subquery is limited to a result set returning 1 row and 1 column.
> Also order is of course preserved because of the join.
>
> Further, I think I found a perfect query plan for the EAV pattern.
>
> First I tried your suggestion but there were some limitation with O(n^2)
> efforts (e.g. nested loops=12586 and also index scans with loop 12586):
>
> CREATE OR REPLACE VIEW log_entries_test AS
> SELECT
>  l.id AS id,
>  l.datetime AS datetime,
>  l.tdate AS tdate,
>  l.ttime AS ttime,
>  (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid =
> kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Raumsolltemperatur') AS
> Raumsolltemperatur,
>  (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid =
> kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Raumtemperatur') AS
> Raumtemperatur,
>  (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid =
> kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Kesselsolltemperatur')
> AS Kesselsolltemperatur,
>  (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid =
> kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Kesseltemperatur') AS
> Kesseltemperatur,
> ....
> FROM
>  log l
> ;
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() -
> INTERVAL '10 days' ORDER BY datetime DESC;
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Index Scan Backward using i_log_unique on log l  (cost=0.00..140820.12
> rows=69 width=32) (actual time=2.848..22812.331 rows=12586 loops=1)
>  Index Cond: (datetime > (now() - '10 days'::interval))
>  SubPlan 1
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.007..0.018 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.003..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Raumsolltemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.004..0.004 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 2
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.006..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.003..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Raumtemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 3
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.005..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.002..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Kesselsolltemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.003..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 4
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.006..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.002..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Kesseltemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 5
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.005..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.002..0.014 rows=1 loops=12586)
>                Filter: ((description)::text =
> 'Speichersolltemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 6
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.006..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.003..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Speichertemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Therefore I optimized the query further which can be done in the following
> way with another subquery and IHMO a perfect query plan. Also the subselect
> avoid multiple iterations for each of the result rows:
>
> CREATE OR REPLACE VIEW log_entries_test AS
> SELECT
>  l.id AS id,
>  l.datetime AS datetime,
>  l.tdate AS tdate,
>  l.ttime AS ttime,
>  (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
> (SELECT keyid FROM key_description WHERE description =
> 'Raumsolltemperatur')) AS Raumsolltemperatur,
>  (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
> (SELECT keyid FROM key_description WHERE description = 'Raumtemperatur')) AS
> Raumtemperatur,
>  (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
> (SELECT keyid FROM key_description WHERE description =
> 'Kesselsolltemperatur')) AS Kesselsolltemperatur,
>  (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
> (SELECT keyid FROM key_description WHERE description = 'Kesseltemperatur'))
> AS Kesseltemperatur,
> ...
> FROM
>  log l
> ;
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() -
> INTERVAL '10 days' ORDER BY datetime DESC;
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Index Scan Backward using i_log_unique on log l  (cost=0.00..140603.99
> rows=69 width=32) (actual time=2.588..5602.899 rows=12586 loops=1)
>  Index Cond: (datetime > (now() - '10 days'::interval))
>  SubPlan 2
>    ->  Index Scan using unique_key_and_id on log_details d
>  (cost=2.38..19.97 rows=1 width=8) (actual time=0.010..0.011 rows=1
> loops=12586)
>          Index Cond: (($1 = fk_id) AND (fk_keyid = $0))
>          InitPlan 1 (returns $0)
>            ->  Seq Scan on key_description  (cost=0.00..2.38 rows=1 width=8)
> (actual time=0.015..0.066 rows=1 loops=1)
>                  Filter: ((description)::text = 'Raumsolltemperatur'::text)
>  SubPlan 4
>    ->  Index Scan using unique_key_and_id on log_details d
>  (cost=2.38..19.97 rows=1 width=8) (actual time=0.003..0.003 rows=1
> loops=12586)
>          Index Cond: (($1 = fk_id) AND (fk_keyid = $2))
>          InitPlan 3 (returns $2)
>            ->  Seq Scan on key_description  (cost=0.00..2.38 rows=1 width=8)
> (actual time=0.009..0.020 rows=1 loops=1)
>                  Filter: ((description)::text = 'Raumtemperatur'::text)
>  SubPlan 6
>    ->  Index Scan using unique_key_and_id on log_details d
>  (cost=2.38..19.97 rows=1 width=8) (actual time=0.002..0.003 rows=1
> loops=12586)
>          Index Cond: (($1 = fk_id) AND (fk_keyid = $3))
>          InitPlan 5 (returns $3)
>            ->  Seq Scan on key_description  (cost=0.00..2.38 rows=1 width=8)
> (actual time=0.005..0.017 rows=1 loops=1)
>                  Filter: ((description)::text =
> 'Kesselsolltemperatur'::text)
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> BTW: Schemadata is in the links discussed in the thread
>
> Thnx to all for helping me.

np -- this felt particularly satisfying for some reason. btw, I think
you have some more low hanging optimization fruit.  I think (although
it would certainly have to be tested) hiding your attribute
description under keyid is buying you nothing but headaches.  If you
used natural key style, making description primary key of
key_description (or unique), and had log_details have a description
column that directly referenced that column, your subquery:

(
  SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
  (
    SELECT keyid FROM key_description WHERE description = 'Kesselsolltemperatur'
  )
) AS Kesselsolltemperatur,

would look like this:
(
  SELECT value FROM log_details d WHERE l.id = d.fk_id AND
d.description = 'Kesselsolltemperatur'
) AS Kesselsolltemperatur,

your index on log_details(fk_id, description) is of course fatter, but
quite precise...does require rebuilding your entire dataset however.
food for thought.

merlin

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

15 сентября 2010 г., 06:32:53

On Tue, 14 Sep 2010, Merlin Moncure wrote:
> np -- this felt particularly satisfying for some reason. btw, I think
> you have some more low hanging optimization fruit.  I think (although
> it would certainly have to be tested) hiding your attribute
> description under keyid is buying you nothing but headaches.  If you
> used natural key style, making description primary key of
> key_description (or unique), and had log_details have a description
> column that directly referenced that column, your subquery:
>
> (
>  SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
>  (
>    SELECT keyid FROM key_description WHERE description = 'Kesselsolltemperatur'
>  )
> ) AS Kesselsolltemperatur,
>
> would look like this:
> (
>  SELECT value FROM log_details d WHERE l.id = d.fk_id AND
> d.description = 'Kesselsolltemperatur'
> ) AS Kesselsolltemperatur,
>
> your index on log_details(fk_id, description) is of course fatter, but
> quite precise...does require rebuilding your entire dataset however.
> food for thought.

I think your suggestion might be slower because the WHERE clause and
possible JOINS with BIGINT is much faster (especially when a lot of data
is queried) than with a VARCHAR. With the latest query plan
key_description is only queried once per subselect which is perfect. I've
also chosen that indirection that I can change description without
changing too much in data model and all data rows on refactoring.

@Tom: Do you think of planner enhancements regarding such situations where
JOINS are "converted" to subselects?

BTW: I had a small bug in the queries and in the code that one description
was wrong (one space too much: 'Meldung F4 2. Zeile' => 'Meldung F4 2. Zeile').
With this indirect data model this is very easy to change: Change
the view and change one code line. With your suggested data model I would
have to update millions of rows ...

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Merlin Moncure

Дата:

15 сентября 2010 г., 11:53:09

On Wed, Sep 15, 2010 at 2:32 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
> On Tue, 14 Sep 2010, Merlin Moncure wrote:
>>
>> np -- this felt particularly satisfying for some reason. btw, I think
>> you have some more low hanging optimization fruit.  I think (although
>> it would certainly have to be tested) hiding your attribute
>> description under keyid is buying you nothing but headaches.  If you
>> used natural key style, making description primary key of
>> key_description (or unique), and had log_details have a description
>> column that directly referenced that column, your subquery:
>>
>> (
>>  SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
>>  (
>>   SELECT keyid FROM key_description WHERE description =
>> 'Kesselsolltemperatur'
>>  )
>> ) AS Kesselsolltemperatur,
>>
>> would look like this:
>> (
>>  SELECT value FROM log_details d WHERE l.id = d.fk_id AND
>> d.description = 'Kesselsolltemperatur'
>> ) AS Kesselsolltemperatur,
>>
>> your index on log_details(fk_id, description) is of course fatter, but
>> quite precise...does require rebuilding your entire dataset however.
>> food for thought.
>
> I think your suggestion might be slower because the WHERE clause and
> possible JOINS with BIGINT is much faster (especially when a lot of data is
> queried) than with a VARCHAR. With the latest query plan key_description is
> only queried once per subselect which is perfect. I've also chosen that
> indirection that I can change description without changing too much in data
> model and all data rows on refactoring.

You're not joining -- you're filtering (and your assumption that
bigint is always going to be faster is quite debatable depending on
circumstances).  The join is skipped because of the key (yes, it's
cheap lookup, but w/50 columns each doing it, nothing is cheap).

merlin

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Gerhard Wiesinger

Дата:

15 сентября 2010 г., 18:43:16

On Wed, 15 Sep 2010, Merlin Moncure wrote:

> On Wed, Sep 15, 2010 at 2:32 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
>> On Tue, 14 Sep 2010, Merlin Moncure wrote:
>>>
>>> np -- this felt particularly satisfying for some reason. btw, I think
>>> you have some more low hanging optimization fruit.  I think (although
>>> it would certainly have to be tested) hiding your attribute
>>> description under keyid is buying you nothing but headaches.  If you
>>> used natural key style, making description primary key of
>>> key_description (or unique), and had log_details have a description
>>> column that directly referenced that column, your subquery:
>>>
>>> (
>>>  SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
>>>  (
>>>   SELECT keyid FROM key_description WHERE description =
>>> 'Kesselsolltemperatur'
>>>  )
>>> ) AS Kesselsolltemperatur,
>>>
>>> would look like this:
>>> (
>>>  SELECT value FROM log_details d WHERE l.id = d.fk_id AND
>>> d.description = 'Kesselsolltemperatur'
>>> ) AS Kesselsolltemperatur,
>>>
>>> your index on log_details(fk_id, description) is of course fatter, but
>>> quite precise...does require rebuilding your entire dataset however.
>>> food for thought.
>>
>> I think your suggestion might be slower because the WHERE clause and
>> possible JOINS with BIGINT is much faster (especially when a lot of data is
>> queried) than with a VARCHAR. With the latest query plan key_description is
>> only queried once per subselect which is perfect. I've also chosen that
>> indirection that I can change description without changing too much in data
>> model and all data rows on refactoring.
>
> You're not joining -- you're filtering (and your assumption that
> bigint is always going to be faster is quite debatable depending on
> circumstances).  The join is skipped because of the key (yes, it's
> cheap lookup, but w/50 columns each doing it, nothing is cheap).

I know that I'm not JOINing in that case - as discussed I ment possible
JOINs in other query scenarios.

BTW: Latest query plan is also optimal that only the
used columns from the view are evaluated. With the full joined version
all columns where used even when dropped in the result-set, e.g.:
SELECT col1, col2 FROM view1; -- Equivalent to SELECT * FROM view1; as col1, col2 are all colums in that view
SELECT col1 FROM view1; -- less effort with subselects when less columns are needed, joins have same "full view" effort
here

Ciao,
Gerhard

--
http://www.wiesinger.com/

Re: Major performance problem after upgrade from 8.3 to 8.4

От

Marc Antonio

Дата:

07 января 2011 г., 07:55:24

Hi,

I had a similar problem with many left join, reading about planning
optimization i tried to edit postgresql.conf and uncommented the line
join_collapse_limit = 8 and set it to 1, disables collapsing of explicit .
My query its taking 2000s in 8.4 and the same query 2ms in 8.3. Now its
working fast in 8.4.

Best regards,

Marc
--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Major-performance-problem-after-upgrade-from-8-3-to-8-4-tp2796390p3329435.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Major performance problem after upgrade from 8.3 to 8.4