Обсуждение: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

Поиск
Список
Период
Сортировка

500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
El-Lotso
Дата:
Hi,

appreciate if someone can have some pointers for this.

PG.8.2.4 1.4G centrino(s/core) 1.5GB ram/5400rpm laptop HD

3 mail tables which has already been selected "out" into separate tables
(useing create table foo as select * from foo_main where x=y)

These test tables containing only a very small subset of the main data's
table (max 1k to 10k rows vs 1.5mill to 7mill rows in the main table)

table definitions and actual query are attached. (names has been altered
to protect the innocent)

I've played around with some tweaking of the postgres.conf setting per
guidance from jdavis (in irc) w/o much(any) improvement. Also tried
re-writing the queries to NOT use subselects (per depesz in irc also)
also yielded nothing spectacular.

The only thing I noticed was that when the subqueries combine more than
3 tables, then PG will choke. If only at 3 joined tables per subquery,
the results come out fast, even for 6K rows.

but if the subqueries (these subqueries by itself, executes fast and
returns results in 1 to 10secs) were done independently and then placed
into a temp table, and then finally joined together using a query such
as

select a.a,b.b,c.c from a inner join b on (x = x) left outer join c on(x
= y)

then it would also be fast

work_mem = 8MB / 32MB /128MB (32 MB default in my setup)
effective_Cache_size = 128MB/500MB (500 default)
shared_buffers = 200MB
geqo_threshold = 5 (default 12)
geqo_effort = 2 (default 5)
ramdom_page_cose = 8.0 (default 4)
maintenance_work_mem = 64MB
join_collapse_limit = 1/8/15  (8 default)
from_collapse_limit = 1/8/15 (8 default)
enable_nestloop = f (on by default)

based on current performance, even with a small number of rows in the
individual tables (max 20k), I can't even get a result out in 2 hours.
(> 3 tables joined per subquery) which is making me re-think of PG's
useful-ness.



BTW, I also tried 8.2.4 CVS_STABLE Branch

Вложения

Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
El-Lotso
Дата:
sorry.. I sent this as I was about to go to bed and the explain analyse
of the query w/ 4 tables joined per subquery came out.

So.. attaching it..

On Wed, 2007-09-12 at 00:57 +0800, El-Lotso wrote:
> Hi,
>
> appreciate if someone can have some pointers for this.
>
> PG.8.2.4 1.4G centrino(s/core) 1.5GB ram/5400rpm laptop HD
>
> 3 mail tables which has already been selected "out" into separate tables
> (useing create table foo as select * from foo_main where x=y)
>
> These test tables containing only a very small subset of the main data's
> table (max 1k to 10k rows vs 1.5mill to 7mill rows in the main table)
>
> table definitions and actual query are attached. (names has been altered
> to protect the innocent)
>
> I've played around with some tweaking of the postgres.conf setting per
> guidance from jdavis (in irc) w/o much(any) improvement. Also tried
> re-writing the queries to NOT use subselects (per depesz in irc also)
> also yielded nothing spectacular.
>
> The only thing I noticed was that when the subqueries combine more than
> 3 tables, then PG will choke. If only at 3 joined tables per subquery,
> the results come out fast, even for 6K rows.
>
> but if the subqueries (these subqueries by itself, executes fast and
> returns results in 1 to 10secs) were done independently and then placed
> into a temp table, and then finally joined together using a query such
> as
>
> select a.a,b.b,c.c from a inner join b on (x = x) left outer join c on(x
> = y)
>
> then it would also be fast
>
> work_mem = 8MB / 32MB /128MB (32 MB default in my setup)
> effective_Cache_size = 128MB/500MB (500 default)
> shared_buffers = 200MB
> geqo_threshold = 5 (default 12)
> geqo_effort = 2 (default 5)
> ramdom_page_cose = 8.0 (default 4)
> maintenance_work_mem = 64MB
> join_collapse_limit = 1/8/15  (8 default)
> from_collapse_limit = 1/8/15 (8 default)
> enable_nestloop = f (on by default)
>
> based on current performance, even with a small number of rows in the
> individual tables (max 20k), I can't even get a result out in 2 hours.
> (> 3 tables joined per subquery) which is making me re-think of PG's
> useful-ness.
>
>
>
> BTW, I also tried 8.2.4 CVS_STABLE Branch

Вложения

Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
Tom Lane
Дата:
El-Lotso <el.lotso@gmail.com> writes:
> sorry.. I sent this as I was about to go to bed and the explain analyse
> of the query w/ 4 tables joined per subquery came out.

It's those factor-of-1000 misestimates of the join sizes that are
killing you, eg this one:

>               ->  Hash Join  (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1)
>                     Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp =
test_db.trd.start_timestamp)AND (test_db.ts.ttype = test_db.trd.ttype)) 
>                     ->  Seq Scan on ts  (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916 rows=3244
loops=1)
>                     ->  Hash  (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016 loops=1)

The single-row-result estimate persuades it to use a nestloop at the
next level up, and then when the output is actually 969 rows, that
means 969 executions of the other side of the upper join.

The two input size estimates are reasonably close to reality, so
the problem seems to be in the estimate of selectivity of the
join condition.  First off, do you have up-to-date statistics
for all the columns being joined here?  It might be that
increasing the statistics targets for those columns would help.

But what I'm a bit worried about is the idea that the join
conditions are correlated or even outright redundant; the
planner will not know that, and will make an unrealistic
estimate of their combined selectivity.  If that's the
case, you might need to redesign the table schema to
eliminate the redundancy before you'll get good plans.

            regards, tom lane

Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
El-Lotso
Дата:
On Tue, 2007-09-11 at 14:23 -0400, Tom Lane wrote:
> El-Lotso <el.lotso@gmail.com> writes:
> > sorry.. I sent this as I was about to go to bed and the explain analyse
> > of the query w/ 4 tables joined per subquery came out.
>
> It's those factor-of-1000 misestimates of the join sizes that are
> killing you, eg this one:
>
> >               ->  Hash Join  (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1)
> >                     Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp =
test_db.trd.start_timestamp)AND (test_db.ts.ttype = test_db.trd.ttype)) 
> >                     ->  Seq Scan on ts  (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916 rows=3244
loops=1)
> >                     ->  Hash  (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016
loops=1)
>
> The single-row-result estimate persuades it to use a nestloop at the
> next level up, and then when the output is actually 969 rows, that
> means 969 executions of the other side of the upper join.

Yep.. that's consistent with the larger results output. more rows = more
loops

>
> The two input size estimates are reasonably close to reality, so
> the problem seems to be in the estimate of selectivity of the
> join condition.  First off, do you have up-to-date statistics
> for all the columns being joined here?  It might be that
> increasing the statistics targets for those columns would help.

I've already upped the stats level to 1000, reindex, vacuum, analysed
etc but nothing has basically changed. The issue here is mainly because
for each id, there is between 2 to 8 hid.

eg:
table d
seq : 1234567 / code : CED89

table trh
seq : 123456
hid : 0/1/2/3/4/5/6/7

and the prob is also compounded by the different ttypes available which
causes the use of the subqueries.

end of the day.. this data output is desired

ID    HID
===========
1234567  |0
1234567  |1
1234567  |2
1234567  |3
1234567  |4
1234567  |5
1234567  |6
1234567  |7

the d table has the unique id whereas the other tables has all the
subsets. Like a family tree.. Starts at 2, (mom/pop) then to children +
children's grandchildren (pair1) children's grandchildren(pair2)

d to trh is a one to many relationship

> But what I'm a bit worried about is the idea that the join
> conditions are correlated or even outright redundant; the
> planner will not know that, and will make an unrealistic
> estimate of their combined selectivity.  If that's the
> case, you might need to redesign the table schema to
> eliminate the redundancy before you'll get good plans.

I'm not I understand (actually, i don't) the above comment. I've already
made then from subqueries to actual joins (collapse it) and still no
dice.

btw, this same schema runs fine on SQL server. (which I'm pulling data
from and pumping into PG)

I'm downgrading to 8.1.9 to see if it helps too.

appreciate any pointers at all.


Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
El-Lotso
Дата:
On Wed, 2007-09-12 at 10:15 +0800, El-Lotso wrote:
> I'm downgrading to 8.1.9 to see if it helps too.\

Nope : Doesn't help at all.. the number of rows at the nested loop and
hash joins are still 1 to 500 ratio. This plan is slightly different in
that PG is choosing seq_scans

Nested Loop Left Join  (cost=2604.28..4135.15 rows=1 width=59) (actual time=249.973..15778.157 rows=528 loops=1)
  Join Filter: ((("inner".id)::text = ("outer".id)::text) AND ("inner".hid = "outer".hid) AND ("inner".seq_time =
"outer".seq_time)AND ("inner".seq_date = "outer".seq_date)) 
  ->  Nested Loop Left Join  (cost=1400.08..2766.23 rows=1 width=67) (actual time=168.375..8002.573 rows=528 loops=1)
        Join Filter: ((("inner".id)::text = ("outer".id)::text) AND ("inner".hid = "outer".hid) AND ("inner".seq_time =
"outer".seq_time)AND ("inner".seq_date = "outer".seq_date)) 
        ->  Hash Join  (cost=127.25..1328.68 rows=1 width=59) (actual time=74.195..84.855 rows=528 loops=1)
              Hash Cond: ((("outer".id)::text = ("inner".id)::text) AND ("outer".ttype = "inner".ttype) AND
("outer".start_timestamp= "inner".start_timestamp)) 
              ->  Seq Scan on trh  (cost=0.00..1060.18 rows=9416 width=36) (actual time=0.022..53.830 rows=9416
loops=1)
                    Filter: ((ttype = 35) OR (ttype = 75) OR (ttype = 703) OR (ttype = 740) OR (ttype = 764))
              ->  Hash  (cost=125.53..125.53 rows=230 width=63) (actual time=12.487..12.487 rows=192 loops=1)
                    ->  Hash Join  (cost=18.69..125.53 rows=230 width=63) (actual time=11.043..12.007 rows=192 loops=1)
                          Hash Cond: (("outer".id)::text = ("inner".id)::text)
                          ->  Seq Scan on ts  (cost=0.00..87.36 rows=3436 width=40) (actual time=0.003..5.436 rows=3436
loops=1)
                          ->  Hash  (cost=18.57..18.57 rows=48 width=23) (actual time=0.876..0.876 rows=48 loops=1)
                                ->  Seq Scan on d  (cost=0.00..18.57 rows=48 width=23) (actual time=0.019..0.771
rows=48loops=1) 
                                      Filter: ((record_update_date_time >= '2007-08-20 00:00:00'::timestamp without
timezone) AND (record_update_date_time <= '2007-09-08 00:00:00'::timestamp without time zone) AND ((code)::text =
'HUA75'::text))
        ->  Hash Join  (cost=1272.83..1437.52 rows=1 width=61) (actual time=11.784..14.216 rows=504 loops=528)
              Hash Cond: ((("outer".id)::text = ("inner".id)::text) AND ("outer".ttype = "inner".ttype) AND
("outer".start_timestamp= "inner".start_timestamp)) 
              ->  Seq Scan on ts  (cost=0.00..87.36 rows=3436 width=40) (actual time=0.003..5.744 rows=3436 loops=528)
              ->  Hash  (cost=1268.29..1268.29 rows=606 width=59) (actual time=82.783..82.783 rows=504 loops=1)
                    ->  Hash Join  (cost=18.69..1268.29 rows=606 width=59) (actual time=76.454..81.515 rows=504
loops=1)
                          Hash Cond: (("outer".id)::text = ("inner".id)::text)
                          ->  Seq Scan on trh  (cost=0.00..1198.22 rows=9064 width=36) (actual time=0.051..66.555
rows=9064loops=1) 
                                Filter: ((ttype = 69) OR (ttype = 178) OR (ttype = 198) OR (ttype = 704) OR (ttype =
757)OR (ttype = 741) OR (ttype = 765)) 
                          ->  Hash  (cost=18.57..18.57 rows=48 width=23) (actual time=0.863..0.863 rows=48 loops=1)
                                ->  Seq Scan on d  (cost=0.00..18.57 rows=48 width=23) (actual time=0.019..0.761
rows=48loops=1) 
                                      Filter: ((record_update_date_time >= '2007-08-20 00:00:00'::timestamp without
timezone) AND (record_update_date_time <= '2007-09-08 00:00:00'::timestamp without time zone) AND ((code)::text =
'HUA75'::text))
  ->  Hash Join  (cost=1204.20..1368.89 rows=1 width=61) (actual time=11.498..13.941 rows=504 loops=528)
        Hash Cond: ((("outer".id)::text = ("inner".id)::text) AND ("outer".ttype = "inner".ttype) AND
("outer".start_timestamp= "inner".start_timestamp)) 
        ->  Seq Scan on ts  (cost=0.00..87.36 rows=3436 width=40) (actual time=0.003..5.593 rows=3436 loops=528)
        ->  Hash  (cost=1199.62..1199.62 rows=610 width=59) (actual time=70.186..70.186 rows=504 loops=1)
              ->  Hash Join  (cost=18.69..1199.62 rows=610 width=59) (actual time=64.270..68.886 rows=504 loops=1)
                    Hash Cond: (("outer".id)::text = ("inner".id)::text)
                    ->  Seq Scan on trh  (cost=0.00..1129.20 rows=9128 width=36) (actual time=0.020..54.050 rows=9128
loops=1)
                          Filter: ((ttype = 177) OR (ttype = 197) OR (ttype = 705) OR (ttype = 742) OR (ttype = 758) OR
(ttype= 766)) 
                    ->  Hash  (cost=18.57..18.57 rows=48 width=23) (actual time=1.100..1.100 rows=48 loops=1)
                          ->  Seq Scan on d  (cost=0.00..18.57 rows=48 width=23) (actual time=0.019..0.994 rows=48
loops=1)
                                Filter: ((record_update_date_time >= '2007-08-20 00:00:00'::timestamp without time
zone)AND (record_update_date_time <= '2007-09-08 00:00:00'::timestamp without time zone) AND ((code)::text =
'HUA75'::text))
Total runtime: 15779.769 ms

Am I screwed? Is a schema redesign really a necessity? This would be a
real pain given the rewrite of _all_ the queries and can't maintain
compatibility in the front-end app between sql server and PG.


Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
El-Lotso
Дата:
On Wed, 2007-09-12 at 10:15 +0800, El-Lotso wrote:
> On Tue, 2007-09-11 at 14:23 -0400, Tom Lane wrote:
> > El-Lotso <el.lotso@gmail.com> writes:
> > > sorry.. I sent this as I was about to go to bed and the explain analyse
> > > of the query w/ 4 tables joined per subquery came out.
> >
> > It's those factor-of-1000 misestimates of the join sizes that are
> > killing you, eg this one:
> >
> > >               ->  Hash Join  (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1)
> > >                     Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp =
test_db.trd.start_timestamp)AND (test_db.ts.ttype = test_db.trd.ttype)) 
> > >                     ->  Seq Scan on ts  (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916
rows=3244loops=1) 
> > >                     ->  Hash  (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016
loops=1)
> >
> > The single-row-result estimate persuades it to use a nestloop at the
> > next level up, and then when the output is actually 969 rows, that
> > means 969 executions of the other side of the upper join.
>
> Yep.. that's consistent with the larger results output. more rows = more
> loops


I'm on the verge of giving up... the schema seems simple and yet there's
so much issues with it. Perhaps it's the layout of the data, I don't
know. But based on the ordering/normalisation of the data and the one to
many relationship of some tables, this is giving the planner a headache
(and me a bulge on the head from knockin it against the wall)

I've tried multiple variations, subqueries, not use subqueries, not join
the table, (but to include it as a subquery - which gets re-written to a
join anyway) exists/not exists to no avail.

PG is fast, yes even w/ all the nested loops for up to 48K of results,
(within 4 minutes) but as soon as I put it into a inner join/left
join/multiple temporary(memory) tables it will choke.

select
a.a,b.b,c.c from
(select
x,y,z
from zz)a
inner join b
on a.a = b.a
left join (select
x,a,z
from xx)
then it will choke.

I'm really at my wits end here.


Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
Nis Jørgensen
Дата:
El-Lotso skrev:

> I'm on the verge of giving up... the schema seems simple and yet there's
> so much issues with it. Perhaps it's the layout of the data, I don't
> know. But based on the ordering/normalisation of the data and the one to
> many relationship of some tables, this is giving the planner a headache
> (and me a bulge on the head from knockin it against the wall)

I think you should look more at the db design, and less on rewriting the
query. Here are some observations:

- Your table structure is quite hard to understand (possibly because you
have changed the names) - if you want help on improving it, you will
need to explain the data to us, and possibly post some sample data.
- You seem to be lacking constraints on the tables. My guess is that
(id,ttype,start_timestamp) is unique in both trh and ts - but I cannot
tell (and neither can the query planner). Foreign key constraints might
help as well. These would also help others to understand your data, and
suggest reformulations of your queries.
- Another guess is that the ttype sets (177,197,705,742,758,766),
(69,178,198,704,757,741,765) are actually indicating some other property
a common "type" of record, and that only one of each will be present for
an id,start_timestamp combination. This may be related to the repeating
fields issue - if a certain ttype indicates that we are interested in a
certain pber_x field (and possibly that the others are empty).
- You have what looks like repeating fields - pber_x, fval_x, index_x -
in your tables. Fixing this might not improve your query, but might be a
good idea for other reasons.
- seq_date and seq_time seems like they may be redundant - are they
different casts of the same data?

All speculation. Hope it helps

Nis

Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
Tom Lane
Дата:
El-Lotso <el.lotso@gmail.com> writes:
> I'm really at my wits end here.

Try to merge the multiple join keys into one, somehow.  I'm not sure why
the planner is overestimating the selectivity of the combined join
conditions, but that's basically where your problem is coming from.

A truly brute-force solution would be "set enable_nestloop = off"
but this is likely to screw performance for other queries.

            regards, tom lane

Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
El-Lotso
Дата:
On Wed, 2007-09-12 at 10:41 -0400, Tom Lane wrote:
> El-Lotso <el.lotso@gmail.com> writes:
> > I'm really at my wits end here.
>
> Try to merge the multiple join keys into one, somehow.  I'm not sure why
> the planner is overestimating the selectivity of the combined join
> conditions, but that's basically where your problem is coming from.

I've tried merging them together.. what previously was

INNER JOIN  TS
ON TS.ID = TRH.ID AND
TS.TTYPE = TRH.TTYPE AND
TS.START_TIMESTAMP = TRH.START_TIMESTAMP

has become
inner join TS
on ts.id_ttype_startstamp = trh.id_ttype_startstamp

where id_ttype_startstamp = (id || '-'||ttype || '-' || start_timestamp)

It's working somewhat better but everything is not as rosy as it should
as the planner is still over/under estimating the # of rows.

FROM org :
Nested Loop Left Join  (cost=10612.48..24857.20 rows=1 width=61) (actual
time=1177.626..462856.007 rows=750 loops=1)

TO merge joined conditions :
Hash Join  (cost=41823.94..45889.49 rows=6101 width=61) (actual
time=3019.609..3037.692 rows=750 loops=1)
  Hash Cond: (trd.trd_join_key = ts.ts_join_key)

Merged Join using the Main table : 3 - 5 million rows
Hash Left Join  (cost=80846.38..121112.36 rows=25 width=244) (actual
time=5088.437..5457.269 rows=750 loops=1)

Note that it still doesn't really help that much, the estimated rows is
still way off the actual number of rows. On one of the querys there the
hid field has a subset of 8 values, it's even worst. And it seems like
the merge condition doesn't help at all.


I'm still trying to merge more join conditions to see if it helps.




> A truly brute-force solution would be "set enable_nestloop = off"
> but this is likely to screw performance for other queries.

I've also tried this... It's not helping much actually.
As mentioned previously, this is a one to many relationship and because
of that, somehow PG just doesn't take it into account.

I'm still not having much luck here. (playing with a subset of the main
table's data _does_ show some promise, but when querying on the main
table w/ 3 million data, everything grinds to a halt)





Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

От
Ow Mun Heng
Дата:
On Wed, 2007-09-12 at 15:14 +0200, Nis Jørgensen wrote:
> El-Lotso skrev:
>
> > I'm on the verge of giving up... the schema seems simple and yet there's
> > so much issues with it. Perhaps it's the layout of the data, I don't
> > know. But based on the ordering/normalisation of the data and the one to
> > many relationship of some tables, this is giving the planner a headache
> > (and me a bulge on the head from knockin it against the wall)
>
> I think you should look more at the db design, and less on rewriting the
> query. Here are some observations:

I can't help much with the design per-se. So..

>
> - Your table structure is quite hard to understand (possibly because you
> have changed the names) - if you want help on improving it, you will
> need to explain the data to us, and possibly post some sample data.

If anyone is willing, I can send some sample data to you off-list.

on the trh table, hid is a subset of data for a particular id.

eg:
PARENT : CHILD 1
PARENT : CHILD 2
PARENT : CHILD 3
PARENT : CHILD 4

uniqueid = merged fields from id / index1 / index2 / start_timestamp(IN EPOCH)
/ phase_id / ttype which is unique on each table  (but not across ALL the tables)


> - You seem to be lacking constraints on the tables. My guess is that
> (id,ttype,start_timestamp) is unique in both trh and ts - but I cannot
> tell (and neither can the query planner). Foreign key constraints might
> help as well. These would also help others to understand your data, and
> suggest reformulations of your queries.

AFAICT, there are no foreign constraints in the original DB design. (and
I'm not even sure how to begin the FK design based on this org design)

the unique_id is as above.
TRH/TRD uniqueid = merged fields from id / index1 / index2 /
start_timestamp(IN EPOCH) / phase_id / ttype

TS uniqueid = merged fields from id / start_timestamp(IN EPOCH) / ttype

Problem with this is that the fields in which they are unique is
different across the different tables, so the unique_id is only unique
for that table alone and acts as a primary key so that no dupes exists
in that one table.



> - Another guess is that the ttype sets (177,197,705,742,758,766),
> (69,178,198,704,757,741,765) are actually indicating some other property
> a common "type" of record, and that only one of each will be present for
> an id,start_timestamp combination. This may be related to the repeatingd
> fields issue - if a certain ttype indicates that we are interested in a
> certain pber_x field (and possibly that the others are empty).

yes..

eg:
id     | hid |ttype | start_timestamp  | pber_2 | pber 3 |pber_4
PARENT | 0   |764   | 2007-07-01 00:00 | 4000   | null   | null
PARENT | 0   |765   | 2007-07-01 00:00 | null   | 9000   | null
PARENT | 0   |766   | 2007-07-01 00:00 | null   | null   | 7999
PARENT | 1   |764   | 2007-07-01 00:00 | 4550   | null   | null
PARENT | 1   |765   | 2007-07-01 00:00 | null   | 9220   | null
PARENT | 1   |766   | 2007-07-01 00:00 | null   | null   | 6669


the subqueries are just to take out the fields with the value and leave
the nulls so that we end-up with

id     |hid| start_timestamp  |pber_2 | pber 3 | pber_4
PARENT | 0 | 2007-07-01 00:00 |  4000 |   9000 | 7999
PARENT | 1 | 2007-07-01 00:00 |  4550 |   9220 | 6669

which is basically just joining a table by itself, but there is a caveat
whereby pber_3 and pber_4 is/can only be joined together based on the
seq_date/seq_time in the ts table hence the query..

JOIN1.id = join2.id
and join1.seq_date = join2.seq_date
etc..

but the problem is confounded by the fact that there is numerous hid
values for head id

> - You have what looks like repeating fields - pber_x, fval_x, index_x -
> in your tables. Fixing this might not improve your query, but might be a
> good idea for other reasons.

it's being looked at by some other team to collapse this to something
like this

ttype | pber
764   | 500
765   | 600
766   | 700

so that there are lesser # of columns and no null fields. But the query
will remain the same

> - seq_date and seq_time seems like they may be redundant - are they
> different casts of the same data?

No. They're used to join together the pber_2/3/4 fields as one may
happen between a few hours to days between each other, but each will be
uniquely identified by the seq_date/time

eg :

id     | pber_2 | seq_date            | seq time
PARENT | 400    | 2007-07-01 00:00:00 | 1980-01-01 20:00:00
PARENT | 410    | 2007-07-10 00:00:00 | 1980-01-01 22:00:00

id     | pber_3 | seq_date            | seq time
PARENT | 900    | 2007-07-01 00:00:00 | 1980-01-01 20:00:00
PARENT | 100    | 2007-07-10 00:00:00 | 1980-01-01 22:00:00

id     | pber_4 | seq_date            | seq time
PARENT | 10000  | 2007-07-01 00:00:00 | 1980-01-01 20:00:00
PARENT | 999    | 2007-07-10 00:00:00 | 1980-01-01 22:00:00


so, the correct value for the fields when joined together will be of the
form

id     |start_timestamp  |seq_date            | seq_time           |pber_2 | pber 3 | pber_4
PARENT |2007-07-01 00:00 |2007-07-01 00:00:00 | 1980-01-01 20:00:00| 400   | 900    | 10000
PARENT |2007-07-01 00:00 |2007-07-10 00:00:00 | 1980-01-01 22:00:00| 410   | 100    | 999

(repeating for each hid subset value)



> All speculation. Hope it helps


anything would help.. I'm more or less willing to try anything to make
things faster else this project is going to the toilet.