Oddball data distribution giving me planner headaches

Поиск
Список
Период
Сортировка
От Jeff Amiel
Тема Oddball data distribution giving me planner headaches
Дата
Msg-id 1322857179.53033.YahooMailClassic@web65513.mail.ac4.yahoo.com
обсуждение исходный текст
Ответы Re: Oddball data distribution giving me planner headaches
Re: Oddball data distribution giving me planner headaches
Список pgsql-general
Oddball data distribution giving me headaches.

We have a distinct 'customer' table with customer_id, type and name/demographic information.
Assume some 1 million rows in the customer table.

We then have a customer 'relationship' table which simply contains 2 columns…designating parent and child
relationships…butallowing complex hierarchies between customers. 


CREATE TABLE customer_rel
(
  parent_customer integer NOT NULL,
  child_customer integer NOT NULL,
 )

8 million rows in this table.  Oddball distribution.  We have some 8 levels of hierarchy (customer type) represented
withthis table.  Every customer gets an entry where parent/child is themselves…and then for every 'upline'.  At the
highestlevel, we have 'distributors' which have all other customer types underneath them.  Assuming we had some 68
distributors,the entries where THEY are the parent_customer represent nearly a million rows of the 8 million. 

I have extracted a simple case from a larger query that was generating an off-beat plan because of the unexpected
plannerrow-counts being spewed by a low level query. 

explain analyze
select * from customer_rel where parent_customer in (select customer_id from customer where
customer_type='DISTRIBUTOR')


"  Nested Loop  (cost=25429.44..29626.39 rows=931 width=0) (actual time=216.325..1238.091 rows=1025401 loops=1)"
"        ->  HashAggregate  (cost=25429.44..25430.80 rows=136 width=4) (actual time=216.304..216.339 rows=68 loops=1)"
"              ->  Seq Scan on customer  (cost=0.00..25429.10 rows=136 width=4) (actual time=0.018..216.226 rows=68
loops=1)"
"                    Filter: (customer_type = 'DISTRIBUTOR'::bpchar)"
"        ->  Index Scan using rel_parent on customer_rel  (cost=0.00..30.76 rows=7 width=4) (actual time=0.006..8.190
rows=15079loops=68)" 
"              Index Cond: (parent_customer = customer.customer_id)"
"Total runtime: 1514.810 ms"

The fact that the top level nested loop THINKS it only will be returning 931 rows (instead of over 1 million) is the
killerhere…3 orders of magnitude.  The results of this  query are used as part of a bigger query and the screwed up
statsare causing all sorts of havoc upline. 

I'm experimenting in 9.1.0…have set the statistics to 1000 (and 10000) on both columns (parent and child) to little
effect. Have hardcoded the n_distinct on the parent_customer column to be 1,000,000 also…with no effect (doing analyze
oftable after each change) 

Does this oddball data distribution doom me to poor planning forever?
Any other thoughts?


В списке pgsql-general по дате отправления:

Предыдущее
От: "Gauthier, Dave"
Дата:
Сообщение: \dT+ does not give elements ?
Следующее
От: "David Johnston"
Дата:
Сообщение: Re: Oddball data distribution giving me planner headaches