Re: Get master-detail relationship metadata
От | Andy Colson |
---|---|
Тема | Re: Get master-detail relationship metadata |
Дата | |
Msg-id | 4D4AC922.8050904@squeakycode.net обсуждение исходный текст |
Ответ на | Get master-detail relationship metadata (Laszlo Nagy <gandalf@shopzeus.com>) |
Список | pgsql-performance |
On 2/3/2011 5:40 AM, Laszlo Nagy wrote: > > Hi All, > > I'm working on a client program that iterates over master-detail > relationships in a loop chain. > > Pseudo code: > > for row_1 in table_1: > table_2 = get_details(row_1,"table2") > for row_2 in table_2: > row_3 = get_details(row_2,"table3") > .... etc. > process_data(row1,row_2,row_3,....) > > My task is to write the "get_details" iterator effectively. The obvious > way to do it is to query details in every get_details() call, but that > is not efficient. We have relationships where one master only has a few > details. For 1 million master rows, that would result in execution of > millions of SQL SELECT commands, degrading the performance by > magnitudes. My idea was that the iterator should pre-fetch and cache > data for many master records at once. The get_details() would use the > cached rows, thus reducing the number of SQL SELECT statements needed. > Actually I wrote the iterator, and it works fine in some cases. For > example: > > producers = get_rows("producer") > for producer in producers: > products = get_getails(producer,"product") > for product in products: > prices = get_details(product,"prices") > for price in prices: > process_product_price(producer,product,price) > > This works fine if one producer has not more than 1000 products and one > product has not more than 10 prices. I can easly keep 10 000 records in > memory. The actual code executes about 15 SQL queries while iterating > over 1 million rows. Compared to the original "obvious" method, > performance is increased to 1500% > > But sometimes it just doesn't work. If a producer has 1 million > products, and one product has 100 prices, then it won't work, because I > cannot keep 100 million prices in memory. My program should somehow > figure out, how much rows it will get for one master, and select between > the cached and not cached methods. > > So here is the question: is there a way to get this information from > PostgreSQL itself? I know that the query plan contains information about > this, but I'm not sure how to extract. Should I run an ANALYZE command > of some kind, and parse the result as a string? For example: > > EXPLAIN select * from product where producer_id=1008; > QUERY PLAN > ---------------------------------------------------------------------- > Seq Scan on product (cost=0.00..1018914.74 rows=4727498 width=1400) > Filter: (producer_id = 1008) > (2 rows) > > > Then I could extract "rows=4727498" to get an idea about how much detail > rows I'll get for the master. > > Is there any better way to do it? And how reliable is this? > > > Thanks, > > Laszlo > > One way would be to join the master to the detail, and write your code expecting duplicates. q = get_rows("select * from product inner join price ... order by productid, priceid"); lastprodid = '' for x in q: prodid = q.prodid if prodid <> lastprodid: # we saw the last product, prepare to move to the next product lastprodid = prodid ... etc > Is there any better way to do it? And how reliable is this? It makes the sql really easy, but the code complex... so pick your poison. -Andy
В списке pgsql-performance по дате отправления:
Предыдущее
От: Mark StosbergДата:
Сообщение: getting the most of out multi-core systems for repeated complex SELECT statements