Query performance issue

От: Jonathan Gray
Тема: Query performance issue
Дата: ,
Msg-id: 0f1701c7cdc6$f9cc1a30$ed644e90$@com
(см: обсуждение, исходный текст)
Ответы: Re: Query performance issue  (Chris)
Список: pgsql-performance

Скрыть дерево обсуждения

Query performance issue  ("Jonathan Gray", )
 Re: Query performance issue  (Chris, )
  Re: Query performance issue  (Chris, )
   Re: Query performance issue  ("Jonathan Gray", )
    Re: Query performance issue  (Chris, )
     Re: Query performance issue  ("Jonathan Gray", )

We’re experiencing a query performance problem related to the planner and its ability to perform a specific type of merge.

 

We have created a test case (as attached, or here: http://www3.streamy.com/postgres/indextest.sql) which involves a hypothetical customer ordering system, with customers, orders, and customer groups.

 

If we want to retrieve a single customers 10 most recent orders, sorted by date, we can use a double index on (customer,date); Postgres’s query planner will use the double index with  a backwards index scan on the second indexed column (date).

 

However, if we want to retrieve a “customer class’s” 10 most recent orders, sorted by date, we are not able to get Postgres to use double indexes.

 

We have come to the conclusion that the fastest way to accomplish this type of query is to merge, in sorted order, each customers set of orders (for which we can use the double index).  Using a heap to merge these ordered lists (until we reach the limit) seems the most algorithmically efficient way we are able to find.  This is implemented in the attachment as a pl/pythonu function.

 

Another less algorithmically efficient solution, but faster in practice for many cases, is to fetch the full limit of orders from each customer, sort these by date, and return up to the limit.

 

We are no masters of reading query plans, but for straight SQL queries the planner seems to yield two different types of plan.  They are fast in certain cases but breakdown in our typical use cases, where the number of orders per customer is sparse compared to the total number of orders across the date range.

 

We are interested in whether a mechanism internal to Postgres can accomplish this type of merging of indexed columns in sorted order.

 

If this cannot currently be accomplished (or if there is something we are missing about why it shouldn’t be) we would appreciate any pointers to be able to translate our python heap approach into C functions integrated more closely with Postgres.  The python function incurs large constant costs because of type conversions and repeated queries to the database.

 

Thanks for any help or direction.

 

Jonathan Gray / Miguel Simon

Вложения

В списке pgsql-performance по дате сообщения:

От: "Jonathan Gray"
Дата:
Сообщение: Re: Query performance issue
От: "Marc Mamin"
Дата:
Сообщение: Re: Postgres configuration for 64 CPUs, 128 GB RAM...