If we want to retrieve a single customers 10 most recent orders, sorted by date, we can use a double index on (customer,date); Postgres’s query planner will use the double index with a backwards index scan on the second indexed column (date).
However, if we want to retrieve a “customer class’s” 10 most recent orders, sorted by date, we are not able to get Postgres to use double indexes.
We have come to the conclusion that the fastest way to accomplish this type of query is to merge, in sorted order, each customers set of orders (for which we can use the double index). Using a heap to merge these ordered lists (until we reach the limit) seems the most algorithmically efficient way we are able to find. This is implemented in the attachment as a pl/pythonu function.
Another less algorithmically efficient solution, but faster in practice for many cases, is to fetch the full limit of orders from each customer, sort these by date, and return up to the limit.
We are no masters of reading query plans, but for straight SQL queries the planner seems to yield two different types of plan. They are fast in certain cases but breakdown in our typical use cases, where the number of orders per customer is sparse compared to the total number of orders across the date range.
We are interested in whether a mechanism internal to Postgres can accomplish this type of merging of indexed columns in sorted order.
If this cannot currently be accomplished (or if there is something we are missing about why it shouldn’t be) we would appreciate any pointers to be able to translate our python heap approach into C functions integrated more closely with Postgres. The python function incurs large constant costs because of type conversions and repeated queries to the database.