Re: Speeding up a query.

От: Hartman, Matthew
Тема: Re: Speeding up a query.
Дата: ,
Msg-id: 366642367C5B354197A1E0D27BC175BD0225971A@KGHMAIL.KGH.ON.CA
(см: обсуждение, исходный текст)
Ответ на: Re: Speeding up a query.  (Merlin Moncure)
Список: pgsql-performance

Скрыть дерево обсуждения

performance with query  (Alberto Dalmaso, )
 Re: performance with query  (Joshua Tolley, )
 Re: performance with query  ("Kevin Grittner", )
 Re: performance with query  (Alberto Dalmaso, )
  Re: performance with query  (Matthew Wakeling, )
   Re: performance with query  (Alberto Dalmaso, )
    Re: performance with query  (Tom Lane, )
     Re: performance with query  (Alberto Dalmaso, )
      Re: performance with query  (Tom Lane, )
      Re: performance with query (OT)  ("Albe Laurenz", )
    Re: performance with query  ("Kevin Grittner", )
  Re: performance with query  ("Kevin Grittner", )
   Re: performance with query  (Alberto Dalmaso, )
    Re: performance with query  ("Kevin Grittner", )
    Re: performance with query  ("Kevin Grittner", )
     Re: performance with query  (Alberto Dalmaso, )
      Speeding up a query.  ("Hartman, Matthew", )
       Re: Speeding up a query.  (Anthony Presley, )
        Re: Speeding up a query.  ("Hartman, Matthew", )
       Re: Speeding up a query.  ("Albe Laurenz", )
        Re: Speeding up a query.  (Grzegorz Jaśkiewicz, )
       Re: Speeding up a query.  (Alberto Dalmaso, )
        Re: Speeding up a query.  ("Kevin Grittner", )
        Re: Speeding up a query.  (Tom Lane, )
       Re: Speeding up a query.  (Merlin Moncure, )
        Re: Speeding up a query.  ("Hartman, Matthew", )
       Re: Speeding up a query.  ("Hartman, Matthew", )
       Re: Speeding up a query.  (Simon Riggs, )
        Re: Speeding up a query.  ("Hartman, Matthew", )
 Re: performance with query  (Alberto Dalmaso, )
 Re: performance with query  (Alberto Dalmaso, )
  Re: performance with query  ("Kevin Grittner", )
   Re: performance with query  (Alberto Dalmaso, )
    Re: performance with query  ("Kevin Grittner", )
     Re: performance with query  (Alberto Dalmaso, )
      Re: performance with query  ("Kevin Grittner", )
       Re: performance with query  (Tom Lane, )
        Re: performance with query  ("Kevin Grittner", )

Thanks for the replies everyone. I'll try to answer them all in this one email. I will send another email immediately
afterthis with additional details about the query. 

> - Frequently the biggest performance gains can be reached by
>   a (painful) redesign. Can ou change the table structure in a way
>   that makes this query less expensive?

I have considered redesigning the algorithm to accommodate this. As I've said, there's one row per five minute time
slot.Instead, I could represent an interval of time with a row. For example, "start_time" of "08:00" with an "end_time"
of"12:00" or perhaps an interval "duration" of "4 hours". The difficulty becomes in managing separate time requirements
(nursevs unit) for each time slot, and in inserting/updating new rows as pieces of those time slots or intervals are
usedup. Having a row per five minute interval avoids those complications so far. Still, I'd start with 32 rows and
increasethe number, never reaching 3,280.. :) 

> - You have an index on matrix.xxxxx, right?

I have tried indexes on each common join criteria. Usually it's "time,unit", "time,nurse", or "time,unit_scheduled",
"time,nurse_scheduled"(the later two being Booleans). In the first two cases it's made a difference of less than a
second.In the last two, the time actually increases if I add "analyze" statements in after updates are made. 

> - Can you reduce the row count of the two subqueries by adding
>   additional conditions that weed out rows that can be excluded
>   right away?

I use some additional conditions. I'll paste the meat of the query below.

> - Maybe you can gain a little by changing the "select *" to
>   "select id" in both subqueries and adding an additional join
>   with matrix that adds the relevant columns in the end.
>   I don't know the executor, so I don't know if that will help,
>   but it would be a simple thing to test in an experiment.

I wrote the "select *" as simplified, but really, it returns the primary key for that row.

> how far in advance do you schedule?  As far as necessary?

It's done on a per day basis, each day taking 8-12 seconds or so on my workstation. We typically schedule patients as
muchas three to six months in advance. The query already pulls data to a temporary table to avoid having to manage a
massivenumber of rows. 

> How many chairs are there?  How many nurses are there?   This is a
> tricky (read: interesting) problem.

In my current template there are 17 chairs and 7 nurses. Chairs are grouped into pods of 2-4 chairs. Nurses cover one
tomany pods, allowing for a primary nurse per pod as well as floater nurses that cover multiple pods. 




Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549-6666 x4294


-----Original Message-----
From: Merlin Moncure [mailto:]
Sent: Wednesday, June 17, 2009 9:09 AM
To: Hartman, Matthew
Cc: 
Subject: Re: [PERFORM] Speeding up a query.

On Tue, Jun 16, 2009 at 2:35 PM, Hartman,
Matthew<> wrote:
> Good afternoon.
>
> I have developed an application to efficiently schedule chemotherapy
> patients at our hospital. The application takes into account several
> resource constraints (available chairs, available nurses, nurse coverage
> assignment to chairs) as well as the chair time and nursing time
> required for a regimen.
>
> The algorithm for packing appointments in respects each constraint and
> typically schedules a day of treatments (30-60) within 9-10 seconds on
> my workstation, down from 27 seconds initially. I would like to get it
> below 5 seconds if possible.
>
> I think what's slowing is down is simply the number of rows and joins.
> The algorithm creates a scheduling matrix with one row per 5 minute
> timeslot, per unit, per nurse assigned to the unit. That translates to
> 3,280 rows for the days I have designed in development (each day can
> change).
>
> To determine the available slots, the algorithm finds the earliest slot
> that has an available chair and a count of the required concurrent
> intervals afterwards. So a 60 minute regimen requires 12 concurrent
> rows. This is accomplished by joining the table on itself. A second
> query is ran for the same range, but with respect to the nurse time and
> an available nurse. Finally, those two are joined against each other.
> Effectively, it is:
>
> Select *
> From   (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) chair,
>        (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) nurse
> Where chair.id = nurse.id
>
> With matrix having 3,280 rows. Ugh.
>
> I have tried various indexes and clustering approachs with little
> success. Any ideas?

how far in advance do you schedule?  As far as necessary?

How many chairs are there?  How many nurses are there?   This is a
tricky (read: interesting) problem.

merlin



В списке pgsql-performance по дате сообщения:

От: Brian Cox
Дата:
Сообщение: very slow selects on a small table
От: Bryce Ewing
Дата:
Сообщение: Re: Index Scan taking long time