Optimizing queries that use multiple tables and many order by columns

Поиск
Список
Период
Сортировка
От Joshua Berry
Тема Optimizing queries that use multiple tables and many order by columns
Дата
Msg-id AANLkTimkoNEDn7bu+4UtwmegdeNibR7vesKRpx5nz=hQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Optimizing queries that use multiple tables and many order by columns  ("Wappler, Robert" <rwappler@ophardt.com>)
Список pgsql-general
Hi Group,

I've never really learned how to optimize queries that join several tables and have order by clauses that specify columns from each table. Is there documentation that could help me optimize and have the proper indexes in place? I've read through the PG Docs Chapter 11 on Indexes yet still lack the needed understanding.

Here's my latest culprit:

select Anl.Priority, Anl.Lab, Anl.Job, JOB.DateIn, JOB.CompanyCode, Anl.SampleName
from analysis anl join job on anl.job = job.job
order by job.companycode, anl.job, anl.lab
limit 10;

Here's the query plan using PG 8.4.4:
Limit  (cost=21990.24..21990.27 rows=10 width=32)
  ->  Sort  (cost=21990.24..22437.69 rows=178979 width=32)
        Sort Key: job.companycode, anl.lab
        ->  Hash Join  (cost=451.20..18122.57 rows=178979 width=32)
              Hash Cond: (anl.job = job.job)
              ->  Seq Scan on analysis anl  (cost=0.00..14091.79 rows=178979 width=23)
              ->  Hash  (cost=287.20..287.20 rows=13120 width=17)
                    ->  Seq Scan on job  (cost=0.00..287.20 rows=13120 width=17)


If I change the above query to only order by one of the tables, I get better results:
select Anl.Priority, Anl.Lab, Anl.Job, JOB.DateIn, JOB.CompanyCode, Anl.SampleName
from analysis anl join job on anl.job = job.job
order by job.companycode --, anl.job, anl.lab
limit 10;
Limit  (cost=0.00..3.65 rows=10 width=32)
  ->  Nested Loop  (cost=0.00..65269.13 rows=178979 width=32)
        ->  Index Scan using job_companycode on job  (cost=0.00..972.67 rows=13120 width=17)
        ->  Index Scan using analysis_job_lab on analysis anl  (cost=0.00..4.63 rows=22 width=23)
              Index Cond: (anl.job = job.job)


Any idea on how I can improve this? In the past I would tend to create a cached copy of the query as a table that would be utilized, but I suspect that there is a better way to go about this. I'm using a system (Clarion) which heavily uses cursors via the ODBC driver (I use the psqlODBC latest version) to get a handful of records at a time, so no actual LIMITs would be used in the production queries; I've added the LIMITs here to try to simulate the performance differences that I find when browsing the data while ordering by the above columns.


Here are the relevant tables and indexes:


CREATE TABLE job
(
  job bigint NOT NULL, -- Job #
  companycode character(4), -- Company Code
  recdbycode character(3), -- Initials of who checked in sample(s)
  datein date, -- Date sample was received
  project character varying, -- Project or Site name
  remarks text, -- Remarks
  --[CONSTRAINTs etc]
)

CREATE INDEX job_companycode
  ON job
  USING btree
  (companycode);

CREATE INDEX job_companycode_job
  ON samples.job
  USING btree
  (companycode, job);


CREATE TABLE analysis
(
  lab bigint NOT NULL, -- Lab number
  job bigint, -- Job number
  sampletype character varying(5), -- General class of sample
  priority character(1), -- Priority level
  samplename character varying, -- Sample or Well name
  CONSTRAINT rel_joblabk_to_jobk FOREIGN KEY (job)
      REFERENCES job (job) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE RESTRICT,
  --[CONSTRAINTs etc]
)

CREATE INDEX analysis_companycode_job_lab
  ON analysis
  USING btree
  (companycode, job, lab);


CREATE INDEX analysis_job_lab
  ON analysis
  USING btree
  (job, lab);


Thanks for any insights and tips you can provide!

Kind Regards,
-Joshua Berry

В списке pgsql-general по дате отправления:

Предыдущее
От: "Wappler, Robert"
Дата:
Сообщение: Re: Feature proposal
Следующее
От: Łukasz Bieniek
Дата:
Сообщение: recovery fail