Re: [PERFORM] Slow query with 3 table joins

Поиск
Список
Период
Сортировка
After about 40 inutes the slow query finally finished and the result of the EXPLAIN plan can be found here:


Thanks,
Alessandro Ferrucci

On Tue, Apr 25, 2017 at 11:10 PM, Alessandro Ferrucci <alessandroferrucci@gmail.com> wrote:
Hello - I am migrating a current system to PostgreSQL and I am having an issue with a relatively straightforward query being extremely slow.

The following are the definitions of the tables:

CREATE TABLE popt_2017.unit
(
  id serial NOT NULL,
  unit_id text,
  batch_id text,
  create_date timestamp without time zone DEFAULT now(),
  update_date timestamp without time zone,
  CONSTRAINT unit_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

CREATE TABLE popt_2017.field
(
  id serial NOT NULL,
  unit_id integer,
  subunit_data_id integer,
  field_name character varying(50),
  page_id character varying(20),
  page_type character varying(20),
  batch_id character varying(20),
  file_name character varying(20),
  data_concept integer,
  "GROUP" integer,
  omr_group integer,
  pres integer,
  reg_data text,
  ocr_conf text,
  ocr_dict text,
  ocr_phon text,
  create_date timestamp without time zone DEFAULT now(),
  update_date timestamp without time zone,
  CONSTRAINT field_pkey PRIMARY KEY (id),
  CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id)
      REFERENCES popt_2017.subunit (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id)
      REFERENCES popt_2017.unit (id) MATCH FULL
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id)
      REFERENCES popt_2017.unit (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

CREATE TABLE popt_2017.answer
(
  id serial NOT NULL,
  field_id integer,
  ans_status integer,
  ans text,
  luggage text,
  arec text,
  kfi_partition integer,
  final boolean,
  length integer,
  create_date timestamp without time zone DEFAULT now(),
  update_date timestamp without time zone,
  CONSTRAINT answer_pkey PRIMARY KEY (id),
  CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id)
      REFERENCES popt_2017.field (id) MATCH FULL
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id)
      REFERENCES popt_2017.field (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

Below are the index definitions for those tables:

UNIT:
CREATE UNIQUE INDEX unit_pkey ON unit USING btree (id);
CREATE INDEX unit_unit_id_idx ON unit USING btree (unit_id);

FIELD:
CREATE UNIQUE INDEX field_pkey ON field USING btree (id)
CREATE INDEX field_unit_id_idx ON field USING btree (unit_id)
CREATE INDEX field_subunit_id_idx ON field USING btree (subunit_data_id)
CREATE INDEX field_field_name_idx ON field USING btree (field_name)

ANSWER:
CREATE UNIQUE INDEX answer_pkey ON answer USING btree (id)
CREATE INDEX answer_field_id_idx ON answer USING btree (field_id)
CREATE INDEX answer_ans_idx ON answer USING btree (ans)

The tables each have the following number of rows:

UNIT: 10,315
FIELD: 139,397,965
ANSWER: 3,463,300

The query in question is:

SELECT
   UNIT.ID AS UNIT_ID,
   UNIT.UNIT_ID AS UNIT_UNIT_ID,
   UNIT.BATCH_ID AS UNIT_BATCH_ID,
   UNIT.CREATE_DATE AS UNIT_CREATE_DATE,
   UNIT.UPDATE_DATE AS UNIT_UPDATE_DATE
FROM
   UNIT, FIELD, ANSWER
WHERE
   UNIT.ID=FIELD.UNIT_ID AND
   FIELD.ID=ANSWER.FIELD_ID AND
   FIELD.FIELD_NAME='SHEETS_PRESENT' AND
   ANSWER.ANS='2';

I attempted to run an EXPLAIN (ANALYZE,BUFFERS) and the query has been running for 32 minutes now, So I won't be able to post the results (as I've never been able to get the query to actually finish.

But, if I remove the join to UNIT (and just join FIELD and ANSWER) the resulting query is sufficiently fast, (the first time it ran in roughly 3 seconds), the query as such is:

SELECT * FROM
   ANSWER, FIELD
WHERE
   FIELD.ID=ANSWER.FIELD_ID AND
   FIELD.FIELD_NAME='SHEETS_PRESENT' AND
   ANSWER.ANS='2';

The EXPLAIN ( ANALYZE, BUFFERS ) output of that query can be found here https://explain.depesz.com/s/ueJq

These tables are static for now, so they do not get DELETEs or INSERTS at all and I have run VACUUM ANALYZE on all the affected tables.

I'm running PostgreSQL PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit

I'm running this on RHEL 6.9

On a server with 32 GB of ram, 2 CPUs.

The following are the changes to postgresql.conf that I have made:

shared_buffers = 7871MB
effective_cache_size = 23611MB
work_mem = 1000MB
maintenance_work_mem = 2048MB

I have not changed the autovacuum settings, but since the tables are static for now and I've already ran VACUUM that should not have any effect.

Any assistance that could be provided is greatly appreciated.

Thank you,
Alessandro Ferrucci









--
Signed,
Alessandro Ferrucci

В списке pgsql-performance по дате отправления:

Предыдущее
От: Alessandro Ferrucci
Дата:
Сообщение: Slow query with 3 table joins
Следующее
От: David Rowley
Дата:
Сообщение: Re: [PERFORM] Slow query with 3 table joins