Обсуждение: Have: Seq Scan - Want: Index Scan - what am I doing wrong?

Поиск
Список
Период
Сортировка

Have: Seq Scan - Want: Index Scan - what am I doing wrong?

От
Chris Ruprecht
Дата:
Hi guys,

PG = 9.1.5
OS = winDOS 2008R8

I have a table that currently has 207 million rows.
there is a timestamp field that contains data.
more data gets copied from another database into this database.
How do I make this do an index scan instead?
I did an "analyze audittrailclinical" to no avail.
I tested different indexes - no same behavior.

The query does this:

SELECT
audittrailclinical.pgid,
audittrailclinical.timestamp,
mmuser.logon,
audittrailclinical.entityname,
audittrailclinical.clinicalactivity,
audittrailclinical.audittraileventcode,
account.accountnumber,
patient.dnsortpersonnumber
FROM
public.account,
public.audittrailclinical,
public.encounter,
public.entity,
public.mmuser,
public.patient,
public.patientaccount
WHERE
     audittrailclinical.encountersid = encounter.encountersid
and   audittrailclinical.timestamp >= '2008-01-01'::timestamp without time zone
and   audittrailclinical.timestamp <= '2012-10-05'::timestamp without time zone
AND  encounter.practiceid = patient.practiceid
AND  encounter.patientid = patient.patientid
AND  encounter.staffid = patient.staffid
AND  entity.entitysid = audittrailclinical.entitysid
AND  mmuser.mmusersid = audittrailclinical.mmusersid
AND  patient.practiceid = patientaccount.practiceid
AND  patient.patientid = patientaccount.patientid
AND  patientaccount.accountsid = account.accountsid
AND  patientaccount.defaultaccount = 'Y'
AND  patient.dnsortpersonnumber = '347450' ;

The query plan says:

"              ->  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 rows=203856829 width=62)"
"                    Filter: (("timestamp" >= '2008-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <=
'2012-10-0500:00:00'::timestamp without time zone))" 

which takes forever.

How do I make this do an index scan instead?
I did an "analyze audittrailclinical" to no avail.

the table definitions are (the createstamp field is empty - I know, bad data):

CREATE TABLE audittrailclinical
(
 audittrailid text,
 audittraileventcode text,
 clinicalactivity text,
 eventsuccessful text,
 externalunique text,
 recordstamp timestamp without time zone,
 recorddescription text,
 encountersid integer,
 eventuserlogon text,
 computername text,
 applicationcode text,
 practiceid integer,
 mmusersid integer,
 entitysid integer,
 entityname text,
 "timestamp" timestamp without time zone,
 lastuser integer,
 createstamp timestamp without time zone,
 pgid bigint DEFAULT nextval(('"bravepoint_seq"'::text)::regclass)
)
WITH (
 OIDS=FALSE
);
ALTER TABLE audittrailclinical
 OWNER TO intergy;
GRANT ALL ON TABLE audittrailclinical TO intergy;
GRANT SELECT ON TABLE audittrailclinical TO rb;

-- Index: atc_en_time

CREATE INDEX atc_en_time
 ON audittrailclinical
 USING btree
 (entitysid , "timestamp" );

-- Index: atc_id

-- DROP INDEX atc_id;

CREATE INDEX atc_id
 ON audittrailclinical
 USING btree
 (audittrailid COLLATE pg_catalog."default" );


Re: Have: Seq Scan - Want: Index Scan - what am I doing wrong?

От
Evgeny Shishkin
Дата:
On Oct 17, 2012, at 3:52 AM, Chris Ruprecht <chris@cdrbill.com> wrote:

> Hi guys,
>
> PG = 9.1.5
> OS = winDOS 2008R8
>
> I have a table that currently has 207 million rows.
> there is a timestamp field that contains data.
> more data gets copied from another database into this database.
> How do I make this do an index scan instead?
> I did an "analyze audittrailclinical" to no avail.
> I tested different indexes - no same behavior.
>
> The query does this:
>
> SELECT
> audittrailclinical.pgid,
> audittrailclinical.timestamp,
> mmuser.logon,
> audittrailclinical.entityname,
> audittrailclinical.clinicalactivity,
> audittrailclinical.audittraileventcode,
> account.accountnumber,
> patient.dnsortpersonnumber
> FROM
> public.account,
> public.audittrailclinical,
> public.encounter,
> public.entity,
> public.mmuser,
> public.patient,
> public.patientaccount
> WHERE
>     audittrailclinical.encountersid = encounter.encountersid
> and   audittrailclinical.timestamp >= '2008-01-01'::timestamp without time zone
> and   audittrailclinical.timestamp <= '2012-10-05'::timestamp without time zone
> AND  encounter.practiceid = patient.practiceid
> AND  encounter.patientid = patient.patientid
> AND  encounter.staffid = patient.staffid
> AND  entity.entitysid = audittrailclinical.entitysid
> AND  mmuser.mmusersid = audittrailclinical.mmusersid
> AND  patient.practiceid = patientaccount.practiceid
> AND  patient.patientid = patientaccount.patientid
> AND  patientaccount.accountsid = account.accountsid
> AND  patientaccount.defaultaccount = 'Y'
> AND  patient.dnsortpersonnumber = '347450' ;
>
> The query plan says:
>
> "              ->  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 rows=203856829 width=62)"
> "                    Filter: (("timestamp" >= '2008-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <=
'2012-10-0500:00:00'::timestamp without time zone))" 
>
> which takes forever.
>

Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan.

Do you have an index on patient.dnsortpersonnumber? Can you post a result from
select count(*) from patient where dnsortpersonnumber = '347450'; ?


> How do I make this do an index scan instead?
> I did an "analyze audittrailclinical" to no avail.
>
> the table definitions are (the createstamp field is empty - I know, bad data):
>
> CREATE TABLE audittrailclinical
> (
> audittrailid text,
> audittraileventcode text,
> clinicalactivity text,
> eventsuccessful text,
> externalunique text,
> recordstamp timestamp without time zone,
> recorddescription text,
> encountersid integer,
> eventuserlogon text,
> computername text,
> applicationcode text,
> practiceid integer,
> mmusersid integer,
> entitysid integer,
> entityname text,
> "timestamp" timestamp without time zone,
> lastuser integer,
> createstamp timestamp without time zone,
> pgid bigint DEFAULT nextval(('"bravepoint_seq"'::text)::regclass)
> )
> WITH (
> OIDS=FALSE
> );
> ALTER TABLE audittrailclinical
> OWNER TO intergy;
> GRANT ALL ON TABLE audittrailclinical TO intergy;
> GRANT SELECT ON TABLE audittrailclinical TO rb;
>
> -- Index: atc_en_time
>
> CREATE INDEX atc_en_time
> ON audittrailclinical
> USING btree
> (entitysid , "timestamp" );
>
> -- Index: atc_id
>
> -- DROP INDEX atc_id;
>
> CREATE INDEX atc_id
> ON audittrailclinical
> USING btree
> (audittrailid COLLATE pg_catalog."default" );
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



Re: Have: Seq Scan - Want: Index Scan - what am I doing wrong?

От
Chris Ruprecht
Дата:
On Oct 16, 2012, at 20:01 , Evgeny Shishkin <itparanoia@gmail.com> wrote:

> Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan.
>
> Do you have an index on patient.dnsortpersonnumber? Can you post a result from
> select count(*) from patient where dnsortpersonnumber = '347450'; ?
>

Yes, there is an index:

"Aggregate  (cost=6427.06..6427.07 rows=1 width=0)"
"  ->  Index Scan using patient_pracsortpatientnumber on patient  (cost=0.00..6427.06 rows=1 width=0)"
"        Index Cond: (dnsortpersonnumber = '347450'::text)"


In fact, all the other criteria is picked using an index. I fear that the >= and <= on the timestamp is causing the
issue.If I do a "=" of just one of them, I get an index scan. But I need to scan the entire range. I get queries like
"giveme everything that was entered into the system for this patient between these two dates". A single date wouldn't
work.

Re: Have: Seq Scan - Want: Index Scan - what am I doing wrong?

От
Bruce Momjian
Дата:
On Tue, Oct 16, 2012 at 08:19:43PM -0400, Chris Ruprecht wrote:
>
> On Oct 16, 2012, at 20:01 , Evgeny Shishkin <itparanoia@gmail.com> wrote:
>
> > Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan.
> >
> > Do you have an index on patient.dnsortpersonnumber? Can you post a result from
> > select count(*) from patient where dnsortpersonnumber = '347450'; ?
> >
>
> Yes, there is an index:
>
> "Aggregate  (cost=6427.06..6427.07 rows=1 width=0)"
> "  ->  Index Scan using patient_pracsortpatientnumber on patient  (cost=0.00..6427.06 rows=1 width=0)"
> "        Index Cond: (dnsortpersonnumber = '347450'::text)"
>
>
> In fact, all the other criteria is picked using an index. I fear that the >= and <= on the timestamp is causing the
issue.If I do a "=" of just one of them, I get an index scan. But I need to scan the entire range. I get queries like
"giveme everything that was entered into the system for this patient between these two dates". A single date wouldn't
work.

Have you read our FAQ on this matter?

    http://wiki.postgresql.org/wiki/FAQ#Why_are_my_queries_slow.3F_Why_don.27t_they_use_my_indexes.3F

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Have: Seq Scan - Want: Index Scan - what am I doing wrong?

От
Chris Ruprecht
Дата:
Thanks Bruce,

I have, and I even thought, I understood it :).

I just ran an explain analyze on another table - and ever since the query plan changed. It's now using the index as
expected.I guess, I have some more reading to do. 

On Oct 16, 2012, at 20:31 , Bruce Momjian <bruce@momjian.us> wrote:

>
> Have you read our FAQ on this matter?
>