Обсуждение: Query optimizing - paradox behave
Hallo !
I want to tune a database. There a many redundant datas in the database
, because of all the relations were consider as n:m relations. But the
most of them are 1:n Relations. So my approach was to cut the
redundancies to get more performance. But .. happens!
The query with the 3 tables is faster than the query with 2 tables.
That is paradox to the Explain output.
And: the real database functions like dbPG95GetIndex and all functions
defined by me are slower.
The whole program is slower than before.
I disabled all the index.(since with index the behavior is the same) The
database pacs ist only restructured. They have the same data. With
database pacs and compare a vacuum was made.
I looked at the user time , since system time is faked because my
testprogram hands over the control to the postmaster and the postmaster
is doing his own work. So I made a lot of tests to get a average
usertime. So escapes will be catched and eliminated.
Here are the tabledescriptions for the original database "compare":
tables i.e.
There is a n:m relationship between patient and study realized with
relpatient_study000 relationtable.
Table "patient"
Attribute | Type | Modifier
----------------------+------------------------+----------
chilioid | character varying(80) |
name | text |
id | character varying(256) |
birthdate | date |
birthtime | time |
sex | character(1) |
medicalrecordlocator | character varying(128) |
Table "study"
Attribute | Type | Modifier
------------------------+------------------------+----------
chilioid | character varying(80) |
instanceuid | character varying(64) |
id | character varying(64) |
studydate | date |
studytime | time |
modality | character varying(2) |
manufacturer | character varying(128) |
referingphysician | text |
description | character varying(128) |
manufacturersmodelname | character varying(128) |
importtime | double precision |
chilisenderid | character varying(80) |
accessionnumber | character varying(64) |
institutionname | character varying(128) |
workflowstate | character varying(8) |
flags | character varying(8) |
performingphysician | character varying(128) |
reportingphysician | character varying(128) |
Table "relpatient_study000"
Attribute | Type | Modifier
-----------+-----------------------+----------
chilioid | character varying(80) |
parentoid | character varying(80) |
childoid | character varying(80) |
parentoid is here the oid of the patient and childoid is here the oid
of the study.
Thats the query with the original database "compare":
time psql -d compare -c "SELECT patient.*,study.* FROM
patient,study,relpatient_study000 r0 WHERE
(patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by
patient.name using <" > 3tableni
1.671u 0.130s 0:11.14 16.4% 0+0k 0+0io 208pf+0w
psql -d compare -c "EXPLAIN SELECT patient.*,study.* FROM
patient,study,relpatient_study000 r0 WHERE
(patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by
patient.name using <" > 3tableni
NOTICE: QUERY PLAN:
Sort (cost=1135170635.79..1135170635.79 rows=748802386 width=296)
-> Merge Join (cost=1025510.64..1057837.48 rows=748802386 width=296)
-> Sort (cost=1017989.22..1017989.22 rows=2556861 width=96)
-> Merge Join (cost=4287.84..4763.21 rows=2556861
width=96)
-> Sort (cost=990.43..990.43 rows=8725 width=72)
-> Seq Scan on patient (cost=0.00..212.25
rows=8725 width=72)
-> Sort (cost=3297.40..3297.40 rows=29305
width=24)
-> Seq Scan on relpatient_study000 r0
(cost=0.00..774.05 rows=29305 width=24)
-> Sort (cost=7521.42..7521.42 rows=29286 width=200)
-> Seq Scan on study (cost=0.00..1116.86 rows=29286
width=200)
-----------------------------
Thats the query with the new restructured database "pacs":
time psql -d pacs -c "SELECT patient.*,study.* FROM patient,study WHERE
(patient.chiliOID=study.patientOID ) order by patient.name using <" >
2tableni
1.730u 0.120s 0:09.44 19.5% 0+0k 0+0io 208pf+0w
psql -d pacs -c "EXPLAIN SELECT patient.*,study.* FROM patient,study
WHERE (patient.chiliOID=study.patientOID ) order by patient.name using
<" > 2tableni NOTICE: QUERY PLAN:
Sort (cost=2194791.19..2194791.19 rows=2555204 width=284)
-> Merge Join (cost=8978.44..9453.57 rows=2555204 width=284)
-> Sort (cost=990.43..990.43 rows=8725 width=72)
-> Seq Scan on patient (cost=0.00..212.25 rows=8725
width=72)
-> Sort (cost=7988.00..7988.00 rows=29286 width=212)
-> Seq Scan on study (cost=0.00..1236.86 rows=29286
width=212)
Restructured tables i.e.
PAtient-study relationship is 1:n realized with column patientoid in
table study.
Table "patient"
Attribute | Type | Modifier
----------------------+------------------------+----------
chilioid | character varying(80) |
name | text |
id | character varying(256) |
birthdate | date |
birthtime | time |
sex | character(1) |
medicalrecordlocator | character varying(128) |
Table "study"
Attribute | Type | Modifier
------------------------+------------------------+----------
chilioid | character varying(80) |
instanceuid | character varying(64) |
id | character varying(64) |
studydate | date |
studytime | time |
modality | character varying(2) |
manufacturer | character varying(128) |
referingphysician | text |
description | character varying(128) |
manufacturersmodelname | character varying(128) |
importtime | double precision |
chilisenderid | character varying(80) |
accessionnumber | character varying(64) |
institutionname | character varying(128) |
workflowstate | character varying(8) |
flags | character varying(8) |
performingphysician | character varying(128) |
reportingphysician | character varying(128) |
patientoid | character varying(80) |
The times of the processes are escape-eliminated by statistical methods.
I determined that the "compare" database is 8% times faster than the new
restructured "pacs" database.
How can I understand this? Whats my mistake?
Anybody who can make some sugestions on the above will
receive my enthusiastic gratitude
David M. Richter
Вложения
What version are you using? (dbPG95GetIndex?) On Thu, 19 Jul 2001, David M. Richter wrote: > Hallo ! > > I want to tune a database. There a many redundant datas in the database > , because of all the relations were consider as n:m relations. But the > most of them are 1:n Relations. So my approach was to cut the > redundancies to get more performance. But .. happens! > > The query with the 3 tables is faster than the query with 2 tables. > > That is paradox to the Explain output. > And: the real database functions like dbPG95GetIndex and all functions > defined by me are slower.
David,
You will no doubt hear later from the tuning experts on the list.
However, let me save everybody some time by verifying some basics:
1. When you restructured the database, you ran VACUUM ANALYZE on the new
database (pacs)?
2. You said that you "eliminated the indexes" because they weren't
helping performance. Is this right? It seems a little hard to figure
from here.
3. General Advice: If you're concerned about query performance, get rid
of those VARCHAR(80) primary keys and replace them with INT4 or INT8!
The math is easy to do:
If you're processing INT8 keys for 1,000,000 table rows that's
8,000,000 (roughly 8mb) data on disk and data being processed.
If you're processing VARCHAR(80) keys for 1,000,000 table rows, thats
82,000,000 bytes (82 mb) on disk and in ram to be processed.
In theory, you could get a 10-fold increase in JOIN performance by
switching to INT8 keys. In practice, its probably more like double or
triple but that ain't bad, either.
-Josh Berkus
______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh@agliodbs.com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco
Вложения
Hallo Stephan! Are there several versions of dbPG95GetIndex existing, or did you mention postgres version 7.1.2? With a little help I have killed the Problem!! Yeah, But now I trying to improve the C-code. Do You have any experience with optimizing C-Code. Are there some new miracle-like function, wich improves the speed dramatically? Do You know some database options wich I could change for better performance? Thanks a lot for Your suggestions!! David >What version are you using? (dbPG95GetIndex?)