Обсуждение: Slow performance when querying millions of rows
Hello,
I have a handful of queries that are performing very slowly. I realize that I will be hitting hardware limits at some point, but want to make sure Ive squeezed out every bit of performance I can before calling it quits.
Our database is collecting traffic data at the rate of about 3 million rows a day. The most important columns in the table being queried are timestamp, volume, occupancy, and speed. I have also denormalized the table by adding road name, direction, mile marker, and lane type values to eliminate joins with other tables that contain information about the devices that collect this information. A typical query will involve segments of roadway (i.e. road names, directions, and mile marker bounds) over a certain period of time (e.g. morning rush hour), and will have filters to exclude questionable data such (e.g. speed > 100 MPH). Unfortunately, there are also a few cases in which a user will query data for many full days on all roadways, essentially querying everything for a large period of time. One other thing to note is that we only ever query readings with lane_type = through_lanes, although we are collecting ramp and reversible lane data to facilitate future reporting needs.
Table Metadata:
- Volume typically ranges anywhere from 0 to 20, averages around 4 5. A small percentage of the rows have null volume.
- Occupancy ranges from 1 to 10, averages around 1 or 2
- Speed is about what you would expect, ranging from 30 70 with an average somewhere near the middle
- There are 17 roads
- There are 2 directions per road
- Mile marker ranges vary by roadway, typical ranges are something like 0 to 40 or 257 to 290
- Most (80 to 90% +) of the readings have lane_type = through_lanes
- Size of a daily table is about 360MB, a half month table is 5 to 6 GB
Full Table and Index Schema:
Ive experimented with partitioning using a table per day and 2 tables per month (1st through 15th, 16th to end of month). 2 tables/month was the original approach to keep the number of tables from growing too rapidly, and shows about 3x slower performance. Using daily tables incurs extra planning overhead as expected, but isnt all that bad. Im OK with taking a 1 second planning hit if my overall query time decreases significantly. Furthermore, we will only be storing raw data for about a year and can aggregate old data. This means that I can do daily tables for raw data and larger/fewer tables for older data. The table and index structure is below, which is identical between daily and ½ month tables with a couple of exceptions:
- Daily tables have a fill factor of 100, ½ month tables are default
- Only the 4 column indexes were created for the daily tables since the others never get used
CREATE TABLE vds_detector_data
(
reading_timestamp timestamp without time zone,
vds_id integer,
detector_id integer,
status smallint,
speed numeric(12,9),
volume numeric(12,9),
confidence smallint,
occupancy numeric(12,9),
loadid bigint,
road_name character varying(150),
road_dir character varying(2),
mile_marker numeric(7,2),
lane_number integer,
lane_type character varying(32),
CONSTRAINT vds_detector_vdsid_fkey FOREIGN KEY (vds_id, detector_id)
REFERENCES ref_vds_detector_properties (device_id, detector_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE RESTRICT
)
WITH (
OIDS=FALSE
);
CREATE INDEX vds_detector_data_dir_idx
ON vds_detector_data
USING btree
(road_dir);
CREATE INDEX vds_detector_data_lane_idx
ON vds_detector_data
USING btree
(lane_number);
CREATE INDEX vds_detector_data_mm_idx
ON vds_detector_data
USING btree
(mile_marker);
CREATE INDEX vds_detector_data_occupancy_idx
ON vds_detector_data
USING btree
(occupancy);
CREATE INDEX vds_detector_data_road_idx
ON vds_detector_data
USING btree
(road_name);
CREATE INDEX vds_detector_data_road_ts_mm_dir_idx
ON vds_detector_data
USING btree
(road_name, reading_timestamp, mile_marker, road_dir);
CREATE INDEX vds_detector_data_speed_idx
ON vds_detector_data
USING btree
(speed);
CREATE INDEX vds_detector_data_timestamp_idx
ON vds_detector_data
USING btree
(reading_timestamp);
CREATE INDEX vds_detector_data_ts_road_mm_dir_idx
ON vds_detector_data
USING btree
(reading_timestamp, road_name, mile_marker, road_dir);
CREATE INDEX vds_detector_data_volume_idx
ON vds_detector_data
USING btree
(volume);
EXPLAIN ANALYZE:
Query:
select cast(reading_timestamp as Date) as date, floor(extract(hour from reading_timestamp) / 1.0) * 1.0 as hour, floor(extract(minute from reading_timestamp) / 60) * 60 as min,
count(*), sum(vdd.volume) as totalVolume, sum(vdd.occupancy*vdd.volume)/sum(vdd.volume) as avgOcc, sum(vdd.speed*vdd.volume)/sum(vdd.volume) as avgSpeed,
avg(vdd.confidence) as avgConfidence, min(vdd.detector_id) as detectorId, vdd.vds_id as vdsId
from vds_detector_data vdd
where (vdd.reading_timestamp between '2011-4-01 00:00:00.000' and '2011-04-30 23:59:59.999')
and vdd.volume!=0
and ((road_name='44' and mile_marker between 257.65 and 289.5 and (road_dir='E' or road_dir='W'))
or (road_name='64' and mile_marker between 0.7 and 40.4 and (road_dir='E' or road_dir='W'))
or (road_name='55' and mile_marker between 184.8 and 208.1 and (road_dir='N' or road_dir='S'))
or (road_name='270' and mile_marker between 0.8 and 34.5 and (road_dir='N' or road_dir='S')))
and not(vdd.speed<0.0 or vdd.speed>90.0 or vdd.volume=0.0) and vdd.lane_type in ('through_lanes')
group by date, hour, min, vdd.vds_id, mile_marker
having sum(vdd.volume)!=0
order by vdd.vds_id, mile_marker;
Daily table explain analyze: http://explain.depesz.com/s/iLY
Half month table explain analyze: http://explain.depesz.com/s/Unt
Postgres version:
PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit
History:
None, this is a new database and application
Hardware:
- 2 Intel Xeon 2.13GHz processors with 8 cores each
- 8GB RAM
- Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID Controller 512MB Cache
- 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of about 2TB
- Windows Server 2008 R2 64-bit (but 32-bit postgres)
- Hardware upgrades arent an option at this point due to budget and time constraints
Maintenance Setup:
Autovacuum is disabled for these tables since the data is never updated. The tables that we are testing with at the moment will not grow any larger and have been both clustered and analyzed. They were clustered on the vds_detector_data_timestamp_idx index.
GUC Settings:
effective_cache_size: 2048MB
work_mem: 512MB
shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query plan and took the same amount of time to execute give or take a few seconds
Summary:
The time to get the raw data (before aggregation and sorting) is relatively similar between the daily and half month tables. It would appear that the major difference is the ordering of sort and aggregation, the daily tables aggregate first so the amount of data sorted is significantly less.
Since the daily tables are only 360MB, I would hope that the entire table could be pulled into memory with one large sequential read. Of course this assumes that the file pieces are stored contiguously, but auto defrag is enabled and shows low fragmentation so Im trusting (as much as one can) Windows to do the right thing here. My drives have a 150MB/s sustained max throughput, and considering that data is spread across 5 drives I would hope to at least be able to reach the single disk theoretical limit and read an entire table plus the index into memory about 4 to 5 seconds. Based on the analyze output, each daily table averages 6 to 7 seconds, so Im pretty close there and maybe just limited by disk speed?
In both cases, the row estimates vs actual are way off. Ive increased statistics on the reading_timestamp and road_name columns to 100 and then 1000 with no change. I ran an ANALYZE after each statistics change. Should I be upping stats on the non-indexed columns as well? Ive read documentation that says I should be able to set statistics values for an entire table as opposed to per column, but havent found how to do that. I guess I was either too lazy to update statistics on each column or just didnt think it would help much.
So, any pointers for performance improvement?
Thanks,
Craig
Open Roads Consulting, Inc.
757-546-3401
http://www.openroadsconsulting.com
This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed. If you are not the intended recipient, you should immediately stop reading this message and delete it from all computers that it resides on. Any unauthorized reading, distribution, copying or other use of this communication (or its attachments) is strictly prohibited. If you have received this communication in error, please notify us immediately.
This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.
I have a handful of queries that are performing very slowly. I realize that I will be hitting hardware limits at some point, but want to make sure Ive squeezed out every bit of performance I can before calling it quits.
Our database is collecting traffic data at the rate of about 3 million rows a day. The most important columns in the table being queried are timestamp, volume, occupancy, and speed. I have also denormalized the table by adding road name, direction, mile marker, and lane type values to eliminate joins with other tables that contain information about the devices that collect this information. A typical query will involve segments of roadway (i.e. road names, directions, and mile marker bounds) over a certain period of time (e.g. morning rush hour), and will have filters to exclude questionable data such (e.g. speed > 100 MPH). Unfortunately, there are also a few cases in which a user will query data for many full days on all roadways, essentially querying everything for a large period of time. One other thing to note is that we only ever query readings with lane_type = through_lanes, although we are collecting ramp and reversible lane data to facilitate future reporting needs.
Table Metadata:
- Volume typically ranges anywhere from 0 to 20, averages around 4 5. A small percentage of the rows have null volume.
- Occupancy ranges from 1 to 10, averages around 1 or 2
- Speed is about what you would expect, ranging from 30 70 with an average somewhere near the middle
- There are 17 roads
- There are 2 directions per road
- Mile marker ranges vary by roadway, typical ranges are something like 0 to 40 or 257 to 290
- Most (80 to 90% +) of the readings have lane_type = through_lanes
- Size of a daily table is about 360MB, a half month table is 5 to 6 GB
Full Table and Index Schema:
Ive experimented with partitioning using a table per day and 2 tables per month (1st through 15th, 16th to end of month). 2 tables/month was the original approach to keep the number of tables from growing too rapidly, and shows about 3x slower performance. Using daily tables incurs extra planning overhead as expected, but isnt all that bad. Im OK with taking a 1 second planning hit if my overall query time decreases significantly. Furthermore, we will only be storing raw data for about a year and can aggregate old data. This means that I can do daily tables for raw data and larger/fewer tables for older data. The table and index structure is below, which is identical between daily and ½ month tables with a couple of exceptions:
- Daily tables have a fill factor of 100, ½ month tables are default
- Only the 4 column indexes were created for the daily tables since the others never get used
CREATE TABLE vds_detector_data
(
reading_timestamp timestamp without time zone,
vds_id integer,
detector_id integer,
status smallint,
speed numeric(12,9),
volume numeric(12,9),
confidence smallint,
occupancy numeric(12,9),
loadid bigint,
road_name character varying(150),
road_dir character varying(2),
mile_marker numeric(7,2),
lane_number integer,
lane_type character varying(32),
CONSTRAINT vds_detector_vdsid_fkey FOREIGN KEY (vds_id, detector_id)
REFERENCES ref_vds_detector_properties (device_id, detector_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE RESTRICT
)
WITH (
OIDS=FALSE
);
CREATE INDEX vds_detector_data_dir_idx
ON vds_detector_data
USING btree
(road_dir);
CREATE INDEX vds_detector_data_lane_idx
ON vds_detector_data
USING btree
(lane_number);
CREATE INDEX vds_detector_data_mm_idx
ON vds_detector_data
USING btree
(mile_marker);
CREATE INDEX vds_detector_data_occupancy_idx
ON vds_detector_data
USING btree
(occupancy);
CREATE INDEX vds_detector_data_road_idx
ON vds_detector_data
USING btree
(road_name);
CREATE INDEX vds_detector_data_road_ts_mm_dir_idx
ON vds_detector_data
USING btree
(road_name, reading_timestamp, mile_marker, road_dir);
CREATE INDEX vds_detector_data_speed_idx
ON vds_detector_data
USING btree
(speed);
CREATE INDEX vds_detector_data_timestamp_idx
ON vds_detector_data
USING btree
(reading_timestamp);
CREATE INDEX vds_detector_data_ts_road_mm_dir_idx
ON vds_detector_data
USING btree
(reading_timestamp, road_name, mile_marker, road_dir);
CREATE INDEX vds_detector_data_volume_idx
ON vds_detector_data
USING btree
(volume);
EXPLAIN ANALYZE:
Query:
select cast(reading_timestamp as Date) as date, floor(extract(hour from reading_timestamp) / 1.0) * 1.0 as hour, floor(extract(minute from reading_timestamp) / 60) * 60 as min,
count(*), sum(vdd.volume) as totalVolume, sum(vdd.occupancy*vdd.volume)/sum(vdd.volume) as avgOcc, sum(vdd.speed*vdd.volume)/sum(vdd.volume) as avgSpeed,
avg(vdd.confidence) as avgConfidence, min(vdd.detector_id) as detectorId, vdd.vds_id as vdsId
from vds_detector_data vdd
where (vdd.reading_timestamp between '2011-4-01 00:00:00.000' and '2011-04-30 23:59:59.999')
and vdd.volume!=0
and ((road_name='44' and mile_marker between 257.65 and 289.5 and (road_dir='E' or road_dir='W'))
or (road_name='64' and mile_marker between 0.7 and 40.4 and (road_dir='E' or road_dir='W'))
or (road_name='55' and mile_marker between 184.8 and 208.1 and (road_dir='N' or road_dir='S'))
or (road_name='270' and mile_marker between 0.8 and 34.5 and (road_dir='N' or road_dir='S')))
and not(vdd.speed<0.0 or vdd.speed>90.0 or vdd.volume=0.0) and vdd.lane_type in ('through_lanes')
group by date, hour, min, vdd.vds_id, mile_marker
having sum(vdd.volume)!=0
order by vdd.vds_id, mile_marker;
Daily table explain analyze: http://explain.depesz.com/s/iLY
Half month table explain analyze: http://explain.depesz.com/s/Unt
Postgres version:
PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit
History:
None, this is a new database and application
Hardware:
- 2 Intel Xeon 2.13GHz processors with 8 cores each
- 8GB RAM
- Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID Controller 512MB Cache
- 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of about 2TB
- Windows Server 2008 R2 64-bit (but 32-bit postgres)
- Hardware upgrades arent an option at this point due to budget and time constraints
Maintenance Setup:
Autovacuum is disabled for these tables since the data is never updated. The tables that we are testing with at the moment will not grow any larger and have been both clustered and analyzed. They were clustered on the vds_detector_data_timestamp_idx index.
GUC Settings:
effective_cache_size: 2048MB
work_mem: 512MB
shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query plan and took the same amount of time to execute give or take a few seconds
Summary:
The time to get the raw data (before aggregation and sorting) is relatively similar between the daily and half month tables. It would appear that the major difference is the ordering of sort and aggregation, the daily tables aggregate first so the amount of data sorted is significantly less.
Since the daily tables are only 360MB, I would hope that the entire table could be pulled into memory with one large sequential read. Of course this assumes that the file pieces are stored contiguously, but auto defrag is enabled and shows low fragmentation so Im trusting (as much as one can) Windows to do the right thing here. My drives have a 150MB/s sustained max throughput, and considering that data is spread across 5 drives I would hope to at least be able to reach the single disk theoretical limit and read an entire table plus the index into memory about 4 to 5 seconds. Based on the analyze output, each daily table averages 6 to 7 seconds, so Im pretty close there and maybe just limited by disk speed?
In both cases, the row estimates vs actual are way off. Ive increased statistics on the reading_timestamp and road_name columns to 100 and then 1000 with no change. I ran an ANALYZE after each statistics change. Should I be upping stats on the non-indexed columns as well? Ive read documentation that says I should be able to set statistics values for an entire table as opposed to per column, but havent found how to do that. I guess I was either too lazy to update statistics on each column or just didnt think it would help much.
So, any pointers for performance improvement?
Thanks,
Craig
Open Roads Consulting, Inc.
757-546-3401
http://www.openroadsconsulting.com
This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed. If you are not the intended recipient, you should immediately stop reading this message and delete it from all computers that it resides on. Any unauthorized reading, distribution, copying or other use of this communication (or its attachments) is strictly prohibited. If you have received this communication in error, please notify us immediately.
This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.
Dne 28.6.2011 23:28, Craig McIlwee napsal(a): > Daily table explain analyze: http://explain.depesz.com/s/iLY > Half month table explain analyze: http://explain.depesz.com/s/Unt Are you sure those two queries are exactly the same? Because the daily case output says the width is 50B, while the half-month case says it's 75B. This might be why the sort/aggregate steps are switched, and that increases the amount of data so that it has to be sorted on disk (which is why the half-month is so much slower). Haven't you added some columns to the half-month query? > Postgres version: > PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit > > History: > None, this is a new database and application > > Hardware: > - 2 Intel Xeon 2.13GHz processors with 8 cores each > - 8GB RAM > - Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID > Controller 512MB Cache > - 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of > about 2TB > - Windows Server 2008 R2 64-bit (but 32-bit postgres) > - Hardware upgrades arent an option at this point due to budget and time > constraints Not much experience with PostgreSQL on Windows, but this looks good to me. Not sure if RAID5 is a good choice, especially because of write performance - this is probably one of the reasons why the disk sort is so slow (in the half-month case). And it's nice you have 8 cores, but don't forget each query executes on a single background process, i.e. it may use single core. So the above query can't use 8 cores - that's why the in-memory sort takes so long I guess. > Maintenance Setup: > Autovacuum is disabled for these tables since the data is never > updated. The tables that we are testing with at the moment will not > grow any larger and have been both clustered and analyzed. They were > clustered on the vds_detector_data_timestamp_idx index. > > GUC Settings: > effective_cache_size: 2048MB > work_mem: 512MB > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query > plan and took the same amount of time to execute give or take a few seconds > > Summary: > > The time to get the raw data (before aggregation and sorting) is > relatively similar between the daily and half month tables. It would > appear that the major difference is the ordering of sort and > aggregation, the daily tables aggregate first so the amount of data > sorted is significantly less. Yes, the ordering is the problem. The amount of data to sort is so huge (3GB) it does not fit into work_mem and has to be sorted on disk. Not sure why this happens, the only difference I've noticed is the 'width' (50B vs. 75B). Are those two queries exactly the same? > Since the daily tables are only 360MB, I would hope that the entire > table could be pulled into memory with one large sequential read. Of > course this assumes that the file pieces are stored contiguously, but > auto defrag is enabled and shows low fragmentation so Im trusting (as > much as one can) Windows to do the right thing here. My drives have a > 150MB/s sustained max throughput, and considering that data is spread > across 5 drives I would hope to at least be able to reach the single > disk theoretical limit and read an entire table plus the index into > memory about 4 to 5 seconds. Based on the analyze output, each daily > table averages 6 to 7 seconds, so Im pretty close there and maybe just > limited by disk speed? Well, you have 30 partitions and 7 seconds for each means 210 secons in total. Which is about the time you get (before the aggregate/sort). You have to check where the bottleneck is - is it the I/O or CPU? I'd guess the CPU, but I may be wrong. On unix I'd use something like iostat/vmstat/top to see what's going on - not sure what to use on Windows. I guess there is a some console or maybe Process Explorer from sysinternals. > In both cases, the row estimates vs actual are way off. Ive increased > statistics on the reading_timestamp and road_name columns to 100 and > then 1000 with no change. I ran an ANALYZE after each statistics > change. Should I be upping stats on the non-indexed columns as well? > Ive read documentation that says I should be able to set statistics > values for an entire table as opposed to per column, but havent found > how to do that. I guess I was either too lazy to update statistics on > each column or just didnt think it would help much. The estimates seem pretty good to me - 10x difference is not that much. Could be better, but I don't think you can get a better plan, is seems very reasonable to me. > So, any pointers for performance improvement? Three ideas what might help 1) partial indexes How much do the parameters in the query change? If there are parameters that are always the same, you may try to create partial indexes. For example if the 'vdd.volume' always has to be '0', then you can create the index like this CREATE INDEX vds_detector_data_dir_idx ON vds_detector_data USING btree (road_dir) WHERE (vdd.volume!=0); That way only the rows with 'vdd.volume!=0' will be included in the index, the index will be smaller and the bitmap will be created faster. Similarly for the other conditions. The smaller the index will be, the faster the bitmap creation. If all the conditions may change, or if the index size does not change much, you can't do this. 2) prebuilt results Another option is precomputation of the 'per partition results' - if you know what the conditions are going to be, you can precompute them and then query just those (much smaller) tables. But this is very application specific. Again, if the all the conditions change, you can't do this. 3) distribute the queries As I've mentioned, PostgreSQL does not distribute the queries on multiple CPU cores, but you can do that on your own at the application level. For example I see the GROUP BY clause contains 'date, hour, min' so you can compute the results for each partition separately (in a different thread, using a separate connection) and then 'append' them. Yes, you'll need to keep some metadata to do this efficiently (e.g. list of partitions along with from/to timestamps), but you should do this anyway I guess (at least I do that when partitioning tables, it makes the management much easier). Not sure if you can do this with the other queries :-( regards Tomas
On 06/28/2011 05:28 PM, Craig McIlwee wrote: > Autovacuum is disabled for these tables since the data is never > updated. The tables that we are testing with at the moment will not > grow any larger and have been both clustered and analyzed. Note that any such prep to keep from ever needing to maintain these tables in the future should include the FREEZE option, possibly with some parameters tweaked first to make it more aggressive. Autovacuum will eventually revisit them in order to prevent transaction ID wrap-around, even if it's disabled. If you're going to the trouble of prepping them so they are never touched again, you should do a freeze with the right parameters to keep this from happening again. > work_mem: 512MB > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query > plan and took the same amount of time to execute give or take a few > seconds shared_buffers doesn't normally impact the query plan; it impacts how much churn there is between the database and the operating system cache, mainly important for making write-heavy work efficient. On Windows, you'll probably be safe to set this to 512MB and forget about it. It doesn't benefit from large values anyway. This is a very large work_mem setting however, so be careful that you won't have many users connecting at once if you're going to use it. Each connection can use a multiple of work_mem, making it quite possible you could run out of memory with this configuration. If that low user count is true, you may want to make sure you're enforcing it by lowering max_connections, as a safety measure to prevent problems. > Since the daily tables are only 360MB, I would hope that the entire > table could be pulled into memory with one large sequential read. Of > course this assumes that the file pieces are stored contiguously, but > auto defrag is enabled and shows low fragmentation so Im trusting (as > much as one can) Windows to do the right thing here. My drives have a > 150MB/s sustained max throughput, and considering that data is spread > across 5 drives I would hope to at least be able to reach the single > disk theoretical limit and read an entire table plus the index into > memory about 4 to 5 seconds. Based on the analyze output, each daily > table averages 6 to 7 seconds, so Im pretty close there and maybe just > limited by disk speed? One thing to note is that your drive speed varies based on what part of the disk things are located at; the slower parts of the drive will be much less than 150MB/s. On Linux servers it's impossible to reach something close to the disk's raw speed without making the operating system read-ahead feature much more aggressive than it is by default. Because PostgreSQL fetches a single block at a time, to keep the drive completely busy something has to notice the pattern of access and be reading data ahead of when the database even asks for it. You may find a parameter you can tune in the properties for the drives somewhere in the Windows Control Panel. And there's a read-ahead setting on your PERC card that's better than nothing you may not have turned on (not as good as the Linux one, but it's useful). There are two useful settings there ("on" and "adaptive" if I recall correctly) that you can try, to see which works better. > Ive read documentation that says I should be able to set statistics > values for an entire table as opposed to per column, but havent found > how to do that. I guess I was either too lazy to update statistics on > each column or just didnt think it would help much. You can adjust the statistics target across the entire database using the default_statistics_target setting, or you can tweak them per column using ALTER TABLE. There is no table-level control. I find it difficult to answer questions about whether there is enough stats or not without actually looking at pg_stats to see how the database is interpreting the data, and comparing it against the real distribution. This is an area where flailing about trying things doesn't work very well; you need to be very systematic about the analysis and testing strategy if you're going to get anywhere useful. It's not easy to do. As a larger commentary on what you're trying to do, applications like this often find themselves at a point one day where you just can't allow arbitrary user queries to run against them anymore. What normally happens then is that the most common things that people really need end up being run one and stored in some summary form, using techniques such as materialized views: http://wiki.postgresql.org/wiki/Materialized_Views In your case, I would start now on trying to find the common patters to the long running reports that people generate, and see if it's possible to pre-compute some portion of them and save that summary. And you may find yourself in a continuous battle with business requests regardless. It's often key decision makers who feel they should be able to query any way they want, regardless of its impact on the database. Finding a middle ground there is usually challenging. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD Comprehensive and Customized PostgreSQL Training Classes: http://www.2ndquadrant.us/postgresql-training/
> Dne 28.6.2011 23:28, Craig McIlwee napsal(a):
> > Daily table explain analyze: http://explain.depesz.com/s/iLY
> > Half month table explain analyze: http://explain.depesz.com/s/Unt
>
> Are you sure those two queries are exactly the same? Because the daily
> case output says the width is 50B, while the half-month case says it's
> 75B. This might be why the sort/aggregate steps are switched, and that
> increases the amount of data so that it has to be sorted on disk (which
> is why the half-month is so much slower).
>
> Haven't you added some columns to the half-month query?
The daily tables were created using CREATE TABLE AS from the half month tables, structure is the same with the exception of fill factor. Queries are identical except for the name of the master table that they select from.
>
> > Postgres version:
> > PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit
> >
> > History:
> > None, this is a new database and application
> >
> > Hardware:
> > - 2 Intel Xeon 2.13GHz processors with 8 cores each
> > - 8GB RAM
> > - Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID
> > Controller 512MB Cache
> > - 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of
> > about 2TB
> > - Windows Server 2008 R2 64-bit (but 32-bit postgres)
> > - Hardware upgrades arent an option at this point due to budget and time
> > constraints
>
> Not much experience with PostgreSQL on Windows, but this looks good to
> me. Not sure if RAID5 is a good choice, especially because of write
> performance - this is probably one of the reasons why the disk sort is
> so slow (in the half-month case).
Yes, the data import is painfully slow but I hope to make up for that with the read performance later.
>
> And it's nice you have 8 cores, but don't forget each query executes on
> a single background process, i.e. it may use single core. So the above
> query can't use 8 cores - that's why the in-memory sort takes so long I
> guess.
>
> > Maintenance Setup:
> > Autovacuum is disabled for these tables since the data is never
> > updated. The tables that we are testing with at the moment will not
> > grow any larger and have been both clustered and analyzed. They were
> > clustered on the vds_detector_data_timestamp_idx index.
> >
> > GUC Settings:
> > effective_cache_size: 2048MB
> > work_mem: 512MB
> > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
> > plan and took the same amount of time to execute give or take a few
> seconds
> >
> > Summary:
> >
> > The time to get the raw data (before aggregation and sorting) is
> > relatively similar between the daily and half month tables. It would
> > appear that the major difference is the ordering of sort and
> > aggregation, the daily tables aggregate first so the amount of data
> > sorted is significantly less.
>
> Yes, the ordering is the problem. The amount of data to sort is so huge
> (3GB) it does not fit into work_mem and has to be sorted on disk. Not
> sure why this happens, the only difference I've noticed is the 'width'
> (50B vs. 75B). Are those two queries exactly the same?
>
> > Since the daily tables are only 360MB, I would hope that the entire
> > table could be pulled into memory with one large sequential read. Of
> > course this assumes that the file pieces are stored contiguously, but
> > auto defrag is enabled and shows low fragmentation so Im trusting (as
> > much as one can) Windows to do the right thing here. My drives have a
> > 150MB/s sustained max throughput, and considering that data is spread
> > across 5 drives I would hope to at least be able to reach the single
> > disk theoretical limit and read an entire table plus the index into
> > memory about 4 to 5 seconds. Based on the analyze output, each daily
> > table averages 6 to 7 seconds, so Im pretty close there and maybe just
> > limited by disk speed?
>
> Well, you have 30 partitions and 7 seconds for each means 210 secons in
> total. Which is about the time you get (before the aggregate/sort).
>
> You have to check where the bottleneck is - is it the I/O or CPU? I'd
> guess the CPU, but I may be wrong. On unix I'd use something like
> iostat/vmstat/top to see what's going on - not sure what to use on
> Windows. I guess there is a some console or maybe Process Explorer from
> sysinternals.
>
> > In both cases, the row estimates vs actual are way off. Ive increased
> > statistics on the reading_timestamp and road_name columns to 100 and
> > then 1000 with no change. I ran an ANALYZE after each statistics
> > change. Should I be upping stats on the non-indexed columns as well?
> > Ive read documentation that says I should be able to set statistics
> > values for an entire table as opposed to per column, but havent found
> > how to do that. I guess I was either too lazy to update statistics on
> > each column or just didnt think it would help much.
>
> The estimates seem pretty good to me - 10x difference is not that much.
> Could be better, but I don't think you can get a better plan, is seems
> very reasonable to me.
>
> > So, any pointers for performance improvement?
>
> Three ideas what might help
>
> 1) partial indexes
>
> How much do the parameters in the query change? If there are parameters
> that are always the same, you may try to create partial indexes. For
> example if the 'vdd.volume' always has to be '0', then you can create
> the index like this
>
> CREATE INDEX vds_detector_data_dir_idx
> ON vds_detector_data
> USING btree
> (road_dir)
> WHERE (vdd.volume!=0);
>
> That way only the rows with 'vdd.volume!=0' will be included in the
> index, the index will be smaller and the bitmap will be created faster.
> Similarly for the other conditions. The smaller the index will be, the
> faster the bitmap creation.
>
> If all the conditions may change, or if the index size does not change
> much, you can't do this.
The 0 volume is the only thing that will always be present, but those records do account for 10 to 15% of the data. I'll give this a shot, I'm really interested in seeing what impact this had. For some reason I was under the impression that partial indexes were used for text searches, so I completely overlooked this.
>
> 2) prebuilt results
>
> Another option is precomputation of the 'per partition results' - if you
> know what the conditions are going to be, you can precompute them and
> then query just those (much smaller) tables. But this is very
> application specific.
>
> Again, if the all the conditions change, you can't do this.
This has been one of the toughest issues. Due to the filtering capabilities, it's just not possible to precalculate anything.
>
> 3) distribute the queries
>
> As I've mentioned, PostgreSQL does not distribute the queries on
> multiple CPU cores, but you can do that on your own at the application
> level.
>
> For example I see the GROUP BY clause contains 'date, hour, min' so you
> can compute the results for each partition separately (in a different
> thread, using a separate connection) and then 'append' them.
>
> Yes, you'll need to keep some metadata to do this efficiently (e.g. list
> of partitions along with from/to timestamps), but you should do this
> anyway I guess (at least I do that when partitioning tables, it makes
> the management much easier).
I noticed this too after a little more testing, there are some serious performance gains to be had here. I started with a single day query and it took about 15 seconds. Next was 5 simultaneous queries all at about 30 seconds each and then 10 queries at 50 seconds each.
>
> Not sure if you can do this with the other queries :-(
Not all, but many. The query in question is the most beastly of them all, so I'm pretty happy to have some strategy for improvement.
>
> regards
> Tomas
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
Thanks for the help.
Craig
This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.
> > Daily table explain analyze: http://explain.depesz.com/s/iLY
> > Half month table explain analyze: http://explain.depesz.com/s/Unt
>
> Are you sure those two queries are exactly the same? Because the daily
> case output says the width is 50B, while the half-month case says it's
> 75B. This might be why the sort/aggregate steps are switched, and that
> increases the amount of data so that it has to be sorted on disk (which
> is why the half-month is so much slower).
>
> Haven't you added some columns to the half-month query?
The daily tables were created using CREATE TABLE AS from the half month tables, structure is the same with the exception of fill factor. Queries are identical except for the name of the master table that they select from.
>
> > Postgres version:
> > PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit
> >
> > History:
> > None, this is a new database and application
> >
> > Hardware:
> > - 2 Intel Xeon 2.13GHz processors with 8 cores each
> > - 8GB RAM
> > - Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID
> > Controller 512MB Cache
> > - 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of
> > about 2TB
> > - Windows Server 2008 R2 64-bit (but 32-bit postgres)
> > - Hardware upgrades arent an option at this point due to budget and time
> > constraints
>
> Not much experience with PostgreSQL on Windows, but this looks good to
> me. Not sure if RAID5 is a good choice, especially because of write
> performance - this is probably one of the reasons why the disk sort is
> so slow (in the half-month case).
Yes, the data import is painfully slow but I hope to make up for that with the read performance later.
>
> And it's nice you have 8 cores, but don't forget each query executes on
> a single background process, i.e. it may use single core. So the above
> query can't use 8 cores - that's why the in-memory sort takes so long I
> guess.
>
> > Maintenance Setup:
> > Autovacuum is disabled for these tables since the data is never
> > updated. The tables that we are testing with at the moment will not
> > grow any larger and have been both clustered and analyzed. They were
> > clustered on the vds_detector_data_timestamp_idx index.
> >
> > GUC Settings:
> > effective_cache_size: 2048MB
> > work_mem: 512MB
> > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
> > plan and took the same amount of time to execute give or take a few
> seconds
> >
> > Summary:
> >
> > The time to get the raw data (before aggregation and sorting) is
> > relatively similar between the daily and half month tables. It would
> > appear that the major difference is the ordering of sort and
> > aggregation, the daily tables aggregate first so the amount of data
> > sorted is significantly less.
>
> Yes, the ordering is the problem. The amount of data to sort is so huge
> (3GB) it does not fit into work_mem and has to be sorted on disk. Not
> sure why this happens, the only difference I've noticed is the 'width'
> (50B vs. 75B). Are those two queries exactly the same?
>
> > Since the daily tables are only 360MB, I would hope that the entire
> > table could be pulled into memory with one large sequential read. Of
> > course this assumes that the file pieces are stored contiguously, but
> > auto defrag is enabled and shows low fragmentation so Im trusting (as
> > much as one can) Windows to do the right thing here. My drives have a
> > 150MB/s sustained max throughput, and considering that data is spread
> > across 5 drives I would hope to at least be able to reach the single
> > disk theoretical limit and read an entire table plus the index into
> > memory about 4 to 5 seconds. Based on the analyze output, each daily
> > table averages 6 to 7 seconds, so Im pretty close there and maybe just
> > limited by disk speed?
>
> Well, you have 30 partitions and 7 seconds for each means 210 secons in
> total. Which is about the time you get (before the aggregate/sort).
>
> You have to check where the bottleneck is - is it the I/O or CPU? I'd
> guess the CPU, but I may be wrong. On unix I'd use something like
> iostat/vmstat/top to see what's going on - not sure what to use on
> Windows. I guess there is a some console or maybe Process Explorer from
> sysinternals.
>
> > In both cases, the row estimates vs actual are way off. Ive increased
> > statistics on the reading_timestamp and road_name columns to 100 and
> > then 1000 with no change. I ran an ANALYZE after each statistics
> > change. Should I be upping stats on the non-indexed columns as well?
> > Ive read documentation that says I should be able to set statistics
> > values for an entire table as opposed to per column, but havent found
> > how to do that. I guess I was either too lazy to update statistics on
> > each column or just didnt think it would help much.
>
> The estimates seem pretty good to me - 10x difference is not that much.
> Could be better, but I don't think you can get a better plan, is seems
> very reasonable to me.
>
> > So, any pointers for performance improvement?
>
> Three ideas what might help
>
> 1) partial indexes
>
> How much do the parameters in the query change? If there are parameters
> that are always the same, you may try to create partial indexes. For
> example if the 'vdd.volume' always has to be '0', then you can create
> the index like this
>
> CREATE INDEX vds_detector_data_dir_idx
> ON vds_detector_data
> USING btree
> (road_dir)
> WHERE (vdd.volume!=0);
>
> That way only the rows with 'vdd.volume!=0' will be included in the
> index, the index will be smaller and the bitmap will be created faster.
> Similarly for the other conditions. The smaller the index will be, the
> faster the bitmap creation.
>
> If all the conditions may change, or if the index size does not change
> much, you can't do this.
The 0 volume is the only thing that will always be present, but those records do account for 10 to 15% of the data. I'll give this a shot, I'm really interested in seeing what impact this had. For some reason I was under the impression that partial indexes were used for text searches, so I completely overlooked this.
>
> 2) prebuilt results
>
> Another option is precomputation of the 'per partition results' - if you
> know what the conditions are going to be, you can precompute them and
> then query just those (much smaller) tables. But this is very
> application specific.
>
> Again, if the all the conditions change, you can't do this.
This has been one of the toughest issues. Due to the filtering capabilities, it's just not possible to precalculate anything.
>
> 3) distribute the queries
>
> As I've mentioned, PostgreSQL does not distribute the queries on
> multiple CPU cores, but you can do that on your own at the application
> level.
>
> For example I see the GROUP BY clause contains 'date, hour, min' so you
> can compute the results for each partition separately (in a different
> thread, using a separate connection) and then 'append' them.
>
> Yes, you'll need to keep some metadata to do this efficiently (e.g. list
> of partitions along with from/to timestamps), but you should do this
> anyway I guess (at least I do that when partitioning tables, it makes
> the management much easier).
I noticed this too after a little more testing, there are some serious performance gains to be had here. I started with a single day query and it took about 15 seconds. Next was 5 simultaneous queries all at about 30 seconds each and then 10 queries at 50 seconds each.
>
> Not sure if you can do this with the other queries :-(
Not all, but many. The query in question is the most beastly of them all, so I'm pretty happy to have some strategy for improvement.
>
> regards
> Tomas
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
Thanks for the help.
Craig
This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.
> On 06/28/2011 05:28 PM, Craig McIlwee wrote:
> > Autovacuum is disabled for these tables since the data is never
> > updated. The tables that we are testing with at the moment will not
> > grow any larger and have been both clustered and analyzed.
>
> Note that any such prep to keep from ever needing to maintain these
> tables in the future should include the FREEZE option, possibly with
> some parameters tweaked first to make it more aggressive. Autovacuum
> will eventually revisit them in order to prevent transaction ID
> wrap-around, even if it's disabled. If you're going to the trouble of
> prepping them so they are never touched again, you should do a freeze
> with the right parameters to keep this from happening again.
>
> > work_mem: 512MB
> > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
> > plan and took the same amount of time to execute give or take a few
> > seconds
>
> shared_buffers doesn't normally impact the query plan; it impacts how
> much churn there is between the database and the operating system cache,
> mainly important for making write-heavy work efficient. On Windows,
> you'll probably be safe to set this to 512MB and forget about it. It
> doesn't benefit from large values anyway.
I was thinking that shared buffers controlled the amount of data, primarily table and index pages, that the database could store in memory at once. Based on that assumption, I thought that a larger value would enable an entire table + index to be in memory together and speed up the query. Am I wrong?
>
> This is a very large work_mem setting however, so be careful that you
> won't have many users connecting at once if you're going to use it.
> Each connection can use a multiple of work_mem, making it quite possible
> you could run out of memory with this configuration. If that low user
> count is true, you may want to make sure you're enforcing it by lowering
> max_connections, as a safety measure to prevent problems.
I plan on lowering this quite a bit since I haven't seen much of a boost by increasing it.
>
> > Since the daily tables are only 360MB, I would hope that the entire
> > table could be pulled into memory with one large sequential read. Of
> > course this assumes that the file pieces are stored contiguously, but
> > auto defrag is enabled and shows low fragmentation so Im trusting (as
> > much as one can) Windows to do the right thing here. My drives have a
> > 150MB/s sustained max throughput, and considering that data is spread
> > across 5 drives I would hope to at least be able to reach the single
> > disk theoretical limit and read an entire table plus the index into
> > memory about 4 to 5 seconds. Based on the analyze output, each daily
> > table averages 6 to 7 seconds, so Im pretty close there and maybe just
> > limited by disk speed?
>
> One thing to note is that your drive speed varies based on what part of
> the disk things are located at; the slower parts of the drive will be
> much less than 150MB/s.
>
> On Linux servers it's impossible to reach something close to the disk's
> raw speed without making the operating system read-ahead feature much
> more aggressive than it is by default. Because PostgreSQL fetches a
> single block at a time, to keep the drive completely busy something has
> to notice the pattern of access and be reading data ahead of when the
> database even asks for it. You may find a parameter you can tune in the
> properties for the drives somewhere in the Windows Control Panel. And
> there's a read-ahead setting on your PERC card that's better than
> nothing you may not have turned on (not as good as the Linux one, but
> it's useful). There are two useful settings there ("on" and "adaptive"
> if I recall correctly) that you can try, to see which works better.
Looks like they are set to adaptive read-ahead now. If the database is executing many concurrent queries, is it reasonable to suspect that the IO requests will compete with each other in such a way that the controller would rarely see many sequential requests since it is serving many processes? The controller does have an 'on' option also that forces read-ahead, maybe that would solve the issue if we can rely on the data to survive in the cache until the actual read request takes place.
>
> > Ive read documentation that says I should be able to set statistics
> > values for an entire table as opposed to per column, but havent found
> > how to do that. I guess I was either too lazy to update statistics on
> > each column or just didnt think it would help much.
>
> You can adjust the statistics target across the entire database using
> the default_statistics_target setting, or you can tweak them per column
> using ALTER TABLE. There is no table-level control. I find it
> difficult to answer questions about whether there is enough stats or not
> without actually looking at pg_stats to see how the database is
> interpreting the data, and comparing it against the real distribution.
> This is an area where flailing about trying things doesn't work very
> well; you need to be very systematic about the analysis and testing
> strategy if you're going to get anywhere useful. It's not easy to do.
>
> As a larger commentary on what you're trying to do, applications like
> this often find themselves at a point one day where you just can't allow
> arbitrary user queries to run against them anymore. What normally
> happens then is that the most common things that people really need end
> up being run one and stored in some summary form, using techniques such
> as materialized views: http://wiki.postgresql.org/wiki/Materialized_Views
>
> In your case, I would start now on trying to find the common patters to
> the long running reports that people generate, and see if it's possible
> to pre-compute some portion of them and save that summary. And you may
> find yourself in a continuous battle with business requests regardless.
> It's often key decision makers who feel they should be able to query any
> way they want, regardless of its impact on the database. Finding a
> middle ground there is usually challenging.
>
> --
> Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
> Comprehensive and Customized PostgreSQL Training Classes:
> http://www.2ndquadrant.us/postgresql-training/
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
Thanks,
Craig
This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.
> > Autovacuum is disabled for these tables since the data is never
> > updated. The tables that we are testing with at the moment will not
> > grow any larger and have been both clustered and analyzed.
>
> Note that any such prep to keep from ever needing to maintain these
> tables in the future should include the FREEZE option, possibly with
> some parameters tweaked first to make it more aggressive. Autovacuum
> will eventually revisit them in order to prevent transaction ID
> wrap-around, even if it's disabled. If you're going to the trouble of
> prepping them so they are never touched again, you should do a freeze
> with the right parameters to keep this from happening again.
>
> > work_mem: 512MB
> > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
> > plan and took the same amount of time to execute give or take a few
> > seconds
>
> shared_buffers doesn't normally impact the query plan; it impacts how
> much churn there is between the database and the operating system cache,
> mainly important for making write-heavy work efficient. On Windows,
> you'll probably be safe to set this to 512MB and forget about it. It
> doesn't benefit from large values anyway.
I was thinking that shared buffers controlled the amount of data, primarily table and index pages, that the database could store in memory at once. Based on that assumption, I thought that a larger value would enable an entire table + index to be in memory together and speed up the query. Am I wrong?
>
> This is a very large work_mem setting however, so be careful that you
> won't have many users connecting at once if you're going to use it.
> Each connection can use a multiple of work_mem, making it quite possible
> you could run out of memory with this configuration. If that low user
> count is true, you may want to make sure you're enforcing it by lowering
> max_connections, as a safety measure to prevent problems.
I plan on lowering this quite a bit since I haven't seen much of a boost by increasing it.
>
> > Since the daily tables are only 360MB, I would hope that the entire
> > table could be pulled into memory with one large sequential read. Of
> > course this assumes that the file pieces are stored contiguously, but
> > auto defrag is enabled and shows low fragmentation so Im trusting (as
> > much as one can) Windows to do the right thing here. My drives have a
> > 150MB/s sustained max throughput, and considering that data is spread
> > across 5 drives I would hope to at least be able to reach the single
> > disk theoretical limit and read an entire table plus the index into
> > memory about 4 to 5 seconds. Based on the analyze output, each daily
> > table averages 6 to 7 seconds, so Im pretty close there and maybe just
> > limited by disk speed?
>
> One thing to note is that your drive speed varies based on what part of
> the disk things are located at; the slower parts of the drive will be
> much less than 150MB/s.
>
> On Linux servers it's impossible to reach something close to the disk's
> raw speed without making the operating system read-ahead feature much
> more aggressive than it is by default. Because PostgreSQL fetches a
> single block at a time, to keep the drive completely busy something has
> to notice the pattern of access and be reading data ahead of when the
> database even asks for it. You may find a parameter you can tune in the
> properties for the drives somewhere in the Windows Control Panel. And
> there's a read-ahead setting on your PERC card that's better than
> nothing you may not have turned on (not as good as the Linux one, but
> it's useful). There are two useful settings there ("on" and "adaptive"
> if I recall correctly) that you can try, to see which works better.
Looks like they are set to adaptive read-ahead now. If the database is executing many concurrent queries, is it reasonable to suspect that the IO requests will compete with each other in such a way that the controller would rarely see many sequential requests since it is serving many processes? The controller does have an 'on' option also that forces read-ahead, maybe that would solve the issue if we can rely on the data to survive in the cache until the actual read request takes place.
>
> > Ive read documentation that says I should be able to set statistics
> > values for an entire table as opposed to per column, but havent found
> > how to do that. I guess I was either too lazy to update statistics on
> > each column or just didnt think it would help much.
>
> You can adjust the statistics target across the entire database using
> the default_statistics_target setting, or you can tweak them per column
> using ALTER TABLE. There is no table-level control. I find it
> difficult to answer questions about whether there is enough stats or not
> without actually looking at pg_stats to see how the database is
> interpreting the data, and comparing it against the real distribution.
> This is an area where flailing about trying things doesn't work very
> well; you need to be very systematic about the analysis and testing
> strategy if you're going to get anywhere useful. It's not easy to do.
>
> As a larger commentary on what you're trying to do, applications like
> this often find themselves at a point one day where you just can't allow
> arbitrary user queries to run against them anymore. What normally
> happens then is that the most common things that people really need end
> up being run one and stored in some summary form, using techniques such
> as materialized views: http://wiki.postgresql.org/wiki/Materialized_Views
>
> In your case, I would start now on trying to find the common patters to
> the long running reports that people generate, and see if it's possible
> to pre-compute some portion of them and save that summary. And you may
> find yourself in a continuous battle with business requests regardless.
> It's often key decision makers who feel they should be able to query any
> way they want, regardless of its impact on the database. Finding a
> middle ground there is usually challenging.
>
> --
> Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
> Comprehensive and Customized PostgreSQL Training Classes:
> http://www.2ndquadrant.us/postgresql-training/
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
Thanks,
Craig
This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.
Dne 29.6.2011 01:26, Craig McIlwee napsal(a): >> Dne 28.6.2011 23:28, Craig McIlwee napsal(a): >> Are you sure those two queries are exactly the same? Because the daily >> case output says the width is 50B, while the half-month case says it's >> 75B. This might be why the sort/aggregate steps are switched, and that >> increases the amount of data so that it has to be sorted on disk (which >> is why the half-month is so much slower). >> >> Haven't you added some columns to the half-month query? > > The daily tables were created using CREATE TABLE AS from the half month > tables, structure is the same with the exception of fill factor. > Queries are identical except for the name of the master table that they > select from. Hm, I'm not sure where this width value comes from but I don't think it's related to fillfactor. >> Not much experience with PostgreSQL on Windows, but this looks good to >> me. Not sure if RAID5 is a good choice, especially because of write >> performance - this is probably one of the reasons why the disk sort is >> so slow (in the half-month case). > > Yes, the data import is painfully slow but I hope to make up for that > with the read performance later. Generally you're right that RAID10 is going to be slower than RAID5 when reading (and faster when writing) the data, but how big the gap is really depends on the controller. It's not that big I guess - see for example this: http://www.kendalvandyke.com/2009/02/disk-performance-hands-on-part-5-raid.html The first test shows that RAID10 is about 10% slower on reads but about 60% faster on writes. BTW have you tuned the GUC settings for write (increasing checkpoint segments may give much better write performance). > The 0 volume is the only thing that will always be present, but those > records do account for 10 to 15% of the data. I'll give this a shot, > I'm really interested in seeing what impact this had. For some reason I > was under the impression that partial indexes were used for text > searches, so I completely overlooked this. Or you might actually do two partitions for each day - one for volume=0 and the other one for volume!=0. Not sure if that is worth the effort. One more thing to try in this case - it's not that important how many rows suffice the condition, much more important is how many blocks need to be read from the disk. If those 10% rows are distributed evenly through the table (i.e. there's at least one in each 8kB block), the I/O still needs to be done. And it's very likely the case, as you've clustered the tables according to the timestamp. Try to cluster the tables according to 'volume' and check the difference. regards Tomas
Dne 29.6.2011 01:50, Craig McIlwee napsal(a): >> > work_mem: 512MB >> > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query >> > plan and took the same amount of time to execute give or take a few >> > seconds >> >> shared_buffers doesn't normally impact the query plan; it impacts how >> much churn there is between the database and the operating system cache, >> mainly important for making write-heavy work efficient. On Windows, >> you'll probably be safe to set this to 512MB and forget about it. It >> doesn't benefit from large values anyway. > > I was thinking that shared buffers controlled the amount of data, > primarily table and index pages, that the database could store in memory > at once. Based on that assumption, I thought that a larger value would > enable an entire table + index to be in memory together and speed up the > query. Am I wrong? Well, you're right and wrong at the same time. The shared buffers really controls the amount of data that may be read into the database cache, that's true. But this value is not used when building the execution plan. There's another value (effective_cache_size) that is used when planning a query. >> > Ive read documentation that says I should be able to set statistics >> > values for an entire table as opposed to per column, but havent found >> > how to do that. I guess I was either too lazy to update statistics on >> > each column or just didnt think it would help much. Link to the docs? According to http://www.postgresql.org/docs/current/static/sql-altertable.html it's possible to set this only at the column level. And of course there's a GUC default_statistics_target that defines default value. Tomas
On 06/28/2011 07:26 PM, Craig McIlwee wrote: > Yes, the data import is painfully slow but I hope to make up for that > with the read performance later. You can probably improve that with something like this: shared_buffers=512MB checkpoint_segments=64 Maybe bump up maintenance_work_mem too, if the vacuum part of that is the painful one. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD Comprehensive and Customized PostgreSQL Training Classes: http://www.2ndquadrant.us/postgresql-training/
On 06/28/2011 07:50 PM, Craig McIlwee wrote: > I was thinking that shared buffers controlled the amount of data, > primarily table and index pages, that the database could store in > memory at once. Based on that assumption, I thought that a larger > value would enable an entire table + index to be in memory together > and speed up the query. Am I wrong? It does to some extent. But: a) This amount doesn't impact query planning as much if you've set a large effective_cache_size b) The operating system is going to cache things outside of PostgreSQL, too c) Data read via a sequential scan sometimes skips going into shared_buffers, to keep that cache from being swamped with any single scan d) until the data has actually made its way into memory, you may be pulling it in there by an inefficient random process at first. By the time the cache is populated, the thing you wanted a populated cache to accelerate may already have finished. It's possible to get insight into this all using pg_buffercache to actually see what's in the cache, and I've put up some talks and scripts to help with that at http://projects.2ndquadrant.com/talks you might find useful. > Looks like they are set to adaptive read-ahead now. If the database > is executing many concurrent queries, is it reasonable to suspect that > the IO requests will compete with each other in such a way that the > controller would rarely see many sequential requests since it is > serving many processes? The controller does have an 'on' option also > that forces read-ahead, maybe that would solve the issue if we can > rely on the data to survive in the cache until the actual read request > takes place. I've never been able to find good documentation on just what the difference between the adaptive and on modes of that controller really are, which is why I suggested you try both and see. Linux has a uniquely good read-ahead model that was tuned with PostgreSQL specifically in mind. And you still have to tweak it upwards from the defaults in order for the database to fetch things as fast as the drives are capable sometimes. So your idea that you will meet/exceed the drive's capabilities for bulk sequential scans is less likely than you might think. RAID5 in theory should give you 2X or more of the speed of any single disk when reading a clustered table, but the way PostgreSQL does it may make that hard to realize on Windows. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD Comprehensive and Customized PostgreSQL Training Classes: http://www.2ndquadrant.us/postgresql-training/