Обсуждение: Slow performance when querying millions of rows

Поиск
Список
Период
Сортировка

Slow performance when querying millions of rows

От
Craig McIlwee
Дата:
Hello,

I have a handful of queries that are performing very slowly.  I realize that I will be hitting hardware limits at some point, but want to make sure Ive squeezed out every bit of performance I can before calling it quits.

Our database is collecting traffic data at the rate of about 3 million rows a day.  The most important columns in the table being queried are timestamp, volume, occupancy, and speed.  I have also denormalized the table by adding road name, direction, mile marker, and lane type values to eliminate joins with other tables that contain information about the devices that collect this information.  A typical query will involve segments of roadway (i.e. road names, directions, and mile marker bounds) over a certain period of time (e.g. morning rush hour), and will have filters to exclude questionable data such (e.g. speed > 100 MPH).  Unfortunately, there are also a few cases in which a user will query data for many full days on all roadways, essentially querying everything for a large period of time.  One other thing to note is that we only ever query readings with lane_type = through_lanes, although we are collecting ramp and reversible lane data to facilitate future reporting needs.

Table Metadata:
- Volume typically ranges anywhere from 0 to 20, averages around 4  5.  A small percentage of the rows have null volume.
- Occupancy ranges from 1 to 10, averages around 1 or 2
- Speed is about what you would expect, ranging from 30  70 with an average somewhere near the middle
- There are 17 roads
- There are 2 directions per road
- Mile marker ranges vary by roadway, typical ranges are something like 0 to 40 or 257 to 290
- Most (80 to 90% +) of the readings have lane_type = through_lanes
- Size of a daily table is about 360MB, a half month table is 5 to 6 GB

Full Table and Index Schema:

Ive experimented with partitioning using a table per day and 2 tables per month (1st through 15th, 16th to end of month).  2 tables/month was the original approach to keep the number of tables from growing too rapidly, and shows about 3x slower performance.  Using daily tables incurs extra planning overhead as expected, but isnt all that bad.  Im OK with taking a 1 second planning hit if my overall query time decreases significantly.  Furthermore, we will only be storing raw data for about a year and can aggregate old data.  This means that I can do daily tables for raw data and larger/fewer tables for older data.  The table and index structure is below, which is identical between daily and ½ month tables with a couple of exceptions:
- Daily tables have a fill factor of 100, ½ month tables are default
- Only the 4 column indexes were created for the daily tables since the others never get used

CREATE TABLE vds_detector_data
(
  reading_timestamp timestamp without time zone,
  vds_id integer,
  detector_id integer,
  status smallint,
  speed numeric(12,9),
  volume numeric(12,9),
  confidence smallint,
  occupancy numeric(12,9),
  loadid bigint,
  road_name character varying(150),
  road_dir character varying(2),
  mile_marker numeric(7,2),
  lane_number integer,
  lane_type character varying(32),
  CONSTRAINT vds_detector_vdsid_fkey FOREIGN KEY (vds_id, detector_id)
      REFERENCES ref_vds_detector_properties (device_id, detector_id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE RESTRICT
)
WITH (
  OIDS=FALSE
);

CREATE INDEX vds_detector_data_dir_idx
  ON vds_detector_data
  USING btree
  (road_dir);

CREATE INDEX vds_detector_data_lane_idx
  ON vds_detector_data
  USING btree
  (lane_number);

CREATE INDEX vds_detector_data_mm_idx
  ON vds_detector_data
  USING btree
  (mile_marker);

CREATE INDEX vds_detector_data_occupancy_idx
  ON vds_detector_data
  USING btree
  (occupancy);

CREATE INDEX vds_detector_data_road_idx
  ON vds_detector_data
  USING btree
  (road_name);

CREATE INDEX vds_detector_data_road_ts_mm_dir_idx
  ON vds_detector_data
  USING btree
  (road_name, reading_timestamp, mile_marker, road_dir);

CREATE INDEX vds_detector_data_speed_idx
  ON vds_detector_data
  USING btree
  (speed);

CREATE INDEX vds_detector_data_timestamp_idx
  ON vds_detector_data
  USING btree
  (reading_timestamp);

CREATE INDEX vds_detector_data_ts_road_mm_dir_idx
  ON vds_detector_data
  USING btree
  (reading_timestamp, road_name, mile_marker, road_dir);

CREATE INDEX vds_detector_data_volume_idx
  ON vds_detector_data
  USING btree
  (volume);

EXPLAIN ANALYZE:

Query:
select cast(reading_timestamp as Date) as date, floor(extract(hour from reading_timestamp) / 1.0) * 1.0  as hour, floor(extract(minute from reading_timestamp) / 60) * 60 as min,
  count(*), sum(vdd.volume) as totalVolume, sum(vdd.occupancy*vdd.volume)/sum(vdd.volume) as avgOcc, sum(vdd.speed*vdd.volume)/sum(vdd.volume) as avgSpeed,
  avg(vdd.confidence) as avgConfidence, min(vdd.detector_id) as detectorId, vdd.vds_id as vdsId
from vds_detector_data vdd
where (vdd.reading_timestamp between '2011-4-01 00:00:00.000' and '2011-04-30  23:59:59.999')
  and vdd.volume!=0
  and ((road_name='44' and mile_marker between 257.65 and 289.5 and (road_dir='E' or road_dir='W'))
    or (road_name='64' and mile_marker between 0.7 and 40.4 and (road_dir='E' or road_dir='W'))
                or (road_name='55' and mile_marker between 184.8 and 208.1 and (road_dir='N' or road_dir='S'))
                or (road_name='270' and mile_marker between 0.8 and 34.5 and (road_dir='N' or road_dir='S')))
  and not(vdd.speed<0.0 or vdd.speed>90.0 or vdd.volume=0.0) and vdd.lane_type in ('through_lanes')
group by date, hour, min, vdd.vds_id, mile_marker
having sum(vdd.volume)!=0
order by vdd.vds_id, mile_marker;

Daily table explain analyze: http://explain.depesz.com/s/iLY
Half month table explain analyze: http://explain.depesz.com/s/Unt

Postgres version:
PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit

History:
None, this is a new database and application

Hardware:
- 2 Intel Xeon 2.13GHz processors with 8 cores each
- 8GB RAM
- Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID Controller 512MB Cache
- 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of about 2TB
- Windows Server 2008 R2 64-bit (but 32-bit postgres)
- Hardware upgrades arent an option at this point due to budget and time constraints

Maintenance Setup:
Autovacuum is disabled for these tables since the data is never updated.  The tables that we are testing with at the moment will not grow any larger and have been both clustered and analyzed.  They were clustered on the vds_detector_data_timestamp_idx index.

GUC Settings:
effective_cache_size: 2048MB
work_mem: 512MB
shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query plan and took the same amount of time to execute give or take a few seconds

Summary:

The time to get the raw data (before aggregation and sorting) is relatively similar between the daily and half month tables.  It would appear that the major difference is the ordering of sort and aggregation, the daily tables aggregate first so the amount of data sorted is significantly less.

Since the daily tables are only 360MB, I would hope that the entire table could be pulled into memory with one large sequential read.  Of course this assumes that the file pieces are stored contiguously, but auto defrag is enabled and shows low fragmentation so Im trusting (as much as one can) Windows to do the right thing here.  My drives have a 150MB/s sustained max throughput, and considering that data is spread across 5 drives I would hope to at least be able to reach the single disk theoretical limit and read an entire table plus the index into memory about 4 to 5 seconds.  Based on the analyze output, each daily table averages 6 to 7 seconds, so Im pretty close there and maybe just limited by disk speed?

In both cases, the row estimates vs actual are way off.  Ive increased statistics on the reading_timestamp and road_name columns to 100 and then 1000 with no change.  I ran an ANALYZE after each statistics change.  Should I be upping stats on the non-indexed columns as well?  Ive read documentation that says I should be able to set statistics values for an entire table as opposed to per column, but havent found how to do that.  I guess I was either too lazy to update statistics on each column or just didnt think it would help much.

So, any pointers for performance improvement?

Thanks,
Craig
Open Roads Consulting, Inc.
757-546-3401
http://www.openroadsconsulting.com

This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.  If you are not the intended recipient, you should immediately stop reading this message and delete it from all computers that it resides on. Any unauthorized reading, distribution, copying or other use of this communication (or its attachments) is strictly prohibited.  If you have received this communication in error, please notify us immediately.



This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.

Re: Slow performance when querying millions of rows

От
Tomas Vondra
Дата:
Dne 28.6.2011 23:28, Craig McIlwee napsal(a):
> Daily table explain analyze: http://explain.depesz.com/s/iLY
> Half month table explain analyze: http://explain.depesz.com/s/Unt

Are you sure those two queries are exactly the same? Because the daily
case output says the width is 50B, while the half-month case says it's
75B. This might be why the sort/aggregate steps are switched, and that
increases the amount of data so that it has to be sorted on disk (which
is why the half-month is so much slower).

Haven't you added some columns to the half-month query?

> Postgres version:
> PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit
>
> History:
> None, this is a new database and application
>
> Hardware:
> - 2 Intel Xeon 2.13GHz processors with 8 cores each
> - 8GB RAM
> - Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID
> Controller 512MB Cache
> - 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of
> about 2TB
> - Windows Server 2008 R2 64-bit (but 32-bit postgres)
> - Hardware upgrades arent an option at this point due to budget and time
> constraints

Not much experience with PostgreSQL on Windows, but this looks good to
me. Not sure if RAID5 is a good choice, especially because of write
performance - this is probably one of the reasons why the disk sort is
so slow (in the half-month case).

And it's nice you have 8 cores, but don't forget each query executes on
a single background process, i.e. it may use single core. So the above
query can't use 8 cores - that's why the in-memory sort takes so long I
guess.

> Maintenance Setup:
> Autovacuum is disabled for these tables since the data is never
> updated.  The tables that we are testing with at the moment will not
> grow any larger and have been both clustered and analyzed.  They were
> clustered on the vds_detector_data_timestamp_idx index.
>
> GUC Settings:
> effective_cache_size: 2048MB
> work_mem: 512MB
> shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
> plan and took the same amount of time to execute give or take a few seconds
>
> Summary:
>
> The time to get the raw data (before aggregation and sorting) is
> relatively similar between the daily and half month tables.  It would
> appear that the major difference is the ordering of sort and
> aggregation, the daily tables aggregate first so the amount of data
> sorted is significantly less.

Yes, the ordering is the problem. The amount of data to sort is so huge
(3GB) it does not fit into work_mem and has to be sorted on disk. Not
sure why this happens, the only difference I've noticed is the 'width'
(50B vs. 75B). Are those two queries exactly the same?

> Since the daily tables are only 360MB, I would hope that the entire
> table could be pulled into memory with one large sequential read.  Of
> course this assumes that the file pieces are stored contiguously, but
> auto defrag is enabled and shows low fragmentation so Im trusting (as
> much as one can) Windows to do the right thing here.  My drives have a
> 150MB/s sustained max throughput, and considering that data is spread
> across 5 drives I would hope to at least be able to reach the single
> disk theoretical limit and read an entire table plus the index into
> memory about 4 to 5 seconds.  Based on the analyze output, each daily
> table averages 6 to 7 seconds, so Im pretty close there and maybe just
> limited by disk speed?

Well, you have 30 partitions and 7 seconds for each means 210 secons in
total. Which is about the time you get (before the aggregate/sort).

You have to check where the bottleneck is - is it the I/O or CPU? I'd
guess the CPU, but I may be wrong. On unix I'd use something like
iostat/vmstat/top to see what's going on - not sure what to use on
Windows. I guess there is a some console or maybe Process Explorer from
sysinternals.

> In both cases, the row estimates vs actual are way off.  Ive increased
> statistics on the reading_timestamp and road_name columns to 100 and
> then 1000 with no change.  I ran an ANALYZE after each statistics
> change.  Should I be upping stats on the non-indexed columns as well?
> Ive read documentation that says I should be able to set statistics
> values for an entire table as opposed to per column, but havent found
> how to do that.  I guess I was either too lazy to update statistics on
> each column or just didnt think it would help much.

The estimates seem pretty good to me - 10x difference is not that much.
Could be better, but I don't think you can get a better plan, is seems
very reasonable to me.

> So, any pointers for performance improvement?

Three ideas what might help

1) partial indexes

How much do the parameters in the query change? If there are parameters
that are always the same, you may try to create partial indexes. For
example if the 'vdd.volume' always has to be '0', then you can create
the index like this

 CREATE INDEX vds_detector_data_dir_idx
   ON vds_detector_data
   USING btree
   (road_dir)
   WHERE (vdd.volume!=0);

That way only the rows with 'vdd.volume!=0' will be included in the
index, the index will be smaller and the bitmap will be created faster.
Similarly for the other conditions. The smaller the index will be, the
faster the bitmap creation.

If all the conditions may change, or if the index size does not change
much, you can't do this.

2) prebuilt results

Another option is precomputation of the 'per partition results' - if you
know what the conditions are going to be, you can precompute them and
then query just those (much smaller) tables. But this is very
application specific.

Again, if the all the conditions change, you can't do this.

3) distribute the queries

As I've mentioned, PostgreSQL does not distribute the queries on
multiple CPU cores, but you can do that on your own at the application
level.

For example I see the GROUP BY clause contains 'date, hour, min' so you
can compute the results for each partition separately (in a different
thread, using a separate connection) and then 'append' them.

Yes, you'll need to keep some metadata to do this efficiently (e.g. list
of partitions along with from/to timestamps), but you should do this
anyway I guess (at least I do that when partitioning tables, it makes
the management much easier).

Not sure if you can do this with the other queries :-(

regards
Tomas

Re: Slow performance when querying millions of rows

От
Greg Smith
Дата:
On 06/28/2011 05:28 PM, Craig McIlwee wrote:
> Autovacuum is disabled for these tables since the data is never
> updated.  The tables that we are testing with at the moment will not
> grow any larger and have been both clustered and analyzed.

Note that any such prep to keep from ever needing to maintain these
tables in the future should include the FREEZE option, possibly with
some parameters tweaked first to make it more aggressive.  Autovacuum
will eventually revisit them in order to prevent transaction ID
wrap-around, even if it's disabled.  If you're going to the trouble of
prepping them so they are never touched again, you should do a freeze
with the right parameters to keep this from happening again.

> work_mem: 512MB
> shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
> plan and took the same amount of time to execute give or take a few
> seconds

shared_buffers doesn't normally impact the query plan; it impacts how
much churn there is between the database and the operating system cache,
mainly important for making write-heavy work efficient.  On Windows,
you'll probably be safe to set this to 512MB and forget about it.  It
doesn't benefit from large values anyway.

This is a very large work_mem setting however, so be careful that you
won't have many users connecting at once if you're going to use it.
Each connection can use a multiple of work_mem, making it quite possible
you could run out of memory with this configuration.  If that low user
count is true, you may want to make sure you're enforcing it by lowering
max_connections, as a safety measure to prevent problems.

> Since the daily tables are only 360MB, I would hope that the entire
> table could be pulled into memory with one large sequential read.  Of
> course this assumes that the file pieces are stored contiguously, but
> auto defrag is enabled and shows low fragmentation so Im trusting (as
> much as one can) Windows to do the right thing here.  My drives have a
> 150MB/s sustained max throughput, and considering that data is spread
> across 5 drives I would hope to at least be able to reach the single
> disk theoretical limit and read an entire table plus the index into
> memory about 4 to 5 seconds.  Based on the analyze output, each daily
> table averages 6 to 7 seconds, so Im pretty close there and maybe just
> limited by disk speed?

One thing to note is that your drive speed varies based on what part of
the disk things are located at; the slower parts of the drive will be
much less than 150MB/s.

On Linux servers it's impossible to reach something close to the disk's
raw speed without making the operating system read-ahead feature much
more aggressive than it is by default.  Because PostgreSQL fetches a
single block at a time, to keep the drive completely busy something has
to notice the pattern of access and be reading data ahead of when the
database even asks for it.  You may find a parameter you can tune in the
properties for the drives somewhere in the Windows Control Panel.  And
there's a read-ahead setting on your PERC card that's better than
nothing you may not have turned on (not as good as the Linux one, but
it's useful).  There are two useful settings there ("on" and "adaptive"
if I recall correctly) that you can try, to see which works better.

> Ive read documentation that says I should be able to set statistics
> values for an entire table as opposed to per column, but havent found
> how to do that.  I guess I was either too lazy to update statistics on
> each column or just didnt think it would help much.

You can adjust the statistics target across the entire database using
the default_statistics_target setting, or you can tweak them per column
using ALTER TABLE.  There is no table-level control.  I find it
difficult to answer questions about whether there is enough stats or not
without actually looking at pg_stats to see how the database is
interpreting the data, and comparing it against the real distribution.
This is an area where flailing about trying things doesn't work very
well; you need to be very systematic about the analysis and testing
strategy if you're going to get anywhere useful.  It's not easy to do.

As a larger commentary on what you're trying to do, applications like
this often find themselves at a point one day where you just can't allow
arbitrary user queries to run against them anymore.  What normally
happens then is that the most common things that people really need end
up being run one and stored in some summary form, using techniques such
as materialized views:  http://wiki.postgresql.org/wiki/Materialized_Views

In your case, I would start now on trying to find the common patters to
the long running reports that people generate, and see if it's possible
to pre-compute some portion of them and save that summary.  And you may
find yourself in a continuous battle with business requests regardless.
It's often key decision makers who feel they should be able to query any
way they want, regardless of its impact on the database.  Finding a
middle ground there is usually challenging.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/


Re: Slow performance when querying millions of rows

От
"Craig McIlwee"
Дата:
> Dne 28.6.2011 23:28, Craig McIlwee napsal(a):
> > Daily table explain analyze: http://explain.depesz.com/s/iLY
> > Half month table explain analyze: http://explain.depesz.com/s/Unt
>
> Are you sure those two queries are exactly the same? Because the daily
> case output says the width is 50B, while the half-month case says it's
> 75B. This might be why the sort/aggregate steps are switched, and that
> increases the amount of data so that it has to be sorted on disk (which
> is why the half-month is so much slower).
>
> Haven't you added some columns to the half-month query?

The daily tables were created using CREATE TABLE AS from the half month tables, structure is the same with the exception of fill factor.  Queries are identical except for the name of the master table that they select from.

>
> > Postgres version:
> > PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit
> >
> > History:
> > None, this is a new database and application
> >
> > Hardware:
> > - 2 Intel Xeon 2.13GHz processors with 8 cores each
> > - 8GB RAM
> > - Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID
> > Controller 512MB Cache
> > - 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of
> > about 2TB
> > - Windows Server 2008 R2 64-bit (but 32-bit postgres)
> > - Hardware upgrades arent an option at this point due to budget and time
> > constraints
>
> Not much experience with PostgreSQL on Windows, but this looks good to
> me. Not sure if RAID5 is a good choice, especially because of write
> performance - this is probably one of the reasons why the disk sort is
> so slow (in the half-month case).

Yes, the data import is painfully slow but I hope to make up for that with the read performance later.

>
> And it's nice you have 8 cores, but don't forget each query executes on
> a single background process, i.e. it may use single core. So the above
> query can't use 8 cores - that's why the in-memory sort takes so long I
> guess.
>
> > Maintenance Setup:
> > Autovacuum is disabled for these tables since the data is never
> > updated.  The tables that we are testing with at the moment will not
> > grow any larger and have been both clustered and analyzed.  They were
> > clustered on the vds_detector_data_timestamp_idx index.
> >
> > GUC Settings:
> > effective_cache_size: 2048MB
> > work_mem: 512MB
> > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
> > plan and took the same amount of time to execute give or take a few
> seconds
> >
> > Summary:
> >
> > The time to get the raw data (before aggregation and sorting) is
> > relatively similar between the daily and half month tables.  It would
> > appear that the major difference is the ordering of sort and
> > aggregation, the daily tables aggregate first so the amount of data
> > sorted is significantly less.
>
> Yes, the ordering is the problem. The amount of data to sort is so huge
> (3GB) it does not fit into work_mem and has to be sorted on disk. Not
> sure why this happens, the only difference I've noticed is the 'width'
> (50B vs. 75B). Are those two queries exactly the same?
>
> > Since the daily tables are only 360MB, I would hope that the entire
> > table could be pulled into memory with one large sequential read.  Of
> > course this assumes that the file pieces are stored contiguously, but
> > auto defrag is enabled and shows low fragmentation so Im trusting (as
> > much as one can) Windows to do the right thing here.  My drives have a
> > 150MB/s sustained max throughput, and considering that data is spread
> > across 5 drives I would hope to at least be able to reach the single
> > disk theoretical limit and read an entire table plus the index into
> > memory about 4 to 5 seconds.  Based on the analyze output, each daily
> > table averages 6 to 7 seconds, so Im pretty close there and maybe just
> > limited by disk speed?
>
> Well, you have 30 partitions and 7 seconds for each means 210 secons in
> total. Which is about the time you get (before the aggregate/sort).
>
> You have to check where the bottleneck is - is it the I/O or CPU? I'd
> guess the CPU, but I may be wrong. On unix I'd use something like
> iostat/vmstat/top to see what's going on - not sure what to use on
> Windows. I guess there is a some console or maybe Process Explorer from
> sysinternals.
>
> > In both cases, the row estimates vs actual are way off.  Ive increased
> > statistics on the reading_timestamp and road_name columns to 100 and
> > then 1000 with no change.  I ran an ANALYZE after each statistics
> > change.  Should I be upping stats on the non-indexed columns as well?
> > Ive read documentation that says I should be able to set statistics
> > values for an entire table as opposed to per column, but havent found
> > how to do that.  I guess I was either too lazy to update statistics on
> > each column or just didnt think it would help much.
>
> The estimates seem pretty good to me - 10x difference is not that much.
> Could be better, but I don't think you can get a better plan, is seems
> very reasonable to me.
>
> > So, any pointers for performance improvement?
>
> Three ideas what might help
>
> 1) partial indexes
>
> How much do the parameters in the query change? If there are parameters
> that are always the same, you may try to create partial indexes. For
> example if the 'vdd.volume' always has to be '0', then you can create
> the index like this
>
>  CREATE INDEX vds_detector_data_dir_idx
>    ON vds_detector_data
>    USING btree
>    (road_dir)
>    WHERE (vdd.volume!=0);
>
> That way only the rows with 'vdd.volume!=0' will be included in the
> index, the index will be smaller and the bitmap will be created faster.
> Similarly for the other conditions. The smaller the index will be, the
> faster the bitmap creation.
>
> If all the conditions may change, or if the index size does not change
> much, you can't do this.

The 0 volume is the only thing that will always be present, but those records do account for 10 to 15% of the data.  I'll give this a shot, I'm really interested in seeing what impact this had.  For some reason I was under the impression that partial indexes were used for text searches, so I completely overlooked this.

>
> 2) prebuilt results
>
> Another option is precomputation of the 'per partition results' - if you
> know what the conditions are going to be, you can precompute them and
> then query just those (much smaller) tables. But this is very
> application specific.
>
> Again, if the all the conditions change, you can't do this.

This has been one of the toughest issues.  Due to the filtering capabilities, it's just not possible to precalculate anything.

>
> 3) distribute the queries
>
> As I've mentioned, PostgreSQL does not distribute the queries on
> multiple CPU cores, but you can do that on your own at the application
> level.
>
> For example I see the GROUP BY clause contains 'date, hour, min' so you
> can compute the results for each partition separately (in a different
> thread, using a separate connection) and then 'append' them.
>
> Yes, you'll need to keep some metadata to do this efficiently (e.g. list
> of partitions along with from/to timestamps), but you should do this
> anyway I guess (at least I do that when partitioning tables, it makes
> the management much easier).

I noticed this too after a little more testing, there are some serious performance gains to be had here.  I started with a single day query and it took about 15 seconds.  Next was 5 simultaneous queries all at about 30 seconds each and then 10 queries at 50 seconds each.

>
> Not sure if you can do this with the other queries :-(

Not all, but many.  The query in question is the most beastly of them all, so I'm pretty happy to have some strategy for improvement.

>
> regards
> Tomas
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Thanks for the help.
Craig

This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.

Re: Slow performance when querying millions of rows

От
"Craig McIlwee"
Дата:
> On 06/28/2011 05:28 PM, Craig McIlwee wrote:
> > Autovacuum is disabled for these tables since the data is never
> > updated.  The tables that we are testing with at the moment will not
> > grow any larger and have been both clustered and analyzed.
>
> Note that any such prep to keep from ever needing to maintain these
> tables in the future should include the FREEZE option, possibly with
> some parameters tweaked first to make it more aggressive.  Autovacuum
> will eventually revisit them in order to prevent transaction ID
> wrap-around, even if it's disabled.  If you're going to the trouble of
> prepping them so they are never touched again, you should do a freeze
> with the right parameters to keep this from happening again.
>
> > work_mem: 512MB
> > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
> > plan and took the same amount of time to execute give or take a few
> > seconds
>
> shared_buffers doesn't normally impact the query plan; it impacts how
> much churn there is between the database and the operating system cache,
> mainly important for making write-heavy work efficient.  On Windows,
> you'll probably be safe to set this to 512MB and forget about it.  It
> doesn't benefit from large values anyway.

I was thinking that shared buffers controlled the amount of data, primarily table and index pages, that the database could store in memory at once.  Based on that assumption, I thought that a larger value would enable an entire table + index to be in memory together and speed up the query.  Am I wrong?

>
> This is a very large work_mem setting however, so be careful that you
> won't have many users connecting at once if you're going to use it. 
> Each connection can use a multiple of work_mem, making it quite possible
> you could run out of memory with this configuration.  If that low user
> count is true, you may want to make sure you're enforcing it by lowering
> max_connections, as a safety measure to prevent problems.

I plan on lowering this quite a bit since I haven't seen much of a boost by increasing it.

>
> > Since the daily tables are only 360MB, I would hope that the entire
> > table could be pulled into memory with one large sequential read.  Of
> > course this assumes that the file pieces are stored contiguously, but
> > auto defrag is enabled and shows low fragmentation so Im trusting (as
> > much as one can) Windows to do the right thing here.  My drives have a
> > 150MB/s sustained max throughput, and considering that data is spread
> > across 5 drives I would hope to at least be able to reach the single
> > disk theoretical limit and read an entire table plus the index into
> > memory about 4 to 5 seconds.  Based on the analyze output, each daily
> > table averages 6 to 7 seconds, so Im pretty close there and maybe just
> > limited by disk speed?
>
> One thing to note is that your drive speed varies based on what part of
> the disk things are located at; the slower parts of the drive will be
> much less than 150MB/s.
>
> On Linux servers it's impossible to reach something close to the disk's
> raw speed without making the operating system read-ahead feature much
> more aggressive than it is by default.  Because PostgreSQL fetches a
> single block at a time, to keep the drive completely busy something has
> to notice the pattern of access and be reading data ahead of when the
> database even asks for it.  You may find a parameter you can tune in the
> properties for the drives somewhere in the Windows Control Panel.  And
> there's a read-ahead setting on your PERC card that's better than
> nothing you may not have turned on (not as good as the Linux one, but
> it's useful).  There are two useful settings there ("on" and "adaptive"
> if I recall correctly) that you can try, to see which works better.

Looks like they are set to adaptive read-ahead now.  If the database is executing many concurrent queries, is it reasonable to suspect that the IO requests will compete with each other in such a way that the controller would rarely see many sequential requests since it is serving many processes?  The controller does have an 'on' option also that forces read-ahead, maybe that would solve the issue if we can rely on the data to survive in the cache until the actual read request takes place.

>
> > Ive read documentation that says I should be able to set statistics
> > values for an entire table as opposed to per column, but havent found
> > how to do that.  I guess I was either too lazy to update statistics on
> > each column or just didnt think it would help much.
>
> You can adjust the statistics target across the entire database using
> the default_statistics_target setting, or you can tweak them per column
> using ALTER TABLE.  There is no table-level control.  I find it
> difficult to answer questions about whether there is enough stats or not
> without actually looking at pg_stats to see how the database is
> interpreting the data, and comparing it against the real distribution. 
> This is an area where flailing about trying things doesn't work very
> well; you need to be very systematic about the analysis and testing
> strategy if you're going to get anywhere useful.  It's not easy to do.
>
> As a larger commentary on what you're trying to do, applications like
> this often find themselves at a point one day where you just can't allow
> arbitrary user queries to run against them anymore.  What normally
> happens then is that the most common things that people really need end
> up being run one and stored in some summary form, using techniques such
> as materialized views:  http://wiki.postgresql.org/wiki/Materialized_Views
>
> In your case, I would start now on trying to find the common patters to
> the long running reports that people generate, and see if it's possible
> to pre-compute some portion of them and save that summary.  And you may
> find yourself in a continuous battle with business requests regardless. 
> It's often key decision makers who feel they should be able to query any
> way they want, regardless of its impact on the database.  Finding a
> middle ground there is usually challenging.
>
> --
> Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
> Comprehensive and Customized PostgreSQL Training Classes:
> http://www.2ndquadrant.us/postgresql-training/
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Thanks,
Craig

This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.

Re: Slow performance when querying millions of rows

От
Tomas Vondra
Дата:
Dne 29.6.2011 01:26, Craig McIlwee napsal(a):
>> Dne 28.6.2011 23:28, Craig McIlwee napsal(a):
>> Are you sure those two queries are exactly the same? Because the daily
>> case output says the width is 50B, while the half-month case says it's
>> 75B. This might be why the sort/aggregate steps are switched, and that
>> increases the amount of data so that it has to be sorted on disk (which
>> is why the half-month is so much slower).
>>
>> Haven't you added some columns to the half-month query?
>
> The daily tables were created using CREATE TABLE AS from the half month
> tables, structure is the same with the exception of fill factor.
> Queries are identical except for the name of the master table that they
> select from.

Hm, I'm not sure where this width value comes from but I don't think
it's related to fillfactor.

>> Not much experience with PostgreSQL on Windows, but this looks good to
>> me. Not sure if RAID5 is a good choice, especially because of write
>> performance - this is probably one of the reasons why the disk sort is
>> so slow (in the half-month case).
>
> Yes, the data import is painfully slow but I hope to make up for that
> with the read performance later.

Generally you're right that RAID10 is going to be slower than RAID5 when
reading (and faster when writing) the data, but how big the gap is
really depends on the controller. It's not that big I guess - see for
example this:

http://www.kendalvandyke.com/2009/02/disk-performance-hands-on-part-5-raid.html

The first test shows that RAID10 is about 10% slower on reads but about
60% faster on writes.

BTW have you tuned the GUC settings for write (increasing checkpoint
segments may give much better write performance).

> The 0 volume is the only thing that will always be present, but those
> records do account for 10 to 15% of the data.  I'll give this a shot,
> I'm really interested in seeing what impact this had.  For some reason I
> was under the impression that partial indexes were used for text
> searches, so I completely overlooked this.

Or you might actually do two partitions for each day - one for volume=0
and the other one for volume!=0. Not sure if that is worth the effort.

One more thing to try in this case - it's not that important how many
rows suffice the condition, much more important is how many blocks need
to be read from the disk. If those 10% rows are distributed evenly
through the table (i.e. there's at least one in each 8kB block), the I/O
still needs to be done.

And it's very likely the case, as you've clustered the tables according
to the timestamp. Try to cluster the tables according to 'volume' and
check the difference.

regards
Tomas

Re: Slow performance when querying millions of rows

От
Tomas Vondra
Дата:
Dne 29.6.2011 01:50, Craig McIlwee napsal(a):
>> > work_mem: 512MB
>> > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
>> > plan and took the same amount of time to execute give or take a few
>> > seconds
>>
>> shared_buffers doesn't normally impact the query plan; it impacts how
>> much churn there is between the database and the operating system cache,
>> mainly important for making write-heavy work efficient.  On Windows,
>> you'll probably be safe to set this to 512MB and forget about it.  It
>> doesn't benefit from large values anyway.
>
> I was thinking that shared buffers controlled the amount of data,
> primarily table and index pages, that the database could store in memory
> at once.  Based on that assumption, I thought that a larger value would
> enable an entire table + index to be in memory together and speed up the
> query.  Am I wrong?

Well, you're right and wrong at the same time. The shared buffers really
controls the amount of data that may be read into the database cache,
that's true. But this value is not used when building the execution
plan. There's another value (effective_cache_size) that is used when
planning a query.

>> > Ive read documentation that says I should be able to set statistics
>> > values for an entire table as opposed to per column, but havent found
>> > how to do that.  I guess I was either too lazy to update statistics on
>> > each column or just didnt think it would help much.

Link to the docs? According to

http://www.postgresql.org/docs/current/static/sql-altertable.html

it's possible to set this only at the column level. And of course
there's a GUC default_statistics_target that defines default value.

Tomas

Re: Slow performance when querying millions of rows

От
Greg Smith
Дата:
On 06/28/2011 07:26 PM, Craig McIlwee wrote:
> Yes, the data import is painfully slow but I hope to make up for that
> with the read performance later.

You can probably improve that with something like this:

shared_buffers=512MB
checkpoint_segments=64

Maybe bump up maintenance_work_mem too, if the vacuum part of that is
the painful one.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/


Re: Slow performance when querying millions of rows

От
Greg Smith
Дата:
On 06/28/2011 07:50 PM, Craig McIlwee wrote:
> I was thinking that shared buffers controlled the amount of data,
> primarily table and index pages, that the database could store in
> memory at once.  Based on that assumption, I thought that a larger
> value would enable an entire table + index to be in memory together
> and speed up the query.  Am I wrong?

It does to some extent.  But:

a) This amount doesn't impact query planning as much if you've set a
large effective_cache_size

b) The operating system is going to cache things outside of PostgreSQL, too

c) Data read via a sequential scan sometimes skips going into
shared_buffers, to keep that cache from being swamped with any single scan

d) until the data has actually made its way into memory, you may be
pulling it in there by an inefficient random process at first.  By the
time the cache is populated, the thing you wanted a populated cache to
accelerate may already have finished.

It's possible to get insight into this all using pg_buffercache to
actually see what's in the cache, and I've put up some talks and scripts
to help with that at http://projects.2ndquadrant.com/talks you might
find useful.

> Looks like they are set to adaptive read-ahead now.  If the database
> is executing many concurrent queries, is it reasonable to suspect that
> the IO requests will compete with each other in such a way that the
> controller would rarely see many sequential requests since it is
> serving many processes?  The controller does have an 'on' option also
> that forces read-ahead, maybe that would solve the issue if we can
> rely on the data to survive in the cache until the actual read request
> takes place.

I've never been able to find good documentation on just what the
difference between the adaptive and on modes of that controller really
are, which is why I suggested you try both and see.  Linux has a
uniquely good read-ahead model that was tuned with PostgreSQL
specifically in mind.  And you still have to tweak it upwards from the
defaults in order for the database to fetch things as fast as the drives
are capable sometimes.  So your idea that you will meet/exceed the
drive's capabilities for bulk sequential scans is less likely than you
might think.  RAID5 in theory should give you 2X or more of the speed of
any single disk when reading a clustered table, but the way PostgreSQL
does it may make that hard to realize on Windows.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/