Обсуждение: Inconsistent performance

Поиск

Список

Период

Сортировка

Inconsistent performance

От

Joseph Bove

Дата:

15 сентября 2003 г., 21:22:36

To whoever can assist,

I am working with a decent sized database on an extremely powerful machine.
The specs follow:

    OS:            RedHat Linux 9.0
    PG Version        7.3
    Memory        1 gig
    CPU            Quad Processor - Unsure of exact CPUs
    Hard Drive        80 gigs
    Database Size        2 gigs


As you can see the server is built for overkill.

The problem that I see is as follows.

I do a rather simple query: select count (*) from large-table where column
= some value;

About 80% of the time, the response time is sub-second. However, at 10% of
the time, the response time is 5 - 10 seconds.

This is nothing readily apparent at the system level that comes close to
explaining the performance hits. CPU and memory usage (as measured by top)
appear to be fine.

Although there are certain tuning issues within the database itself, no
documentation I have seen seems to indicate that tuning issues would lead
to such inconsistent response time.

Any ideas?

Regards,

Joseph

Re: Inconsistent performance

От

Stephan Szabo

Дата:

15 сентября 2003 г., 21:35:41

On Mon, 15 Sep 2003, Joseph Bove wrote:

> I am working with a decent sized database on an extremely powerful machine.
> The specs follow:
>
>     OS:            RedHat Linux 9.0
>     PG Version        7.3
>     Memory        1 gig
>     CPU            Quad Processor - Unsure of exact CPUs
>     Hard Drive        80 gigs
>     Database Size        2 gigs
>
>
> As you can see the server is built for overkill.
>
> The problem that I see is as follows.
>
> I do a rather simple query: select count (*) from large-table where column
> = some value;
>
> About 80% of the time, the response time is sub-second. However, at 10% of
> the time, the response time is 5 - 10 seconds.

Is it consistant for various values of "some value"?  If so, it's possible
that it's switching plans based on the apparent selectivity of the column
for that value.

Re: Inconsistent performance

От

Bruno Wolff III

Дата:

15 сентября 2003 г., 21:36:28

On Mon, Sep 15, 2003 at 17:34:12 -0400,
  Joseph Bove <jbove@vetstar.com> wrote:
>
> I do a rather simple query: select count (*) from large-table where column
> = some value;
>
> About 80% of the time, the response time is sub-second. However, at 10% of
> the time, the response time is 5 - 10 seconds.
>
> This is nothing readily apparent at the system level that comes close to
> explaining the performance hits. CPU and memory usage (as measured by top)
> appear to be fine.
>
> Although there are certain tuning issues within the database itself, no
> documentation I have seen seems to indicate that tuning issues would lead
> to such inconsistent response time.

Looking at the output from explain analyze for the query would be useful.
It may be that there are a lot of rows that have the value in the problem
queries.

Re: Inconsistent performance

От

Joseph Bove

Дата:

15 сентября 2003 г., 22:12:12

Stephan,

Actually, it's inconsistent with the exact same command. I've now
replicated the problem by doing the following command:

select count (*) from table;

The table in question has 88899 rows.

The response time is anywhere from 1 second to 12 seconds. Different
response times can occur in the same minute of testing!

Regards,

Joseph



At 02:34 PM 9/15/2003 -0700, you wrote:
>On Mon, 15 Sep 2003, Joseph Bove wrote:
>
> > I am working with a decent sized database on an extremely powerful machine.
> > The specs follow:
> >
> >       OS:                     RedHat Linux 9.0
> >       PG Version              7.3
> >       Memory          1 gig
> >       CPU                     Quad Processor - Unsure of exact CPUs
> >       Hard Drive              80 gigs
> >       Database Size           2 gigs
> >
> >
> > As you can see the server is built for overkill.
> >
> > The problem that I see is as follows.
> >
> > I do a rather simple query: select count (*) from large-table where column
> > = some value;
> >
> > About 80% of the time, the response time is sub-second. However, at 10% of
> > the time, the response time is 5 - 10 seconds.
>
>Is it consistant for various values of "some value"?  If so, it's possible
>that it's switching plans based on the apparent selectivity of the column
>for that value.

Re: Inconsistent performance

От

Josh Berkus

Дата:

15 сентября 2003 г., 22:16:59

Joseph,

Please see this web page before posting anything else:
http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines

Currently, you are not posting enough data for anyone to be of meaningful
help.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: Inconsistent performance

От

Stephan Szabo

Дата:

15 сентября 2003 г., 22:49:43

On Mon, 15 Sep 2003, Joseph Bove wrote:

> Stephan,
>
> Actually, it's inconsistent with the exact same command. I've now
> replicated the problem by doing the following command:
>
> select count (*) from table;
>
> The table in question has 88899 rows.
>
> The response time is anywhere from 1 second to 12 seconds. Different
> response times can occur in the same minute of testing!

Well, that's really only got one valid plan right now (seqscan and
aggregate). It'd be mildly interesting to see what explain analyze says in
slow and fast states, although I'd be willing to bet that it's just going
to effectively show that the seqscan is taking more or less time.

I think we're going to need to see the configuration settings for the
server and possibly some info on how big the table is (say relpages for
the pg_class row associated with the table after a vacuum full).

Re: Inconsistent performance

От

Joseph Bove

Дата:

15 сентября 2003 г., 23:16:24

Stephan,

I've run explain analyze a number of times and have gotten results between
5.5 and 7.5 seconds

Attached is a typical output

  QUERY PLAN
-------------------------------------
  Aggregate  (cost=9993.92..9993.92 rows=1 width=0)
                   (actual time=7575.59..7575.59 rows=1 loops=1)
->  Seq Scan on vetapview  (cost=0.00..9771.34 rows=89034 width=0)
                                        (actual time=0.06..7472.20
rows=88910 loops=1)
  Total runtime: 7575.67 msec
(3 rows)

The only things changing are the actual time. The costs are constant.

The relpages from pg_class for vetapview (the table in question) is 8881.

At the end of this message is the exhaustive contents of postgresql.conf.
The only settings I have attempted tuning are as follows:

tcpip_socket = true
max_connections = 100
shared_buffers = 5000
sort_mem = 8192
fsync = false

I did have shared_buffers and sort_mem both set higher originally (15000,
32168) but decreased them in case over-utilization of memory was the problem.

The kernel setting shmmax is set to 256,000,000 (out of 1 gig)

Regards,

Joseph

postgresql.conf

#
#       Connection Parameters
#
tcpip_socket = true
#ssl = false

max_connections = 100
#superuser_reserved_connections = 2

#port = 5432
#hostname_lookup = false
#show_source_port = false

#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal

#virtual_host = ''

#krb_server_keyfile = ''


#
#       Shared Memory Size
#
#shared_buffers = 15000         # min max_connections*2 or 16, 8KB each
shared_buffers = 5000
#max_fsm_relations = 1000       # min 10, fsm is free space map, ~40 bytes
#max_fsm_pages = 10000          # min 1000, fsm is free space map, ~6 bytes
#max_locks_per_transaction = 64 # min 10
#wal_buffers = 8                # min 4, typically 8KB each

#
#       Non-shared Memory Sizes
#
#sort_mem = 32168               # min 64, size in KB
sort_mem = 8192
#vacuum_mem = 8192              # min 1024, size in KB
#

#       Write-ahead log (WAL)
#
#checkpoint_segments = 3        # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300       # range 30-3600, in seconds
#
#commit_delay = 0               # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000
#
fsync = false
#wal_sync_method = fsync        # the default varies across platforms:
#                               # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0                  # range 0-16


#
#       Optimizer Parameters
#
#enable_seqscan = true
#enable_indexscan = true
#enable_tidscan = true
#enable_sort = true#enable_tidscan = true
#enable_sort = true
#enable_nestloop = true
#enable_mergejoin = true
#enable_hashjoin = true

#effective_cache_size = 1000    # typically 8KB each
#random_page_cost = 4           # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01          # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025     # (same)

#default_statistics_target = 10 # range 1-1000

#
#       GEQO Optimizer Parameters
#
#geqo = true
#geqo_selection_bias = 2.0      # range 1.5-2.0
#geqo_threshold = 11
#geqo_pool_size = 0             # default based on tables in statement,
                                 # range 128-1024
#geqo_effort = 1
#geqo_generations = 0#geqo_random_seed = -1          # auto-compute seed


#
#       Message display
#
#server_min_messages = notice   # Values, in order of decreasing detail:
                                 #   debug5, debug4, debug3, debug2, debug1,
                                 #   info, notice, warning, error, log, fatal,
                                 #   panic
#client_min_messages = notice   # Values, in order of decreasing detail:
                                 #   debug5, debug4, debug3, debug2, debug1,
                                 #   log, info, notice, warning, error
#silent_mode = false

#log_connections = false
#log_pid = false
#log_statement = false
#log_duration = false
#log_timestamp = false

#log_min_error_statement = error # Values in order of increasing severity:

#log_min_error_statement = error # Values in order of increasing severity:
                                  #   debug5, debug4, debug3, debug2, debug1,
                                  #   info, notice, warning, error, panic(off)

#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false

#explain_pretty_print = true

# requires USE_ASSERT_CHECKING
#debug_assertions = true


#
#       Syslog
#
#syslog = 0                     # range 0-2
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

#

#       Statistics
#
#show_parser_stats = false
#show_planner_stats = false
#show_executor_stats = false
#show_statement_stats = false

# requires BTREE_BUILD_STATS
#show_btree_build_stats = false


#
#       Access statistics collection
#
#stats_start_collector = true
#stats_reset_on_server_start = true
#stats_command_string = false
#stats_row_level = false
#stats_block_level = false

#

#       Lock Tracing
#
#trace_notify = false

# requires LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_lwlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0


#
#       Misc
#
#autocommit = true
#dynamic_library_path = '$libdir'
#search_path = '$user,public'
#datestyle = 'iso, us'
#timezone = unknown             # actually, defaults to TZ environment setting
#datestyle = 'iso, us'
#timezone = unknown             # actually, defaults to TZ environment setting
#australian_timezones = false
#client_encoding = sql_ascii    # actually, defaults to database encoding
#authentication_timeout = 60    # 1-600, in seconds
#deadlock_timeout = 1000        # in milliseconds
#default_transaction_isolation = 'read committed'
#max_expr_depth = 10000         # min 10
#max_files_per_process = 1000   # min 25
#password_encryption = true
#sql_inheritance = true
#transform_null_equals = false
#statement_timeout = 0          # 0 is disabled, in milliseconds
#db_user_namespace = false



#
#       Locale settings
#
# (initialized by initdb -- may be changed)
LC_MESSAGES = 'en_US.UTF-8'
LC_MONETARY = 'en_US.UTF-8'
LC_NUMERIC = 'en_US.UTF-8'
LC_TIME = 'en_US.UTF-8'

At 03:49 PM 9/15/2003 -0700, Stephan Szabo wrote:

>On Mon, 15 Sep 2003, Joseph Bove wrote:
>
> > Stephan,
> >
> > Actually, it's inconsistent with the exact same command. I've now
> > replicated the problem by doing the following command:
> >
> > select count (*) from table;
> >
> > The table in question has 88899 rows.
> >
> > The response time is anywhere from 1 second to 12 seconds. Different
> > response times can occur in the same minute of testing!
>
>Well, that's really only got one valid plan right now (seqscan and
>aggregate). It'd be mildly interesting to see what explain analyze says in
>slow and fast states, although I'd be willing to bet that it's just going
>to effectively show that the seqscan is taking more or less time.
>
>I think we're going to need to see the configuration settings for the
>server and possibly some info on how big the table is (say relpages for
>the pg_class row associated with the table after a vacuum full).
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: Inconsistent performance

От

Christopher Browne

Дата:

15 сентября 2003 г., 23:39:37

jbove@vetstar.com (Joseph Bove) writes:
> I do a rather simple query: select count (*) from large-table where
> column = some value;
>
> About 80% of the time, the response time is sub-second. However, at
> 10% of the time, the response time is 5 - 10 seconds.

Does it seem data-dependent?

That is, does the time vary for different values of "some value?"

If a particular value is particularly common, the system might well
revert to a sequential scan, making the assumption that it is quicker
to look at every page in the table rather than to walk through
Enormous Numbers of records.

I had a case very similar to this where a table had _incredible_
skewing of this sort where there were a small number of column values
that occurred hundreds of thousands of times, and other column values
only occurred a handful of times.

I was able to get Excellent Performance back by setting up two partial
indices:
 - One for WHERE THIS_COLUMN > VITAL_VALUE;
 - One for WHERE THIS_COLUMN < VITAL_VALUE;

The REALLY COMMON values were in the range < VITAL_VALUE.
--
output = ("cbbrowne" "@" "libertyrms.info")
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

Re: Inconsistent performance

От

"scott.marlowe"

Дата:

16 сентября 2003 г., 00:27:02

On Mon, 15 Sep 2003, Joseph Bove wrote:

> Stephan,
>
> I've run explain analyze a number of times and have gotten results between
> 5.5 and 7.5 seconds
>
> Attached is a typical output
>
>   QUERY PLAN
> -------------------------------------
>   Aggregate  (cost=9993.92..9993.92 rows=1 width=0)
>                    (actual time=7575.59..7575.59 rows=1 loops=1)
> ->  Seq Scan on vetapview  (cost=0.00..9771.34 rows=89034 width=0)
>                                         (actual time=0.06..7472.20
> rows=88910 loops=1)
>   Total runtime: 7575.67 msec
> (3 rows)
>
> The only things changing are the actual time. The costs are constant.
>
> The relpages from pg_class for vetapview (the table in question) is 8881.
>
> At the end of this message is the exhaustive contents of postgresql.conf.
> The only settings I have attempted tuning are as follows:
>
> tcpip_socket = true
> max_connections = 100
> shared_buffers = 5000
> sort_mem = 8192
> fsync = false

A couple of things.

1:  Is there an index on the parts of the query used for the where clause?
2:  What is your effect_cache_size set to?  It needs to be set right for
your postgresql server to be able to take advantage of the kernel's cache
(i.e. use an index scan when the kernel is likely to have that data in
memory.)

Re: Inconsistent performance

От

Brian Hirt

Дата:

16 сентября 2003 г., 00:40:24

it seems like the difference is probably related to caching.   you say
you have 1gb of ram, and the database is 2gb.    Obviously the entire
database isn't cached, but maybe your query runs fast when the table is
in memory, and they it gets swapped out of cache because some other
piece of information moves into memory.  In that circumstance, it has
to load the information from disk and is therefor slow.

how busy is the system?  what other programs are running on the
machine?  how big (on disk) is the table in question? what kind of load
does the system have?   is it a single 80gb ide drive?   Even though
you have 4 CPU's a small amount of memory and bad IO system will kill
the database.


On Monday, September 15, 2003, at 05:28 PM, Joseph Bove wrote:

> Stephan,
>
> I've run explain analyze a number of times and have gotten results
> between 5.5 and 7.5 seconds
>
> Attached is a typical output
>
>  QUERY PLAN
> -------------------------------------
>  Aggregate  (cost=9993.92..9993.92 rows=1 width=0)
>                   (actual time=7575.59..7575.59 rows=1 loops=1)
> ->  Seq Scan on vetapview  (cost=0.00..9771.34 rows=89034 width=0)
>                                        (actual time=0.06..7472.20
> rows=88910 loops=1)
>  Total runtime: 7575.67 msec
> (3 rows)
>
> The only things changing are the actual time. The costs are constant.
>
> The relpages from pg_class for vetapview (the table in question) is
> 8881.
>
> At the end of this message is the exhaustive contents of
> postgresql.conf. The only settings I have attempted tuning are as
> follows:
>
> tcpip_socket = true
> max_connections = 100
> shared_buffers = 5000
> sort_mem = 8192
> fsync = false
>
> I did have shared_buffers and sort_mem both set higher originally
> (15000, 32168) but decreased them in case over-utilization of memory
> was the problem.
>
> The kernel setting shmmax is set to 256,000,000 (out of 1 gig)
>
> Regards,
>
> Joseph
>
> postgresql.conf
>
> #
> #       Connection Parameters
> #
> tcpip_socket = true
> #ssl = false
>
> max_connections = 100
> #superuser_reserved_connections = 2
>
> #port = 5432
> #hostname_lookup = false
> #show_source_port = false
>
> #unix_socket_directory = ''
> #unix_socket_group = ''
> #unix_socket_permissions = 0777 # octal
>
> #virtual_host = ''
>
> #krb_server_keyfile = ''
>
>
> #
> #       Shared Memory Size
> #
> #shared_buffers = 15000         # min max_connections*2 or 16, 8KB each
> shared_buffers = 5000
> #max_fsm_relations = 1000       # min 10, fsm is free space map, ~40
> bytes
> #max_fsm_pages = 10000          # min 1000, fsm is free space map, ~6
> bytes
> #max_locks_per_transaction = 64 # min 10
> #wal_buffers = 8                # min 4, typically 8KB each
>
> #
> #       Non-shared Memory Sizes
> #
> #sort_mem = 32168               # min 64, size in KB
> sort_mem = 8192
> #vacuum_mem = 8192              # min 1024, size in KB
> #
> #       Write-ahead log (WAL)
> #
> #checkpoint_segments = 3        # in logfile segments, min 1, 16MB each
> #checkpoint_timeout = 300       # range 30-3600, in seconds
> #
> #commit_delay = 0               # range 0-100000, in microseconds
> #commit_siblings = 5            # range 1-1000
> #
> fsync = false
> #wal_sync_method = fsync        # the default varies across platforms:
> #                               # fsync, fdatasync, open_sync, or
> open_datasync
> #wal_debug = 0                  # range 0-16
>
>
> #
> #       Optimizer Parameters
> #
> #enable_seqscan = true
> #enable_indexscan = true
> #enable_tidscan = true
> #enable_sort = true#enable_tidscan = true
> #enable_sort = true
> #enable_nestloop = true
> #enable_mergejoin = true
> #enable_hashjoin = true
>
> #effective_cache_size = 1000    # typically 8KB each
> #random_page_cost = 4           # units are one sequential page fetch
> cost
> #cpu_tuple_cost = 0.01          # (same)
> #cpu_index_tuple_cost = 0.001   # (same)
> #cpu_operator_cost = 0.0025     # (same)
>
> #default_statistics_target = 10 # range 1-1000
>
> #
> #       GEQO Optimizer Parameters
> #
> #geqo = true
> #geqo_selection_bias = 2.0      # range 1.5-2.0
> #geqo_threshold = 11
> #geqo_pool_size = 0             # default based on tables in statement,
>                                 # range 128-1024
> #geqo_effort = 1
> #geqo_generations = 0#geqo_random_seed = -1          # auto-compute
> seed
>
>
> #
> #       Message display
> #
> #server_min_messages = notice   # Values, in order of decreasing
> detail:
>                                 #   debug5, debug4, debug3, debug2,
> debug1,
>                                 #   info, notice, warning, error, log,
> fatal,
>                                 #   panic
> #client_min_messages = notice   # Values, in order of decreasing
> detail:
>                                 #   debug5, debug4, debug3, debug2,
> debug1,
>                                 #   log, info, notice, warning, error
> #silent_mode = false
>
> #log_connections = false
> #log_pid = false
> #log_statement = false
> #log_duration = false
> #log_timestamp = false
>
> #log_min_error_statement = error # Values in order of increasing
> severity:
>
> #log_min_error_statement = error # Values in order of increasing
> severity:
>                                  #   debug5, debug4, debug3, debug2,
> debug1,
>                                  #   info, notice, warning, error,
> panic(off)
>
> #debug_print_parse = false
> #debug_print_rewritten = false
> #debug_print_plan = false
> #debug_pretty_print = false
>
> #explain_pretty_print = true
>
> # requires USE_ASSERT_CHECKING
> #debug_assertions = true
>
>
> #
> #       Syslog
> #
> #syslog = 0                     # range 0-2
> #syslog_facility = 'LOCAL0'
> #syslog_ident = 'postgres'
>
> #
> #       Statistics
> #
> #show_parser_stats = false
> #show_planner_stats = false
> #show_executor_stats = false
> #show_statement_stats = false
>
> # requires BTREE_BUILD_STATS
> #show_btree_build_stats = false
>
>
> #
> #       Access statistics collection
> #
> #stats_start_collector = true
> #stats_reset_on_server_start = true
> #stats_command_string = false
> #stats_row_level = false
> #stats_block_level = false
>
> #
> #       Lock Tracing
> #
> #trace_notify = false
>
> # requires LOCK_DEBUG
> #trace_locks = false
> #trace_userlocks = false
> #trace_lwlocks = false
> #debug_deadlocks = false
> #trace_lock_oidmin = 16384
> #trace_lock_table = 0
>
>
> #
> #       Misc
> #
> #autocommit = true
> #dynamic_library_path = '$libdir'
> #search_path = '$user,public'
> #datestyle = 'iso, us'
> #timezone = unknown             # actually, defaults to TZ environment
> setting
> #datestyle = 'iso, us'
> #timezone = unknown             # actually, defaults to TZ environment
> setting
> #australian_timezones = false
> #client_encoding = sql_ascii    # actually, defaults to database
> encoding
> #authentication_timeout = 60    # 1-600, in seconds
> #deadlock_timeout = 1000        # in milliseconds
> #default_transaction_isolation = 'read committed'
> #max_expr_depth = 10000         # min 10
> #max_files_per_process = 1000   # min 25
> #password_encryption = true
> #sql_inheritance = true
> #transform_null_equals = false
> #statement_timeout = 0          # 0 is disabled, in milliseconds
> #db_user_namespace = false
>
>
>
> #
> #       Locale settings
> #
> # (initialized by initdb -- may be changed)
> LC_MESSAGES = 'en_US.UTF-8'
> LC_MONETARY = 'en_US.UTF-8'
> LC_NUMERIC = 'en_US.UTF-8'
> LC_TIME = 'en_US.UTF-8'
>
> At 03:49 PM 9/15/2003 -0700, Stephan Szabo wrote:
>
>> On Mon, 15 Sep 2003, Joseph Bove wrote:
>>
>> > Stephan,
>> >
>> > Actually, it's inconsistent with the exact same command. I've now
>> > replicated the problem by doing the following command:
>> >
>> > select count (*) from table;
>> >
>> > The table in question has 88899 rows.
>> >
>> > The response time is anywhere from 1 second to 12 seconds. Different
>> > response times can occur in the same minute of testing!
>>
>> Well, that's really only got one valid plan right now (seqscan and
>> aggregate). It'd be mildly interesting to see what explain analyze
>> says in
>> slow and fast states, although I'd be willing to bet that it's just
>> going
>> to effectively show that the seqscan is taking more or less time.
>>
>> I think we're going to need to see the configuration settings for the
>> server and possibly some info on how big the table is (say relpages
>> for
>> the pg_class row associated with the table after a vacuum full).
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 6: Have you searched our list archives?
>>
>>                http://archives.postgresql.org
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: Inconsistent performance

От

"scott.marlowe"

Дата:

16 сентября 2003 г., 00:43:57

On Mon, 15 Sep 2003, scott.marlowe wrote:

> On Mon, 15 Sep 2003, Joseph Bove wrote:
>
> > Stephan,
> >
> > I've run explain analyze a number of times and have gotten results between
> > 5.5 and 7.5 seconds
> >
> > Attached is a typical output
> >
> >   QUERY PLAN
> > -------------------------------------
> >   Aggregate  (cost=9993.92..9993.92 rows=1 width=0)
> >                    (actual time=7575.59..7575.59 rows=1 loops=1)
> > ->  Seq Scan on vetapview  (cost=0.00..9771.34 rows=89034 width=0)
> >                                         (actual time=0.06..7472.20
> > rows=88910 loops=1)
> >   Total runtime: 7575.67 msec
> > (3 rows)
> >
> > The only things changing are the actual time. The costs are constant.
> >
> > The relpages from pg_class for vetapview (the table in question) is 8881.
> >
> > At the end of this message is the exhaustive contents of postgresql.conf.
> > The only settings I have attempted tuning are as follows:
> >
> > tcpip_socket = true
> > max_connections = 100
> > shared_buffers = 5000
> > sort_mem = 8192
> > fsync = false
>
> A couple of things.
>
> 1:  Is there an index on the parts of the query used for the where clause?
> 2:  What is your effect_cache_size set to?  It needs to be set right for
> your postgresql server to be able to take advantage of the kernel's cache
> (i.e. use an index scan when the kernel is likely to have that data in
> memory.)

Sorry, that should be effective_cache_size, not effect_cache_size.  It's
set in 8k blocks and is usually about how much buffer / cache you have
left over after the machines "settles" after being up and running for a
while.  Fer instance, on my server, I show 784992K cache, and 42976K buff
under top, so, that's 827968k/8k=103496 blocks.  Note that if you've
recompiled you may have somehow set block size larger, but installations
with postgresql block sizes ~=8k are pretty uncommon, and you'd know if
you had done that, so it's probably 8k blocks.

Re: Inconsistent performance

От

Christopher Browne

Дата:

16 сентября 2003 г., 02:39:43

The world rejoiced as jbove@vetstar.com (Joseph Bove) wrote:
> Actually, it's inconsistent with the exact same command. I've now
> replicated the problem by doing the following command:
>
> select count (*) from table;
>
> The table in question has 88899 rows.
>
> The response time is anywhere from 1 second to 12 seconds. Different
> response times can occur in the same minute of testing!

The only possible plan for THAT query will involve a seq scan of the
whole table.  If the postmaster already has the data in cache, it
makes sense for it to run in 1 second.  If it has to read it from
disk, 12 seconds makes a lot of sense.

You might want to increase the "shared_buffers" parameter in
postgresql.conf; that should lead to increased stability of times as
it should be more likely that the data in "table" will remain in
cache.
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/x.html
Signs of  a  Klingon  Programmer -  8.  "Debugging?   Klingons do  not
debug.  Our software  does not   coddle the  weak. Bugs  are  good for
building character in the user."

Re: Inconsistent performance

От

Manfred Koizar

Дата:

16 сентября 2003 г., 07:07:34

On Mon, 15 Sep 2003 22:26:45 -0400, Christopher Browne
<cbbrowne@acm.org> wrote:
>> select count (*) from table;
>The only possible plan for THAT query will involve a seq scan of the
>whole table.  If the postmaster already has the data in cache, it
>makes sense for it to run in 1 second.  If it has to read it from
>disk, 12 seconds makes a lot of sense.

Yes.  And note that the main difference is between having the data in
memory and having to fetch it from disk.  I don't believe that this
difference can be explained by 9000 read calls hitting the operating
system's cache.

>You might want to increase the "shared_buffers" parameter in
>postgresql.conf; that should lead to increased stability of times as
>it should be more likely that the data in "table" will remain in
>cache.

Let's not jump to this conclusion before we know what's going on.

Joseph Bove <jbove@vetstar.com> wrote in another message above:
| I did have shared_buffers and sort_mem both set higher originally (15000,
| 32168)

As I read this I think he meant "... and had the same performance
problem."

Joseph, what do you get, if you run that
     EXPLAIN ANALYSE SELECT count(*) ...
several times?  What do vmstat and top show while the query is
running?  Are there other processes active during or between the runs?
What kind of processes?  Postgres backends?  Web server? ...

Servus
 Manfred

Re: Inconsistent performance

От

"Matt Clark"

Дата:

16 сентября 2003 г., 08:31:34

...
> #effective_cache_size = 1000    # typically 8KB each

That's horribly wrong.  It's telling PG that your OS is only likely to cache
8MB of the DB in RAM.  If you've got 1GB of memory it should be between
64000 and 96000

Re: Inconsistent performance

От

Joseph Bove

Дата:

16 сентября 2003 г., 15:09:18

Dear list,

First and foremost, thanks to the great number of people who have responded
with various tips and suggestions. I am now starting to fully appreciate
the various cache settings and what they can do for performance.

I just want to redefine the problem based on the knowledge of it that I now
have.

In my example, I am purposefully forcing a full table scan - select count
(*) from table. This table has only 90,000 rows. Each row is comprised of
about 300 bytes of data.

If the table has not been cached, I was seeing response times from 5 to 18
seconds to read the table. If it had been cached, then the response time
dropped to sub-second response.

Obviously, I can tune the caching so as to make sure that as much data that
can be reasonably cached is cached. However, I don't think that a hit of
even 5 seconds to read a table of 90,000 rows is acceptable.

One thing that has been tried with some success was to dump the table and
recreate it. After this exercise, selecting all rows from the table when it
is not in cache takes about 3 seconds. (Of course, when in cache, the same
sub-second response time is seen.)

I still think that 3 seconds is not acceptable. However, I reserve the
right to be wrong. Does it sound unrealistic to expect PostgreSQL to be
able to read 90,000 rows with 300 bytes per row in under a second?

Based on suggestions from the list, I am also thinking of making the
following tuning changes:

shared_buffers = 15000
sort_mem = 32168
effective_cache_size = 64000

This is based on one gig of memory.

Does anyone have any feedback on these values? Also, realizing that no two
database are the same, etc., etc... does anyone have a good formula for
setting these values?

Thanks in advance,

Joseph

At 09:09 AM 9/16/2003 +0200, Manfred Koizar wrote:
>On Mon, 15 Sep 2003 22:26:45 -0400, Christopher Browne
><cbbrowne@acm.org> wrote:
> >> select count (*) from table;
> >The only possible plan for THAT query will involve a seq scan of the
> >whole table.  If the postmaster already has the data in cache, it
> >makes sense for it to run in 1 second.  If it has to read it from
> >disk, 12 seconds makes a lot of sense.
>
>Yes.  And note that the main difference is between having the data in
>memory and having to fetch it from disk.  I don't believe that this
>difference can be explained by 9000 read calls hitting the operating
>system's cache.
>
> >You might want to increase the "shared_buffers" parameter in
> >postgresql.conf; that should lead to increased stability of times as
> >it should be more likely that the data in "table" will remain in
> >cache.
>
>Let's not jump to this conclusion before we know what's going on.
>
>Joseph Bove <jbove@vetstar.com> wrote in another message above:
>| I did have shared_buffers and sort_mem both set higher originally (15000,
>| 32168)
>
>As I read this I think he meant "... and had the same performance
>problem."
>
>Joseph, what do you get, if you run that
>         EXPLAIN ANALYSE SELECT count(*) ...
>several times?  What do vmstat and top show while the query is
>running?  Are there other processes active during or between the runs?
>What kind of processes?  Postgres backends?  Web server? ...
>
>Servus
>  Manfred
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: Inconsistent performance

От

Jeff

Дата:

16 сентября 2003 г., 15:39:08

On Tue, 16 Sep 2003, Joseph Bove wrote:

> I still think that 3 seconds is not acceptable. However, I reserve the
> right to be wrong. Does it sound unrealistic to expect PostgreSQL to be
> able to read 90,000 rows with 300 bytes per row in under a second?
>
first, check to see what your max throughput on your disk is using a
benchmark such as Bonnie (Making sure to use a size LARGER than phsyical
memory. 2x physical is veyr optimial).

next, run your query again with a vmstat 1 running in another term.

See how close the vmstat "bi" numbers correspond to your max according to
bonnie.  You could have an IO bottleneck.  (I once went running around
trying to figure it out and then discovered the issue was IO).

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Inconsistent performance