Postgres on SSD

Поиск
Список
Период
Сортировка
От Ondrej Ivanič
Тема Postgres on SSD
Дата
Msg-id CAM6mieJ1QQnmr7cnzkjmMHkyCZ1d1Ka2wGfCEfJ2oQCzMHFLCQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Postgres on SSD
Re: Postgres on SSD
Список pgsql-general
Hi,

I'm about to evaluate this SSD card: FusionIO ioDrive Duo [1]. The
main reason for this experiment is to see if SSD can significantly
improve query performance. So, I have the following questions:

- Could you please share your experience with SSD? Any issues?
- What needs to be changed at Postgres/Operating system level? The
obvious one is to change random_page_cost (now: 2) and seq_page_cost
(now: 4). What else should I look at?

Background:
Database schema is pretty simple:
Database size is around ~1.4TB. Main tables occupied around 1/3
(450GB, ~220mil rows) and aggregated data occupied 2/3 (900GB). All
indexes are on separate table space (~550GB)
1) main table(s)
This big fact table has ~200 columns and average row size is 1.5kb.
This table is never updated and new data is inserted constantly using
copy in chunks about 10k rows. Table is quite sparse so it is broken
into 5 tables which are joined as necessary (regular query reads 10 -
40% of the row size).

Tables are partitioned by month but I'm thinking to use week or two
week partitions.

Primary key is composite key (datetime, organisation, transaction_id).
The transaction_id column is unique but "datetime" column is used for
partitioning and all queries contains organisation="...."
(multi-tetant database). In addition, there are ~15 single column
indexes. Old data is deleted after 6 months (drop partition)

The usual  query looks like this:
- select ... from T where organisation = ... and datetime between ...
and ... where ... order by <single col> limit 1000
User can choose any column for sorting but we created indexes for the
most popular/reasonable ones (those ~15 single column indexes).

In the reality, query is more complex because of few Postgres issues:
- partitions/limit/order issue described on Stackoverflow and fixed in
9.1 or 9.2 [2], [3].
- partitions/join issues ie left join "on" clause must contain
datetime condition in order to avoid fulltable scan on joined table

Query response time for indexed columns is between 5 to 30 sec
(sometimes 200sec). The target is to have all queries under 5 sec. If
query has order by on non-indexed column then response time is in
hundreds seconds but desired response time should be 10sec (test query
is over 1 month range and organisation has between 0.5 and 2 mil row
per month; single partition has > 30 mil rows)

2) materialised aggregate tables
About 45 tables like this: agg_attribute1_attribute2(date,
organisation, attribute1, attribute2, count) (= select datetime::date,
organisation, attribute1, attribute2, count(*) from T where
organisation = ... and datetime between ... and ... group by 1,2,3,4)
Tables are updated by cron job every 15 minutes. Thanks for 8.3 HOT
updates - almost no bloat! Monthly partitions are used (date column).
Query response time is between 5 to 30 sec (sometimes 200sec) and the
target is to have all queries under 5 sec

Usual query is:
select attribute1, count(*) from agg_attribute1_... where organisation
= ... and datetime between ... and ... group by 1 limit 10
or
select attribute1, attribute2 count(*) from agg_attribute1_attribute2
where organisation = ... and datetime between ... and ... group by 1
limit 10

Top N queries perform even worse -- the query response time is in
minutes and the target is around 15 sec

Current hardware setup:
XEN host with 16 CPU (Intel(R) Xeon(R) CPU L5520  @ 2.27GHz). CentOS 5.6
80GB RAM
Storage: some Hitachi Fibre channel SAN with two LUNs:
1st LUN has *everything* under $PG_DATA (used 850 GB)
2nd LUN has *all* indexes (index table space) (used 550GB)

Postgres settings:
             name             |
         current_setting


------------------------------+--------------------------------------------------------------------------------------------------
----------------
 version                      | PostgreSQL 8.4.5 on
x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red
Hat 4.
1.2-48), 64-bit
 archive_command              | walarchive.sh %p %f
 archive_mode                 | on
 autovacuum                   | on
 autovacuum_max_workers       | 6
 autovacuum_naptime           | 5min
 autovacuum_vacuum_cost_delay | -1
 checkpoint_completion_target | 0.9
 checkpoint_segments          | 48
 constraint_exclusion         | on
 default_statistics_target    | 100
 effective_cache_size         | 20GB
 fsync                        | on
 lc_collate                   | en_US.UTF-8
 lc_ctype                     | en_US.UTF-8
 log_autovacuum_min_duration  | 0
 log_destination              | csvlog
 log_min_duration_statement   | 10s
 log_rotation_age             | 1d
 log_rotation_size            | 0
 log_truncate_on_rotation     | on
 logging_collector            | on
 maintenance_work_mem         | 256MB
 max_connections              | 100
 max_stack_depth              | 2MB
 random_page_cost             | 2
 server_encoding              | UTF8
 shared_buffers               | 9GB
 TimeZone                     | UTC
 vacuum_cost_delay            | 0
 wal_buffers                  | 50MB
 wal_sync_method              | fdatasync
 wal_writer_delay             | 1s
 work_mem                     | 256MB



[1] http://www.fusionio.com/products/iodrive-duo/
[2] http://stackoverflow.com/questions/6268633/postgres-partitioning-order-by-performance
[3] http://stackoverflow.com/questions/2236776/efficient-querying-of-multi-partition-postgres-table

Thanks,
--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

В списке pgsql-general по дате отправления:

Предыдущее
От: Postgres User
Дата:
Сообщение: Indicating DEFAULT values in INSERT statement
Следующее
От: Diego Augusto Molina
Дата:
Сообщение: Re: Indicating DEFAULT values in INSERT statement