Re: reducing random_page_cost from 4 to 2 to force index scan

Поиск
Список
Период
Сортировка
От Sok Ann Yap
Тема Re: reducing random_page_cost from 4 to 2 to force index scan
Дата
Msg-id BANLkTi=NFNTZ8n4zB0L+TLdurnSnnpx9Pw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: reducing random_page_cost from 4 to 2 to force index scan  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
On Wed, Apr 27, 2011 at 8:40 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Sok Ann Yap  wrote:
>> Kevin Grittner  wrote:
>
>>> Please show us your overall configuration and give a description
>>> of the hardware (how many of what kind of cores, how much RAM,
>>> what sort of storage system).
>
>> Here's the configuration (this is just a low end laptop):
>
>> version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu,
>> compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0,
>> pie-0.4.5) 4.5.2, 64-bit
>> checkpoint_segments | 16
>> default_statistics_target | 10000
>
> Usually overkill.  If this didn't help, you should probably change it
> back.
>
>> effective_cache_size | 512MB
>> lc_collate | en_US.UTF-8
>> lc_ctype | en_US.UTF-8
>> listen_addresses | *
>> log_destination | syslog
>> log_min_duration_statement | 0
>> maintenance_work_mem | 256MB
>> max_connections | 100
>
> You probably don't need this many connections.
>
>> max_stack_depth | 2MB
>> port | 5432
>> random_page_cost | 4
>> server_encoding | UTF8
>> shared_buffers | 256MB
>> silent_mode | on
>> TimeZone | Asia/Kuala_Lumpur
>> wal_buffers | 1MB
>> work_mem | 32MB
>> (20 rows)
>
> It's hard to recommend other changes without knowing the RAM on the
> system.  How many of what kind of CPUs would help, too.
>
>> The thing is, the query I posted was fairly simple (I think), and
>> PostgreSQL should be able to choose the 3000+ times faster index
>> scan with the default random_page_cost of 4.
>
> It picks the plan with the lowest estimated cost.  If it's not
> picking the best plan, that's usually an indication that you need to
> adjust cost factors so that estimates better model the actual costs.
>
>> If I need to reduce it to 2 when using a 5.4k rpm slow disk, what
>> is random_page_cost = 4 good for?
>
> It's good for large databases with a lot of physical disk I/O.  In
> fact, in some of those cases, it needs to be higher.  In your test,
> the numbers indicate that everything was cached in RAM.  That makes
> the effective cost very low.
>
> Also, the odds are that you have more total cache space between the
> shared_buffers and the OS cache than the effective_cache_size
> setting, so the optimizer doesn't expect the number of cache hits
> you're getting on index usage.
>
> -Kevin
>

Thanks for the tips and explanation. I wrongly assumed the
random_page_cost value is independent from caching.

Now, let's go back to the original query:

SELECT
    salutations.id,
    salutations.name,
    EXISTS (
        SELECT 1
        FROM contacts
        WHERE salutations.id = contacts.salutation_id
    ) AS in_use
FROM salutations

If I split up the query, i.e. running this once:

SELECT
    salutations.id,
    salutations.name
FROM salutations

and then running this 44 times, once for each row:

SELECT
    EXISTS (
        SELECT 1
        FROM contacts
        WHERE contacts.salutation_id = ?
    ) AS in_use

I can see that PostgreSQL will smartly pick the best plan, i.e. for
common salutations (Madam, Ms, etc), it will do sequential scan, while
for salutations that are rarely used or not used at all, it will do
index scan.

Anyway, the overhead of spawning 44 extra queries means that it is
still better off for me to stick with the original query and tune
PostgreSQL to choose index scan.

В списке pgsql-performance по дате отправления:

Предыдущее
От: HSIEN-WEN CHU
Дата:
Сообщение: VX_CONCURRENT flag on vxfs( 5.1 or later) for performance for postgresql?
Следующее
От: Sethu Prasad
Дата:
Сообщение: Re: Performance