Re: PostgreSQL performance problem -> tuning

Поиск
Список
Период
Сортировка
От Richard Huxton
Тема Re: PostgreSQL performance problem -> tuning
Дата
Msg-id 200308072006.35087.dev@archonet.com
обсуждение исходный текст
Ответ на Re: PostgreSQL performance problem -> tuning  (Yaroslav Mazurak <yamazurak@Lviv.Bank.Gov.UA>)
Ответы Re: PostgreSQL performance problem -> tuning
Список pgsql-performance
On Thursday 07 August 2003 17:30, Yaroslav Mazurak wrote:
> Hi All!
>
>
>     First, thanks for answers!
>
> Richard Huxton wrote:
> > On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote:
> >>>IIRC there is a limit on filesystem cache on freeBSD. 300MB by default.
> >>>If that is the case, you might have to raise it to make
> >>>effective_cache_size really effective..
> >>
> >>    "Try various sysctls" says nothing for me. I want use *all available
> >>RAM* (of course, without needed for OS use) for PostgreSQL.
> >
> > PG will be using the OS' disk caching.
>
>     I think all applications using OS disk caching. ;)
>     Or you want to say that PostgreSQL tuned for using OS-specific cache
> implementation?
>     Do you know method for examining real size of OS filesystem cache? If I
> understood right, PostgreSQL dynamically use all available RAM minus
> shared_buffers minus k * sort_mem minus effective_cache_size?
>     I want configure PostgreSQL for using _maximum_ of available RAM.

PG's memory use can be split into four areas (note - I'm not a developer so
this could be wrong).
1. Shared memory - vital so that different connections can communicate with
each other. Shouldn't be too large, otherwise PG spends too long managing its
shared memory rather than working on your queries.
2. Sort memory - If you have to sort results during a query it will use up to
the amount you define in sort_mem and then use disk if it needs any more.
This is for each sort.
3. Results memory - If you're returning 8000 rows then PG will assemble these
and send them to the client which also needs space to store the 8000 rows.
4. Working memory - to actually run the queries - stack and heap space to keep
track of its calculations etc.

Your best bet is to start off with some smallish reasonable values and step
them up gradually until you don't see any improvement. What is vital is that
the OS can cache enough disk-space to keep all your commonly used tables and
indexes in memory - if it can't then you'll see performance drop rapidly as
PG has to keep accessing the disk.

For the moment, I'd leave the settings roughly where they are while we look at
the query, then once that's out of the way we can fine-tune the settings.

[snip suggestion to break the query down]
>     Yes, you're right. I've tested a few statements and obtain interesting
> results.
>     SELECT * FROM v_file02wide WHERE... executes about 34 seconds.
>     SELECT showcalc(...); executes from 0.7 seconds (without recursion) up
> to 6.3 seconds if recursion is used! :(
>     This mean, that approximate execute time for fully qualified SELECT
> with about 8K rows is... about 13 hours! :-O

Hmm - not good.

>     Hence, problem is in my function showcalc:

That's certainly the place to start, although we might be able to do something
with v_file02wide later.

> CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4),
> NUMERIC(16)) RETURNS NUMERIC(16)
> LANGUAGE SQL STABLE AS '
> -- Parameters: code, dd, r020, t071
>     SELECT COALESCE(
>         (SELECT sc.koef * $4
>             FROM showing AS s NATURAL JOIN showcomp AS sc
>             WHERE s.kod = $1
>                 AND NOT SUBSTR(acc_mask, 1, 1) = ''[''
>                 AND SUBSTR(acc_mask, 1, 4) = $3
>                 AND SUBSTR(acc_mask, 5, 1) = SUBSTR($2, 1, 1)),
>         (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2,
> LENGTH(acc_mask) - 2), $2, $3, $4), 0))
>             FROM showing AS s NATURAL JOIN showcomp AS sc
>             WHERE s.kod = $1
>                 AND SUBSTR(acc_mask, 1, 1) = ''[''),
>         0) AS showing;
> ';
>
>     BTW, cross join "," with WHERE clause don't improve performance
> relative to NATURAL JOIN.
>     Additionally, with user-defined function beginchar (SUBSTR(..., 1, 1)),
> used for indexing, showcalc executes about 16 seconds. With function
> SUBSTR the same showcalc executes 6 seconds.

Fair enough - substr should be fairly efficient.

[snip explanation of table structures and usage]

I'm not going to claim I understood everything in your explanation, but there
are a couple of things I can suggest. However, before you go and do any of
that, can I ask you to post an EXPLAIN ANALYSE of two calls to your
showcalc() function (once for a simple account, once for one with recursion)?
You'll need to cut and paste the query as standard SQL since the explain
won't look inside the function body.

OK - bear in mind that these suggestions are made without the benefit of the
explain analyse:

1. You could try splitting out the various tags of your mask into different
fields - that will instantly eliminate all the substr() calls and might make
a difference. If you want to keep the mask for display purposes, we could
build a trigger to keep it in sync with the separate flags.

2. Use a "calculations" table and build your results step by step. So -
calculate all the simple accounts, then calculate the ones that contain the
simple accounts.

3. You could keep a separate "account_contains" table that might look like:
  acc_id | contains
  A001   | A001
  A002   | A002
  A003   | A003
  A003   | A001
  A004   | A004
  A004   | A003
  A004   | A001

So here A001/A002 are simple accounts but A003 contains A001 too. A004
contains A003 and A001. The table can be kept up to date automatically using
some triggers.
This should make it simple to pick up all the accounts contained within the
target account and might mean you can eliminate the recursion.

>     Now I think about change function showcalc or/and this data
> structures... :)

Post the EXPLAIN ANALYSE first - maybe someone smarter than me will have an
idea.

>     Anyway, 600Mb is too low for PostgreSQL for executing my query - DBMS
> raise error after 11.5 hours (of estimated 13?). :(

I think the problem is the 13 hours, not the 600MB. Once we've got the query
running in a reasonable length of time (seconds) then the memory requirements
will go down, I'm sure.

--
  Richard Huxton
  Archonet Ltd

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Marc G. Fournier"
Дата:
Сообщение: Testing gateway
Следующее
От: Ron Johnson
Дата:
Сообщение: Re: PostgreSQL performance problem -> tuning