Re: using a lot of maintenance_work_mem

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: using a lot of maintenance_work_mem
Дата
Msg-id 20110409015405.GH4548@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: using a lot of maintenance_work_mem  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: using a lot of maintenance_work_mem  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom, all,

Having run into issues caused by small work_mem, again, I felt the need
to respond to this.

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> You would break countless things.  It might be okay anyway in a trusted
> environment, ie, one without users trying to crash the system, but there
> are a lot of security-critical implications of that test.

I really don't see work_mem or maintenance_work_mem as security-related
parameters.  Amusingly, the Postgres95 1.01 release apparently attmpted
to make the cap 16GB (but failed and made it 256M instead).  After a bit
of poking around, I found this commit:

commit 85c17dbff8ade0c5237e3ac1ece7cacacfdde399
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Feb 6 01:53:53 2001 +0000
   Out-of-bounds memory allocation request sizes should be treated as just   elog(ERROR) not an Assert trap, since
we'vedowngraded out-of-memory to   elog(ERROR) not a fatal error.  Also, change the hard boundary from 256Mb   to 1Gb,
justso that anyone who's actually got that much memory to spare   can play with TOAST objects approaching a gigabyte. 

If we want to implement a system to limit what users can request with
regard to work_mem then we can do that, but a smart user could probably
circumvent such a system by building huge queries..  A system which
monitered actual usage and ERROR'd out would probably be better to
address that concern.

> If we were actually trying to support such large allocations,
> what I'd be inclined to do is introduce a separate call along the lines
> of MemoryContextAllocLarge() that lacks the safety check.

This sounds like the right approach to me.  Basically, I'd like to have
MemoryContextAllocLarge(), on 64bit platforms, and have it be used for
things like sorts and hash tables.  We'd need to distinguish that usage
from things which allocate varlena's and the like.

> But before
> expending time on that, I'd want to see some evidence that it's actually
> helpful for production situations.  I'm a bit dubious that you're going
> to gain much here.

I waited ~26hrs for a rather simple query:

explain select <bunch-of-columns>, <bunch-of-aggregates>
from really_big_table
where customer_code ~ '^CUST123'
group by <bunch-of-columns>
;
                                                               QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------GroupAggregate
(cost=37658456.68..42800117.89 rows=10546998 width=146)  ->  Sort  (cost=37658456.68..37922131.61 rows=105469973
width=146)       Sort Key: <bunch-of-columns>        ->  Seq Scan on really_big_table  (cost=0.00..15672543.00
rows=105469973width=146)              Filter: ((customer_code)::text ~ '^CUST123'::text) 
(5 rows)

This query ran for ~26 hours, where ~20 hours was spent sorting the ~30G
which resulted from the Seq-Scan+filter (the raw table is ~101G).  The
resulting table (after the GroupAgg) was only 30MB in size (~80k rows
instead of the estimated 10M above).  Another query against the same
101G table, which used a HashAgg, completed just a bit faster than the
26 hours:
                                                                    QUERY PLAN
                           

----------------------------------------------------------------------------------------------------------------------------------------------------HashAggregate
(cost=19627666.99..19631059.80 rows=90475 width=116) (actual time=1435604.737..1435618.293 rows=4869 loops=1)  ->  Seq
Scanon really_big_table  (cost=0.00..15672543.00 rows=105469973 width=116) (actual time=221029.805..804802.329
rows=104616597loops=1)        Filter: ((agency_hierarchy_code)::text ~ '^CUST123'::text)Total runtime: 1435625.388 ms 
(4 rows)

Now, this query had fewer columns in the group by (required to convince
PG to use a HashAgg), but, seriously, it only took 23 minutes to scan
through the entire table.  It could have taken 3 hours and I would have
been happy.

Admittedly, part of the problem here is the whole cross-column
correllation stats problem, but I wouldn't care if the stats were right
and I ended up with a 1.5G hash table and 10M records result, I'm pretty
sure generating that would be a lot faster using a HashAgg than a
sort+GroupAgg.  Also, I feel like we're pretty far from having the
cross-column statistics fixed and I'm not 100% convinced that it'd
actually come up with a decent result for this query anyway (there's 18
columns in the group by clause for the first query...).

Anyhow, I just wanted to show that there are definitely cases where the
current limit is making things difficult for real-world PG users on
production systems.
Thanks,
    Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "A.M."
Дата:
Сообщение: Re: lowering privs in SECURITY DEFINER function
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: pg_upgrade bug found!