Re: Pet Peeves?

Поиск
Список
Период
Сортировка
От Chris Mayfield
Тема Re: Pet Peeves?
Дата
Msg-id gma7uv$2gel$1@news.hub.org
обсуждение исходный текст
Ответ на Pet Peeves?  (Gregory Stark <stark@enterprisedb.com>)
Ответы Re: Pet Peeves?  (Peter Geoghegan <peter.geoghegan86@gmail.com>)
Re: Pet Peeves?  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-general
Here's a few more pet peeves.  I'm not sure if any of these are known
bugs or just me being picky.

--Chris

--------------------------------------------------

1. Having to rewrite entire tables out to disk the first time I scan
them, for example:

CREATE TABLE t1 AS ...; -- writes 100 GB to disk
CREATE INDEX i1 ON t1 ...; -- rewrites 100 GB to disk

The main issue is setting the hint bits for each tuple, which IMO should
initially be set for "CREATE TABLE AS" statements.  To work around this
for now, I modified heap_insert (in heapam.c) to mark tuples as
committed when inserting them into newly added pages without WAL:

/*
  * Optimization for CREATE TABLE AS statements: mark tuples as committed
  * to prevent rewriting them to disk upon first use. This is safe since
  * the new relation is not visible until the transaction commits anyway.
  */
if (!use_wal && !use_fsm)
{
    tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
}

--------------------------------------------------

2. Having to use a subquery and/or "OFFSET 0" to prevent multiple calls
to immutable functions returning composite types, for example:

CREATE TYPE three AS
   (i integer, j integer, k integer);

CREATE FUNCTION hello()
RETURNS three AS $$
DECLARE ret three;
BEGIN
   RAISE NOTICE 'hello';
   ret := ROW(1,2,3);
   RETURN ret;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- function called three times
SELECT (hello()).*;

-- function called only once
SELECT (h).* FROM (
   SELECT hello() AS h
) AS sub;

-- function called three times
SELECT (h).* FROM (
   SELECT hello() AS h
   FROM generate_series(1,3)
) AS sub;

-- function called only once
SELECT (h).* FROM (
   SELECT hello() AS h
   FROM generate_series(1,3)
   OFFSET 0
) AS sub;

--------------------------------------------------

3. Poor estimates for n_distinct can result in out of memory errors.

For example, I have a user-defined aggregate over a user-defined data
type (both written in C).  The aggregate doesn't take much memory, but
the data type can be relatively large (i.e. "STORAGE = extended").  My
table has five million distinct rows, but n_distinct is around 50,000
(increasing the stats target helps a little, but it's still way off).
As a result the planner choses "SeqScan + HashAgg" instead of "IndexScan
+ GroupAgg", and the query aborts when the hash table eventually runs
out of memory.

I currently work around this issue using "SET enable_hashagg TO off;"
when necessary.

В списке pgsql-general по дате отправления:

Предыдущее
От: Grzegorz Jaśkiewicz
Дата:
Сообщение: C function question
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Pet Peeves?