Обсуждение: Performance impact of NULLs and variable length fields

Поиск
Список
Период
Сортировка

Performance impact of NULLs and variable length fields

От
Martijn van Oosterhout
Дата:
Sometime ago somebody asked if it made a difference adn it was suggested
that the effect was probably marginal. I ran a profiler over postgres doing
a large query and these are the top 10 functions:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 16.04      0.51     0.51  1676772     0.00     0.00  nocachegetattr
 11.95      0.89     0.38  1427403     0.00     0.00  heapgettup
 10.06      1.21     0.32  2955372     0.00     0.00  LockBuffer
  6.92      1.43     0.22  3406475     0.00     0.00  ExecEvalExpr
  5.03      1.59     0.16  1617018     0.00     0.00  AllocSetReset
  4.40      1.73     0.14  1427403     0.00     0.00  heap_getnext
  4.09      1.86     0.13  1844339     0.00     0.00  ExecEvalVar
  4.09      1.99     0.13  1441330     0.00     0.00  ReleaseBuffer
  4.09      2.12     0.13  1427064     0.00     0.00  SeqNext
  3.46      2.23     0.11  1552338     0.00     0.00  ExecQual

Note: the wall clock time of the query was about 30 seconds. The total cpu
time was 3.18 seconds.

Now, nocachegetattr is called mostly whenever there are NULLs or variable
length strings in the tuple. Since our biggest table begins with a variable
length field, every single lookup is going to be uncached. Looks like the
effect is significant.

Is this ever going to change?

Just for interests sake i've also got the results for a plain select sum(x)
from y;

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  9.12      0.60     0.60  1423541     0.00     0.00  heapgettup
  8.51      1.16     0.56  1423329     0.00     0.00  nocachegetattr
  6.08      1.56     0.40  2850372     0.00     0.00  AllocSetAlloc
  5.47      1.92     0.36  1423308     0.00     0.00  ExecTargetList
  5.02      2.25     0.33  2846619     0.00     0.00  AllocSetReset
  4.10      2.52     0.27  1423308     0.00     0.00  heap_formtuple
  3.80      2.77     0.25  2846614     0.00     0.00  ExecEvalVar
  3.80      3.02     0.25        2   125.00   922.33  ExecAgg
  3.65      3.26     0.24  1423310     0.00     0.00  ExecProcNode
  3.04      3.46     0.20  1423307     0.00     0.00  advance_transition_function

Seems to me that the generality of the aggregate implementation doesn't seem
to really be an issue.

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: Performance impact of NULLs and variable length fields

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Sometime ago somebody asked if it made a difference adn it was suggested
> that the effect was probably marginal. I ran a profiler over postgres doing
> a large query and these are the top 10 functions:

This is pretty meaningless when you didn't give one single detail about
the test query or conditions.

            regards, tom lane

Re: Performance impact of NULLs and variable length fields

От
Martijn van Oosterhout
Дата:
Well, Tom pointed out that without the query itself this was fairly
meaningless. So here it is:

select c.id, sum(d.cost)
from c, l, d
where c.id = l.cid
and l.clid = d.clid
and c.accman = '2500'
and billid is null
group by c.id;

EXPLAIN output:
Aggregate  (cost=51479.55..51507.07 rows=550 width=48) (actual time=26192.07..26222.07 rows=156 loops=1)
  ->  Group  (cost=51479.55..51493.31 rows=5503 width=48) (actual time=26165.81..26186.14 rows=2029 loops=156)
        ->  Sort  (cost=51479.55..51479.55 rows=5503 width=48) (actual time=26164.90..26167.77 rows=2029 loops=1)
              ->  Hash Join  (cost=676.79..51137.62 rows=5503 width=48) (actual time=3703.82..26122.23 rows=2029
loops=1)
                    ->  Seq Scan on d  (cost=0.00..50150.07 rows=32262 width=12) (actual time=67.35..24921.42
rows=41045loops=1) 
                    ->  Hash  (cost=671.31..671.31 rows=2193 width=36) (actual time=594.55..594.55 rows=0 loops=1)
                          ->  Merge Join  (cost=260.50..671.31 rows=2193 width=36) (actual time=251.95..551.40
rows=1330loops=1) 
                                ->  Index Scan using l_cid on l  (cost=0.00..364.36 rows=6308 width=20) (actual
time=33.95..340.52rows=6307 loops=1) 
                                ->  Sort  (cost=260.50..260.50 rows=1305 width=16) (actual time=142.85..145.34
rows=1337loops=1) 
                                      ->  Seq Scan on c  (cost=0.00..192.94 rows=1305 width=16) (actual
time=14.99..103.40rows=1314 loops=1) 

d has over 1.4 million rows, l about 10,000 and c about 2,000.

NULL columns and variable length fields used extensivly.

On Sat, Jul 21, 2001 at 05:25:38PM +1000, Martijn van Oosterhout wrote:
> Sometime ago somebody asked if it made a difference adn it was suggested
> that the effect was probably marginal. I ran a profiler over postgres doing
> a large query and these are the top 10 functions:
>
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  16.04      0.51     0.51  1676772     0.00     0.00  nocachegetattr
>  11.95      0.89     0.38  1427403     0.00     0.00  heapgettup
>  10.06      1.21     0.32  2955372     0.00     0.00  LockBuffer
>   6.92      1.43     0.22  3406475     0.00     0.00  ExecEvalExpr
>   5.03      1.59     0.16  1617018     0.00     0.00  AllocSetReset
>   4.40      1.73     0.14  1427403     0.00     0.00  heap_getnext
>   4.09      1.86     0.13  1844339     0.00     0.00  ExecEvalVar
>   4.09      1.99     0.13  1441330     0.00     0.00  ReleaseBuffer
>   4.09      2.12     0.13  1427064     0.00     0.00  SeqNext
>   3.46      2.23     0.11  1552338     0.00     0.00  ExecQual

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: Performance impact of NULLs and variable length fields

От
Bruce Momjian
Дата:
Martijn van Oosterhout wrote:
> Sometime ago somebody asked if it made a difference adn it was suggested
> that the effect was probably marginal. I ran a profiler over postgres doing
> a large query and these are the top 10 functions:
>
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  16.04      0.51     0.51  1676772     0.00     0.00  nocachegetattr
>  11.95      0.89     0.38  1427403     0.00     0.00  heapgettup
>  10.06      1.21     0.32  2955372     0.00     0.00  LockBuffer
>   6.92      1.43     0.22  3406475     0.00     0.00  ExecEvalExpr
>   5.03      1.59     0.16  1617018     0.00     0.00  AllocSetReset
>   4.40      1.73     0.14  1427403     0.00     0.00  heap_getnext
>   4.09      1.86     0.13  1844339     0.00     0.00  ExecEvalVar
>   4.09      1.99     0.13  1441330     0.00     0.00  ReleaseBuffer
>   4.09      2.12     0.13  1427064     0.00     0.00  SeqNext
>   3.46      2.23     0.11  1552338     0.00     0.00  ExecQual

I have added this to the TODO list:

    * Improve caching of attribute offsets when NULLs exist in the row

I have some ideas about how offsets with NULL's could be handled in this
case.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026