Re: speeding up planning with partitions

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: speeding up planning with partitions
Дата
Msg-id 993dedcc-6079-ad40-b087-00d139c00203@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: speeding up planning with partitions  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
(I've closed the CF entry: https://commitfest.postgresql.org/22/1778/)

On 2019/04/01 2:04, Tom Lane wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
>> On Sun, Mar 31, 2019 at 11:45 AM Imai Yoshikazu <yoshikazu_i443@live.jp> wrote:
>>> Certainly, using bitmapset contributes to the performance when scanning
>>> one partition(few partitions) from large partitions.
> 
>> Thanks Imai-san for testing.
> 
> I tried to replicate these numbers with the code as-committed, and
> could not.

Thanks for that.

> What I get, using the same table-creation code as you
> posted and a pgbench script file like
> 
> \set param random(1, :N)
> select * from rt where a = :param;
> 
> is scaling like this:
> 
> N    tps, range    tps, hash
> 
> 2    10520.519932    10415.230400
> 8    10443.361457    10480.987665
> 32    10341.196768    10462.551167
> 128    10370.953849    10383.885128
> 512    10207.578413    10214.049394
> 1024    10042.794340    10121.683993
> 4096    8937.561825    9214.993778
> 8192    8247.614040    8486.728918
> 
> If I use "-M prepared" the numbers go up a bit for lower N, but
> drop at high N:
> 
> N    tps, range    tps, hash
> 
> 2    11449.920527    11462.253871
> 8    11530.513146    11470.812476
> 32    11372.412999    11450.213753
> 128    11289.351596    11322.698856
> 512    11095.428451    11200.683771
> 1024    10757.646108    10805.052480
> 4096    8689.165875    8930.690887
> 8192    7301.609147    7502.806455
> 
> Digging into that, it seems like the degradation with -M prepared is
> mostly in LockReleaseAll's hash_seq_search over the locallock hash table.
> What I think must be happening is that with -M prepared, at some point the
> plancache decides to try a generic plan, which causes opening/locking all
> the partitions, resulting in permanent bloat in the locallock hash table.
> We immediately go back to using custom plans, but hash_seq_search has
> more buckets to look through for the remainder of the process' lifetime.

Ah, we did find this to be a problem upthread [1] and Tsunakawa-san then
even posted a patch which is being discussed at:

https://commitfest.postgresql.org/22/1993/

> I do see some cycles getting spent in apply_scanjoin_target_to_paths
> that look to be due to scanning over the long part_rels array,
> which your proposal would ameliorate.  But (a) that's pretty small
> compared to other effects, and (b) IMO, apply_scanjoin_target_to_paths
> is a remarkable display of brute force inefficiency to begin with.
> I think we should see if we can't nuke that function altogether in
> favor of generating the paths with the right target the first time.

That's an option if we can make it work.

Shouldn't we look at *all* of the places that have code that now look like
this:

  for (i = 0; i < rel->nparts; i++)
  {
      RelOptInfo *partrel = rel->part_rels[i];

      if (partrel == NULL)
          continue;
      ...
  }

Beside apply_scanjoin_target_to_paths(), there are:

create_partitionwise_grouping_paths()
make_partitionedrel_pruneinfo()


> BTW, the real elephant in the room is the O(N^2) cost of creating
> these tables in the first place.  The runtime for the table-creation
> scripts looks like
> 
> N    range        hash
> 
> 2    0m0.011s    0m0.011s
> 8    0m0.015s    0m0.014s
> 32    0m0.032s    0m0.030s
> 128    0m0.132s    0m0.099s
> 512    0m0.969s    0m0.524s
> 1024    0m3.306s    0m1.442s
> 4096    0m46.058s    0m15.522s
> 8192    3m11.995s    0m58.720s
> 
> This seems to be down to the expense of doing RelationBuildPartitionDesc
> to rebuild the parent's relcache entry for each child CREATE TABLE.
> Not sure we can avoid that, but maybe we should consider adopting a
> cheaper-to-read representation of partition descriptors.  The fact that
> range-style entries seem to be 3X more expensive to load than hash-style
> entries is strange.

I've noticed this many times too, but never prioritized doing something
about it.  I'll try sometime.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f-dn1hDZqObwdMrYdV7-cELJwWCPRWet6EQX_WaV8JLgw%40mail.gmail.com




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yugo Nagata
Дата:
Сообщение: Re: Implementing Incremental View Maintenance
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: New vacuum option to do only freezing