Обсуждение: Bitmap heap scan performance

Поиск
Список
Период
Сортировка

Bitmap heap scan performance

От
Rob Emery
Дата:
Hi Guys,

I’m at a bit of a loss where I can go with the following 2 queries
that are over the same data structure (DDL attached) under postgresql
PostgreSQL 9.5.16 on x86_64-pc-linux-gnu (Debian 9.5.16-1.pgdg90+1),
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit and
could do with a second set of eyes if someone would oblige.

I’ve attached Query1.txt and Query2.txt along with the DDL for the
tables and indicies and execution plans.

On our production environment we’re running at about 2 seconds (with
the cache warm); I’m getting a comparable speed on my playbox. It
seems to me like the Bitmap Heap Scan on proposal is the issue because
the recheck is throwing away enormous amounts of data. The
has_been_anonymised flag on the proposal is effectively a soft-delete;
so I’ve tried adding something like :

CREATE INDEX ON proposal.proposal (system_id, legacy_organisation_id, reference)
WHERE has_been_anonymised = false;

Which I was hoping would shrink the size of the index significantly
and encourage an index scan rather than bitmap, however it didn’t have
that effect. For reference:

Has_been_anonymised false:    1534790
Has_been_anonymised true:        7072192

Row counts over the whole table in question are :
Proposal.proposal:    8606982    2340 MB
Proposal.note:        2624423      1638 MB

Presumably I could partition proposal on has_been_anonymised, however
the row counts seem low enough that it feels a bit like overkill? We
also need referential integrity so I'll need to wait until that's in
(I think it's coming in PG12?)

If I decrease the number of legacy_organisation_id’s that are being
used then the query performance gets much better, but presumably
that’s because there’s a smaller dataset.

Any thoughts or ideas?

Thanks
Rob

--
 <https://codeweavers.net>


A big Get Focused ‘thank you’
<https://codeweavers.net/company-blog/a-big-get-focused-thank-you>
Why you
should partner with an Agile company
<https://codeweavers.net/company-blog/why-you-should-partner-with-an-agile-company>


*
*
*Phone:* 0800 021 0888   Email: contactus@codeweavers.net
<mailto:contactus@codeweavers.net>
Codeweavers Ltd | Barn 4 | Dunston
Business Village | Dunston | ST18 9AB
Registered in England and Wales No.
04092394 | VAT registration no. 974 9705 63 




<https://twitter.com/Codeweavers_Ltd> 
<https://www.facebook.com/Codeweavers.Ltd/> 
<https://www.linkedin.com/company/codeweavers-limited>

Вложения

Re: Bitmap heap scan performance

От
Jeff Janes
Дата:
On Fri, Aug 9, 2019 at 4:42 AM Rob Emery <re-pgsql@codeweavers.net> wrote:
 

It
seems to me like the Bitmap Heap Scan on proposal is the issue because
the recheck is throwing away enormous amounts of data.

Have you tried increasing work_mem?  The probable reason for the recheck is that your bitmap overflows the allowed memory, and then switches from storing every tid to storing just the block numbers.  As indicated by the lossy part of "Heap Blocks: exact=3983 lossy=27989"

The
has_been_anonymised flag on the proposal is effectively a soft-delete;
so I’ve tried adding something like :

CREATE INDEX ON proposal.proposal (system_id, legacy_organisation_id, reference)
WHERE has_been_anonymised = false;

Which I was hoping would shrink the size of the index significantly
 
The partial index should be smaller, but when comparing to the index with "has_been_anonymised" as the leading column, it won't make a lot of difference.  You only have to scan a smaller part of the larger index, and the sizes of part of the index you have to scan in each case will be roughly comparable.
 
and encourage an index scan rather than bitmap, however it didn’t have
that effect.

To encourage index scans over bitmap scans, you can increase effective_cache_size.  Or to really force the issue, you can "set enable_bitmapscan=off" but that is something you would usually do locally for experimental purposes, not do it in production's config settings.

Cheers,

Jeff

Re: Bitmap heap scan performance

От
Rob Emery
Дата:
Aha!

That's a great hint, we had that set down to an obscenely low value
due to our max_connections setting being quite high. I've tweaked it
back up to 4MB for now and it's definitely had a marked improvement!

Many Thanks,
Rob

On 09/08/2019, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Aug 9, 2019 at 4:42 AM Rob Emery <re-pgsql@codeweavers.net> wrote:
>
>
>>
>> It
>> seems to me like the Bitmap Heap Scan on proposal is the issue because
>> the recheck is throwing away enormous amounts of data.
>
>
> Have you tried increasing work_mem?  The probable reason for the recheck is
> that your bitmap overflows the allowed memory, and then switches
> from storing every tid to storing just the block numbers.  As indicated by
> the lossy part of "Heap Blocks: exact=3983 lossy=27989"
>
> The
>> has_been_anonymised flag on the proposal is effectively a soft-delete;
>> so I’ve tried adding something like :
>>
>> CREATE INDEX ON proposal.proposal (system_id, legacy_organisation_id,
>> reference)
>> WHERE has_been_anonymised = false;
>>
>> Which I was hoping would shrink the size of the index significantly
>>
>
> The partial index should be smaller, but when comparing to the index with
> "has_been_anonymised" as the leading column, it won't make a lot of
> difference.  You only have to scan a smaller part of the larger index, and
> the sizes of part of the index you have to scan in each case will be
> roughly comparable.
>
>
>> and encourage an index scan rather than bitmap, however it didn’t have
>> that effect.
>
>
> To encourage index scans over bitmap scans, you can increase
> effective_cache_size.  Or to really force the issue, you can "set
> enable_bitmapscan=off" but that is something you would usually do locally
> for experimental purposes, not do it in production's config settings.
>
> Cheers,
>
> Jeff
>


--
Robert Emery
Infrastructure Director

E: robertemery@codeweavers.net | T: 01785 711633 | W: www.codeweavers.net

--
 <https://codeweavers.net>


A big Get Focused ‘thank you’
<https://codeweavers.net/company-blog/a-big-get-focused-thank-you>
Why you
should partner with an Agile company
<https://codeweavers.net/company-blog/why-you-should-partner-with-an-agile-company>


*
*
*Phone:* 0800 021 0888   Email: contactus@codeweavers.net
<mailto:contactus@codeweavers.net>
Codeweavers Ltd | Barn 4 | Dunston
Business Village | Dunston | ST18 9AB
Registered in England and Wales No.
04092394 | VAT registration no. 974 9705 63 




<https://twitter.com/Codeweavers_Ltd> 
<https://www.facebook.com/Codeweavers.Ltd/> 
<https://www.linkedin.com/company/codeweavers-limited>



Re: Bitmap heap scan performance

От
Jeremy Finzel
Дата:
Presumably I could partition proposal on has_been_anonymised, however
the row counts seem low enough that it feels a bit like overkill? We
also need referential integrity so I'll need to wait until that's in
(I think it's coming in PG12?)

If I decrease the number of legacy_organisation_id’s that are being
used then the query performance gets much better, but presumably
that’s because there’s a smaller dataset.

What are the actual counts that your queries are returning?

For your first query at least, are you sure your issue is not simply that you have no index on proposal.proposal.reference?  Because the entry_time filter is highly selective (and that part of the query only took 180ms), I would think the planner would first filter on the note table, then join back to proposal.proposal using an index scan on reference.  But you have no index there.  You might even consider an index on (reference) WHERE has_been_anonymised = false?

Also, one of your challenges seems to be that all of your indexed fields are low cardinality.  Rather than partitioning on has_been_anonymised, perhaps you could consider partitioning on system_id and sub-partition on legacy_organisation_id?  It depends on if your attached queries are always the standard pattern or not though.  This is something you might play around with.

Another option is to try yet further specificity in your partial index conditions, and also to only then index your primary key.  For example:

CREATE INDEX ON proposal.proposal (id)
WHERE has_been_anonymised = false AND system_id = 11;

I'm curious if any of these ideas would make a difference.

Thanks,
Jeremy