Обсуждение: Why my query not doing index only scan

Поиск
Список
Период
Сортировка

Why my query not doing index only scan

От
Arup Rakshit
Дата:
Hello All,

I am learning at this point how index works in DBMS. So I am looking for a very broad explanation to clear my basics.

I have an index defined "inspector_tool_idx4_1" UNIQUE, btree (company_id, item_code, deleted_at). Now I am using the *company_id* column in the where clause, and the selecting just the *item_code* field for all matching rows. I expected here the planner will do a index only scans. But it is doing bitmap index scan. Any idea what it is not doing what I expected it to do.


inspection_development=# explain analyze select item_code from inspector_tool_components where company_id = '7881ff2e-0557-4734-9da8-2d33072ff1ef';
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on inspector_tool_components  (cost=45.92..1360.48 rows=1226 width=8) (actual time=0.382..1.202 rows=1232 loops=1)
   Recheck Cond: (company_id = '7881ff2e-0557-4734-9da8-2d33072ff1ef'::uuid)
   Heap Blocks: exact=81
   ->  Bitmap Index Scan on inspector_tool_idx4_1  (cost=0.00..45.61 rows=1226 width=0) (actual time=0.347..0.347 rows=1232 loops=1)
         Index Cond: (company_id = '7881ff2e-0557-4734-9da8-2d33072ff1ef'::uuid)
 Planning time: 0.207 ms
 Execution time: 1.358 ms
(7 rows)



Thanks,

Arup Rakshit



Re: Why my query not doing index only scan

От
Stephen Frost
Дата:
Greetings,

* Arup Rakshit (ar@zeit.io) wrote:
> I have an index defined "inspector_tool_idx4_1" UNIQUE, btree (company_id, item_code, deleted_at). Now I am using the
*company_id*column in the where clause, and the selecting just the *item_code* field for all matching rows. I expected
herethe planner will do a index only scans. But it is doing bitmap index scan. Any idea what it is not doing what I
expectedit to do.
 

One possibility is that the visibility map isn't current.

Indexes don't include visibility information.  The way an index-only
scan works is that we track pages which are 'all visible' (meaning that
every tuple on that page is visible to all running transactions) in a
seperate file called the 'visibility map' (aka the VM).  The VM is
updated by the VACUUM process- but we only automatically run a VACUUM
(with the autovacuum process) when thresholds have been reached for the
number of UPDATE'd or DELETE'd tuples.

What this means is that if you are playing around in development and
just COPY a bunch of rows into a table, but don't ever UPDATE or DELETE
the rows in that table, then you'll almost never get an index-only scan
because the VM won't be current (and PG knows this).

Make sure to do a VACUUM after loading data (and letting any ongoing
transactions finish) and then re-test.  That should make it sure that
the VM is current and make it more likely that PG will do an index-only
scan.  Not a guarantee still, but that's the first thing I'd try, based
on what you've shared here.

Thanks!

Stephen

Вложения

Re: Why my query not doing index only scan

От
Arup Rakshit
Дата:
Thanks Stephen. After running the vacuum on the table it worked.

inspection_development=# explain analyze select item_code from inspector_tool_components where company_id = '7881ff2e-0557-4734-9da8-2d33072ff1ef';
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using inspector_tool_idx4_1 on inspector_tool_components  (cost=0.41..57.87 rows=1226 width=8) (actual time=0.044..0.626 rows=1232 loops=1)
   Index Cond: (company_id = '7881ff2e-0557-4734-9da8-2d33072ff1ef'::uuid)
   Heap Fetches: 0
 Planning time: 0.190 ms
 Execution time: 0.778 ms
(5 rows)


Thanks,

Arup Rakshit



On 10-Sep-2018, at 4:58 PM, Stephen Frost <sfrost@snowman.net> wrote:

VACUUM

Re: Why my query not doing index only scan

От
Arup Rakshit
Дата:
Hello Stephen,

I would like to ask one more question related to this topic. When I take a dump from production, and restore it to development DB, what are the commands I generally need to run to dev deb quack close to production?

Thanks,

Arup Rakshit



On 10-Sep-2018, at 4:58 PM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Arup Rakshit (ar@zeit.io) wrote:
I have an index defined "inspector_tool_idx4_1" UNIQUE, btree (company_id, item_code, deleted_at). Now I am using the *company_id* column in the where clause, and the selecting just the *item_code* field for all matching rows. I expected here the planner will do a index only scans. But it is doing bitmap index scan. Any idea what it is not doing what I expected it to do.

One possibility is that the visibility map isn't current.

Indexes don't include visibility information.  The way an index-only
scan works is that we track pages which are 'all visible' (meaning that
every tuple on that page is visible to all running transactions) in a
seperate file called the 'visibility map' (aka the VM).  The VM is
updated by the VACUUM process- but we only automatically run a VACUUM
(with the autovacuum process) when thresholds have been reached for the
number of UPDATE'd or DELETE'd tuples.

What this means is that if you are playing around in development and
just COPY a bunch of rows into a table, but don't ever UPDATE or DELETE
the rows in that table, then you'll almost never get an index-only scan
because the VM won't be current (and PG knows this).

Make sure to do a VACUUM after loading data (and letting any ongoing
transactions finish) and then re-test.  That should make it sure that
the VM is current and make it more likely that PG will do an index-only
scan.  Not a guarantee still, but that's the first thing I'd try, based
on what you've shared here.

Thanks!

Stephen

Re: Why my query not doing index only scan

От
Stephen Frost
Дата:
Greetings,

* Arup Rakshit (ar@zeit.io) wrote:
> I would like to ask one more question related to this topic. When I take a dump from production, and restore it to
developmentDB, what are the commands I generally need to run to dev deb quack close to production?
 

The best way to get a prod-like environment in development is to use a
file-level backup tool, like pgBackRest.  There's a few nice things that
does:

- Copies the database files at a physical level, making it much closer
  to what prod is like
- Checks your database checksums (if you have them enabled, which
  hopefully you do, see initdb -k)
- Verifies your backup/restore approach
- Provides the fastest way to perform a restore in the event of an issue
- Allows you to do point-in-time-recovery (PITR)

Using pg_dump/pg_restore takes a logical export of the database and then
imports it into a new PG cluster.  Even if you do VACUUM ANALYZE after
doing pg_dump/restore, you're going to end up with (somewhat, at least)
different stats, all the tables will be completely packed (no bloat, no
space for new tuples to go into existing pages), all of the indexes will
be completely rebuilt and pristine, etc.  In the end, it's pretty
different and while it'll behavior in a similar manner to prod in a lot
of ways, there'll be a lot of ways it doesn't too.

Thanks!

Stephen

Вложения

Re: Why my query not doing index only scan

От
Laurenz Albe
Дата:
Arup Rakshit wrote:
> I am learning at this point how index works in DBMS. So I am looking for a very broad explanation to clear my
basics.
> 
> I have an index defined "inspector_tool_idx4_1" UNIQUE, btree (company_id, item_code, deleted_at).
> Now I am using the *company_id* column in the where clause, and the selecting just the *item_code* field for all
matchingrows.
 
> I expected here the planner will do a index only scans. But it is doing bitmap index scan. Any idea what it is not
doingwhat I expected it to do.
 
> 
> 
> inspection_development=# explain analyze select item_code from inspector_tool_components where company_id =
'7881ff2e-0557-4734-9da8-2d33072ff1ef';
>                                                              QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on inspector_tool_components  (cost=45.92..1360.48 rows=1226 width=8) (actual time=0.382..1.202
rows=1232loops=1)
 
>    Recheck Cond: (company_id = '7881ff2e-0557-4734-9da8-2d33072ff1ef'::uuid)
>    Heap Blocks: exact=81
>    ->  Bitmap Index Scan on inspector_tool_idx4_1  (cost=0.00..45.61 rows=1226 width=0) (actual time=0.347..0.347
rows=1232loops=1)
 
>          Index Cond: (company_id = '7881ff2e-0557-4734-9da8-2d33072ff1ef'::uuid)
>  Planning time: 0.207 ms
>  Execution time: 1.358 ms
> (7 rows)

Try to "VACUUM (ANALYZE) inspector_tool_components", that will set the
visibility mape and get the statistics right, maybe than you get an
index only scan.

How many rows does the table contain?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com