Обсуждение: Extremely slow count (simple query, with index)

Поиск
Список
Период
Сортировка

Extremely slow count (simple query, with index)

От
Marco Colli
Дата:
Hello!

Any help would be greatly appreciated.
I need to run these simple queries on a table with millions of rows:

```
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123;
```

```
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;
```

The count result for both queries, for project 123, is about 5M.

I have an index in place on `project_id`, and also another index on `(project_id, trashed_at)`:

```
"index_subscriptions_on_project_id_and_created_at" btree (project_id, created_at DESC)
"index_subscriptions_on_project_id_and_trashed_at" btree (project_id, trashed_at DESC)
```

The problem is that both queries are extremely slow and take about 17s each.

These are the results of `EXPLAIN ANALIZE`:


```
      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2068127.29..2068127.30 rows=1 width=0) (actual time=17342.420..17342.420 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23 rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 loops=1)
         Recheck Cond: (project_id = 123)
         Rows Removed by Index Recheck: 23746378
         Heap Blocks: exact=131205 lossy=1480411
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74 rows=4996823 width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)
               Index Cond: (project_id = 123)
 Planning time: 0.090 ms
 Execution time: 17344.182 ms
(9 rows)
```


```
      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2047881.69..2047881.70 rows=1 width=0) (actual time=17557.218..17557.218 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19 rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 loops=1)
         Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
         Rows Removed by Index Recheck: 23746273
         Heap Blocks: exact=131144 lossy=1480409
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55 rows=4428599 width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)
               Index Cond: ((project_id = 123) AND (trashed_at IS NULL))
 Planning time: 0.084 ms
 Execution time: 17558.522 ms
(9 rows)
```

What is the problem? 
What can I do to improve the performance (i.e. count in a few seconds)?

I have also tried to increase work_mem from 16MB to 128MB without any improvement.
Even an approximate count would be enough.
Postgresql v9.5

Re: Extremely slow count (simple query, with index)

От
Justin Pryzby
Дата:
On Thu, Aug 22, 2019 at 02:44:15PM +0200, Marco Colli wrote:
> SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123;
> SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS
NULL;

>    ->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23 rows=4996823 width=0) (actual
time=1666.409..16855.610rows=4994254 loops=1)
 
>          Recheck Cond: (project_id = 123)
>          Rows Removed by Index Recheck: 23746378
>          Heap Blocks: exact=131205 lossy=1480411
>          ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74
rows=4996823width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)
 

>    ->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19 rows=4428599 width=0) (actual
time=1644.966..17078.378rows=4994130 loops=1)
 
>          Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
>          Rows Removed by Index Recheck: 23746273
>          Heap Blocks: exact=131144 lossy=1480409
>          ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55
rows=4428599width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)
 

You can see it used the same index in both cases, and the index scan was
reasonably fast (compared to your goal), but the heap component was slow.

I suggest to run VACUUM FREEZE on the table, to try to encourage index only
scan.  If that works, you should condider setting aggressive autovacuum
parameter, at least for the table:
ALTER TABLE subscriptions SET (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005);
-- And possibly lower value of autovacuum_freeze_max_age

Or, running manual vacuum possibly during quiet hours (possibly setting
vacuum_freeze_table_age to encourage aggressive vacuum).

> Even an approximate count would be enough.

You can SELECT reltuples FROM pg_class WHERE oid='subscriptions'::oid, but its
accuracy depends on frequency of vacuum (and if a large delete/insert happened
since the most recent vacuum/analyze).

Justin



Re: Extremely slow count (simple query, with index)

От
Ravikumar Reddy
Дата:
Dear ,

Create the below indexes and try it !!!

create index ind_ subscriptions_ project_id   on "subscriptions"("project_id")
Where "project_id"= 1 

create index ind_ subscriptions_ trashed_at   on "subscriptions"(" trashed_at  ")
Where "trashed_at" is null 

 

On Thu, Aug 22, 2019 at 6:36 PM Marco Colli <collimarco91@gmail.com> wrote:
Hello!

Any help would be greatly appreciated.
I need to run these simple queries on a table with millions of rows:

```
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123;
```

```
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;
```

The count result for both queries, for project 123, is about 5M.

I have an index in place on `project_id`, and also another index on `(project_id, trashed_at)`:

```
"index_subscriptions_on_project_id_and_created_at" btree (project_id, created_at DESC)
"index_subscriptions_on_project_id_and_trashed_at" btree (project_id, trashed_at DESC)
```

The problem is that both queries are extremely slow and take about 17s each.

These are the results of `EXPLAIN ANALIZE`:


```
      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2068127.29..2068127.30 rows=1 width=0) (actual time=17342.420..17342.420 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23 rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 loops=1)
         Recheck Cond: (project_id = 123)
         Rows Removed by Index Recheck: 23746378
         Heap Blocks: exact=131205 lossy=1480411
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74 rows=4996823 width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)
               Index Cond: (project_id = 123)
 Planning time: 0.090 ms
 Execution time: 17344.182 ms
(9 rows)
```


```
      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2047881.69..2047881.70 rows=1 width=0) (actual time=17557.218..17557.218 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19 rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 loops=1)
         Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
         Rows Removed by Index Recheck: 23746273
         Heap Blocks: exact=131144 lossy=1480409
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55 rows=4428599 width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)
               Index Cond: ((project_id = 123) AND (trashed_at IS NULL))
 Planning time: 0.084 ms
 Execution time: 17558.522 ms
(9 rows)
```

What is the problem? 
What can I do to improve the performance (i.e. count in a few seconds)?

I have also tried to increase work_mem from 16MB to 128MB without any improvement.
Even an approximate count would be enough.
Postgresql v9.5



--
Regards,
Ravikumar S,
Ph: 8106741263

Re: Extremely slow count (simple query, with index)

От
MichaelDBA
Дата:
Hi Marco,

Since you said approximates would be good enough, there are two ways to do that.  Query pg_class.reltuples or pg_stat_user_tables.n_live_tup.  Personally, I prefer the pg_stat_user tables since it is more current than pg_class table, unless you run ANALYZE on your target table before querying pg_class table.  Then of course you get results in a few milliseconds since you do not incur the tablescan cost of selecting directly from the target table.

Regards,
Michael Vitale


Marco Colli wrote on 8/22/2019 8:44 AM:
Hello!

Any help would be greatly appreciated.
I need to run these simple queries on a table with millions of rows:

```
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123;
```

```
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;
```

The count result for both queries, for project 123, is about 5M.

I have an index in place on `project_id`, and also another index on `(project_id, trashed_at)`:

```
"index_subscriptions_on_project_id_and_created_at" btree (project_id, created_at DESC)
"index_subscriptions_on_project_id_and_trashed_at" btree (project_id, trashed_at DESC)
```

The problem is that both queries are extremely slow and take about 17s each.

These are the results of `EXPLAIN ANALIZE`:


```
      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2068127.29..2068127.30 rows=1 width=0) (actual time=17342.420..17342.420 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23 rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 loops=1)
         Recheck Cond: (project_id = 123)
         Rows Removed by Index Recheck: 23746378
         Heap Blocks: exact=131205 lossy=1480411
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74 rows=4996823 width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)
               Index Cond: (project_id = 123)
 Planning time: 0.090 ms
 Execution time: 17344.182 ms
(9 rows)
```


```
      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2047881.69..2047881.70 rows=1 width=0) (actual time=17557.218..17557.218 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19 rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 loops=1)
         Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
         Rows Removed by Index Recheck: 23746273
         Heap Blocks: exact=131144 lossy=1480409
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55 rows=4428599 width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)
               Index Cond: ((project_id = 123) AND (trashed_at IS NULL))
 Planning time: 0.084 ms
 Execution time: 17558.522 ms
(9 rows)
```

What is the problem? 
What can I do to improve the performance (i.e. count in a few seconds)?

I have also tried to increase work_mem from 16MB to 128MB without any improvement.
Even an approximate count would be enough.
Postgresql v9.5


Re: Extremely slow count (simple query, with index)

От
Michael Lewis
Дата:
You can SELECT reltuples FROM pg_class WHERE oid='subscriptions'::oid, but its
accuracy depends on frequency of vacuum (and if a large delete/insert happened
since the most recent vacuum/analyze).

This only seems helpful to find approx. count for the entire table, without considering the WHERE condition.

Marco,
As Justin pointed out, you have most of your time in the bitmap heap scan. Are you running SSDs? I wonder about tuning effective_io_concurrency to make more use of them.

"Currently, this setting only affects bitmap heap scans."

Also, how many million rows is this table in total? Have you considered partitioning?

Re: Extremely slow count (simple query, with index)

От
Marco Colli
Дата:
I have completely solved (from 17s to 1s) by running this command:
vacuum analyze subscriptions;

Now I run the autovacuum more frequently using these settings in postgresql.conf:
autovacuum_vacuum_scale_factor = 0.01
autovacuum_analyze_scale_factor = 0.01

Thanks to everyone - and in particular to Justin Pryzby for pointing me in the right direction.

On Thu, Aug 22, 2019 at 7:37 PM Michael Lewis <mlewis@entrata.com> wrote:
You can SELECT reltuples FROM pg_class WHERE oid='subscriptions'::oid, but its
accuracy depends on frequency of vacuum (and if a large delete/insert happened
since the most recent vacuum/analyze).

This only seems helpful to find approx. count for the entire table, without considering the WHERE condition.

Marco,
As Justin pointed out, you have most of your time in the bitmap heap scan. Are you running SSDs? I wonder about tuning effective_io_concurrency to make more use of them.

"Currently, this setting only affects bitmap heap scans."

Also, how many million rows is this table in total? Have you considered partitioning?

Re: Extremely slow count (simple query, with index)

От
Justin Pryzby
Дата:
On Thu, Aug 22, 2019 at 07:54:57PM +0200, Marco Colli wrote:
> I have completely solved (from 17s to 1s) by running this command:
> vacuum analyze subscriptions;

Thanks for following though.

On Thu, Aug 22, 2019 at 08:19:10AM -0500, Justin Pryzby wrote:
> You can see it used the same index in both cases, and the index scan was
> reasonably fast (compared to your goal), but the heap component was slow.
> 
> I suggest to run VACUUM FREEZE on the table, to try to encourage index only
> scan.  If that works, you should condider setting aggressive autovacuum

I should've used a better word, since aggressive means something specific.
Perhaps just: "parameter to encourage more frequent autovacuums".

> parameter, at least for the table:
> ALTER TABLE subscriptions SET (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005);
> -- And possibly lower value of autovacuum_freeze_max_age
> 
> Or, running manual vacuum possibly during quiet hours (possibly setting
> vacuum_freeze_table_age to encourage aggressive vacuum).

I think my reference to autovacuum_freeze_max_age and vacuum_freeze_table_age
were incorrect; what's important is "relallvisible" and not "relfrozenxid".
And xid wraparound isn't at issue here.

> > Even an approximate count would be enough.
> 
> You can SELECT reltuples FROM pg_class WHERE oid='subscriptions'::oid, but its

Should be: oid='subscriptions'::regclass

> accuracy depends on frequency of vacuum (and if a large delete/insert happened
> since the most recent vacuum/analyze).

Justin