Обсуждение: [GENERAL] Why so long?

Поиск
Список
Период
Сортировка

[GENERAL] Why so long?

От
Steve Clark
Дата:
Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the index?


pmacct=# explain select max(id) from netflow;
                                                        QUERY PLAN                                                       
--------------------------------------------------------------------------------------------------------------------------
 Result  (cost=1.13..1.14 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.71..1.13 rows=1 width=8)
           ->  Index Only Scan Backward using netflow_pkey on netflow  (cost=0.71..3799108784.10 rows=9123246080 width=8)
                 Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# explain select min(id) from netflow;
                                                   QUERY PLAN                                                   
-----------------------------------------------------------------------------------------------------------------
 Result  (cost=1.13..1.14 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.71..1.13 rows=1 width=8)
           ->  Index Only Scan using netflow_pkey on netflow  (cost=0.71..3799108784.10 rows=9123246080 width=8)
                 Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# \timing
Timing is on.
pmacct=# select max(id) from netflow;
     max    
-------------
 17547256873
(1 row)

Time: 0.626 ms



pmacct=# select min(id) from netflow;
^CCancel request sent
ERROR:  canceling statement due to user request
Time: 339114.334 ms


                                               Table "public.netflow"
     Column     |            Type             |                              Modifiers                             
----------------+-----------------------------+---------------------------------------------------------------------
 id             | bigint                      | not null default nextval('netflow_id_seq'::regclass)
 agent_id       | bigint                      |
 bytes          | bigint                      |
 stamp_inserted | timestamp without time zone | not null default '0001-01-01 00:00:00'::timestamp without time zone
 stamp_updated  | timestamp without time zone |
 packets        | integer                     | default 0
 port_src       | integer                     | default 0
 port_dst       | integer                     | default 0
 ip_proto       | smallint                    | default 0
 tos            | smallint                    | default 0
 ip_src         | inet                        | not null default '0.0.0.0'::inet
 ip_dst         | inet                        | not null default '0.0.0.0'::inet
Indexes:
    "netflow_pkey" PRIMARY KEY, btree (id)
    "netflow_ts_key" btree (stamp_inserted)
    "netflow_tsu_idx" btree (stamp_updated)
Triggers:
    netflow_import AFTER INSERT OR UPDATE ON netflow FOR EACH STATEMENT EXECUTE PROCEDURE netflow_update()



--

Re: [GENERAL] Why so long?

От
Steve Clark
Дата:
Should add this is version 9.4.10 of postgresql

On 04/19/2017 11:24 AM, Steve Clark wrote:
Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the index?


pmacct=# explain select max(id) from netflow;
                                                        QUERY PLAN                                                       
--------------------------------------------------------------------------------------------------------------------------
 Result  (cost=1.13..1.14 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.71..1.13 rows=1 width=8)
           ->  Index Only Scan Backward using netflow_pkey on netflow  (cost=0.71..3799108784.10 rows=9123246080 width=8)
                 Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# explain select min(id) from netflow;
                                                   QUERY PLAN                                                   
-----------------------------------------------------------------------------------------------------------------
 Result  (cost=1.13..1.14 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.71..1.13 rows=1 width=8)
           ->  Index Only Scan using netflow_pkey on netflow  (cost=0.71..3799108784.10 rows=9123246080 width=8)
                 Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# \timing
Timing is on.
pmacct=# select max(id) from netflow;
     max    
-------------
 17547256873
(1 row)

Time: 0.626 ms



pmacct=# select min(id) from netflow;
^CCancel request sent
ERROR:  canceling statement due to user request
Time: 339114.334 ms


                                               Table "public.netflow"
     Column     |            Type             |                              Modifiers                             
----------------+-----------------------------+---------------------------------------------------------------------
 id             | bigint                      | not null default nextval('netflow_id_seq'::regclass)
 agent_id       | bigint                      |
 bytes          | bigint                      |
 stamp_inserted | timestamp without time zone | not null default '0001-01-01 00:00:00'::timestamp without time zone
 stamp_updated  | timestamp without time zone |
 packets        | integer                     | default 0
 port_src       | integer                     | default 0
 port_dst       | integer                     | default 0
 ip_proto       | smallint                    | default 0
 tos            | smallint                    | default 0
 ip_src         | inet                        | not null default '0.0.0.0'::inet
 ip_dst         | inet                        | not null default '0.0.0.0'::inet
Indexes:
    "netflow_pkey" PRIMARY KEY, btree (id)
    "netflow_ts_key" btree (stamp_inserted)
    "netflow_tsu_idx" btree (stamp_updated)
Triggers:
    netflow_import AFTER INSERT OR UPDATE ON netflow FOR EACH STATEMENT EXECUTE PROCEDURE netflow_update()



--



--
Stephen Clark
NetWolves Managed Services, LLC.
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com

Re: [GENERAL] Why so long?

От
Jeff Janes
Дата:
On Wed, Apr 19, 2017 at 8:24 AM, Steve Clark <steve.clark@netwolves.com> wrote:
Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the index?

Not if the low end of the index is stuffed full of obsolete entries, which haven't been cleaned up because it is not being vacuumed often enough.

Do you have autovacuum on?  Have you manually vacuumed the table recently?

Cheers,

Jeff

Re: [GENERAL] Why so long?

От
Steve Clark
Дата:
On 04/19/2017 11:57 AM, Jeff Janes wrote:
On Wed, Apr 19, 2017 at 8:24 AM, Steve Clark <steve.clark@netwolves.com> wrote:
Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the index?

Not if the low end of the index is stuffed full of obsolete entries, which haven't been cleaned up because it is not being vacuumed often enough.

Do you have autovacuum on?  Have you manually vacuumed the table recently?

Cheers,

Jeff
Hi Jeff,

Autovacuum is turned on.

 schemaname |        relname        | last_vacuum |        last_autovacuum        | vacuum_count | autovacuum_count
------------+-----------------------+-------------+-------------------------------+--------------+------------------
 public     | netflow               |             | 2017-04-11 01:18:53.261221-04 |            0 |                1


It is a large table.
select pg_size_pretty(pg_relation_size('netflow'));
 pg_size_pretty
----------------
 1267 GB

select pg_size_pretty(pg_total_relation_size('netflow_pkey'));
 pg_size_pretty
----------------
 287 GB

Regards,
Steve

Re: [GENERAL] Why so long?

От
David Rowley
Дата:
On 20 April 2017 at 03:24, Steve Clark <steve.clark@netwolves.com> wrote:
> pmacct=# explain select min(id) from netflow;
>                                                    QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
>  Result  (cost=1.13..1.14 rows=1 width=0)
>    InitPlan 1 (returns $0)
>      ->  Limit  (cost=0.71..1.13 rows=1 width=8)
>            ->  Index Only Scan using netflow_pkey on netflow
> (cost=0.71..3799108784.10 rows=9123246080 width=8)
>                  Index Cond: (id IS NOT NULL)
> (5 rows)
>
> pmacct=# \timing
> Timing is on.
> pmacct=# select max(id) from netflow;
>      max
> -------------
>  17547256873
> (1 row)
>
> Time: 0.626 ms
>
>
>
> pmacct=# select min(id) from netflow;
> ^CCancel request sent
> ERROR:  canceling statement due to user request
> Time: 339114.334 ms

It may help the diagnosis if you run an EXPLAIN (ANALYZE, BUFFERS)
version of this query to completion.

The index pages could simply be cold and coming from disk on a very
much I/O starved system.

More recently added pages are more likely to be cached.

You may also want to consider running the EXPLAIN (ANALYZE, BUFFERS)
after having SET track_io_timing = on;

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] Why so long?

От
Rafia Sabih
Дата:


On Wed, Apr 19, 2017 at 8:54 PM, Steve Clark <steve.clark@netwolves.com> wrote:
Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the index?


pmacct=# explain select max(id) from netflow;
                                                        QUERY PLAN                                                       
--------------------------------------------------------------------------------------------------------------------------
 Result  (cost=1.13..1.14 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.71..1.13 rows=1 width=8)
           ->  Index Only Scan Backward using netflow_pkey on netflow  (cost=0.71..3799108784.10 rows=9123246080 width=8)
                 Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# explain select min(id) from netflow;
                                                   QUERY PLAN                                                   
-----------------------------------------------------------------------------------------------------------------
 Result  (cost=1.13..1.14 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.71..1.13 rows=1 width=8)
           ->  Index Only Scan using netflow_pkey on netflow  (cost=0.71..3799108784.10 rows=9123246080 width=8)
                 Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# \timing
Timing is on.
pmacct=# select max(id) from netflow;
     max    
-------------
 17547256873
(1 row)

Time: 0.626 ms



pmacct=# select min(id) from netflow;
^CCancel request sent
ERROR:  canceling statement due to user request
Time: 339114.334 ms


                                               Table "public.netflow"
     Column     |            Type             |                              Modifiers                             
----------------+-----------------------------+---------------------------------------------------------------------
 id             | bigint                      | not null default nextval('netflow_id_seq'::regclass)
 agent_id       | bigint                      |
 bytes          | bigint                      |
 stamp_inserted | timestamp without time zone | not null default '0001-01-01 00:00:00'::timestamp without time zone
 stamp_updated  | timestamp without time zone |
 packets        | integer                     | default 0
 port_src       | integer                     | default 0
 port_dst       | integer                     | default 0
 ip_proto       | smallint                    | default 0
 tos            | smallint                    | default 0
 ip_src         | inet                        | not null default '0.0.0.0'::inet
 ip_dst         | inet                        | not null default '0.0.0.0'::inet
Indexes:
    "netflow_pkey" PRIMARY KEY, btree (id)
    "netflow_ts_key" btree (stamp_inserted)
    "netflow_tsu_idx" btree (stamp_updated)
Triggers:
    netflow_import AFTER INSERT OR UPDATE ON netflow FOR EACH STATEMENT EXECUTE PROCEDURE netflow_update()


Is this some one-time event or you could reproduce it consistently? 


--
Regards,
Rafia Sabih