Обсуждение: Inherits and "get highest id".

Поиск
Список
Период
Сортировка

Inherits and "get highest id".

От
Dawid Kuroczko
Дата:
Hello.

I've tried data partitioning using INHERITS mechanism (pgsql 8.0.3).

The schema looks like this:

CREATE TABLE log (
        logid integer NOT NULL PRIMARY KEY,
        logdate timestamp(0) without time zone NOT NULL,
        typeid integer NOT NULL,
        ip      inet,
        [.....]
);

CREATE TABLE log_data.log200501 () INHERITS log;
CREATE TABLE log_data.log200502 () INHERITS log;
...
CREATE TABLE log_data.log200512 () INHERITS log;

Each month-table has over 2 milion rows.
Each table has a primary index on logid column.

And now, if I do a simple query like this:

explain analyze SELECT logid FROM log_200501 ORDER BY logid LIMIT 1;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.04 rows=1 width=4) (actual time=0.113..0.115
rows=1 loops=1)
   ->  Index Scan using log_200501_pkey on log_200501
(cost=0.00..168766.23 rows=4149475 width=4) (actual time=0.108..0.108
rows=1 loops=1)
 Total runtime: 0.196 ms

But when I'm trying to get "global" highest logid I get:

qnex=# explain SELECT logid FROM log ORDER BY logid LIMIT 1;
                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------
 Limit  (cost=4140892.91..4140892.91 rows=1 width=4)
   ->  Sort  (cost=4140892.91..4201607.87 rows=24285986 width=4)
         Sort Key: logs.log.logid
         ->  Result  (cost=0.00..887109.86 rows=24285986 width=4)
               ->  Append  (cost=0.00..887109.86 rows=24285986 width=4)
                     ->  Seq Scan on log  (cost=0.00..10.70 rows=70 width=4)
                     ->  Seq Scan on log_200501 log
(cost=0.00..155529.75 rows=4149475 width=4)
                     ->  Seq Scan on log_200502 log
(cost=0.00..145904.29 rows=3857729 width=4)
                     ->  Seq Scan on log_200503 log
(cost=0.00..165485.78 rows=4369278 width=4)
                     ->  Seq Scan on log_200504 log
(cost=0.00..420093.74 rows=11908874 width=4)
                     ->  Seq Scan on log_200505 log  (cost=0.00..10.70
rows=70 width=4)
                     ->  Seq Scan on log_200506 log  (cost=0.00..10.70
rows=70 width=4)
                     ->  Seq Scan on log_200507 log  (cost=0.00..10.70
rows=70 width=4)
                     ->  Seq Scan on log_200508 log  (cost=0.00..10.70
rows=70 width=4)
                     ->  Seq Scan on log_200509 log  (cost=0.00..10.70
rows=70 width=4)
                     ->  Seq Scan on log_200510 log  (cost=0.00..10.70
rows=70 width=4)
                     ->  Seq Scan on log_200511 log  (cost=0.00..10.70
rows=70 width=4)
                     ->  Seq Scan on log_200512 log  (cost=0.00..10.70
rows=70 width=4)
(18 rows)

Time: 3,142 ms



In other words -- doing such a one row select means scanning the
whole partitioned data.  the primaryindex on logid is not used
anywhere.  Creating view (SELECT * UNION ALL SELECT * UNION ALL....)
does not help either.

I wonder -- is there any better solution?  selecting highest from each
table, unioning it and then selecting highest of the highest works
well but it strikes me as 'not exactly the niciest implementation'
(one of the reasons is that I have to remembed about it each time
new table will be added to schema...).

Any comments, hints?

Regards,
    Dawid

Re: Inherits and "get highest id".

От
Bruno Wolff III
Дата:
On Fri, Jun 10, 2005 at 12:27:32 +0200,
  Dawid Kuroczko <qnex42@gmail.com> wrote:
>
> In other words -- doing such a one row select means scanning the
> whole partitioned data.  the primaryindex on logid is not used
> anywhere.  Creating view (SELECT * UNION ALL SELECT * UNION ALL....)
> does not help either.
>
> I wonder -- is there any better solution?  selecting highest from each
> table, unioning it and then selecting highest of the highest works
> well but it strikes me as 'not exactly the niciest implementation'
> (one of the reasons is that I have to remembed about it each time
> new table will be added to schema...).
>
> Any comments, hints?

I seem to remember people getting performance by doing union alls of selects
that explicity list the partition condition; as that will let the planner
speed up checks of tables that cannot result in matches.