Обсуждение: partitioned table and ORDER BY indexed_field DESC LIMIT 1

Поиск

Список

Период

Сортировка

partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

Anton

Дата:

24 августа 2007 г., 08:53:12

Hi.

I just created partitioned table, n_traf, sliced by month
(n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are
indexed by 'date_time' column.
Then I populate it (last value have date 2007-08-...) and do VACUUM
ANALYZE ON n_traf_y2007... all of it.

Now I try to select latest value (ORDER BY date_time LIMIT 1), but
Postgres produced the ugly plan:

=# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Limit  (cost=824637.69..824637.69 rows=1 width=32)
   ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
         Sort Key: public.n_traf.date_time
         ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
               ->  Append  (cost=0.00..100877.99 rows=5643499 width=32)
                     ->  Seq Scan on n_traf  (cost=0.00..22.30
rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m01 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m02 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m03 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m04 n_traf
(cost=0.00..1.01 rows=1 width=32)
                     ->  Seq Scan on n_traf_y2007m05 n_traf
(cost=0.00..9110.89 rows=509689 width=32)
                     ->  Seq Scan on n_traf_y2007m06 n_traf
(cost=0.00..32003.89 rows=1790489 width=32)
                     ->  Seq Scan on n_traf_y2007m07 n_traf
(cost=0.00..33881.10 rows=1895510 width=32)
                     ->  Seq Scan on n_traf_y2007m08 n_traf
(cost=0.00..25702.70 rows=1437970 width=32)
                     ->  Seq Scan on n_traf_y2007m09 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m10 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m11 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m12 n_traf
(cost=0.00..22.30 rows=1230 width=32)
(18 rows)


Why it no uses indexes at all?
-------------------------------------------

The simplier query goes fast, use index.
=# explain analyze SELECT * FROM n_traf_y2007m08 ORDER BY date_time
DESC LIMIT 1;

           QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.03 rows=1 width=32) (actual time=0.156..0.158
rows=1 loops=1)
   ->  Index Scan Backward using n_traf_y2007m08_date_time_login_id on
n_traf_y2007m08  (cost=0.00..39489.48 rows=1437970 width=32) (actual
time=0.150..0.150 rows=1 loops=1)
 Total runtime: 0.241 ms
(3 rows)

Table n_traf looks like this:
=# \d n_traf
                     Table "public.n_traf"
   Column    |            Type             |     Modifiers
-------------+-----------------------------+--------------------
 login_id    | integer                     | not null
 traftype_id | integer                     | not null
 date_time   | timestamp without time zone | not null
 bytes_in    | bigint                      | not null default 0
 bytes_out   | bigint                      | not null default 0
Indexes:
    "n_traf_login_id_key" UNIQUE, btree (login_id, traftype_id, date_time)
    "n_traf_date_time_login_id" btree (date_time, login_id)
Foreign-key constraints:
    "n_traf_login_id_fkey" FOREIGN KEY (login_id) REFERENCES
n_logins(login_id) ON UPDATE CASCADE ON DELETE CASCADE
    "n_traf_traftype_id_fkey" FOREIGN KEY (traftype_id) REFERENCES
n_traftypes(traftype_id) ON UPDATE CASCADE
Rules:
    n_traf_insert_y2007m01 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-01-01'::date AND new.date_time <
'2007-02-01 00:00:00'::timestamp without time zone DO INSTEAD
  INSERT INTO n_traf_y2007m01 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m02 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-02-01'::date AND new.date_time <
'2007-03-01 00:00:00'::timestamp without time zone DO INSTEAD
  INSERT INTO n_traf_y2007m02 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m03 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-03-01'::date AND new.date_time <
'2007-04-01 00:00:00'::timestamp without time zone DO INSTEAD
  INSERT INTO n_traf_y2007m03 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m04 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-04-01'::date AND new.date_time <
'2007-05-01 00:00:00'::timestamp without time zone DO INSTEAD
  INSERT INTO n_traf_y2007m04 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m05 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-05-01'::date AND new.date_time <
'2007-06-01 00:00:00'::timestamp without time zone DO INSTEAD
  INSERT INTO n_traf_y2007m05 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m06 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-06-01'::date AND new.date_time <
'2007-07-01 00:00:00'::timestamp without time zone DO INSTEAD
  INSERT INTO n_traf_y2007m06 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m07 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-07-01'::date AND new.date_time <
'2007-08-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
INTO n_traf_y2007m07 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m08 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-08-01'::date AND new.date_time <
'2007-09-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
INTO n_traf_y2007m08 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m09 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-09-01'::date AND new.date_time <
'2007-10-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
INTO n_traf_y2007m09 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m10 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-10-01'::date AND new.date_time <
'2007-11-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
INTO n_traf_y2007m10 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m11 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-11-01'::date AND new.date_time <
'2007-12-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
INTO n_traf_y2007m11 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
    n_traf_insert_y2007m12 AS
    ON INSERT TO n_traf
   WHERE new.date_time >= '2007-12-01'::date AND new.date_time <
'2008-01-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
INTO n_traf_y2007m12 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
  VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)


Tables n_traf_y2007m... looks like these

                 Table "public.n_traf_y2007m01"
   Column    |            Type             |     Modifiers
-------------+-----------------------------+--------------------
 login_id    | integer                     | not null
 traftype_id | integer                     | not null
 date_time   | timestamp without time zone | not null
 bytes_in    | bigint                      | not null default 0
 bytes_out   | bigint                      | not null default 0
Indexes:
    "n_traf_y2007m01_date_time_login_id" btree (date_time, login_id)
Check constraints:
    "n_traf_y2007m01_date_time_check" CHECK (date_time >=
'2007-01-01'::date AND date_time < '2007-02-01 00:00:00'::timestamp
without time zone)
Inherits: n_traf

Index "public.n_traf_y2007m01_date_time_login_id"
  Column   |            Type
-----------+-----------------------------
 date_time | timestamp without time zone
 login_id  | integer
btree, for table "public.n_traf_y2007m01"

                 Table "public.n_traf_y2007m02"
   Column    |            Type             |     Modifiers
-------------+-----------------------------+--------------------
 login_id    | integer                     | not null
 traftype_id | integer                     | not null
 date_time   | timestamp without time zone | not null
 bytes_in    | bigint                      | not null default 0
 bytes_out   | bigint                      | not null default 0
Indexes:
    "n_traf_y2007m02_date_time_login_id" btree (date_time, login_id)
Check constraints:
    "n_traf_y2007m02_date_time_check" CHECK (date_time >=
'2007-02-01'::date AND date_time < '2007-03-01 00:00:00'::timestamp
without time zone)
Inherits: n_traf
...

--
engineer

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

"Mikko Partio"

Дата:

24 августа 2007 г., 10:24:38

On 8/24/07, Anton <anton200@gmail.com> wrote:

Hi.

I just created partitioned table, n_traf, sliced by month
(n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are
indexed by 'date_time' column.
Then I populate it (last value have date 2007-08-...) and do VACUUM
ANALYZE ON n_traf_y2007... all of it.

Now I try to select latest value (ORDER BY date_time LIMIT 1), but
Postgres produced the ugly plan:

=# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
Limit  (cost=824637.69..824637.69 rows=1 width=32)
   ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
         Sort Key: public.n_traf.date_time
         ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
               ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
                     ->  Seq Scan on n_traf  (cost=0.00..22.30
rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m01 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m02 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m03 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m04 n_traf
(cost=0.00..1.01 rows=1 width=32)
                     ->  Seq Scan on n_traf_y2007m05 n_traf
(cost=0.00..9110.89 rows=509689 width=32)
                     ->  Seq Scan on n_traf_y2007m06 n_traf
(cost= 0.00..32003.89 rows=1790489 width=32)
                     ->  Seq Scan on n_traf_y2007m07 n_traf
(cost=0.00..33881.10 rows=1895510 width=32)
                     ->  Seq Scan on n_traf_y2007m08 n_traf
(cost= 0.00..25702.70 rows=1437970 width=32)
                     ->  Seq Scan on n_traf_y2007m09 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m10 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m11 n_traf
(cost=0.00..22.30 rows=1230 width=32)
                     ->  Seq Scan on n_traf_y2007m12 n_traf
(cost=0.00..22.30 rows=1230 width=32)
(18 rows)

Why it no uses indexes at all?
-------------------------------------------

I'm no expert but I'd guess that the the planner doesn't know which partition holds the latest time so it has to read them all.

Regards

MP

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

Anton

Дата:

24 августа 2007 г., 10:32:48

> > =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> >                                                QUERY PLAN
> ---------------------------------------------------------------------------------------------------------
> > Limit  (cost=824637.69..824637.69 rows=1 width=32)
> >    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
> >          Sort Key: public.n_traf.date_time
> >          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
> >                ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
> >                      ->  Seq Scan on n_traf  (cost=0.00..22.30
> > rows=1230 width=32)
> >                      ->  Seq Scan on n_traf_y2007m01 n_traf
> > (cost=0.00..22.30 rows=1230 width=32)
...
> >                      ->  Seq Scan on n_traf_y2007m12 n_traf
> > (cost=0.00..22.30 rows=1230 width=32)
> > (18 rows)
> >
> > Why it no uses indexes at all?
> > -------------------------------------------
> I'm no expert but I'd guess that the the planner doesn't know which
> partition holds the latest time so it has to read them all.

Agree. But why it not uses indexes when it reading them?

--
engineer

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

"Heikki Linnakangas"

Дата:

24 августа 2007 г., 10:39:07

Anton wrote:
>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>>                                                QUERY PLAN
>> ---------------------------------------------------------------------------------------------------------
>>> Limit  (cost=824637.69..824637.69 rows=1 width=32)
>>>    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
>>>          Sort Key: public.n_traf.date_time
>>>          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
>>>                ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
>>>                      ->  Seq Scan on n_traf  (cost=0.00..22.30
>>> rows=1230 width=32)
>>>                      ->  Seq Scan on n_traf_y2007m01 n_traf
>>> (cost=0.00..22.30 rows=1230 width=32)
> ...
>>>                      ->  Seq Scan on n_traf_y2007m12 n_traf
>>> (cost=0.00..22.30 rows=1230 width=32)
>>> (18 rows)
>>>
>>> Why it no uses indexes at all?
>>> -------------------------------------------
>> I'm no expert but I'd guess that the the planner doesn't know which
>> partition holds the latest time so it has to read them all.
>
> Agree. But why it not uses indexes when it reading them?

The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
below the append node. Therefore it needs to fetch all rows from all the
tables, and the fastest way to do that is a seq scan.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

"Luke Lonergan"

Дата:

24 августа 2007 г., 16:20:41

We just fixed this - I'll post a patch, but I don't have time to verify
against HEAD.

- Luke


On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

> Anton wrote:
>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>>>                                                QUERY PLAN
>>> ----------------------------------------------------------------------------
>>> -----------------------------
>>>> Limit  (cost=824637.69..824637.69 rows=1 width=32)
>>>>    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
>>>>          Sort Key: public.n_traf.date_time
>>>>          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
>>>>                ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
>>>>                      ->  Seq Scan on n_traf  (cost=0.00..22.30
>>>> rows=1230 width=32)
>>>>                      ->  Seq Scan on n_traf_y2007m01 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)
>> ...
>>>>                      ->  Seq Scan on n_traf_y2007m12 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)
>>>> (18 rows)
>>>>
>>>> Why it no uses indexes at all?
>>>> -------------------------------------------
>>> I'm no expert but I'd guess that the the planner doesn't know which
>>> partition holds the latest time so it has to read them all.
>>
>> Agree. But why it not uses indexes when it reading them?
>
> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> below the append node. Therefore it needs to fetch all rows from all the
> tables, and the fastest way to do that is a seq scan.

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

"Luke Lonergan"

Дата:

24 августа 2007 г., 16:26:10

Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
it?

This enables the use of index scan of a child table by recognizing sort
order of the append node.  Kurt Harriman did the work.

- Luke

Index: cdb-pg/src/backend/optimizer/path/indxpath.c
===================================================================
RCS file:
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/path/indxpath.c,v
diff -u -N -r1.22 -r1.22.2.1
--- cdb-pg/src/backend/optimizer/path/indxpath.c    25 Apr 2007 22:07:21
-0000    1.22
+++ cdb-pg/src/backend/optimizer/path/indxpath.c    10 Aug 2007 03:41:15
-0000    1.22.2.1
@@ -379,8 +379,51 @@
             index_pathkeys = build_index_pathkeys(root, index,
                                                   ForwardScanDirection,
                                                   true);
-            useful_pathkeys = truncate_useless_pathkeys(root, rel,
-                                                        index_pathkeys);
+            /*
+             * CDB: For appendrel child, pathkeys contain Var nodes in
terms
+             * of the child's baserel.  Transform the pathkey list to refer
to
+             * columns of the appendrel.
+             */
+            if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+            {
+                AppendRelInfo  *appinfo = NULL;
+                RelOptInfo     *appendrel = NULL;
+                ListCell       *appcell;
+                CdbPathLocus    notalocus;
+
+                /* Find the appendrel of which this baserel is a child. */
+                foreach(appcell, root->append_rel_list)
+                {
+                    appinfo = (AppendRelInfo *)lfirst(appcell);
+                    if (appinfo->child_relid == rel->relid)
+                        break;
+                }
+                Assert(appinfo);
+                appendrel = find_base_rel(root, appinfo->parent_relid);
+
+                /*
+                 * The pathkey list happens to have the same format as the
+                 * partitioning key of a Hashed locus, so by disguising it
+                 * we can use cdbpathlocus_pull_above_projection() to do
the
+                 * transformation.
+                 */
+                CdbPathLocus_MakeHashed(¬alocus, index_pathkeys);
+                notalocus =
+                    cdbpathlocus_pull_above_projection(root,
+                                                       notalocus,
+                                                       rel->relids,
+                                                       rel->reltargetlist,
+
appendrel->reltargetlist,
+                                                       appendrel->relid);
+                if (CdbPathLocus_IsHashed(notalocus))
+                    index_pathkeys = truncate_useless_pathkeys(root,
appendrel,
+
notalocus.partkey);
+                else
+                    index_pathkeys = NULL;
+            }
+
+            useful_pathkeys = truncate_useless_pathkeys(root, rel,
+                                                        index_pathkeys);
         }
         else
             useful_pathkeys = NIL;
Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
===================================================================
RCS file:
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/path/pathkeys.c,v
diff -u -N -r1.18 -r1.18.2.1
--- cdb-pg/src/backend/optimizer/path/pathkeys.c    30 Apr 2007 05:44:07
-0000    1.18
+++ cdb-pg/src/backend/optimizer/path/pathkeys.c    10 Aug 2007 03:41:15
-0000    1.18.2.1
@@ -1403,55 +1403,53 @@
 {
     PathKeyItem    *item;
     Expr           *newexpr;
+    AttrNumber  targetindex;

     Assert(pathkey);

-    /* Use constant expr if available.  Will be at head of list. */
-    if (CdbPathkeyEqualsConstant(pathkey))
+    /* Find an expr that we can rewrite to use the projected columns. */
+    item = cdbpullup_findPathKeyItemInTargetList(pathkey,
+                                                 relids,
+                                                 targetlist,
+                                                 &targetindex); // OUT
+
+    /* If not found, see if the equiv class contains a constant expr. */
+    if (!item &&
+        CdbPathkeyEqualsConstant(pathkey))
     {
         item = (PathKeyItem *)linitial(pathkey);
         newexpr = (Expr *)copyObject(item->key);
     }

-    /* New vars for old! */
-    else
-    {
-        AttrNumber  targetindex;
+    /* Fail if no usable expr. */
+    else if (!item)
+        return NULL;

-        /* Find an expr that we can rewrite to use the projected columns.
*/
-        item = cdbpullup_findPathKeyItemInTargetList(pathkey,
-                                                     relids,
-                                                     targetlist,
-                                                     &targetindex); // OUT
-        if (!item)
-            return NULL;
+    /* If found matching targetlist item, make a Var that references it. */
+    else if (targetindex > 0)
+        newexpr = (Expr *)cdbpullup_makeVar(newrelid,
+                                            targetindex,
+                                            newvarlist,
+                                            (Expr *)item->key);

-        /* If found matching targetlist item, make a Var that references
it. */
-        if (targetindex > 0)
-            newexpr = (Expr *)cdbpullup_makeVar(newrelid,
-                                                targetindex,
-                                                newvarlist,
-                                                (Expr *)item->key);
+    /* Replace expr's Var nodes with new ones referencing the targetlist.
*/
+    else
+        newexpr = cdbpullup_expr((Expr *)item->key,
+                                 targetlist,
+                                 newvarlist,
+                                 newrelid);

-        /* Replace expr's Var nodes with new ones referencing the
targetlist. */
-        else
-            newexpr = cdbpullup_expr((Expr *)item->key,
-                                     targetlist,
-                                     newvarlist,
-                                     newrelid);
+    /* Pull up RelabelType node too, unless tlist expr has right type. */
+    if (IsA(item->key, RelabelType))
+    {
+        RelabelType    *oldrelabel = (RelabelType *)item->key;

-        /* Pull up RelabelType node too, unless tlist expr has right type.
*/
-        if (IsA(item->key, RelabelType))
-        {
-            RelabelType    *oldrelabel = (RelabelType *)item->key;
-
-            if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
-                oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
-                newexpr = (Expr *)makeRelabelType(newexpr,
-                                                  oldrelabel->resulttype,
-                                                  oldrelabel->resulttypmod,
-
oldrelabel->relabelformat);
-        }
+        if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
+            oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
+            newexpr = (Expr *)makeRelabelType(newexpr,
+                                              oldrelabel->resulttype,
+                                              oldrelabel->resulttypmod,
+                                              oldrelabel->relabelformat);
     }
     Insist(newexpr);

Index: cdb-pg/src/backend/optimizer/util/pathnode.c
===================================================================
RCS file:
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/util/pathnode.c,v
diff -u -N -r1.52.2.4 -r1.52.2.5
--- cdb-pg/src/backend/optimizer/util/pathnode.c    5 Aug 2007 23:06:44
-0000    1.52.2.4
+++ cdb-pg/src/backend/optimizer/util/pathnode.c    10 Aug 2007 03:41:15
-0000    1.52.2.5
@@ -1563,7 +1563,15 @@
             pathnode->path.rescannable = false;
     }

-    return pathnode;
+    /*
+     * CDB: If there is exactly one subpath, its ordering is preserved.
+     * Child rel's pathkey exprs are already expressed in terms of the
+     * columns of the parent appendrel.  See find_usable_indexes().
+     */
+    if (list_length(subpaths) == 1)
+        pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
+
+    return pathnode;
 }

 /*


On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

> Anton wrote:
>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>>>                                                QUERY PLAN
>>> ----------------------------------------------------------------------------
>>> -----------------------------
>>>> Limit  (cost=824637.69..824637.69 rows=1 width=32)
>>>>    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
>>>>          Sort Key: public.n_traf.date_time
>>>>          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
>>>>                ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
>>>>                      ->  Seq Scan on n_traf  (cost=0.00..22.30
>>>> rows=1230 width=32)
>>>>                      ->  Seq Scan on n_traf_y2007m01 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)
>> ...
>>>>                      ->  Seq Scan on n_traf_y2007m12 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)
>>>> (18 rows)
>>>>
>>>> Why it no uses indexes at all?
>>>> -------------------------------------------
>>> I'm no expert but I'd guess that the the planner doesn't know which
>>> partition holds the latest time so it has to read them all.
>>
>> Agree. But why it not uses indexes when it reading them?
>
> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> below the append node. Therefore it needs to fetch all rows from all the
> tables, and the fastest way to do that is a seq scan.

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

Tomas Tamosaitis

Дата:

27 августа 2007 г., 13:14:56

Pn, 2007 08 24 14:53 +0600, Anton rašė:
> Hi.
>
> I just created partitioned table, n_traf, sliced by month
> (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are
> indexed by 'date_time' column.
> Then I populate it (last value have date 2007-08-...) and do VACUUM
> ANALYZE ON n_traf_y2007... all of it.
>
> Now I try to select latest value (ORDER BY date_time LIMIT 1), but
> Postgres produced the ugly plan:
>
> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>                                                QUERY PL

can you test performance and send explain results of select like this :
select * from n_traf where date_time = (select max(date_time) from
n_traf);

i have similar problem with ~70M rows table (then using ordering), but
my table not partitioned.
I`m interesting how this select will works on partitioned table.


--
Pagarbiai,
Tomas Tamošaitis
Projektų Vadovas
Connecty
Skype://mazgis1009?add
Mob: +370 652 86127
e-pastas: tomas.tamosaitis@connecty.lt
web: www.connecty.lt

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

"Heikki Linnakangas"

Дата:

28 августа 2007 г., 10:21:31

Bruce, would you please add this to the 8.4 patch queue so we remember
to look at this later?

It didn't occur to me that we can do that in the degenerate case when
there's just a single node below the Append. A more general solution
would be to check if the pathkeys of all the child nodes match, and do a
"merge append" similar to a merge join.

Luke Lonergan wrote:
> Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
> it?
>
> This enables the use of index scan of a child table by recognizing sort
> order of the append node.  Kurt Harriman did the work.
>
> - Luke
>
> Index: cdb-pg/src/backend/optimizer/path/indxpath.c
> ===================================================================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/indxpath.c,v
> diff -u -N -r1.22 -r1.22.2.1
> --- cdb-pg/src/backend/optimizer/path/indxpath.c    25 Apr 2007 22:07:21
> -0000    1.22
> +++ cdb-pg/src/backend/optimizer/path/indxpath.c    10 Aug 2007 03:41:15
> -0000    1.22.2.1
> @@ -379,8 +379,51 @@
>              index_pathkeys = build_index_pathkeys(root, index,
>                                                    ForwardScanDirection,
>                                                    true);
> -            useful_pathkeys = truncate_useless_pathkeys(root, rel,
> -                                                        index_pathkeys);
> +            /*
> +             * CDB: For appendrel child, pathkeys contain Var nodes in
> terms
> +             * of the child's baserel.  Transform the pathkey list to refer
> to
> +             * columns of the appendrel.
> +             */
> +            if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
> +            {
> +                AppendRelInfo  *appinfo = NULL;
> +                RelOptInfo     *appendrel = NULL;
> +                ListCell       *appcell;
> +                CdbPathLocus    notalocus;
> +
> +                /* Find the appendrel of which this baserel is a child. */
> +                foreach(appcell, root->append_rel_list)
> +                {
> +                    appinfo = (AppendRelInfo *)lfirst(appcell);
> +                    if (appinfo->child_relid == rel->relid)
> +                        break;
> +                }
> +                Assert(appinfo);
> +                appendrel = find_base_rel(root, appinfo->parent_relid);
> +
> +                /*
> +                 * The pathkey list happens to have the same format as the
> +                 * partitioning key of a Hashed locus, so by disguising it
> +                 * we can use cdbpathlocus_pull_above_projection() to do
> the
> +                 * transformation.
> +                 */
> +                CdbPathLocus_MakeHashed(¬alocus, index_pathkeys);
> +                notalocus =
> +                    cdbpathlocus_pull_above_projection(root,
> +                                                       notalocus,
> +                                                       rel->relids,
> +                                                       rel->reltargetlist,
> +
> appendrel->reltargetlist,
> +                                                       appendrel->relid);
> +                if (CdbPathLocus_IsHashed(notalocus))
> +                    index_pathkeys = truncate_useless_pathkeys(root,
> appendrel,
> +
> notalocus.partkey);
> +                else
> +                    index_pathkeys = NULL;
> +            }
> +
> +            useful_pathkeys = truncate_useless_pathkeys(root, rel,
> +                                                        index_pathkeys);
>          }
>          else
>              useful_pathkeys = NIL;
> Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
> ===================================================================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/pathkeys.c,v
> diff -u -N -r1.18 -r1.18.2.1
> --- cdb-pg/src/backend/optimizer/path/pathkeys.c    30 Apr 2007 05:44:07
> -0000    1.18
> +++ cdb-pg/src/backend/optimizer/path/pathkeys.c    10 Aug 2007 03:41:15
> -0000    1.18.2.1
> @@ -1403,55 +1403,53 @@
>  {
>      PathKeyItem    *item;
>      Expr           *newexpr;
> +    AttrNumber  targetindex;
>
>      Assert(pathkey);
>
> -    /* Use constant expr if available.  Will be at head of list. */
> -    if (CdbPathkeyEqualsConstant(pathkey))
> +    /* Find an expr that we can rewrite to use the projected columns. */
> +    item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> +                                                 relids,
> +                                                 targetlist,
> +                                                 &targetindex); // OUT
> +
> +    /* If not found, see if the equiv class contains a constant expr. */
> +    if (!item &&
> +        CdbPathkeyEqualsConstant(pathkey))
>      {
>          item = (PathKeyItem *)linitial(pathkey);
>          newexpr = (Expr *)copyObject(item->key);
>      }
>
> -    /* New vars for old! */
> -    else
> -    {
> -        AttrNumber  targetindex;
> +    /* Fail if no usable expr. */
> +    else if (!item)
> +        return NULL;
>
> -        /* Find an expr that we can rewrite to use the projected columns.
> */
> -        item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> -                                                     relids,
> -                                                     targetlist,
> -                                                     &targetindex); // OUT
> -        if (!item)
> -            return NULL;
> +    /* If found matching targetlist item, make a Var that references it. */
> +    else if (targetindex > 0)
> +        newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> +                                            targetindex,
> +                                            newvarlist,
> +                                            (Expr *)item->key);
>
> -        /* If found matching targetlist item, make a Var that references
> it. */
> -        if (targetindex > 0)
> -            newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> -                                                targetindex,
> -                                                newvarlist,
> -                                                (Expr *)item->key);
> +    /* Replace expr's Var nodes with new ones referencing the targetlist.
> */
> +    else
> +        newexpr = cdbpullup_expr((Expr *)item->key,
> +                                 targetlist,
> +                                 newvarlist,
> +                                 newrelid);
>
> -        /* Replace expr's Var nodes with new ones referencing the
> targetlist. */
> -        else
> -            newexpr = cdbpullup_expr((Expr *)item->key,
> -                                     targetlist,
> -                                     newvarlist,
> -                                     newrelid);
> +    /* Pull up RelabelType node too, unless tlist expr has right type. */
> +    if (IsA(item->key, RelabelType))
> +    {
> +        RelabelType    *oldrelabel = (RelabelType *)item->key;
>
> -        /* Pull up RelabelType node too, unless tlist expr has right type.
> */
> -        if (IsA(item->key, RelabelType))
> -        {
> -            RelabelType    *oldrelabel = (RelabelType *)item->key;
> -
> -            if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> -                oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> -                newexpr = (Expr *)makeRelabelType(newexpr,
> -                                                  oldrelabel->resulttype,
> -                                                  oldrelabel->resulttypmod,
> -
> oldrelabel->relabelformat);
> -        }
> +        if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> +            oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> +            newexpr = (Expr *)makeRelabelType(newexpr,
> +                                              oldrelabel->resulttype,
> +                                              oldrelabel->resulttypmod,
> +                                              oldrelabel->relabelformat);
>      }
>      Insist(newexpr);
>
> Index: cdb-pg/src/backend/optimizer/util/pathnode.c
> ===================================================================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/util/pathnode.c,v
> diff -u -N -r1.52.2.4 -r1.52.2.5
> --- cdb-pg/src/backend/optimizer/util/pathnode.c    5 Aug 2007 23:06:44
> -0000    1.52.2.4
> +++ cdb-pg/src/backend/optimizer/util/pathnode.c    10 Aug 2007 03:41:15
> -0000    1.52.2.5
> @@ -1563,7 +1563,15 @@
>              pathnode->path.rescannable = false;
>      }
>
> -    return pathnode;
> +    /*
> +     * CDB: If there is exactly one subpath, its ordering is preserved.
> +     * Child rel's pathkey exprs are already expressed in terms of the
> +     * columns of the parent appendrel.  See find_usable_indexes().
> +     */
> +    if (list_length(subpaths) == 1)
> +        pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
> +
> +    return pathnode;
>  }
>
>  /*
>
>
> On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
>
>> Anton wrote:
>>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>>>>                                                QUERY PLAN
>>>> ----------------------------------------------------------------------------
>>>> -----------------------------
>>>>> Limit  (cost=824637.69..824637.69 rows=1 width=32)
>>>>>    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
>>>>>          Sort Key: public.n_traf.date_time
>>>>>          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
>>>>>                ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
>>>>>                      ->  Seq Scan on n_traf  (cost=0.00..22.30
>>>>> rows=1230 width=32)
>>>>>                      ->  Seq Scan on n_traf_y2007m01 n_traf
>>>>> (cost=0.00..22.30 rows=1230 width=32)
>>> ...
>>>>>                      ->  Seq Scan on n_traf_y2007m12 n_traf
>>>>> (cost=0.00..22.30 rows=1230 width=32)
>>>>> (18 rows)
>>>>>
>>>>> Why it no uses indexes at all?
>>>>> -------------------------------------------
>>>> I'm no expert but I'd guess that the the planner doesn't know which
>>>> partition holds the latest time so it has to read them all.
>>> Agree. But why it not uses indexes when it reading them?
>> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
>> below the append node. Therefore it needs to fetch all rows from all the
>> tables, and the fastest way to do that is a seq scan.
>
>


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

Anton

Дата:

27 октября 2007 г., 04:26:31

I want ask about problem with partioned tables (it was discussed some
time ago, see below). Is it fixed somehow in 8.2.5 ?

2007/8/24, Luke Lonergan <llonergan@greenplum.com>:
> Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
> it?
>
> This enables the use of index scan of a child table by recognizing sort
> order of the append node.  Kurt Harriman did the work.
...
>
> On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
>
> > Anton wrote:
> >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> >>>>                                                QUERY PLAN
> >>> ----------------------------------------------------------------------------
> >>> -----------------------------
> >>>> Limit  (cost=824637.69..824637.69 rows=1 width=32)
> >>>>    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
> >>>>          Sort Key: public.n_traf.date_time
> >>>>          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
> >>>>                ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
> >>>>                      ->  Seq Scan on n_traf  (cost=0.00..22.30
> >>>> rows=1230 width=32)
> >>>>                      ->  Seq Scan on n_traf_y2007m01 n_traf
> >>>> (cost=0.00..22.30 rows=1230 width=32)
> >> ...
> >>>>                      ->  Seq Scan on n_traf_y2007m12 n_traf
> >>>> (cost=0.00..22.30 rows=1230 width=32)
> >>>> (18 rows)
> >>>>
> >>>> Why it no uses indexes at all?
> >>>> -------------------------------------------
> >>> I'm no expert but I'd guess that the the planner doesn't know which
> >>> partition holds the latest time so it has to read them all.
> >>
> >> Agree. But why it not uses indexes when it reading them?
> >
> > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> > below the append node. Therefore it needs to fetch all rows from all the
> > tables, and the fastest way to do that is a seq scan.

--
engineer

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

Tom Lane

Дата:

27 октября 2007 г., 05:37:39

Anton <anton200@gmail.com> writes:
> I want ask about problem with partioned tables (it was discussed some
> time ago, see below). Is it fixed somehow in 8.2.5 ?

No.  The patch you mention never was considered at all, since it
consisted of a selective quote from Greenplum source code.  It would
not even compile in community Postgres, because it adds calls to half a
dozen Greenplum routines that we've never seen.  Not to mention that
the base of the diff is Greenplum proprietary code, so the patch itself
wouldn't even apply successfully.

As to whether it would work if we had the full story ... well, not
having the full story, I don't want to opine.

            regards, tom lane

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

Anton

Дата:

27 октября 2007 г., 08:53:46

2007/10/27, Tom Lane <tgl@sss.pgh.pa.us>:
> Anton <anton200@gmail.com> writes:
> > I want ask about problem with partioned tables (it was discussed some
> > time ago, see below). Is it fixed somehow in 8.2.5 ?
>
> No.  The patch you mention never was considered at all, since it
> consisted of a selective quote from Greenplum source code.  It would
...
> As to whether it would work if we had the full story ... well, not
> having the full story, I don't want to opine.


Sorry, my english is not good enough to understand your last sentence.

I repost here my original question "Why it no uses indexes?" (on
partitioned table and ORDER BY indexed_field DESC LIMIT 1), if you
mean that you miss this discussion.

> I just created partitioned table, n_traf, sliced by month
> (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are
> indexed by 'date_time' column.
> Then I populate it (last value have date 2007-08-...) and do VACUUM
> ANALYZE ON n_traf_y2007... all of it.
>
> Now I try to select latest value (ORDER BY date_time LIMIT 1), but
> Postgres produced the ugly plan:
>
> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>                                                QUERY PLAN
> ---------------------------------------------------------------------------------------------------------
>  Limit  (cost=824637.69..824637.69 rows=1 width=32)
>    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
>          Sort Key: public.n_traf.date_time
>          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
>                ->  Append  (cost=0.00..100877.99 rows=5643499 width=32)
>                      ->  Seq Scan on n_traf  (cost=0.00..22.30
> rows=1230 width=32)
>                      ->  Seq Scan on n_traf_y2007m01 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
>                      ->  Seq Scan on n_traf_y2007m02 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
>                      ->  Seq Scan on n_traf_y2007m03 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
>                      ->  Seq Scan on n_traf_y2007m04 n_traf
> (cost=0.00..1.01 rows=1 width=32)
>                      ->  Seq Scan on n_traf_y2007m05 n_traf
> (cost=0.00..9110.89 rows=509689 width=32)
>                      ->  Seq Scan on n_traf_y2007m06 n_traf
> (cost=0.00..32003.89 rows=1790489 width=32)
>                      ->  Seq Scan on n_traf_y2007m07 n_traf
> (cost=0.00..33881.10 rows=1895510 width=32)
>                      ->  Seq Scan on n_traf_y2007m08 n_traf
> (cost=0.00..25702.70 rows=1437970 width=32)
>                      ->  Seq Scan on n_traf_y2007m09 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
>                      ->  Seq Scan on n_traf_y2007m10 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
>                      ->  Seq Scan on n_traf_y2007m11 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
>                      ->  Seq Scan on n_traf_y2007m12 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
> (18 rows)
>
>
> Why it no uses indexes at all?
> -------------------------------------------
>
> The simplier query goes fast, use index.
> =# explain analyze SELECT * FROM n_traf_y2007m08 ORDER BY date_time
> DESC LIMIT 1;
>
>            QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.03 rows=1 width=32) (actual time=0.156..0.158
> rows=1 loops=1)
>    ->  Index Scan Backward using n_traf_y2007m08_date_time_login_id on
> n_traf_y2007m08  (cost=0.00..39489.48 rows=1437970 width=32) (actual
> time=0.150..0.150 rows=1 loops=1)
>  Total runtime: 0.241 ms
> (3 rows)
>
> Table n_traf looks like this:
> =# \d n_traf
>                      Table "public.n_traf"
>    Column    |            Type             |     Modifiers
> -------------+-----------------------------+--------------------
>  login_id    | integer                     | not null
>  traftype_id | integer                     | not null
>  date_time   | timestamp without time zone | not null
>  bytes_in    | bigint                      | not null default 0
>  bytes_out   | bigint                      | not null default 0
> Indexes:
>     "n_traf_login_id_key" UNIQUE, btree (login_id, traftype_id, date_time)
>     "n_traf_date_time_login_id" btree (date_time, login_id)
> Foreign-key constraints:
>     "n_traf_login_id_fkey" FOREIGN KEY (login_id) REFERENCES
> n_logins(login_id) ON UPDATE CASCADE ON DELETE CASCADE
>     "n_traf_traftype_id_fkey" FOREIGN KEY (traftype_id) REFERENCES
> n_traftypes(traftype_id) ON UPDATE CASCADE
> Rules:
>     n_traf_insert_y2007m01 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-01-01'::date AND new.date_time <
> '2007-02-01 00:00:00'::timestamp without time zone DO INSTEAD
>   INSERT INTO n_traf_y2007m01 (login_id, traftype_id, date_time,
> bytes_in, bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m02 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-02-01'::date AND new.date_time <
> '2007-03-01 00:00:00'::timestamp without time zone DO INSTEAD
>   INSERT INTO n_traf_y2007m02 (login_id, traftype_id, date_time,
> bytes_in, bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m03 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-03-01'::date AND new.date_time <
> '2007-04-01 00:00:00'::timestamp without time zone DO INSTEAD
>   INSERT INTO n_traf_y2007m03 (login_id, traftype_id, date_time,
> bytes_in, bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m04 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-04-01'::date AND new.date_time <
> '2007-05-01 00:00:00'::timestamp without time zone DO INSTEAD
>   INSERT INTO n_traf_y2007m04 (login_id, traftype_id, date_time,
> bytes_in, bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m05 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-05-01'::date AND new.date_time <
> '2007-06-01 00:00:00'::timestamp without time zone DO INSTEAD
>   INSERT INTO n_traf_y2007m05 (login_id, traftype_id, date_time,
> bytes_in, bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m06 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-06-01'::date AND new.date_time <
> '2007-07-01 00:00:00'::timestamp without time zone DO INSTEAD
>   INSERT INTO n_traf_y2007m06 (login_id, traftype_id, date_time,
> bytes_in, bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m07 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-07-01'::date AND new.date_time <
> '2007-08-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
> INTO n_traf_y2007m07 (login_id, traftype_id, date_time, bytes_in,
> bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m08 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-08-01'::date AND new.date_time <
> '2007-09-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
> INTO n_traf_y2007m08 (login_id, traftype_id, date_time, bytes_in,
> bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m09 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-09-01'::date AND new.date_time <
> '2007-10-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
> INTO n_traf_y2007m09 (login_id, traftype_id, date_time, bytes_in,
> bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m10 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-10-01'::date AND new.date_time <
> '2007-11-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
> INTO n_traf_y2007m10 (login_id, traftype_id, date_time, bytes_in,
> bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m11 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-11-01'::date AND new.date_time <
> '2007-12-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
> INTO n_traf_y2007m11 (login_id, traftype_id, date_time, bytes_in,
> bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>     n_traf_insert_y2007m12 AS
>     ON INSERT TO n_traf
>    WHERE new.date_time >= '2007-12-01'::date AND new.date_time <
> '2008-01-01 00:00:00'::timestamp without time zone DO INSTEAD  INSERT
> INTO n_traf_y2007m12 (login_id, traftype_id, date_time, bytes_in,
> bytes_out)
>   VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
> new.bytes_out)
>
>
> Tables n_traf_y2007m... looks like these
>
>                  Table "public.n_traf_y2007m01"
>    Column    |            Type             |     Modifiers
> -------------+-----------------------------+--------------------
>  login_id    | integer                     | not null
>  traftype_id | integer                     | not null
>  date_time   | timestamp without time zone | not null
>  bytes_in    | bigint                      | not null default 0
>  bytes_out   | bigint                      | not null default 0
> Indexes:
>     "n_traf_y2007m01_date_time_login_id" btree (date_time, login_id)
> Check constraints:
>     "n_traf_y2007m01_date_time_check" CHECK (date_time >=
> '2007-01-01'::date AND date_time < '2007-02-01 00:00:00'::timestamp
> without time zone)
> Inherits: n_traf
>
> Index "public.n_traf_y2007m01_date_time_login_id"
>   Column   |            Type
> -----------+-----------------------------
>  date_time | timestamp without time zone
>  login_id  | integer
> btree, for table "public.n_traf_y2007m01"
>
>                  Table "public.n_traf_y2007m02"
>    Column    |            Type             |     Modifiers
> -------------+-----------------------------+--------------------
>  login_id    | integer                     | not null
>  traftype_id | integer                     | not null
>  date_time   | timestamp without time zone | not null
>  bytes_in    | bigint                      | not null default 0
>  bytes_out   | bigint                      | not null default 0
> Indexes:
>     "n_traf_y2007m02_date_time_login_id" btree (date_time, login_id)
> Check constraints:
>     "n_traf_y2007m02_date_time_check" CHECK (date_time >=
> '2007-02-01'::date AND date_time < '2007-03-01 00:00:00'::timestamp
> without time zone)
> Inherits: n_traf
> ...

--
engineer

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

Heikki Linnakangas

Дата:

27 октября 2007 г., 09:17:42

Anton wrote:
> I repost here my original question "Why it no uses indexes?" (on
> partitioned table and ORDER BY indexed_field DESC LIMIT 1), if you
> mean that you miss this discussion.

As I said back then:

The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
below the append node.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

От

Bruce Momjian

Дата:

12 марта 2008 г., 19:16:54

Heikki Linnakangas wrote:
> Bruce, would you please add this to the 8.4 patch queue so we remember
> to look at this later?

OK, added to queue, but Tom's patch queue comment is:

This is useless since it does not represent a complete patch; the
provided code calls a lot of Greenplum-private routines that weren't
provided. It's not even reviewable let alone a candidate to apply.

---------------------------------------------------------------------------


>
> It didn't occur to me that we can do that in the degenerate case when
> there's just a single node below the Append. A more general solution
> would be to check if the pathkeys of all the child nodes match, and do a
> "merge append" similar to a merge join.
>
> Luke Lonergan wrote:
> > Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
> > it?
> >
> > This enables the use of index scan of a child table by recognizing sort
> > order of the append node.  Kurt Harriman did the work.
> >
> > - Luke
> >
> > Index: cdb-pg/src/backend/optimizer/path/indxpath.c
> > ===================================================================
> > RCS file:
> > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> > er/path/indxpath.c,v
> > diff -u -N -r1.22 -r1.22.2.1
> > --- cdb-pg/src/backend/optimizer/path/indxpath.c    25 Apr 2007 22:07:21
> > -0000    1.22
> > +++ cdb-pg/src/backend/optimizer/path/indxpath.c    10 Aug 2007 03:41:15
> > -0000    1.22.2.1
> > @@ -379,8 +379,51 @@
> >              index_pathkeys = build_index_pathkeys(root, index,
> >                                                    ForwardScanDirection,
> >                                                    true);
> > -            useful_pathkeys = truncate_useless_pathkeys(root, rel,
> > -                                                        index_pathkeys);
> > +            /*
> > +             * CDB: For appendrel child, pathkeys contain Var nodes in
> > terms
> > +             * of the child's baserel.  Transform the pathkey list to refer
> > to
> > +             * columns of the appendrel.
> > +             */
> > +            if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
> > +            {
> > +                AppendRelInfo  *appinfo = NULL;
> > +                RelOptInfo     *appendrel = NULL;
> > +                ListCell       *appcell;
> > +                CdbPathLocus    notalocus;
> > +
> > +                /* Find the appendrel of which this baserel is a child. */
> > +                foreach(appcell, root->append_rel_list)
> > +                {
> > +                    appinfo = (AppendRelInfo *)lfirst(appcell);
> > +                    if (appinfo->child_relid == rel->relid)
> > +                        break;
> > +                }
> > +                Assert(appinfo);
> > +                appendrel = find_base_rel(root, appinfo->parent_relid);
> > +
> > +                /*
> > +                 * The pathkey list happens to have the same format as the
> > +                 * partitioning key of a Hashed locus, so by disguising it
> > +                 * we can use cdbpathlocus_pull_above_projection() to do
> > the
> > +                 * transformation.
> > +                 */
> > +                CdbPathLocus_MakeHashed(¬alocus, index_pathkeys);
> > +                notalocus =
> > +                    cdbpathlocus_pull_above_projection(root,
> > +                                                       notalocus,
> > +                                                       rel->relids,
> > +                                                       rel->reltargetlist,
> > +
> > appendrel->reltargetlist,
> > +                                                       appendrel->relid);
> > +                if (CdbPathLocus_IsHashed(notalocus))
> > +                    index_pathkeys = truncate_useless_pathkeys(root,
> > appendrel,
> > +
> > notalocus.partkey);
> > +                else
> > +                    index_pathkeys = NULL;
> > +            }
> > +
> > +            useful_pathkeys = truncate_useless_pathkeys(root, rel,
> > +                                                        index_pathkeys);
> >          }
> >          else
> >              useful_pathkeys = NIL;
> > Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
> > ===================================================================
> > RCS file:
> > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> > er/path/pathkeys.c,v
> > diff -u -N -r1.18 -r1.18.2.1
> > --- cdb-pg/src/backend/optimizer/path/pathkeys.c    30 Apr 2007 05:44:07
> > -0000    1.18
> > +++ cdb-pg/src/backend/optimizer/path/pathkeys.c    10 Aug 2007 03:41:15
> > -0000    1.18.2.1
> > @@ -1403,55 +1403,53 @@
> >  {
> >      PathKeyItem    *item;
> >      Expr           *newexpr;
> > +    AttrNumber  targetindex;
> >
> >      Assert(pathkey);
> >
> > -    /* Use constant expr if available.  Will be at head of list. */
> > -    if (CdbPathkeyEqualsConstant(pathkey))
> > +    /* Find an expr that we can rewrite to use the projected columns. */
> > +    item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> > +                                                 relids,
> > +                                                 targetlist,
> > +                                                 &targetindex); // OUT
> > +
> > +    /* If not found, see if the equiv class contains a constant expr. */
> > +    if (!item &&
> > +        CdbPathkeyEqualsConstant(pathkey))
> >      {
> >          item = (PathKeyItem *)linitial(pathkey);
> >          newexpr = (Expr *)copyObject(item->key);
> >      }
> >
> > -    /* New vars for old! */
> > -    else
> > -    {
> > -        AttrNumber  targetindex;
> > +    /* Fail if no usable expr. */
> > +    else if (!item)
> > +        return NULL;
> >
> > -        /* Find an expr that we can rewrite to use the projected columns.
> > */
> > -        item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> > -                                                     relids,
> > -                                                     targetlist,
> > -                                                     &targetindex); // OUT
> > -        if (!item)
> > -            return NULL;
> > +    /* If found matching targetlist item, make a Var that references it. */
> > +    else if (targetindex > 0)
> > +        newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> > +                                            targetindex,
> > +                                            newvarlist,
> > +                                            (Expr *)item->key);
> >
> > -        /* If found matching targetlist item, make a Var that references
> > it. */
> > -        if (targetindex > 0)
> > -            newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> > -                                                targetindex,
> > -                                                newvarlist,
> > -                                                (Expr *)item->key);
> > +    /* Replace expr's Var nodes with new ones referencing the targetlist.
> > */
> > +    else
> > +        newexpr = cdbpullup_expr((Expr *)item->key,
> > +                                 targetlist,
> > +                                 newvarlist,
> > +                                 newrelid);
> >
> > -        /* Replace expr's Var nodes with new ones referencing the
> > targetlist. */
> > -        else
> > -            newexpr = cdbpullup_expr((Expr *)item->key,
> > -                                     targetlist,
> > -                                     newvarlist,
> > -                                     newrelid);
> > +    /* Pull up RelabelType node too, unless tlist expr has right type. */
> > +    if (IsA(item->key, RelabelType))
> > +    {
> > +        RelabelType    *oldrelabel = (RelabelType *)item->key;
> >
> > -        /* Pull up RelabelType node too, unless tlist expr has right type.
> > */
> > -        if (IsA(item->key, RelabelType))
> > -        {
> > -            RelabelType    *oldrelabel = (RelabelType *)item->key;
> > -
> > -            if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> > -                oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> > -                newexpr = (Expr *)makeRelabelType(newexpr,
> > -                                                  oldrelabel->resulttype,
> > -                                                  oldrelabel->resulttypmod,
> > -
> > oldrelabel->relabelformat);
> > -        }
> > +        if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> > +            oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> > +            newexpr = (Expr *)makeRelabelType(newexpr,
> > +                                              oldrelabel->resulttype,
> > +                                              oldrelabel->resulttypmod,
> > +                                              oldrelabel->relabelformat);
> >      }
> >      Insist(newexpr);
> >
> > Index: cdb-pg/src/backend/optimizer/util/pathnode.c
> > ===================================================================
> > RCS file:
> > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> > er/util/pathnode.c,v
> > diff -u -N -r1.52.2.4 -r1.52.2.5
> > --- cdb-pg/src/backend/optimizer/util/pathnode.c    5 Aug 2007 23:06:44
> > -0000    1.52.2.4
> > +++ cdb-pg/src/backend/optimizer/util/pathnode.c    10 Aug 2007 03:41:15
> > -0000    1.52.2.5
> > @@ -1563,7 +1563,15 @@
> >              pathnode->path.rescannable = false;
> >      }
> >
> > -    return pathnode;
> > +    /*
> > +     * CDB: If there is exactly one subpath, its ordering is preserved.
> > +     * Child rel's pathkey exprs are already expressed in terms of the
> > +     * columns of the parent appendrel.  See find_usable_indexes().
> > +     */
> > +    if (list_length(subpaths) == 1)
> > +        pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
> > +
> > +    return pathnode;
> >  }
> >
> >  /*
> >
> >
> > On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
> >
> >> Anton wrote:
> >>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> >>>>>                                                QUERY PLAN
> >>>> ----------------------------------------------------------------------------
> >>>> -----------------------------
> >>>>> Limit  (cost=824637.69..824637.69 rows=1 width=32)
> >>>>>    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
> >>>>>          Sort Key: public.n_traf.date_time
> >>>>>          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
> >>>>>                ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
> >>>>>                      ->  Seq Scan on n_traf  (cost=0.00..22.30
> >>>>> rows=1230 width=32)
> >>>>>                      ->  Seq Scan on n_traf_y2007m01 n_traf
> >>>>> (cost=0.00..22.30 rows=1230 width=32)
> >>> ...
> >>>>>                      ->  Seq Scan on n_traf_y2007m12 n_traf
> >>>>> (cost=0.00..22.30 rows=1230 width=32)
> >>>>> (18 rows)
> >>>>>
> >>>>> Why it no uses indexes at all?
> >>>>> -------------------------------------------
> >>>> I'm no expert but I'd guess that the the planner doesn't know which
> >>>> partition holds the latest time so it has to read them all.
> >>> Agree. But why it not uses indexes when it reading them?
> >> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> >> below the append node. Therefore it needs to fetch all rows from all the
> >> tables, and the fastest way to do that is a seq scan.
> >
> >
>
>
> --
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: partitioned table and ORDER BY indexed_field DESC LIMIT 1