Обсуждение: Indexes with descending date columns
Hi
I have a performance problem when traversing a table in index order with
multiple columns including a date column in date reverse order. Below
follows a simplified description of the table, the index and the
associated query
\d prcdedit
prcdedit_prcd | character(20) |
prcdedit_date | timestamp without time zone |
Indexes:
"prcdedit_idx" btree (prcdedit_prcd, prcdedit_date)
When invoking a query such as
select oid, prcdedit_prcd, prcdedit_date, 'dd/mm/yyyy hh24:mi:ss') as
mydate where prcdedit_prcd > 'somevalue' order by prcdedit_prcd,
prcdedit_date desc;
the peformance is dismal.
However removing the 'desc' qualifier as follows the query flys
select oid, prcdedit_prcd, prcdedit_date, 'dd/mm/yyyy hh24:mi:ss') as
mydate where prcdedit_prcd > 'somevalue' order by prcdedit_prcd,
prcdedit_date;
PostgreSQL Version = 8.1.2
Row count on the table is > 300000
Explain is as follows for desc
Limit (cost=81486.35..81486.41 rows=25 width=230) (actual
time=116619.652..116619.861 rows=25 loops=1)
-> Sort (cost=81486.35..82411.34 rows=369997 width=230) (actual
time=116619.646..116619.729 rows=25 loops=1)
Sort Key: prcdedit_prcd, prcdedit_date, oid
-> Bitmap Heap Scan on prcdedit (cost=4645.99..23454.94
rows=369997 width=230) (actual time=376.952..11798.834 rows=369630
loops=1)
Recheck Cond: (prcdedit_prcd > '063266
'::bpchar)
-> Bitmap Index Scan on prcdedit_idx
(cost=0.00..4645.99 rows=369997 width=0) (actual time=366.048..366.048
rows=369630 loops=1)
Index Cond: (prcdedit_prcd > '063266
'::bpchar)
Total runtime: 116950.175 ms
and as follows when I remove the 'desc'
Limit (cost=0.00..2.34 rows=25 width=230) (actual time=0.082..0.535
rows=25 loops=1)
-> Index Scan using prcdedit_idx on prcdedit (cost=0.00..34664.63
rows=369997 width=230) (actual time=0.075..0.405 rows=25 loops=1)
Index Cond: (prcdedit_prcd > '063266 '::bpchar)
Total runtime: 0.664 ms
Any assistance/advice much appreciated.
--
Regards
Theo
> I have a performance problem when traversing a table in index order with > multiple columns including a date column in date reverse order. Below > follows a simplified description of the table, the index and the > associated query > > \d prcdedit > prcdedit_prcd | character(20) | > prcdedit_date | timestamp without time zone | > > Indexes: > "prcdedit_idx" btree (prcdedit_prcd, prcdedit_date) Depending on how you use the table, there are three possible solutions. First, if it makes sense in the domain, using an ORDER BY where _both_ columns are used descending will make PG search theindex in reverse and will be just as fast as when both as searched by the default ascending. Second possibility: Create a dummy column whose value depends on the negative of prcdedit_date, e.g., -extract(epoch fromprcdedit_date), keep the dummy column in sync with the original column using triggers, and rewrite your queries to useORDER BY prcdedit_prod, dummy_column. Third: Create an index on a function which sorts in the order you want, and then always sort using the function index (youcould use the -extract(epoch...) gimmick for that, among other possibilities.) HTH.
On Fri, 2006-03-17 at 08:25, andrew@pillette.com wrote: > > I have a performance problem when traversing a table in index order with > > multiple columns including a date column in date reverse order. Below > > follows a simplified description of the table, the index and the > > associated query > > > > \d prcdedit > > prcdedit_prcd | character(20) | > > prcdedit_date | timestamp without time zone | > > > > Indexes: > > "prcdedit_idx" btree (prcdedit_prcd, prcdedit_date) > > Depending on how you use the table, there are three possible solutions. > > First, if it makes sense in the domain, using an ORDER BY where _both_ columns are used descending will make PG searchthe index in reverse and will be just as fast as when both as searched by the default ascending. > > Second possibility: Create a dummy column whose value depends on the negative of prcdedit_date, e.g., -extract(epoch fromprcdedit_date), keep the dummy column in sync with the original column using triggers, and rewrite your queries to useORDER BY prcdedit_prod, dummy_column. > > Third: Create an index on a function which sorts in the order you want, and then always sort using the function index (youcould use the -extract(epoch...) gimmick for that, among other possibilities.) > > HTH. All good input - thanks, however, before I start messing with my stuff which I know will be complex - some questions to any of the developers on the list. i Is it feasible to extend index creation to support descending columns? ... this is supported on other commercial and non commercial databases, but I do not know if this is a SQL standard. ii If no to i, is it feasible to extend PostgreSQL to allow traversing an index in column descending and column ascending order - assuming an order by on more than one column with column order not in the same direction and indexes existing? ... if that makes sense. -- Regards Theo
Theo Kramer wrote: > All good input - thanks, however, before I start messing with my stuff > which I know will be complex - some questions to any of the developers > on the list. > > i Is it feasible to extend index creation to support descending > columns? ... this is supported on other commercial and non > commercial databases, but I do not know if this is a SQL standard. This can be done. You need to create an operator class which specifies the reverse sort order (i.e. reverse the operators), and then use it in the new index. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, 2006-03-23 at 16:16, Alvaro Herrera wrote:
> Theo Kramer wrote:
>
> > All good input - thanks, however, before I start messing with my stuff
> > which I know will be complex - some questions to any of the developers
> > on the list.
> >
> > i Is it feasible to extend index creation to support descending
> > columns? ... this is supported on other commercial and non
> > commercial databases, but I do not know if this is a SQL standard.
>
> This can be done. You need to create an operator class which specifies
> the reverse sort order (i.e. reverse the operators), and then use it in
> the new index.
Hmmm, would that then result in the following syntax being valid?
create index my_idx on my_table (c1, c2 desc, c3, c4 desc) ;
where my_table is defined as
create table my_table (
c1 text,
c2 timestamp,
c3 integer,
c4 integer
);
If so, I would appreciate any pointers on where to start on this -
already fumbling my way through Interfacing Extensions To Indexes in the
manual...
Regards
Theo
--
Regards
Theo
Theo Kramer <theo@flame.co.za> writes:
> If so, I would appreciate any pointers on where to start on this -
> already fumbling my way through Interfacing Extensions To Indexes in the
> manual...
Search the PG list archives for discussions of reverse-sort opclasses.
It's really pretty trivial, once you've created a negated btree
comparison function for the datatype.
This is the sort of thing that we are almost but not quite ready to put
into the standard distribution. The issues that are bugging me have to
do with whether NULLs sort low or high --- right now, if you make a
reverse-sort opclass, it will effectively sort NULLs low instead of
high, and that has some unpleasant consequences because the rest of the
system isn't prepared for variance on the point (in particular I'm
afraid this could break mergejoins). I'd like to see us make "NULLs
low" vs "NULLs high" be a defined property of opclasses, and deal with
the fallout from that, and then we could put reverse-sort opclasses for
all the standard datatypes into the regular distribution.
regards, tom lane
On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote:
> ii If no to i, is it feasible to extend PostgreSQL to allow traversing
> an index in column descending and column ascending order - assuming
> an order by on more than one column with column order not
> in the same direction and indexes existing? ... if that makes sense.
Yes.
stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..31.76 rows=10 width=24)
-> Index Scan Backward using email_contrib_pkey on email_contrib (cost=0.00..427716532.18 rows=134656656 width=24)
(2 rows)
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Hi, I have a select like SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total FROM table WHERE (array[20]+array[21]+ ... +array[50]+array[51])<5000 AND array[20]<>0 AND array[21]<>0 ... AND array[50]<>0 AND array[51])<>0 Any ideas to make this query faster?
On Fri, Mar 24, 2006 at 01:41:50PM +0100, Ruben Rubio Rey wrote: > Hi, > > I have a select like > > SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total > FROM table > WHERE > (array[20]+array[21]+ ... +array[50]+array[51])<5000 http://www.varlena.com/GeneralBits/109.php might provide some useful insights. I also recall seeing something about sum operators for arrays, but I can't recall where. > AND array[20]<>0 > AND array[21]<>0 > ... > AND array[50]<>0 > AND array[51])<>0 Uhm... please don't tell me that you're using 0 in place of NULL... You might be able to greatly simplify that by use of ANY; you'd need to ditch elements 1-19 though: ... WHERE NOT ANY(array) = 0 See http://www.postgresql.org/docs/8.1/interactive/arrays.html -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: >On Fri, Mar 24, 2006 at 01:41:50PM +0100, Ruben Rubio Rey wrote: > > >>Hi, >> >>I have a select like >> >>SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total >>FROM table >>WHERE >>(array[20]+array[21]+ ... +array[50]+array[51])<5000 >> >> > >http://www.varlena.com/GeneralBits/109.php might provide some useful >insights. I also recall seeing something about sum operators for arrays, >but I can't recall where. > > I ll check it out, seems to be very useful Is faster create a function to sum the array? > > >>AND array[20]<>0 >>AND array[21]<>0 >>... >>AND array[50]<>0 >>AND array[51])<>0 >> >> > >Uhm... please don't tell me that you're using 0 in place of NULL... > > mmm ... i have read in postgres documentation that null values on arrays are not supported ... >You might be able to greatly simplify that by use of ANY; you'd need to >ditch elements 1-19 though: > >... WHERE NOT ANY(array) = 0 > > Yep this is much better. >See http://www.postgresql.org/docs/8.1/interactive/arrays.html > >
On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote: > >http://www.varlena.com/GeneralBits/109.php might provide some useful > >insights. I also recall seeing something about sum operators for arrays, > >but I can't recall where. > > > > > I ll check it out, seems to be very useful > Is faster create a function to sum the array? There's been talk of having one, but I don't think any such thing currently exists. > >>AND array[20]<>0 > >>AND array[21]<>0 > >>... > >>AND array[50]<>0 > >>AND array[51])<>0 > >> > >> > > > >Uhm... please don't tell me that you're using 0 in place of NULL... > > > > > mmm ... i have read in postgres documentation that null values on arrays > are not supported ... Damn, you're right. Another reason I tend to stay away from them... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, Mar 24, 2006 at 07:06:19AM -0600, Jim C. Nasby wrote: > On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote: > > mmm ... i have read in postgres documentation that null values on arrays > > are not supported ... > > Damn, you're right. Another reason I tend to stay away from them... 8.2 will support NULL array elements. http://archives.postgresql.org/pgsql-committers/2005-11/msg00385.php http://developer.postgresql.org/docs/postgres/arrays.html test=> SELECT '{1,2,NULL,3,4}'::integer[]; int4 ---------------- {1,2,NULL,3,4} (1 row) -- Michael Fuhr
With 8.1.3, I get an error when trying to do this on a Text[] column : .. WHERE ANY(array) LIKE 'xx%' Indeed, I get rejected even with: .. WHERE ANY(array) = 'xx' In both cases, the error is: ERROR: syntax error at or near "any" ... It would only work as documented in the manual (8.10.5): SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);It appears that this restriction is still in place in 8.2:
http://developer.postgresql.org/docs/postgres/arrays.html
Is that the case?
Thanks in advance,
KC.
Ruben Rubio Rey <ruben@rentalia.com> writes:
> SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
> FROM table
> WHERE
> (array[20]+array[21]+ ... +array[50]+array[51])<5000
> AND array[20]<>0
> AND array[21]<>0
> ...
> AND array[50]<>0
> AND array[51])<>0
> Any ideas to make this query faster?
What's the array datatype? Integer or float would probably go a lot
faster than NUMERIC, if that's what you're using now.
regards, tom lane
K C Lau <kclau60@netvigator.com> writes:
> Indeed, I get rejected even with:
> .. WHERE ANY(array) = 'xx'
> It would only work as documented in the manual (8.10.5):
> SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
That's not changing any time soon; the SQL spec defines only the second
syntax for ANY, and I believe there would be syntactic ambiguity if we
tried to allow the other.
> With 8.1.3, I get an error when trying to do this on a Text[] column :
> .. WHERE ANY(array) LIKE 'xx%'
If you're really intent on doing that, make an operator for "reverse
LIKE" and use it with the ANY on the right-hand side.
regression=# create function rlike(text,text) returns bool as
regression-# 'select $2 like $1' language sql strict immutable;
CREATE FUNCTION
regression=# create operator ~~~ (procedure = rlike, leftarg = text,
regression(# rightarg = text, commutator = ~~);
CREATE OPERATOR
regression=# select 'xx%' ~~~ any(array['aaa','bbb']);
?column?
----------
f
(1 row)
regression=# select 'xx%' ~~~ any(array['aaa','xxb']);
?column?
----------
t
(1 row)
regression=#
regards, tom lane
Thank you very much, Tom. We'll try it and report if there is any significant impact performance-wise. Best regards, KC. At 00:25 06/03/25, Tom Lane wrote: >K C Lau <kclau60@netvigator.com> writes: > > Indeed, I get rejected even with: > > .. WHERE ANY(array) = 'xx' > > > It would only work as documented in the manual (8.10.5): > > SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter); > >That's not changing any time soon; the SQL spec defines only the second >syntax for ANY, and I believe there would be syntactic ambiguity if we >tried to allow the other. > > > With 8.1.3, I get an error when trying to do this on a Text[] column : > > .. WHERE ANY(array) LIKE 'xx%' > >If you're really intent on doing that, make an operator for "reverse >LIKE" and use it with the ANY on the right-hand side. > >regression=# create function rlike(text,text) returns bool as >regression-# 'select $2 like $1' language sql strict immutable; >CREATE FUNCTION >regression=# create operator ~~~ (procedure = rlike, leftarg = text, >regression(# rightarg = text, commutator = ~~); >CREATE OPERATOR >regression=# select 'xx%' ~~~ any(array['aaa','bbb']); > ?column? >---------- > f >(1 row) > >regression=# select 'xx%' ~~~ any(array['aaa','xxb']); > ?column? >---------- > t >(1 row) > >regression=# > > regards, tom lane
Tom Lane wrote: >Ruben Rubio Rey <ruben@rentalia.com> writes: > > >>SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total >>FROM table >>WHERE >>(array[20]+array[21]+ ... +array[50]+array[51])<5000 >>AND array[20]<>0 >>AND array[21]<>0 >> ... >>AND array[50]<>0 >>AND array[51])<>0 >> >Any ideas to make this query faster? > > > >What's the array datatype? Integer or float would probably go a lot >faster than NUMERIC, if that's what you're using now. > > Already its integer[]
On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote: > On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote: > > ii If no to i, is it feasible to extend PostgreSQL to allow traversing > > an index in column descending and column ascending order - assuming > > an order by on more than one column with column order not > > in the same direction and indexes existing? ... if that makes sense. > > Yes. > > stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------ > Limit (cost=0.00..31.76 rows=10 width=24) > -> Index Scan Backward using email_contrib_pkey on email_contrib (cost=0.00..427716532.18 rows=134656656 width=24) > (2 rows) Not quite what I mean - redo the above as follows and then see what explain returns explain select * from email_contrib order by project_id, id, date desc limit 10; -- Regards Theo
On Wed, Mar 29, 2006 at 12:52:31PM +0200, Theo Kramer wrote:
> On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote:
> > On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote:
> > > ii If no to i, is it feasible to extend PostgreSQL to allow traversing
> > > an index in column descending and column ascending order - assuming
> > > an order by on more than one column with column order not
> > > in the same direction and indexes existing? ... if that makes sense.
> >
> > Yes.
> >
> > stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10;
> > QUERY PLAN
> >
------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=0.00..31.76 rows=10 width=24)
> > -> Index Scan Backward using email_contrib_pkey on email_contrib (cost=0.00..427716532.18 rows=134656656
width=24)
> > (2 rows)
>
> Not quite what I mean - redo the above as follows and then see what
> explain returns
>
> explain select * from email_contrib order by project_id, id, date desc
> limit 10;
Ahh. There's a hack to do that by defining a new opclass that reverses <
and >, and then doing ORDER BY project_id, id, date USING new_opclass.
I think there's a TODO about this, but I'm not sure...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote:
> On Wed, Mar 29, 2006 at 12:52:31PM +0200, Theo Kramer wrote:
> > On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote:
> > > On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote:
> > > > ii If no to i, is it feasible to extend PostgreSQL to allow traversing
> > > > an index in column descending and column ascending order - assuming
> > > > an order by on more than one column with column order not
> > > > in the same direction and indexes existing? ... if that makes sense.
> > >
> > > Yes.
> > >
> > > stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10;
> > > QUERY PLAN
> > >
------------------------------------------------------------------------------------------------------------------------
> > > Limit (cost=0.00..31.76 rows=10 width=24)
> > > -> Index Scan Backward using email_contrib_pkey on email_contrib (cost=0.00..427716532.18 rows=134656656
width=24)
> > > (2 rows)
> >
> > Not quite what I mean - redo the above as follows and then see what
> > explain returns
> >
> > explain select * from email_contrib order by project_id, id, date desc
> > limit 10;
>
> Ahh. There's a hack to do that by defining a new opclass that reverses <
> and >, and then doing ORDER BY project_id, id, date USING new_opclass.
>
> I think there's a TODO about this, but I'm not sure...
Yes, and updated:
* Allow the creation of indexes with mixed ascending/descending
specifiers
This is possible now by creating an operator class with reversed sort
operators. One complexity is that NULLs would then appear at the start
of the result set, and this might affect certain sort types, like
merge join.
--
Bruce Momjian http://candle.pha.pa.us
+ If your life is a hard drive, Christ can be your backup. +
Hi, Bruce, Bruce Momjian wrote: >>Ahh. There's a hack to do that by defining a new opclass that reverses < >>and >, and then doing ORDER BY project_id, id, date USING new_opclass. >> >>I think there's a TODO about this, but I'm not sure... > > Yes, and updated: > > * Allow the creation of indexes with mixed ascending/descending > specifiers > > This is possible now by creating an operator class with reversed sort > operators. One complexity is that NULLs would then appear at the start > of the result set, and this might affect certain sort types, like > merge join. I think it would be better to allow "index zig-zag scans" for multi-column index.[1] So it traverses in a given order on the higher order column, and the sub trees for each specific high order value is traversed in reversed order. From my knowledge at least of BTrees, and given correct commutator definitions, this should be not so complicated to implement.[2] This would allow the query planner to use the same index for arbitrary ASC/DESC combinations of the given columns. Just a thought, Markus [1] It may make sense to implement the mixed specifiers on indices as well, to allow CLUSTERing on mixed search order. [2] But I admit that I currently don't have enough knowledge in PostgreSQL index scan internals to know whether it really is easy to implement. -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org