Обсуждение: Re: Analysis Function

Поиск

Список

Период

Сортировка

Re: Analysis Function

От

David Jarvis

Дата:

11 июня 2010 г., 00:48:35

Hi,

I found a slow part of the query:

SELECT
date(extract(YEAR FROM m.taken)||'-1-1') d1,
date(extract(YEAR FROM m.taken)||'-1-31') d2
FROM
climate.city c,
climate.station s,
climate.station_category sc,
climate.measurement m
WHERE
c.id = 5148 AND ...

Date extraction is 3.2 seconds, but without is 1.5 seconds. The PL/pgSQL code that actually runs (where p_month1, p_day1, and p_month2, p_day2 are integers):

date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''') d1,
date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''') d2

What is a better way to create those dates (without string concatenation, I presume)?

Dave

Re: Analysis Function

От

Andy Colson

Дата:

11 июня 2010 г., 02:11:23

On 06/10/2010 07:41 PM, David Jarvis wrote:
> Hi,
>
> I found a slow part of the query:
>
> SELECT
> *  date(extract(YEAR FROM m.taken)||'-1-1') d1,*
> *  date(extract(YEAR FROM m.taken)||'-1-31') d2*
> FROM
>    climate.city c,
>    climate.station s,
>    climate.station_category sc,
>    climate.measurement m
> WHERE
> c.id <http://c.id> = 5148 AND ...
>
> Date extraction is 3.2 seconds, but without is 1.5 seconds. The PL/pgSQL
> code that actually runs (where p_month1, p_day1, and p_month2, p_day2
> are integers):
>
> *        date(extract(YEAR FROM
> m.taken)||''-'||p_month1||'-'||p_day1||''') d1,
>          date(extract(YEAR FROM
> m.taken)||''-'||p_month2||'-'||p_day2||''') d2
> *
> What is a better way to create those dates (without string
> concatenation, I presume)?
>
> Dave
>

I assume you are doing this in a loop?  Many Many Many times?  cuz:

andy=# select  date(extract(year from current_date) || '-1-1');
     date
------------
  2010-01-01
(1 row)

Time: 0.528 ms

Its pretty quick.  You say "without" its 1.5 seconds?  Thats all you change?  Can we see the sql and 'explain analyze'
forboth? 

-Andy

Re: Analysis Function

От

David Jarvis

Дата:

11 июня 2010 г., 02:56:45

Hi, Andy.

I assume you are doing this in a loop? Many Many Many times? cuz:

Yes. Here are the variations I have benchmarked (times are best of three):

Variation #0

-no date field-
Explain: http://explain.depesz.com/s/Y9R
Time: 2.2s

Variation #1

date('1960-1-1')
Explain: http://explain.depesz.com/s/DW2
Time: 2.6s

Variation #2

date('1960'||'-1-1')
Explain: http://explain.depesz.com/s/YuX
Time: 3.1s

Variation #3

date(extract(YEAR FROM m.taken)||'-1-1')
Explain: http://explain.depesz.com/s/1I
Time: 4.3s

Variation #4

to_date( date_part('YEAR', m.taken)::text, 'YYYY' ) + interval '0 months' + interval '0 days'
Explain: http://explain.depesz.com/s/fIT
Time: 4.4s

What I would like is along Variation #5:

PGTYPESdate_mdyjul(taken_year, p_month1, p_day1)
Time: 2.3s

I find it interesting that variation #2 is half a second slower than variation #1.

The other question I have is: why does PG seem to discard the results? In pgAdmin3, I can keep pressing F5 and (before 8.4.4?) the results came back in 4s for the first response then 1s in subsequent responses.

Dave

Re: Analysis Function

От

David Jarvis

Дата:

11 июня 2010 г., 08:26:02

Hi,

To avoid string concatenation using dates, I figured I could write a C function:

#include "postgres.h"
#include "fmgr.h"
#include "utils/date.h"
#include "utils/nabstime.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

Datum dateserial (PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1 (dateserial);

Datum dateserial (PG_FUNCTION_ARGS) {
int32 p_year = PG_GETARG_INT32(0);
int32 p_month = PG_GETARG_INT32(1);
int32 p_day = PG_GETARG_INT32(2);

DateADT d = date2j (p_year, p_month, p_day) - POSTGRES_EPOCH_JDATE;
PG_RETURN_DATEADT(d);
}

Compiles without errors or warnings. The function is integrated as follows:

CREATE OR REPLACE FUNCTION dateserial(integer, integer, integer)
RETURNS text AS
'ymd.so', 'dateserial'
LANGUAGE 'c' IMMUTABLE STRICT
COST 1;

However, when I try to use it, the database segfaults:

select dateserial( 2007, 1, 3 )

Any ideas why?

Thank you!

Dave

P.S.
I have successfully written a function that creates a YYYYmmDD formatted string (using sprintf) when given three integers. It returns as expected; I ran it as follows:

dateserial( extract(YEAR FROM m.taken)::int, 1, 1 )::date

This had a best-of-three time of 3.7s compared with 4.3s using string concatenation. If I can eliminate all the typecasts, and pass in m.taken directly (rather than calling extract), I think the speed will be closer to 2.5s.

Any hints would be greatly appreciated.

Re: Analysis Function

От

Tim Landscheidt

Дата:

11 июня 2010 г., 08:58:10

David Jarvis <thangalin@gmail.com> wrote:

> [...]
> Yes. Here are the variations I have benchmarked (times are best of three):

> Variation #0
> -no date field-
> Explain: http://explain.depesz.com/s/Y9R
> Time: 2.2s

> Variation #1
> date('1960-1-1')
> Explain: http://explain.depesz.com/s/DW2
> Time: 2.6s

> Variation #2
> date('1960'||'-1-1')
> Explain: http://explain.depesz.com/s/YuX
> Time: 3.1s

> Variation #3
> date(extract(YEAR FROM m.taken)||'-1-1')
> Explain: http://explain.depesz.com/s/1I
> Time: 4.3s

> Variation #4
> to_date( date_part('YEAR', m.taken)::text, 'YYYY' ) + interval '0 months' +
> interval '0 days'
> Explain: http://explain.depesz.com/s/fIT
> Time: 4.4s

> What I would like is along Variation #5:

> *PGTYPESdate_mdyjul(taken_year, p_month1, p_day1)*
> Time: 2.3s

> I find it interesting that variation #2 is half a second slower than
> variation #1.
> [...]

Have you tested DATE_TRUNC()?

Tim

Re: Analysis Function

От

David Jarvis

Дата:

11 июня 2010 г., 09:18:17

Hi, Tim.

Have you tested DATE_TRUNC()?

Not really; it returns a full timestamp and I would still have to concatenate strings. My goal is to speed up the following code (where p_ parameters are user inputs):

date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''') d1,
date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''') d2

Using DATE_TRUNC() won't help here, as far as I can tell. Removing the concatenation will halve the query's time. Such as:

dateserial( m.taken, p_month1, p_day1 ) d1,
dateserial( m.taken, p_month2, p_day2 ) d2

My testing so far has shown a modest improvement by using a C function (to avoid concatenation).

Dave

Re: Analysis Function

От

Heikki Linnakangas

Дата:

11 июня 2010 г., 09:42:49

On 11/06/10 11:25, David Jarvis wrote:
> Datum dateserial (PG_FUNCTION_ARGS) {
>    int32 p_year = PG_GETARG_INT32(0);
>    int32 p_month = PG_GETARG_INT32(1);
>    int32 p_day = PG_GETARG_INT32(2);
>
>    DateADT d = date2j (p_year, p_month, p_day) - POSTGRES_EPOCH_JDATE;
>    PG_RETURN_DATEADT(d);
> }
>
> Compiles without errors or warnings. The function is integrated as follows:
>
> CREATE OR REPLACE FUNCTION dateserial(integer, integer, integer)
>    RETURNS text AS
> 'ymd.so', 'dateserial'
>    LANGUAGE 'c' IMMUTABLE STRICT
>    COST 1;
>
> However, when I try to use it, the database segfaults:
>
> select dateserial( 2007, 1, 3 )
>
> Any ideas why?

The C function returns a DateADT, which is a typedef for int32, but the
CREATE FUNCTION statement claims that it returns 'text'.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Query about index usage

От

Jayadevan M

Дата:

11 июня 2010 г., 09:56:47

Hello all,

One query about PostgreSQL's index usage. If I select just one column on
which there is an index (or select only columns on which there is an
index), and the index is used by PostgreSQL, does PostgreSQL avoid table
access if possible?  I am trying to understand the differences between
Oracle's data access patterns and PostgreSQL's.
Here is how it works in Oracle.

Case 1 - SELECT column which is not there in the index

SQL> select name from myt where id = 13890;

NAME
---------------------------------------------------------------------------------------------------
AAAA


Execution Plan
----------------------------------------------------------
Plan hash value: 2609414407

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)|
Time    |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    65 |     2   (0)|
00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MYT   |     1 |    65 |     2   (0)|
00:00:01 |
|*  2 |   INDEX RANGE SCAN          | MYIDX |     1 |       |     1   (0)|
00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=13890)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        409  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed



Case 1 - SELECT column which is there in the index

SQL> select id from myt where id = 13890;

        ID
----------
     13890


Execution Plan
----------------------------------------------------------
Plan hash value: 2555454399

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    13 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| MYIDX |     1 |    13 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ID"=13890)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        407  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

In the second query where id was selected, the table was not used at all.
In PosgreSQL, explain gives me similar output in both cases.
Table structure -

postgres=# \d myt
             Table "public.myt"
 Column |         Type          | Modifiers
--------+-----------------------+-----------
 id     | integer               |
 name   | character varying(20) |
Indexes:
    "myidx" btree (id)


Regards,
Jayadevan





DISCLAIMER:

"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."

Re: Analysis Function

От

Tim Landscheidt

Дата:

11 июня 2010 г., 11:19:00

David Jarvis <thangalin@gmail.com> wrote:

>> Have you tested DATE_TRUNC()?

> Not really; it returns a full timestamp and I would still have to
> concatenate strings. My goal is to speed up the following code (where
> *p_*parameters are user inputs):

> *        date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''')
> d1,
>         date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''')
> d2*

> Using DATE_TRUNC() won't help here, as far as I can tell. Removing the
> concatenation will halve the query's time. Such as:

> dateserial( m.taken, p_month1, p_day1 ) d1,
> dateserial( m.taken, p_month2, p_day2 ) d2

> My testing so far has shown a modest improvement by using a C function (to
> avoid concatenation).

You could use:

| (DATE_TRUNC('year', m.taken) + p_month1 * '1 month'::INTERVAL + p_day1 * '1 day'::INTERVAL)::DATE

but whether that is faster or slower I don't know. But I
don't see why this query needs to be fast in the first
place. It seems to be interactive, and therefore I wouldn't
invest too much time to have the user wait not 4.4, but
2.2 seconds. You could also do the concatenation in the ap-
plication if that is faster than PostgreSQL's date arithme-
tics.

Tim

Re: Query about index usage

От

"Kevin Grittner"

Дата:

11 июня 2010 г., 15:25:31

Jayadevan M <Jayadevan.Maymala@ibsplc.com> wrote:

> One query about PostgreSQL's index usage. If I select just one
> column on which there is an index (or select only columns on which
> there is an index), and the index is used by PostgreSQL, does
> PostgreSQL avoid table access if possible?

PostgreSQL can't currently avoid reading the table, because that's
where the tuple visibility information is stored.  We've been making
progress toward having some way to avoid reading the table for all
except very recently written tuples, but we're not there yet (in any
production version or in the 9.0 version to be released this
summer).

-Kevin

Re: Query about index usage

От

Greg Smith

Дата:

11 июня 2010 г., 15:32:27

Jayadevan M wrote:
> One query about PostgreSQL's index usage. If I select just one column on
> which there is an index (or select only columns on which there is an
> index), and the index is used by PostgreSQL, does PostgreSQL avoid table
> access if possible?

PostgreSQL keeps information about what rows are visible or not in with
the row data.  It's therefore impossible at this time for it to answer
queries just based on what's in an index.  Once candidate rows are found
using one, the database must then also retrieve the row(s) and do a
second check as to whether it's visible to the running transaction or
not before returning them to the client.

Improving this situation is high up on the list of things to improve in
PostgreSQL and the value of it recognized, it just hasn't been built yet.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Analysis Function

От

David Jarvis

Дата:

11 июня 2010 г., 17:10:02

Hi,

The C function returns a DateADT, which is a typedef for int32, but the CREATE FUNCTION statement claims that it returns 'text'.

That'll do it. Thank you!

but whether that is faster or slower I don't know. But I
don't see why this query needs to be fast in the first
place. It seems to be interactive, and therefore I wouldn't

When users click the button, I want the result returned in in less under 4 seconds. Right now it is closer to 10. Consequently, they click twice. Shaving 2 seconds here and there will make a huge difference. It will also allow the computer to handle a higher volume of requests.

invest too much time to have the user wait not 4.4, but
2.2 seconds. You could also do the concatenation in the ap-
plication if that is faster than PostgreSQL's date arithme-
tics.

No, I cannot. The concatenation uses the year that the measurement was made--from the database--and the month/day combination from the user. See also:

http://stackoverflow.com/questions/2947105/calculate-year-for-end-date-postgresql

Dave

Re: Analysis Function

От

David Jarvis

Дата:

11 июня 2010 г., 18:17:28

Hi,

Here is code to convert dates from integers without string concatenation:

Edit dateserial.c:

#include "postgres.h"
#include "utils/date.h"

#include "utils/nabstime.h"

#ifdef PG_MODULE_MAGIC

PG_MODULE_MAGIC;
#endif

Datum dateserial(PG_FUNCTION_ARGS);


PG_FUNCTION_INFO_V1 (dateserial);

Datum
dateserial(PG_FUNCTION_ARGS) {

  int32 p_year = (int32)PG_GETARG_FLOAT8(0);

  int32 p_month = PG_GETARG_INT32(1);
  int32 p_day = PG_GETARG_INT32(2);


  PG_RETURN_DATEADT( date2j( p_year, p_month, p_day ) - POSTGRES_EPOCH_JDATE );

}

Edit Makefile:

MODULES = dateserial
PGXS := $(shell pg_config --pgxs)

include $(PGXS)

Edit inst.sh (optional):

#!/bin/bash

make clean && make && strip *.so && make install && /etc/init.d/postgresql-8.4 restart

Run bash inst.sh.

Create a SQL function dateserial:

CREATE OR REPLACE FUNCTION dateserial(double precision, integer, integer)

  RETURNS date AS
'$libdir/dateserial', 'dateserial'
  LANGUAGE 'c' IMMUTABLE STRICT

  COST 1;
ALTER FUNCTION dateserial(double precision, integer, integer) OWNER TO postgres;

Test the function:

SELECT dateserial( 2007, 5, 5 )

Using this function, performance increases from 4.4s to 2.8s..

Dave

Re: Query about index usage

От

Bob Lunney

Дата:

11 июня 2010 г., 18:50:21

Jayadevan,

PostgreSQL must go to the table to determine if the row you are requesting is visible to your transaction.  This is an
artifactof the MVCC implementation.  Oracle can fetch the data from the index, since it doesn't keep multiple
representationsof the rows, but it may need to check the undo logs to determine the state that applies to your
transaction. Its just two different ways to accomplish the same thing. 

Bob Lunney

--- On Fri, 6/11/10, Jayadevan M <Jayadevan.Maymala@ibsplc.com> wrote:

> From: Jayadevan M <Jayadevan.Maymala@ibsplc.com>
> Subject: [PERFORM] Query about index usage
> To: pgsql-performance@postgresql.org
> Date: Friday, June 11, 2010, 5:56 AM
> Hello all,
>
> One query about PostgreSQL's index usage. If I select just
> one column on
> which there is an index (or select only columns on which
> there is an
> index), and the index is used by PostgreSQL, does
> PostgreSQL avoid table
> access if possible?  I am trying to understand the
> differences between
> Oracle's data access patterns and PostgreSQL's.
> Here is how it works in Oracle.
>
> Case 1 - SELECT column which is not there in the index
>
> SQL> select name from myt where id = 13890;
>
> NAME
> ---------------------------------------------------------------------------------------------------
> AAAA
>
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 2609414407
>
> -------------------------------------------------------------------------------------
> | Id  | Operation         
>          | Name  |
> Rows  | Bytes | Cost (%CPU)|
> Time    |
> -------------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT     
>       |   
>    |     1 |   
> 65 |     2   (0)|
> 00:00:01 |
> |   1 |  TABLE ACCESS BY INDEX ROWID|
> MYT   |     1 | 
>   65 |     2   (0)|
>
> 00:00:01 |
> |*  2 |   INDEX RANGE SCAN   
>       | MYIDX |     1
> |       | 
>    1   (0)|
> 00:00:01 |
> -------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
>    2 - access("ID"=13890)
>
> Note
> -----
>    - dynamic sampling used for this
> statement
>
>
> Statistics
> ----------------------------------------------------------
>           0  recursive calls
>           0  db block gets
>           4  consistent gets
>           0  physical reads
>           0  redo size
>         409  bytes sent via
> SQL*Net to client
>         384  bytes received via
> SQL*Net from client
>           2  SQL*Net
> roundtrips to/from client
>           0  sorts (memory)
>           0  sorts (disk)
>           1  rows processed
>
>
>
> Case 1 - SELECT column which is there in the index
>
> SQL> select id from myt where id = 13890;
>
>         ID
> ----------
>      13890
>
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 2555454399
>
> --------------------------------------------------------------------------
> | Id  | Operation        |
> Name  | Rows  | Bytes | Cost (%CPU)| Time 
>    |
> --------------------------------------------------------------------------
> |   0 | SELECT STATEMENT |   
>    |     1 |   
> 13 |     1   (0)|
> 00:00:01 |
> |*  1 |  INDEX RANGE SCAN| MYIDX | 
>    1 |    13 | 
>    1   (0)| 00:00:01 |
> --------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
>    1 - access("ID"=13890)
>
> Note
> -----
>    - dynamic sampling used for this
> statement
>
>
> Statistics
> ----------------------------------------------------------
>           0  recursive calls
>           0  db block gets
>           3  consistent gets
>           0  physical reads
>           0  redo size
>         407  bytes sent via
> SQL*Net to client
>         384  bytes received via
> SQL*Net from client
>           2  SQL*Net
> roundtrips to/from client
>           0  sorts (memory)
>           0  sorts (disk)
>           1  rows processed
>
> In the second query where id was selected, the table was
> not used at all.
> In PosgreSQL, explain gives me similar output in both
> cases.
> Table structure -
>
> postgres=# \d myt
>              Table
> "public.myt"
>  Column |         Type 
>         | Modifiers
> --------+-----------------------+-----------
>  id     | integer     
>          |
>  name   | character varying(20) |
> Indexes:
>     "myidx" btree (id)
>
>
> Regards,
> Jayadevan
>
>
>
>
>
> DISCLAIMER:
>
> "The information in this e-mail and any attachment is
> intended only for
> the person to whom it is addressed and may contain
> confidential and/or
> privileged material. If you have received this e-mail in
> error, kindly
> contact the sender and destroy all copies of the original
> communication.
> IBS makes no warranty, express or implied, nor guarantees
> the accuracy,
> adequacy or completeness of the information contained in
> this email or any
> attachment and is not liable for any errors, defects,
> omissions, viruses
> or for resultant loss or damage, if any, direct or
> indirect."
>
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Analysis Function

От

Tim Landscheidt

Дата:

11 июня 2010 г., 18:54:03

David Jarvis <thangalin@gmail.com> wrote:

> [...]
>> invest too much time to have the user wait not 4.4, but
>> 2.2 seconds. You could also do the concatenation in the ap-
>> plication if that is faster than PostgreSQL's date arithme-
>> tics.

> No, I cannot. The concatenation uses the year that the measurement was
> made--from the database--and the month/day combination from the user. See
> also:

> http://stackoverflow.com/questions/2947105/calculate-year-for-end-date-postgresql

That page doesn't deal with "select year from database and
month/day from user and present the results", but *much*
different problems.

Tim

Re: Analysis Function

От

Tom Lane

Дата:

11 июня 2010 г., 20:21:07

David Jarvis <thangalin@gmail.com> writes:
> dateserial(PG_FUNCTION_ARGS) {
>   int32 p_year = (int32)PG_GETARG_FLOAT8(0);
>   int32 p_month = PG_GETARG_INT32(1);
>   int32 p_day = PG_GETARG_INT32(2);

Er ... why float?  Integer is plenty for the range of years supported by
the PG datetime infrastructure.  The above coding is pretty lousy in terms
of its roundoff and overflow behavior, too.

            regards, tom lane

Re: Analysis Function

От

David Jarvis

Дата:

11 июня 2010 г., 20:31:08

Hi, Tom.

extract(YEAR FROM m.taken)

I thought that returned a double precision?

Dave

Re: Analysis Function

От

David Jarvis

Дата:

11 июня 2010 г., 20:38:17

Hi,

I added an explicit cast in the SQL:

dateserial(extract(YEAR FROM m.taken)::int,'||p_month1||','||p_day1||') d1,
dateserial(extract(YEAR FROM m.taken)::int,'||p_month2||','||p_day2||') d2

The function now takes three integer parameters; there was no performance loss.

Thank you.

Dave

Re: Analysis Function

От

Heikki Linnakangas

Дата:

13 июня 2010 г., 06:03:41

On 11/06/10 23:38, David Jarvis wrote:
> I added an explicit cast in the SQL:
>
>          dateserial(extract(YEAR FROM
> m.taken)::int,'||p_month1||','||p_day1||') d1,
>          dateserial(extract(YEAR FROM
> m.taken)::int,'||p_month2||','||p_day2||') d2
>
> The function now takes three integer parameters; there was no performance
> loss.

We had a little chat about this with Magnus. It's pretty surprising that
there's no built-in function to do this, we should consider adding one.

We could have a function like:

construct_timestamp(year int4, month int4, date int4, hour int4, minute
int4, second int4, milliseconds int4, timezone text)

Now that we have named parameter notation, callers can use it to
conveniently fill in only the fields needed:

SELECT construct_timestamp(year := 1999, month := 10, date := 22);

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Analysis Function

От

David Jarvis

Дата:

13 июня 2010 г., 07:38:57

Hi,

We had a little chat about this with Magnus. It's pretty surprising that there's no built-in function to do this, we should consider adding one.

I agree; you should be able to create a timestamp or a date from integer values. Others, apparently, have written code. The implementation I did was pretty rudimentary, but I was going for speed.

If you could overload to_date and to_timestamp, that would be great. For example:

to_date( year ) = year-01-01
to_date( year, month ) = year-month-01
to_date( year, month, day ) = year-month-day

to_timestamp( year, month, day, hour ) = year-month-day hour:00:00.0000 GMT
etc.

construct_timestamp(year int4, month int4, date int4, hour int4, minute int4, second int4, milliseconds int4, timezone text)

Also, "date int4" should be "day int4", to avoid confusion with the date type.

Does it makes sense to use named parameter notation for the first value (the year)? This could be potentially confusing:

to_date() - What would this return? now()? Jan 1st, 1970? 2000?

Similarly, to_timestamp() ...? Seems meaningless without at least a full date and an hour.

Dave

Re: Analysis Function

От

Magnus Hagander

Дата:

13 июня 2010 г., 10:08:09

On Sun, Jun 13, 2010 at 09:38, David Jarvis <thangalin@gmail.com> wrote:
> Hi,
>
>> We had a little chat about this with Magnus. It's pretty surprising that
>> there's no built-in function to do this, we should consider adding one.
>
> I agree; you should be able to create a timestamp or a date from integer
> values. Others, apparently, have written code. The implementation I did was
> pretty rudimentary, but I was going for speed.
>
> If you could overload to_date and to_timestamp, that would be great. For
> example:
>
> to_date( year ) = year-01-01
> to_date( year, month ) = year-month-01
> to_date( year, month, day ) = year-month-day
>
> to_timestamp( year, month, day, hour ) = year-month-day hour:00:00.0000 GMT
> etc.

Not that it would make a huge difference over having to specify 1's
and 0's there, but I agree that could be useful.


>> construct_timestamp(year int4, month int4, date int4, hour int4, minute
>> int4, second int4, milliseconds int4, timezone text)
>
> Also, "date int4" should be "day int4", to avoid confusion with the date
> type.

Yes, absolutely.


> Does it makes sense to use named parameter notation for the first value (the
> year)? This could be potentially confusing:

How so? If it does named parameters, why not all?


> to_date() - What would this return? now()? Jan 1st, 1970? 2000?

ERROR, IMHO. We have a function for now() already, and the others are
so arbitrary there is no way to explain such a choice.


> Similarly, to_timestamp() ...? Seems meaningless without at least a full
> date and an hour.

Agreed.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Analysis Function

От

Tom Lane

Дата:

13 июня 2010 г., 15:34:32

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> We could have a function like:

> construct_timestamp(year int4, month int4, date int4, hour int4, minute
> int4, second int4, milliseconds int4, timezone text)

This fails to allow specification to the microsecond level (and note
that with float timestamps even smaller fractions have potential use).
I would suggest dropping the milliseconds argument and instead letting
the seconds arg be float8.  That seems a closer match to the way people
think about the textual representation.

> Now that we have named parameter notation, callers can use it to
> conveniently fill in only the fields needed:

It's not immediately obvious what the default value of "timezone"
will be?

            regards, tom lane

Re: Analysis Function

От

Tom Lane

Дата:

13 июня 2010 г., 15:42:31

Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Jun 13, 2010 at 09:38, David Jarvis <thangalin@gmail.com> wrote:
>> Does it makes sense to use named parameter notation for the first value (the
>> year)? This could be potentially confusing:

> How so? If it does named parameters, why not all?

There's no reason not to allow the year parameter to be named.  What
I think it shouldn't have is a default.  OTOH I see no good reason
not to allow the other ones to have defaults.  (We presumably want
timezone to default to the system timezone setting, but I wonder how
we should make that work --- should an empty string be treated as
meaning that?)

>> Similarly, to_timestamp() ...? Seems meaningless without at least a full
>> date and an hour.

> Agreed.

No, I think it's perfectly sane to allow month/day to default to 1
and h/m/s to zeroes.

I do think it might be a good idea to have two functions,
construct_timestamp yielding timestamptz and construct_date
yielding date (and needing only 3 args).  When you only want
a date, having to use construct_timestamp and cast will be
awkward and much more expensive than is needed (timezone
rotations aren't real cheap).

            regards, tom lane

Re: Analysis Function

От

Magnus Hagander

Дата:

13 июня 2010 г., 15:49:59

On Sun, Jun 13, 2010 at 17:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Sun, Jun 13, 2010 at 09:38, David Jarvis <thangalin@gmail.com> wrote:
>>> Does it makes sense to use named parameter notation for the first value (the
>>> year)? This could be potentially confusing:
>
>> How so? If it does named parameters, why not all?
>
> There's no reason not to allow the year parameter to be named.  What
> I think it shouldn't have is a default.  OTOH I see no good reason
> not to allow the other ones to have defaults.  (We presumably want
> timezone to default to the system timezone setting, but I wonder how
> we should make that work --- should an empty string be treated as
> meaning that?)

Umm. NULL could be made to mean that, or we could provicde two
different versions - one that takes TZ and one that doesn't.


>>> Similarly, to_timestamp() ...? Seems meaningless without at least a full
>>> date and an hour.
>
>> Agreed.
>
> No, I think it's perfectly sane to allow month/day to default to 1
> and h/m/s to zeroes.
>
> I do think it might be a good idea to have two functions,
> construct_timestamp yielding timestamptz and construct_date
> yielding date (and needing only 3 args).  When you only want
> a date, having to use construct_timestamp and cast will be
> awkward and much more expensive than is needed (timezone
> rotations aren't real cheap).

And a third, construct_time(), no?


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Analysis Function

От

Tom Lane

Дата:

13 июня 2010 г., 15:58:36

Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Jun 13, 2010 at 17:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... (We presumably want
>> timezone to default to the system timezone setting, but I wonder how
>> we should make that work --- should an empty string be treated as
>> meaning that?)

> Umm. NULL could be made to mean that, or we could provicde two
> different versions - one that takes TZ and one that doesn't.

Using NULL like that seems a bit awkward: for one thing it'd mean the
function couldn't be STRICT, and also it'd be bizarre that only this
one argument could be null without leading to a null result.

And two separate functions isn't good either.  Basically, I think it's
important that there be a way to specify an explicit parameter value
that behaves identically to the default.

> And a third, construct_time(), no?

Yeah, maybe ... do you think there's any demand for it?

            regards, tom lane

Re: Analysis Function

От

David Jarvis

Дата:

13 июня 2010 г., 19:19:17

Hi,

It's not immediately obvious what the default value of "timezone"
will be?

The system's locale, like now(); documentation can clarify.

By named parameter, I meant default value. You could construct a timestamp variable using:

construct_timestamp( year := 1900, hour := 1 )

When I read that code, the first thing I think it should return is:

1900-01-01 01:00:00.0000-07

I agree construct_timestamp( hour := 1 ) and construct_date() are errors: year is required.

Dave

P.S.
I prefer to_timestamp and to_date over the more verbose construct_timestamp.

Re: Query about index usage

От

Jayadevan M

Дата:

14 июня 2010 г., 03:41:23

> PostgreSQL can't currently avoid reading the table, because that's
> where the tuple visibility information is stored.  We've been making
> progress toward having some way to avoid reading the table for all
> except very recently written tuples, but we're not there yet (in any
> production version or in the 9.0 version to be released this
> summer).
Thank you for all the replies. I am learning PostgreSQL and figuring out
which of the standard techniques for tuning queries in Oracle works in
PostgreSQL as well. Thank you.
Regards,
Jayadevan





DISCLAIMER:

"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."

Re: Analysis Function

От

Magnus Hagander

Дата:

14 июня 2010 г., 09:01:57

On Sun, Jun 13, 2010 at 17:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Sun, Jun 13, 2010 at 17:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> ... (We presumably want
>>> timezone to default to the system timezone setting, but I wonder how
>>> we should make that work --- should an empty string be treated as
>>> meaning that?)
>
>> Umm. NULL could be made to mean that, or we could provicde two
>> different versions - one that takes TZ and one that doesn't.
>
> Using NULL like that seems a bit awkward: for one thing it'd mean the
> function couldn't be STRICT, and also it'd be bizarre that only this
> one argument could be null without leading to a null result.

Hmm, yeah.


> And two separate functions isn't good either.  Basically, I think it's
> important that there be a way to specify an explicit parameter value
> that behaves identically to the default.

In that case, empty string seems fairly reasonable - if you look at
the text based parsing, that's what we do if the timezone is an "empty
string" (meaning not specified).



>> And a third, construct_time(), no?
>
> Yeah, maybe ... do you think there's any demand for it?

Yes, I think there is. Plus, it's for completeness :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Analysis Function

От

Magnus Hagander

Дата:

14 июня 2010 г., 09:03:46

On Sun, Jun 13, 2010 at 21:19, David Jarvis <thangalin@gmail.com> wrote:
> Hi,
>
>> It's not immediately obvious what the default value of "timezone"
>> will be?
>
> The system's locale, like now(); documentation can clarify.
>
> By named parameter, I meant default value. You could construct a timestamp
> variable using:
>
>   construct_timestamp( year := 1900, hour := 1 )
>
> When I read that code, the first thing I think it should return is:
>
>   1900-01-01 01:00:00.0000-07
>
> I agree construct_timestamp( hour := 1 ) and construct_date() are errors:
> year is required.

Does it make sense to allow minutes when hours isn't specified? Or
should we simply say that for each of the date and the time part, to
specify at <level n> you need to have everything from the top up to
<level n-1> specified? E.g. month requires year to be specified, day
requires both year and month etc?

> I prefer to_timestamp and to_date over the more verbose construct_timestamp.

Yeah, I agree with that.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Analysis Function

От

David Jarvis

Дата:

14 июня 2010 г., 11:10:51

Does it make sense to allow minutes when hours isn't specified? Or

For time, 00 seems a reasonable default for all values; clearly document the defaults. Also, having a default makes the code simpler than <level n> plus <level n-1>. (Not to mention explaining it.) ;-)

SELECT to_timestamp( minutes := 19 ) -- error (year not specified)

SELECT to_timestamp( year := 2000, minutes := 19 ) -- 2000-01-01 00:19:00.0000-07

Dave

Re: Analysis Function

От

Tom Lane

Дата:

14 июня 2010 г., 13:59:47

Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Jun 13, 2010 at 21:19, David Jarvis <thangalin@gmail.com> wrote:
>> I prefer to_timestamp and to_date over the more verbose construct_timestamp.

> Yeah, I agree with that.

Those names are already taken.  It will cause confusion (of both people
and machines) if you try to overload them with this.

            regards, tom lane

Re: Analysis Function

От

Magnus Hagander

Дата:

16 июня 2010 г., 06:36:36

On Mon, Jun 14, 2010 at 15:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Sun, Jun 13, 2010 at 21:19, David Jarvis <thangalin@gmail.com> wrote:
>>> I prefer to_timestamp and to_date over the more verbose construct_timestamp.
>
>> Yeah, I agree with that.
>
> Those names are already taken.  It will cause confusion (of both people
> and machines) if you try to overload them with this.

Fair enough. How about something like make_timestamp? It's at least
shorter and easier than construct :-)


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Analysis Function

От

David Jarvis

Дата:

16 июня 2010 г., 08:48:36

Fair enough. How about something like make_timestamp? It's at least
shorter and easier than construct :-)

Agreed.

Dave

Re: Analysis Function

От

Tom Lane

Дата:

16 июня 2010 г., 20:46:47

David Jarvis <thangalin@gmail.com> writes:
>> Fair enough. How about something like make_timestamp? It's at least
>> shorter and easier than construct :-)

> Agreed.

No objection here either.

            regards, tom lane

Re: Query about index usage

От

Jayadevan M

Дата:

23 июня 2010 г., 04:17:08

Hello,
> PostgreSQL can't currently avoid reading the table, because that's
> where the tuple visibility information is stored.  We've been making
> progress toward having some way to avoid reading the table for all
> except very recently written tuples, but we're not there yet (in any
> production version or in the 9.0 version to be released this
> summer).
More doubts on how indexes are used by PostgreSQL. It is mentioned that
table data blocks have data about tuple visibility and hence table scans
are always necessary. So how does PostgreSQL reduce the number of blocks
to be read by using indexes? Does this mean that indexes will have
references to all the 'possible' blocks which may contain the data one is
searching for, and then scans all those blocks and eliminates records
which should not be 'visible' to the query being executed? Do index data
get updated as and when data is committed and made 'visible'  or is it
that index data get updated as soon as data is changed, before commit is
issued and rollback of transaction results in a rollback of the index data
changes too?
Regards,
Jayadevan





DISCLAIMER:

"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."

Re: Query about index usage

От

Greg Smith

Дата:

23 июня 2010 г., 06:06:03

Jayadevan M wrote:
> It is mentioned that table data blocks have data about tuple visibility and hence table scans
> are always necessary. So how does PostgreSQL reduce the number of blocks
> to be read by using indexes?

To be useful, a query utilizing an index must be selective:  it must
only return a fraction of the possible rows in the table.  Scanning the
index will produce a list of blocks that contain the potentially visible
data, then only those data blocks will be retrieved and tested for
visibility.

Let's say you have a table that's 100 pages (pages are 8KB) and an index
that's 50 pages against it.  You run a query that only selects 5% of the
rows in the table, from a continuous section.  Very rough estimate, it
will look at 5% * 50 = 3 index pages.  Those will point to a matching
set of 5% * 100 = 5 data pages.  Now you've just found the right subset
of the data by only retrieving 8 random pages of data instead of 100.
With random_page_cost=4.0, that would give this plan a cost of around
32, while the sequential scan one would cost 100 * 1.0 (sequential
accesses) for a cost of around 100 (Both of them would also have some
smaller row processing cost added in there too).

It's actually a bit more complicated than that--the way indexes are
built means you can't just linearly estimate their usage, and scans of
non-contiguous sections are harder to model simply--but that should give
you an idea.  Only when using the index significantly narrows the number
of data pages expected will it be an improvement over ignoring the index
and just scanning the whole table.

If the expected use of the index was only 20% selective for another
query, you'd be getting 20% * 50 = 10 index pages, 20% * 100 = 20 data
pages, for a potential total of 30 random page lookups.  That could end
up costing 30 * 4.0 = 120, higher than the sequential scan.    Usually
the breakpoint for how much of a table has to be scanned before just
scanning the whole thing sequentially is considered cheaper happens near
20% of it, and you can shift it around by adjusting random_page_cost.
Make it lower, and you can end up preferring index scans even for 30 or
40% of a table.

> Do index data get updated as and when data is committed and made 'visible' or is it
> that index data get updated as soon as data is changed, before commit is
> issued and rollback of transaction results in a rollback of the index data

Index changes happen when the data goes into the table, including
situations where it might not be committed.  The index change doesn't
ever get deferred to commit time, like you can things like foreign key
checks.  When a transaction is rolled back, the aborted row eventually
gets marked as dead by vacuum, at which point any index records pointing
to it can also be cleaned up.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Query about index usage

От

Jayadevan M

Дата:

23 июня 2010 г., 06:16:05

Thank you for the detailed explanation.
Regards,
Jayadevan





DISCLAIMER:

"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."

Re: Analysis Function

От

Bruce Momjian

Дата:

01 июля 2010 г., 00:43:07

Tom Lane wrote:
> David Jarvis <thangalin@gmail.com> writes:
> >> Fair enough. How about something like make_timestamp? It's at least
> >> shorter and easier than construct :-)
>
> > Agreed.
>
> No objection here either.

Added to TODO:

    Add function to allow the creation of timestamps using parameters

    * http://archives.postgresql.org/pgsql-performance/2010-06/msg00232.php


--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: Analysis Function