Обсуждение: CROSSTAB( .. only one column has values... )

Поиск

Список

Период

Сортировка

CROSSTAB( .. only one column has values... )

От

Adam Tauno Williams

Дата:

05 января 2021 г., 15:46:08

I'm using the crosstab feature and do not understand why I am only
getting values in the first column.

The query:
SELECT 
  date_trunc('month', t2.value_date) AS invoice_date, 
  t1.value_string AS invoice_type
  COUNT(*)
FROM document d
  LEFT OUTER JOIN obj_property t1
    ON (t1.obj_id = d.document_id
        AND t1.namespace_prefix = 'http://www.example.com/ctabs'
        AND t1.value_key = 'invoiceType')
  LEFT OUTER JOIN obj_property t2
    ON (t2.obj_id = d.document_id
        AND t2.namespace_prefix = 'http://www.example.com/ctabs'
        AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1,2

 - has results like -

invoice_date invoice_type count 
------------ ------------ ----- 
2013-02      service      3454  
2013-03      service      3512  
2013-03      parts        5366  
2013-04      parts        5657  
2013-04      service      4612  
2013-05      service      4946  
2013-05      parts        5508  
...

So I put this in as a crosstab:

SELECT * FROM crosstab(
    $$
SELECT 
  SUBSTR(t2.value_string, 1, 7) AS invoice_date,
  t1.value_string AS invoice_type,
  COUNT(*)
FROM document d
  LEFT OUTER JOIN obj_property t1
    ON (t1.obj_id = d.document_id
        AND t1.namespace_prefix = 'http://www.example.com/ctabs'
        AND t1.value_key = 'invoiceType')
  LEFT OUTER JOIN obj_property t2
    ON (t2.obj_id = d.document_id
        AND t2.namespace_prefix = 'http://www.example.com/ctabs'
        AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1,2
 $$) AS final_result (invoice_date VARCHAR, "parts" BIGINT, "rental"
BIGINT, "sales" BIGINT, "service" BIGINT);

 - and I get the results of -

invoice_date parts rental sales  service 
------------ ----- ------ ------ ------- 
2001-09      1     (null) (null) (null)  
2007-07      1     (null) (null) (null)  
2013-02      5353  (null) (null) (null)  
2013-02      3454  (null) (null) (null)  
2013-03      3512  (null) (null) (null)  
2013-03      5366  (null) (null) (null)  
...

Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.

-- 
Adam Tauno Williams <mailto:awilliam@whitemice.org> GPG D95ED383
Systems Administrator, Python Developer, LPI / NCLA

Re: CROSSTAB( .. only one column has values... )

От

"David G. Johnston"

Дата:

05 января 2021 г., 16:12:47

On Tue, Jan 5, 2021 at 8:46 AM Adam Tauno Williams <awilliam@whitemice.org> wrote:

Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.

I think you are assigning the function more intelligence/effort than it puts out.

From the documentation:

"""

The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. It fills the output value columns, left to right, with the value fields from these rows. If there are fewer rows in a group than there are output value columns, the extra output columns are filled with nulls; if there are more rows, the extra input rows are skipped.

In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row. Notice that crosstab itself does not pay any attention to the second column of the query result; it's just there to be ordered by, to control the order in which the third-column values appear across the page.

"""

The fact you don't have an order by, and that there are not an equal number of records per date, suggests to me that you are expecting the function to fill in the blanks when the documentation says it doesn't do that.

David J.

Re: CROSSTAB( .. only one column has values... )

От

Tim Clarke

Дата:

05 января 2021 г., 17:11:37

On 05/01/2021 16:12, David G. Johnston wrote:

On Tue, Jan 5, 2021 at 8:46 AM Adam Tauno Williams <awilliam@whitemice.org> wrote:
Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.

I think you are assigning the function more intelligence/effort than it puts out.

From the documentation:
"""
The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. It fills the output value columns, left to right, with the value fields from these rows. If there are fewer rows in a group than there are output value columns, the extra output columns are filled with nulls; if there are more rows, the extra input rows are skipped.

In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row. Notice that crosstab itself does not pay any attention to the second column of the query result; it's just there to be ordered by, to control the order in which the third-column values appear across the page.
"""

The fact you don't have an order by, and that there are not an equal number of records per date, suggests to me that you are expecting the function to fill in the blanks when the documentation says it doesn't do that.

David J.

imho the crosstab() function isn't a good implementation. The biggest failure it has is that you must know exactly how many output columns you will have in the result prior to running it

Tim Clarke MBCS
IT Director
Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420

Telephone: Witham: +44(0)1376 503500 | London: +44 (0)20 3009 0853 | Frankfurt: +49 (0)69 7191 6000 | Hong Kong: +852 5803 1687 | Toronto: +1 647 503 2848
Web: https://www.manifest.co.uk/

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to admin@minerva.info
Legal: Minerva Analytics is the trading name of: Minerva Analytics Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here https://www.manifest.co.uk/legal/ for further information.

Re: CROSSTAB( .. only one column has values... )

От

Thomas Kellerer

Дата:

05 января 2021 г., 17:15:54

Adam Tauno Williams schrieb am 05.01.2021 um 16:46:
> I'm using the crosstab feature and do not understand why I am only
> getting values in the first column.
>
> The query:
> SELECT
>    date_trunc('month', t2.value_date) AS invoice_date,
>    t1.value_string AS invoice_type
>    COUNT(*)
> FROM document d
>    LEFT OUTER JOIN obj_property t1
>      ON (t1.obj_id = d.document_id
>          AND t1.namespace_prefix = 'http://www.example.com/ctabs'
>          AND t1.value_key = 'invoiceType')
>    LEFT OUTER JOIN obj_property t2
>      ON (t2.obj_id = d.document_id
>          AND t2.namespace_prefix = 'http://www.example.com/ctabs'
>          AND t2.value_key = 'invoiceDate')
> WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
> GROUP BY 1,2
>
>   - has results like -
>
> invoice_date invoice_type count
> ------------ ------------ -----
> 2013-02      service      3454
> 2013-03      service      3512
> 2013-03      parts        5366
> 2013-04      parts        5657
> 2013-04      service      4612
> 2013-05      service      4946
> 2013-05      parts        5508
> ...
>
> So I put this in as a crosstab:
>

I find using filtered aggregation to be way easier and more flexible than using crosstab():

     SELECT date_trunc('month', t2.value_date) AS invoice_date,
            count(*) filter (where value_string = 'rental') as rental,
            count(*) filter (where value_string = 'sales') as sales,
            count(*) filter (where value_string = 'service') as service
     FROM document d
        LEFT OUTER JOIN obj_property t1
          ON (t1.obj_id = d.document_id
              AND t1.namespace_prefix = 'http://www.example.com/ctabs'
              AND t1.value_key = 'invoiceType')
        LEFT OUTER JOIN obj_property t2
          ON (t2.obj_id = d.document_id
              AND t2.namespace_prefix = 'http://www.example.com/ctabs'
              AND t2.value_key = 'invoiceDate')
     WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
     GROUP BY 1


Thomas

Re: CROSSTAB( .. only one column has values... )

От

Joe Conway

Дата:

05 января 2021 г., 18:45:44

On 1/5/21 12:11 PM, Tim Clarke wrote:
> imho the crosstab() function isn't a good implementation. The biggest failure it
> has is that you must know exactly how many output columns you will have in the
> result /_prior to running it

*That* is a function of how Postgres set returning functions work, and not
specific to crosstab(). It is not easily fixed. Patches to fix that would be
welcomed!

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Вложения

signature.asc

Re: CROSSTAB( .. only one column has values... )

От

Joe Conway

Дата:

05 января 2021 г., 18:53:56

On 1/5/21 10:46 AM, Adam Tauno Williams wrote:
> I'm using the crosstab feature and do not understand why I am only
> getting values in the first column.

<snip>

> So I put this in as a crosstab:
>
> SELECT * FROM crosstab(
>     $$
 <snip>

>  $$) AS final_result (invoice_date VARCHAR, "parts" BIGINT, "rental"
> BIGINT, "sales" BIGINT, "service" BIGINT);
>
>  - and I get the results of -
>
> invoice_date parts rental sales  service
> ------------ ----- ------ ------ -------
> 2001-09      1     (null) (null) (null)
> 2007-07      1     (null) (null) (null)
> 2013-02      5353  (null) (null) (null)
> 2013-02      3454  (null) (null) (null)
> 2013-03      3512  (null) (null) (null)
> 2013-03      5366  (null) (null) (null)
> ...
>
> Only the first column has values, all the rest are NULL.
> I assume I am missing something stupid.

Seems you are using the wrong form of the crosstab() function. See

  https://www.postgresql.org/docs/current/tablefunc.html#id-1.11.7.47.5.7.2

"The main limitation of the single-parameter form of crosstab is that it treats
all values in a group alike, inserting each value into the first available
column. If you want the value columns to correspond to specific categories of
data, and some groups might not have data for some of the categories, that
doesn't work well. The two-parameter form of crosstab handles this case by
providing an explicit list of the categories corresponding to the output columns."

HTH,

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Вложения

signature.asc

Re: CROSSTAB( .. only one column has values... )

От

Pavel Stehule

Дата:

05 января 2021 г., 19:12:58

út 5. 1. 2021 v 19:45 odesílatel Joe Conway <mail@joeconway.com> napsal:

On 1/5/21 12:11 PM, Tim Clarke wrote:
> imho the crosstab() function isn't a good implementation. The biggest failure it
> has is that you must know exactly how many output columns you will have in the
> result /_prior to running it

*That* is a function of how Postgres set returning functions work, and not
specific to crosstab(). It is not easily fixed. Patches to fix that would be
welcomed!

https://www.postgresql.org/message-id/flat/CAFj8pRC%2BhNzpH%2B0bPRCnqNncUCGjEvpwX%2B0nbhb1F7gwjYZZNg%40mail.gmail.com#9b3fbdd968c271668049a103bfc93880

Regards

Pavel

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: CROSSTAB( .. only one column has values... )

От

"Daniel Verite"

Дата:

06 января 2021 г., 20:55:55

(resent to the list, previous post was rejected)

       Pavel Stehule wrote:

> > *That* is a function of how Postgres set returning functions work, and not
> > specific to crosstab(). It is not easily fixed. Patches to fix that would
> > be
> > welcomed!
> >
>
>
https://www.postgresql.org/message-id/flat/CAFj8pRC%2BhNzpH%2B0bPRCnqNncUCGjEvpwX%2B0nbhb1F7gwjYZZNg%40mail.gmail.com#9b3fbdd968c271668049a103bfc93880

Oracle has Polymorphic Table Function but still it doesn't work
seamlessly for dynamic pivots.
This blog post illustrates why:
https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/
The "related posts" at the end are also interesting about this feature.

Even before considering more dynamic SRFs in postgres, having a
variable number of columns for a DML statement is problematic
in general.

When an SQL statement is parsed/prepared, it's not supposed to
change its output structure, unless a DDL statement intervenes.
The dynamic pivot must break this promise, since a simple INSERT or
UPDATE or DELETE in the pivoting rows may cause the number of
output columns to change across invocations of the same statement.
That would mean that PREPARE would be basically unusable or
unreliable for such statements.

I think the query parser is also not supposed to read data outside
of the catalogs to determine the output structure of a query.
This promise would also need to be broken by dynamic pivots
implemented as a single-pass DML query.

On the other hand, dynamic pivots can be done by generating the SQL
dynamically and  getting at the results in a second step, or returning a
resultset embedded in a scalar (json).
Speaking of features that don't exist but might someday, ISTM that
CALL crosstab_dynamic(...) was more plausible than
SELECT * FROM crosstab_dynamic(...), since CALL
doesn't have the same contingencies as SELECT.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Re: CROSSTAB( .. only one column has values... )

От

Pavel Stehule

Дата:

06 января 2021 г., 21:06:06

st 6. 1. 2021 v 21:47 odesílatel Daniel Verite <daniel@manitou-mail.org> napsal:

Pavel Stehule wrote:

> > *That* is a function of how Postgres set returning functions work, and not
> > specific to crosstab(). It is not easily fixed. Patches to fix that would
> > be
> > welcomed!
> >
>
> https://www.postgresql.org/message-id/flat/CAFj8pRC%2BhNzpH%2B0bPRCnqNncUCGjEvpwX%2B0nbhb1F7gwjYZZNg%40mail.gmail.com#9b3fbdd968c271668049a103bfc93880

Oracle has Polymorphic Table Function but still it doesn't work
seamlessly for dynamic pivots.
This blog post illustrates why:
https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/
The "related posts" at the end are also interesting about this feature.

Even before considering more dynamic SRFs in postgres, having a
variable number of columns for a DML statement is problematic
in general.

When an SQL statement is parsed/prepared, it's not supposed to
change its output structure, unless a DDL statement intervenes.
The dynamic pivot must break this promise, since a simple INSERT or
UPDATE or DELETE in the pivoting rows may cause the number of
output columns to change across invocations of the same statement.
That would mean that PREPARE would be basically unusable or
unreliable for such statements.

yes

polymorphic functions need two steps. First step returns structure, second data.

The prepared statements can be supported, but there should be a recheck if the result has expected structure. And maybe in future, the prepared statements can be more dynamic, and can be able to do replaning when it will be necessary.

I think the query parser is also not supposed to read data outside
of the catalogs to determine the output structure of a query.
This promise would also need to be broken by dynamic pivots
implemented as a single-pass DML query.

On the other hand, dynamic pivots can be done by generating the SQL
dynamically and getting at the results in a second step, or returning a
resultset embedded in a scalar (json).
Speaking of features that don't exist but might someday, ISTM that
CALL crosstab_dynamic(...) was more plausible than
SELECT * FROM crosstab_dynamic(...), since CALL
doesn't have the same contingencies as SELECT.

Yes, it is the Sybase way and it can be useful. But you cannot work with the returned result more.

For users it isn't too important if they have to use polymorphic functions or dynamic recordset or a PIVOT clause. Important is performance and ergometry. Polymorphic functions can be used for more tasks than pivoting (reading some external sources, ...). Dynamic recordsets or multi recordsets can be nice features too. I like multi recordsets for reporting. And Oracle's PIVOT clause is just handy and doesn't require programming.

Regards

Pavel

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: CROSSTAB( .. only one column has values... )

Вложения

Вложения