Обсуждение: CROSSTAB( .. only one column has values... )
I'm using the crosstab feature and do not understand why I am only getting values in the first column. The query: SELECT date_trunc('month', t2.value_date) AS invoice_date, t1.value_string AS invoice_type COUNT(*) FROM document d LEFT OUTER JOIN obj_property t1 ON (t1.obj_id = d.document_id AND t1.namespace_prefix = 'http://www.example.com/ctabs' AND t1.value_key = 'invoiceType') LEFT OUTER JOIN obj_property t2 ON (t2.obj_id = d.document_id AND t2.namespace_prefix = 'http://www.example.com/ctabs' AND t2.value_key = 'invoiceDate') WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0)) GROUP BY 1,2 - has results like - invoice_date invoice_type count ------------ ------------ ----- 2013-02 service 3454 2013-03 service 3512 2013-03 parts 5366 2013-04 parts 5657 2013-04 service 4612 2013-05 service 4946 2013-05 parts 5508 ... So I put this in as a crosstab: SELECT * FROM crosstab( $$ SELECT SUBSTR(t2.value_string, 1, 7) AS invoice_date, t1.value_string AS invoice_type, COUNT(*) FROM document d LEFT OUTER JOIN obj_property t1 ON (t1.obj_id = d.document_id AND t1.namespace_prefix = 'http://www.example.com/ctabs' AND t1.value_key = 'invoiceType') LEFT OUTER JOIN obj_property t2 ON (t2.obj_id = d.document_id AND t2.namespace_prefix = 'http://www.example.com/ctabs' AND t2.value_key = 'invoiceDate') WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0)) GROUP BY 1,2 $$) AS final_result (invoice_date VARCHAR, "parts" BIGINT, "rental" BIGINT, "sales" BIGINT, "service" BIGINT); - and I get the results of - invoice_date parts rental sales service ------------ ----- ------ ------ ------- 2001-09 1 (null) (null) (null) 2007-07 1 (null) (null) (null) 2013-02 5353 (null) (null) (null) 2013-02 3454 (null) (null) (null) 2013-03 3512 (null) (null) (null) 2013-03 5366 (null) (null) (null) ... Only the first column has values, all the rest are NULL. I assume I am missing something stupid. -- Adam Tauno Williams <mailto:awilliam@whitemice.org> GPG D95ED383 Systems Administrator, Python Developer, LPI / NCLA
Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.
In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row. Notice that crosstab itself does not pay any attention to the second column of the query result; it's just there to be ordered by, to control the order in which the third-column values appear across the page.
On Tue, Jan 5, 2021 at 8:46 AM Adam Tauno Williams <awilliam@whitemice.org> wrote:Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.I think you are assigning the function more intelligence/effort than it puts out.From the documentation:"""The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. It fills the output value columns, left to right, with the value fields from these rows. If there are fewer rows in a group than there are output value columns, the extra output columns are filled with nulls; if there are more rows, the extra input rows are skipped.
In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row. Notice that crosstab itself does not pay any attention to the second column of the query result; it's just there to be ordered by, to control the order in which the third-column values appear across the page."""The fact you don't have an order by, and that there are not an equal number of records per date, suggests to me that you are expecting the function to fill in the blanks when the documentation says it doesn't do that.David J.
+1
imho the crosstab() function isn't a good implementation. The biggest failure it has is that you must know exactly how many output columns you will have in the result prior to running it
IT Director
Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420
Telephone: Witham: +44(0)1376 503500 | London: +44 (0)20 3009 0853 | Frankfurt: +49 (0)69 7191 6000 | Hong Kong: +852 5803 1687 | Toronto: +1 647 503 2848
Web: https://www.manifest.co.uk/
Minerva Analytics Ltd - A Solactive Company
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | United Kingdom
Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to admin@minerva.info
Legal: Minerva Analytics is the trading name of: Minerva Analytics Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here https://www.manifest.co.uk/legal/ for further information.
Adam Tauno Williams schrieb am 05.01.2021 um 16:46: > I'm using the crosstab feature and do not understand why I am only > getting values in the first column. > > The query: > SELECT > date_trunc('month', t2.value_date) AS invoice_date, > t1.value_string AS invoice_type > COUNT(*) > FROM document d > LEFT OUTER JOIN obj_property t1 > ON (t1.obj_id = d.document_id > AND t1.namespace_prefix = 'http://www.example.com/ctabs' > AND t1.value_key = 'invoiceType') > LEFT OUTER JOIN obj_property t2 > ON (t2.obj_id = d.document_id > AND t2.namespace_prefix = 'http://www.example.com/ctabs' > AND t2.value_key = 'invoiceDate') > WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0)) > GROUP BY 1,2 > > - has results like - > > invoice_date invoice_type count > ------------ ------------ ----- > 2013-02 service 3454 > 2013-03 service 3512 > 2013-03 parts 5366 > 2013-04 parts 5657 > 2013-04 service 4612 > 2013-05 service 4946 > 2013-05 parts 5508 > ... > > So I put this in as a crosstab: > I find using filtered aggregation to be way easier and more flexible than using crosstab(): SELECT date_trunc('month', t2.value_date) AS invoice_date, count(*) filter (where value_string = 'rental') as rental, count(*) filter (where value_string = 'sales') as sales, count(*) filter (where value_string = 'service') as service FROM document d LEFT OUTER JOIN obj_property t1 ON (t1.obj_id = d.document_id AND t1.namespace_prefix = 'http://www.example.com/ctabs' AND t1.value_key = 'invoiceType') LEFT OUTER JOIN obj_property t2 ON (t2.obj_id = d.document_id AND t2.namespace_prefix = 'http://www.example.com/ctabs' AND t2.value_key = 'invoiceDate') WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0)) GROUP BY 1 Thomas
On 1/5/21 12:11 PM, Tim Clarke wrote: > imho the crosstab() function isn't a good implementation. The biggest failure it > has is that you must know exactly how many output columns you will have in the > result /_prior to running it *That* is a function of how Postgres set returning functions work, and not specific to crosstab(). It is not easily fixed. Patches to fix that would be welcomed! Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Вложения
On 1/5/21 10:46 AM, Adam Tauno Williams wrote: > I'm using the crosstab feature and do not understand why I am only > getting values in the first column. <snip> > So I put this in as a crosstab: > > SELECT * FROM crosstab( > $$ <snip> > $$) AS final_result (invoice_date VARCHAR, "parts" BIGINT, "rental" > BIGINT, "sales" BIGINT, "service" BIGINT); > > - and I get the results of - > > invoice_date parts rental sales service > ------------ ----- ------ ------ ------- > 2001-09 1 (null) (null) (null) > 2007-07 1 (null) (null) (null) > 2013-02 5353 (null) (null) (null) > 2013-02 3454 (null) (null) (null) > 2013-03 3512 (null) (null) (null) > 2013-03 5366 (null) (null) (null) > ... > > Only the first column has values, all the rest are NULL. > I assume I am missing something stupid. Seems you are using the wrong form of the crosstab() function. See https://www.postgresql.org/docs/current/tablefunc.html#id-1.11.7.47.5.7.2 "The main limitation of the single-parameter form of crosstab is that it treats all values in a group alike, inserting each value into the first available column. If you want the value columns to correspond to specific categories of data, and some groups might not have data for some of the categories, that doesn't work well. The two-parameter form of crosstab handles this case by providing an explicit list of the categories corresponding to the output columns." HTH, Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Вложения
On 1/5/21 12:11 PM, Tim Clarke wrote:
> imho the crosstab() function isn't a good implementation. The biggest failure it
> has is that you must know exactly how many output columns you will have in the
> result /_prior to running it
*That* is a function of how Postgres set returning functions work, and not
specific to crosstab(). It is not easily fixed. Patches to fix that would be
welcomed!
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
(resent to the list, previous post was rejected) Pavel Stehule wrote: > > *That* is a function of how Postgres set returning functions work, and not > > specific to crosstab(). It is not easily fixed. Patches to fix that would > > be > > welcomed! > > > > https://www.postgresql.org/message-id/flat/CAFj8pRC%2BhNzpH%2B0bPRCnqNncUCGjEvpwX%2B0nbhb1F7gwjYZZNg%40mail.gmail.com#9b3fbdd968c271668049a103bfc93880 Oracle has Polymorphic Table Function but still it doesn't work seamlessly for dynamic pivots. This blog post illustrates why: https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/ The "related posts" at the end are also interesting about this feature. Even before considering more dynamic SRFs in postgres, having a variable number of columns for a DML statement is problematic in general. When an SQL statement is parsed/prepared, it's not supposed to change its output structure, unless a DDL statement intervenes. The dynamic pivot must break this promise, since a simple INSERT or UPDATE or DELETE in the pivoting rows may cause the number of output columns to change across invocations of the same statement. That would mean that PREPARE would be basically unusable or unreliable for such statements. I think the query parser is also not supposed to read data outside of the catalogs to determine the output structure of a query. This promise would also need to be broken by dynamic pivots implemented as a single-pass DML query. On the other hand, dynamic pivots can be done by generating the SQL dynamically and getting at the results in a second step, or returning a resultset embedded in a scalar (json). Speaking of features that don't exist but might someday, ISTM that CALL crosstab_dynamic(...) was more plausible than SELECT * FROM crosstab_dynamic(...), since CALL doesn't have the same contingencies as SELECT. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Pavel Stehule wrote:
> > *That* is a function of how Postgres set returning functions work, and not
> > specific to crosstab(). It is not easily fixed. Patches to fix that would
> > be
> > welcomed!
> >
>
> https://www.postgresql.org/message-id/flat/CAFj8pRC%2BhNzpH%2B0bPRCnqNncUCGjEvpwX%2B0nbhb1F7gwjYZZNg%40mail.gmail.com#9b3fbdd968c271668049a103bfc93880
Oracle has Polymorphic Table Function but still it doesn't work
seamlessly for dynamic pivots.
This blog post illustrates why:
https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/
The "related posts" at the end are also interesting about this feature.
Even before considering more dynamic SRFs in postgres, having a
variable number of columns for a DML statement is problematic
in general.
When an SQL statement is parsed/prepared, it's not supposed to
change its output structure, unless a DDL statement intervenes.
The dynamic pivot must break this promise, since a simple INSERT or
UPDATE or DELETE in the pivoting rows may cause the number of
output columns to change across invocations of the same statement.
That would mean that PREPARE would be basically unusable or
unreliable for such statements.
I think the query parser is also not supposed to read data outside
of the catalogs to determine the output structure of a query.
This promise would also need to be broken by dynamic pivots
implemented as a single-pass DML query.
On the other hand, dynamic pivots can be done by generating the SQL
dynamically and getting at the results in a second step, or returning a
resultset embedded in a scalar (json).
Speaking of features that don't exist but might someday, ISTM that
CALL crosstab_dynamic(...) was more plausible than
SELECT * FROM crosstab_dynamic(...), since CALL
doesn't have the same contingencies as SELECT.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite