Обсуждение: Crosstab Problems
Hi there,
successfully installed the tablefunc package.
Now, I would like to transform this kind of result based on a normal SQL:
c_name | year | value
---------------------------------------
Germany | 2001 | 123
Germany | 2002 | 125
Germany | 2003 | 128
Germany | 2004 | 132
Germany | 2005 | 135
Italy | 2001 | 412
Italy | 2002 | 429
Italy | 2003 | 456
Italy | 2004 | 465
Italy | 2005 | 477
to this one:
c_name | 2001 | 2002 | 2003 | 2004 | 2005
------------------------------------------------------------------------
Germany | 123 | 125 .....
Italy | 412 | .....
I use this SQL statement:
SELECT
*
FROM
crosstab(
'SELECT
c.name AS name,
year_start AS year,
value
FROM
agri_area AS d
LEFT JOIN
countries AS c ON c.id = id_country
WHERE
year_start = 2003 OR
year_start = 2002 OR
year_start = 2001
ORDER BY
name ASC,
year_start ASC;'
, 3)
AS ct(name varchar, y_2003 numeric, y_2002 numeric, y_2001 numeric)
I had a couple of problems getting there. But now that I have the feeling that this is OK, it tells me this:
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Can anyone tell me why? And how to get it right? Thanks for any advice!
Stef
Stefan Schwarzer <stefan.schwarzer@grid.unep.ch> writes: > I had a couple of problems getting there. But now that I have the > feeling that this is OK, it tells me this: > server closed the connection unexpectedly Could you provide a self-contained test case for this? There's not really enough information here for someone else to duplicate the problem. Also, which PG version are you using? regards, tom lane
> Could you provide a self-contained test case for this? There's not > really enough information here for someone else to duplicate the > problem. Also, which PG version are you using? Wasn't sure what you ment with "a self containted test case". Is it the raw data? Here is a SQL dump for the table. One can just neglect the JOIN with the countries table (which just replaces the country id with the country name): http://geodata.grid.unep.ch/download/sql_agri_area.sql.zip But when re-doing the query now without the JOIN, it works (almost): SELECT * FROM crosstab( 'SELECT id_country AS id, year_start AS year, value FROM agri_area AS d WHERE year_start = 2003 OR year_start = 2002 OR year_start = 2001 ORDER BY year_start ASC, id_country ASC;' , 3) AS ct(id int2, y_2003 numeric, y_2002 numeric, y_2001 numeric) Now, the problem is that it lists three times the IDs, and only the first year column is filled with values. The other two year columns stay empty. Thanks for any advice! Stef
Stefan Schwarzer <stefan.schwarzer@grid.unep.ch> writes: > Here is a SQL dump for the table. One can just neglect the JOIN with > the countries table (which just replaces the country id with the > country name): > http://geodata.grid.unep.ch/download/sql_agri_area.sql.zip > But when re-doing the query now without the JOIN, it works (almost): OK, after poking at it, it seems that crosstab() isn't prepared for null rowids. I can reproduce the crash without any data: contrib_regression=# select * from crosstab( 'SELECT null::text as name, 10 as year, 42 as value', 3) as ct(name text, year int, value int); server closed the connection unexpectedly Backtrace looks like #0 0xc008774c in ?? () from /usr/lib/libc.1 #1 0x3eb0bc in MemoryContextStrdup (context=0x40167048, string=0x0) at mcxt.c:662 #2 0xc0a5f2e4 in crosstab (fcinfo=0x7b03b858) at tablefunc.c:539 #3 0x239e24 in ExecMakeTableFunctionResult (funcexpr=0x401615e8, econtext=0x401611f0, expectedDesc=0x401613a0, returnDesc=0x7b03b7d8) at execQual.c:1566 #4 0x24d264 in FunctionNext (node=0x40161160) at nodeFunctionscan.c:68 #5 0x23ed8c in ExecScan (node=0x40161160, accessMtd=0x400170b2 <DINFINITY+3218>) at execScan.c:68 #6 0x24d2c4 in ExecFunctionScan (node=0x40167048) at nodeFunctionscan.c:109 so it's trying to pstrdup a null result from SPI_getvalue. Obviously it shouldn't crash, but I'm not sure what it *should* do in this case. Joe? In the meantime, it appears that you want to not use a LEFT JOIN here, or else maybe COALESCE(c.name, '') so that a null isn't returned to crosstab. regards, tom lane
On 10/18/07, Stefan Schwarzer <stefan.schwarzer@grid.unep.ch> wrote: > > Could you provide a self-contained test case for this? There's not > > really enough information here for someone else to duplicate the > > problem. Also, which PG version are you using? > > Wasn't sure what you ment with "a self containted test case". Is it > the raw data? > > Here is a SQL dump for the table. One can just neglect the JOIN with > the countries table (which just replaces the country id with the > country name): > > http://geodata.grid.unep.ch/download/sql_agri_area.sql.zip > > But when re-doing the query now without the JOIN, it works (almost): > > SELECT > * > FROM > crosstab( > 'SELECT > id_country AS id, > year_start AS year, > value > FROM > agri_area AS d > WHERE > year_start = 2003 OR year_start = 2002 OR year_start = > 2001 ORDER BY year_start ASC, id_country ASC;' > , 3) > AS ct(id int2, y_2003 numeric, y_2002 numeric, y_2001 numeric) > > Now, the problem is that it lists three times the IDs, and only the > first year column is filled with values. The other two year columns > stay empty. I use crosstab for a rather large weekly report in our db and it works fine, however, you can't feel it nulls. It needs all the holes filled in, so to speak. In mine I had to use generate_series to make sure all the rows were there, then coalesce to make sure there were no nulls. You might need to do something like that in yours. I'm trying it out now.
On 10/18/07, Stefan Schwarzer <stefan.schwarzer@grid.unep.ch> wrote: > But when re-doing the query now without the JOIN, it works (almost): > > SELECT > * > FROM > crosstab( > 'SELECT > id_country AS id, > year_start AS year, > value > FROM > agri_area AS d > WHERE > year_start = 2003 OR year_start = 2002 OR year_start = > 2001 ORDER BY year_start ASC, id_country ASC;' > , 3) > AS ct(id int2, y_2003 numeric, y_2002 numeric, y_2001 numeric) > > Now, the problem is that it lists three times the IDs, and only the > first year column is filled with values. The other two year columns > stay empty. You missed this point in the docs: Notes 1. The sql result must be ordered by 1,2. Change your order by to that and it works fine.
Tom Lane wrote: > so it's trying to pstrdup a null result from SPI_getvalue. > > Obviously it shouldn't crash, but I'm not sure what it *should* do in > this case. Joe? The row is pretty useless without a rowid in this context -- it seems like the best thing to do would be to skip those rows entirely. Of course you could argue I suppose that it ought to throw an ERROR and bail out entirely. Maybe a good compromise would be to skip the row but throw a NOTICE? Joe
Em Thursday 18 October 2007 16:37:59 Joe Conway escreveu: > Tom Lane wrote: > > so it's trying to pstrdup a null result from SPI_getvalue. > > > > Obviously it shouldn't crash, but I'm not sure what it *should* do in > > this case. Joe? > > The row is pretty useless without a rowid in this context -- it seems > like the best thing to do would be to skip those rows entirely. Of > course you could argue I suppose that it ought to throw an ERROR and > bail out entirely. Maybe a good compromise would be to skip the row but > throw a NOTICE? If I were using it and having this problem I'd rather have an ERROR. It isn't uncommon for people not look at their logs and it isn't uncommon for them just run command from some language using a database adapter that might not return the NOTICE output. The ERROR wouldn't pass unnoticed. -- Jorge Godoy <jgodoy@gmail.com>
Jorge Godoy <jgodoy@gmail.com> writes: > Em Thursday 18 October 2007 16:37:59 Joe Conway escreveu: >> The row is pretty useless without a rowid in this context -- it seems >> like the best thing to do would be to skip those rows entirely. Of >> course you could argue I suppose that it ought to throw an ERROR and >> bail out entirely. Maybe a good compromise would be to skip the row but >> throw a NOTICE? > If I were using it and having this problem I'd rather have an ERROR. I can think of four reasonably credible alternatives: 1. Treat NULL rowid as a category in its own right. This would conform with the behavior of GROUP BY and DISTINCT, for instance. 2. Throw an ERROR if NULL rowid is seen. 3. Throw a NOTICE or WARNING (hopefully only one message not repeated ones) if NULL rowid is seen, then ignore the row. 4. Silently ignore rows with NULL rowid. Not being a heavy user of crosstab(), I'm not sure which of these is the most appropriate, but #1 seems the most defensible from a theoretical perspective. Since the bug has gone undiscovered this long, it seems obvious that not too many people actually try to feed null rowids to crosstab; so expending a lot of effort to fix it is probably not reasonable. If you don't like #1 I'd vote for #2 second. regards, tom lane
Tom Lane wrote: > Jorge Godoy <jgodoy@gmail.com> writes: >> Em Thursday 18 October 2007 16:37:59 Joe Conway escreveu: >>> The row is pretty useless without a rowid in this context -- it seems >>> like the best thing to do would be to skip those rows entirely. Of >>> course you could argue I suppose that it ought to throw an ERROR and >>> bail out entirely. Maybe a good compromise would be to skip the row but >>> throw a NOTICE? > >> If I were using it and having this problem I'd rather have an ERROR. > > I can think of four reasonably credible alternatives: > > 1. Treat NULL rowid as a category in its own right. This would conform > with the behavior of GROUP BY and DISTINCT, for instance. > > 2. Throw an ERROR if NULL rowid is seen. > Not being a heavy user of crosstab(), I'm not sure which of these is the > most appropriate, but #1 seems the most defensible from a theoretical > perspective. > > Since the bug has gone undiscovered this long, it seems obvious that > not too many people actually try to feed null rowids to crosstab; so > expending a lot of effort to fix it is probably not reasonable. > If you don't like #1 I'd vote for #2 second. Hadn't really thought about #1, but now that you mention it, it does make sense. #1 gets my vote too. I'll pick this up next week if that's OK. Joe
>> But when re-doing the query now without the JOIN, it works (almost): >> >> SELECT >> * >> FROM >> crosstab( >> 'SELECT >> id_country AS id, >> year_start AS year, >> value >> FROM >> agri_area AS d >> WHERE >> year_start = 2003 OR year_start = 2002 OR year_start = >> 2001 ORDER BY year_start ASC, id_country ASC;' >> , 3) >> AS ct(id int2, y_2003 numeric, y_2002 numeric, y_2001 numeric) >> >> Now, the problem is that it lists three times the IDs, and only the >> first year column is filled with values. The other two year columns >> stay empty. > > You missed this point in the docs: > > Notes > > 1. The sql result must be ordered by 1,2. > Change your order by to that and it works fine. Oh, great. No, haven't seen it. Now it works. Thanks a lot! Just for the completeness, I attach the SQL. SELECT * FROM crosstab( 'SELECT COALESCE(c.name, ''''), year_start AS year, value FROM agri_area AS d LEFT JOIN countries AS c ON c.id = id_country WHERE year_start = 2003 OR year_start = 2002 OR year_start = 2001 GROUP BY name, id_country, year_start, value ORDER BY 1,2;' , 3) AS ct(name varchar, y_2003 numeric, y_2002 numeric, y_2001 numeric)
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > 3. Throw a NOTICE or WARNING (hopefully only one message not repeated > ones) if NULL rowid is seen, then ignore the row. From my experience with OLTP I don't like this one. A warning for DML is effectively the same as an error if you're running thousands of queries per minute. The logs fill up and even if you filter the logs it imposes extra run-time overhead. You end up having to avoid the warning just as if it had been an error. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Jorge Godoy <jgodoy@gmail.com> writes: >> Em Thursday 18 October 2007 16:37:59 Joe Conway escreveu: >>> The row is pretty useless without a rowid in this context -- it seems >>> like the best thing to do would be to skip those rows entirely. Of >>> course you could argue I suppose that it ought to throw an ERROR and >>> bail out entirely. Maybe a good compromise would be to skip the row but >>> throw a NOTICE? > >> If I were using it and having this problem I'd rather have an ERROR. > > I can think of four reasonably credible alternatives: > > 1. Treat NULL rowid as a category in its own right. This would conform > with the behavior of GROUP BY and DISTINCT, for instance. > 4. Silently ignore rows with NULL rowid. After looking closer I realized that #4 was my original intention, and there was even code attempting to implement it, but just not very well ;-(. In any case, the attached changes the behavior to #1 for both flavors of crosstab (the original crosstab(text, int) and the usually more useful crosstab(text, text)). It is appropriate for 8.3 but not back-patching as it changes behavior in a non-backward compatible way and is probably too invasive anyway. I'll do something much simpler just to prevent crashing for 8.2 and earlier. If there are no objections I'll apply Thursday. Joe Index: tablefunc.c =================================================================== RCS file: /opt/src/cvs/pgsql/contrib/tablefunc/tablefunc.c,v retrieving revision 1.47 diff -c -r1.47 tablefunc.c *** tablefunc.c 3 Mar 2007 19:32:54 -0000 1.47 --- tablefunc.c 25 Oct 2007 02:11:06 -0000 *************** *** 355,360 **** --- 355,361 ---- crosstab_fctx *fctx; int i; int num_categories; + bool firstpass = false; MemoryContext oldcontext; /* stuff done only on the first call of the function */ *************** *** 469,474 **** --- 470,476 ---- funcctx->max_calls = proc; MemoryContextSwitchTo(oldcontext); + firstpass = true; } /* stuff done on every call of the function */ *************** *** 500,506 **** HeapTuple tuple; Datum result; char **values; ! bool allnulls = true; while (true) { --- 502,508 ---- HeapTuple tuple; Datum result; char **values; ! bool skip_tuple = false; while (true) { *************** *** 530,555 **** rowid = SPI_getvalue(spi_tuple, spi_tupdesc, 1); /* ! * If this is the first pass through the values for this rowid ! * set it, otherwise make sure it hasn't changed on us. Also ! * check to see if the rowid is the same as that of the last ! * tuple sent -- if so, skip this tuple entirely */ if (i == 0) - values[0] = pstrdup(rowid); - - if ((rowid != NULL) && (strcmp(rowid, values[0]) == 0)) { ! if ((lastrowid != NULL) && (strcmp(rowid, lastrowid) == 0)) break; ! else if (allnulls == true) ! allnulls = false; /* ! * Get the next category item value, which is alway * attribute number three. * ! * Be careful to sssign the value to the array index based * on which category we are presently processing. */ values[1 + i] = SPI_getvalue(spi_tuple, spi_tupdesc, 3); --- 532,574 ---- rowid = SPI_getvalue(spi_tuple, spi_tupdesc, 1); /* ! * If this is the first pass through the values for this ! * rowid, set the first column to rowid */ if (i == 0) { ! if (rowid) ! values[0] = pstrdup(rowid); ! else ! values[0] = NULL; ! ! /* ! * Check to see if the rowid is the same as that of the last ! * tuple sent -- if so, skip this tuple entirely ! */ ! if (!firstpass && ! (((lastrowid == NULL) && (rowid == NULL)) || ! ((lastrowid != NULL) && ! (rowid != NULL) && ! (strcmp(rowid, lastrowid) == 0)))) ! { ! skip_tuple = true; break; ! } ! } + /* + * If rowid hasn't changed on us, continue building the + * ouput tuple. + */ + if ((rowid && values[0] && (strcmp(rowid, values[0]) == 0)) || + ((rowid == NULL) && (values[0] == NULL))) + { /* ! * Get the next category item value, which is always * attribute number three. * ! * Be careful to assign the value to the array index based * on which category we are presently processing. */ values[1 + i] = SPI_getvalue(spi_tuple, spi_tupdesc, 3); *************** *** 572,584 **** call_cntr = --funcctx->call_cntr; break; } ! ! if (rowid != NULL) ! xpfree(rowid); } xpfree(fctx->lastrowid); - if (values[0] != NULL) { /* --- 591,600 ---- call_cntr = --funcctx->call_cntr; break; } ! xpfree(rowid); } xpfree(fctx->lastrowid); if (values[0] != NULL) { /* *************** *** 586,597 **** * calls */ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); - lastrowid = fctx->lastrowid = pstrdup(values[0]); MemoryContextSwitchTo(oldcontext); } ! if (!allnulls) { /* build the tuple */ tuple = BuildTupleFromCStrings(attinmeta, values); --- 602,614 ---- * calls */ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); lastrowid = fctx->lastrowid = pstrdup(values[0]); MemoryContextSwitchTo(oldcontext); } + else + lastrowid = fctx->lastrowid = NULL; ! if (!skip_tuple) { /* build the tuple */ tuple = BuildTupleFromCStrings(attinmeta, values); *************** *** 625,630 **** --- 642,650 ---- SPI_finish(); SRF_RETURN_DONE(funcctx); } + + /* need to reset this before the next tuple is started */ + skip_tuple = false; } } } *************** *** 856,861 **** --- 876,882 ---- int ncols = spi_tupdesc->natts; char *rowid; char *lastrowid = NULL; + bool firstpass = true; int i, j; int result_ncols; *************** *** 918,938 **** /* get the rowid from the current sql result tuple */ rowid = SPI_getvalue(spi_tuple, spi_tupdesc, 1); - /* if rowid is null, skip this tuple entirely */ - if (rowid == NULL) - continue; - /* * if we're on a new output row, grab the column values up to * column N-2 now */ ! if ((lastrowid == NULL) || (strcmp(rowid, lastrowid) != 0)) { /* * a new row means we need to flush the old one first, unless * we're on the very first row */ ! if (lastrowid != NULL) { /* rowid changed, flush the previous output row */ tuple = BuildTupleFromCStrings(attinmeta, values); --- 939,958 ---- /* get the rowid from the current sql result tuple */ rowid = SPI_getvalue(spi_tuple, spi_tupdesc, 1); /* * if we're on a new output row, grab the column values up to * column N-2 now */ ! if (firstpass || ! (lastrowid == NULL && rowid != NULL) || ! (lastrowid != NULL && rowid == NULL) || ! (lastrowid != NULL && rowid != NULL && (strcmp(rowid, lastrowid) != 0))) { /* * a new row means we need to flush the old one first, unless * we're on the very first row */ ! if (!firstpass) { /* rowid changed, flush the previous output row */ tuple = BuildTupleFromCStrings(attinmeta, values); *************** *** 949,954 **** --- 969,977 ---- values[0] = rowid; for (j = 1; j < ncols - 2; j++) values[j] = SPI_getvalue(spi_tuple, spi_tupdesc, j + 1); + + /* we're no longer on the first pass */ + firstpass = false; } /* look up the category and fill in the appropriate column */ *************** *** 964,970 **** } xpfree(lastrowid); ! lastrowid = pstrdup(rowid); } /* flush the last output row */ --- 987,994 ---- } xpfree(lastrowid); ! if (rowid) ! lastrowid = pstrdup(rowid); } /* flush the last output row */
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> 1. Treat NULL rowid as a category in its own right. This would conform >> with the behavior of GROUP BY and DISTINCT, for instance. > In any case, the attached changes the behavior to #1 for both flavors of > crosstab (the original crosstab(text, int) and the usually more useful > crosstab(text, text)). > It is appropriate for 8.3 but not back-patching as it changes behavior > in a non-backward compatible way and is probably too invasive anyway. Um, if the previous code crashed in this case, why would you worry about being backward-compatible with it? You're effectively changing the behavior anyway, so you might as well make it do what you've decided is the right thing. regards, tom lane
On 10/24/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Joe Conway <mail@joeconway.com> writes: > > Tom Lane wrote: > >> 1. Treat NULL rowid as a category in its own right. This would conform > >> with the behavior of GROUP BY and DISTINCT, for instance. > > > In any case, the attached changes the behavior to #1 for both flavors of > > crosstab (the original crosstab(text, int) and the usually more useful > > crosstab(text, text)). > > > It is appropriate for 8.3 but not back-patching as it changes behavior > > in a non-backward compatible way and is probably too invasive anyway. > > Um, if the previous code crashed in this case, why would you worry about > being backward-compatible with it? You're effectively changing the > behavior anyway, so you might as well make it do what you've decided is > the right thing. As a crosstab user, I agree with Tom.
Il Thursday 25 October 2007 16:29:33 Scott Marlowe ha scritto: > On 10/24/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Joe Conway <mail@joeconway.com> writes: > > > Tom Lane wrote: > > >> 1. Treat NULL rowid as a category in its own right. This would > > >> conform with the behavior of GROUP BY and DISTINCT, for instance. > > > > > > In any case, the attached changes the behavior to #1 for both flavors > > > of crosstab (the original crosstab(text, int) and the usually more > > > useful crosstab(text, text)). > > > > > > It is appropriate for 8.3 but not back-patching as it changes behavior > > > in a non-backward compatible way and is probably too invasive anyway. > > > > Um, if the previous code crashed in this case, why would you worry about > > being backward-compatible with it? You're effectively changing the > > behavior anyway, so you might as well make it do what you've decided is > > the right thing. > > As a crosstab user, I agree with Tom. If I can throw in my EUR 0.01 contrib, I would agree with Joe (thanks for your wonderful crosstab). If crosstab in 8.3 will have a different behaviour *and* it's not part of the core features, then I'd prefer to correct it. In any case developers will have to cope with discrepancies when going to 8.3 and you can bet they won't remain with 8.2 when 8.3 will be rolled out. And, by the way, why not including the crosstab as a standard feature? I think it deserves it!