Re: [patch] Proposal for \crosstabview in psql
От | Daniel Verite |
---|---|
Тема | Re: [patch] Proposal for \crosstabview in psql |
Дата | |
Msg-id | d4186a6f-4e25-4c62-bd65-4cf3929c21c6@mm обсуждение исходный текст |
Ответ на | Re: [patch] Proposal for \rotate in psql ("Daniel Verite" <daniel@manitou-mail.org>) |
Ответы |
Re: [patch] Proposal for \crosstabview in psql
Re: [patch] Proposal for \crosstabview in psql |
Список | pgsql-hackers |
Hi, Here's an updated patch that replaces sorted arrays by AVL binary trees when gathering distinct values for the columns involved in the pivot. The change is essential for large resultsets. For instance, it allows to process a query like this (10 million rows x 10 columns): select x,(random()*10)::int, (random()*1000)::int from generate_series(1,10000000) as x \crosstabview which takes about 30 seconds to run and display on my machine with the attached patch. That puts it seemingly in the same ballpark than the equivalent test with the server-side crosstab(). With the previous iterations of the patch, this test would never end, even with much smaller sets, as the execution time of the 1st step grew exponentially with the number of distinct keys. The exponential effect starts to be felt at about 10k values on my low-end CPU, and from there quickly becomes problematic. As a client-side display feature, processing millions of rows like in the query above does not necessarily make sense, it's pushing the envelope, but stalling way below 100k rows felt lame, so I'm happy to get rid of that limitation. However, there is another one. The above example does not need or request an additional sort step, but if it did, sorting more than 65535 entries in the vertical header would error out, because values are shipped as parameters to PQexecParams(), which only accepts that much. To avoid the problem, when the rows in the output "grid" exceed 2^16 and they need to be sorted, the user must let the sort being driven by ORDER BY beforehand in the query, knowing that the pivot will keep the original ordering intact in the vertical header. I'm still thinking about extending this based on Pavel's diff for the "label" column, so that \crosstabview [+|-]colV[:colSortH] [+|-]colH[:colSortH] would mean to use colV/H as grid headers but sort them according to colSortV/H. I prefer that syntax over adding more parameters, and also I'd like to have it work in both V and H directions. Aside from the AVL trees, there are a few other minor changes in that patch: - move non-exportable structs from the .h to the .c - move code in common.c to respect alphabetical ordering - if vertical sort is requested, add explicit check against more than 65535 params instead of letting the sort query fail - report all failure cases of the sort query - rename sortColumns to serverSort and use less the term "columns" in comments and variables. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Вложения
В списке pgsql-hackers по дате отправления: