Re: ToDo: show size of partitioned table

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: ToDo: show size of partitioned table
Дата
Msg-id CAFj8pRBJetR9tk5cNGt5mLewxwB4gXK0dm7L41fxUqBJ58h2wg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: ToDo: show size of partitioned table  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: ToDo: show size of partitioned table  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers


čt 22. 11. 2018 v 1:51 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Wed, Nov 21, 2018 at 05:37:33PM +0100, Pavel Stehule wrote:
> st 21. 11. 2018 v 17:21 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com>
> napsal:
>> Hmm, these tests are not going to work, because they have "pavel" in the
>> expected output.
>
> I was blind, thank you for check

+create table testtable_apple(logdate date);
+create table testtable_orange(logdate date);
+create index testtable_apple_index on testtable_apple(logdate);
+create index testtable_orange_index on testtable_orange(logdate);
There are already a bunch of partition relations with multiple levels
created as part of the regression tests, so instead of creating more of
those, I would suggest to test \dP and \dPt in create_table.sql, and
\dPi in indexing.sql (please make sure to add tests for \dP with
partitioned indexes as well).

I think that you should really add the direct parent of a partition in
at least the verbose output, now for multiple partition levels things
are confusing in my opinion.  For example with such a schema:
CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id);
CREATE INDEX parent_index ON parent_tab (id);
CREATE TABLE child_0_10 PARTITION OF parent_tab
  FOR VALUES FROM (0) TO (10);
CREATE TABLE child_10_20 PARTITION OF parent_tab
  FOR VALUES FROM (10) TO (20);
CREATE TABLE child_20_30 PARTITION OF parent_tab
  FOR VALUES FROM (20) TO (30);
INSERT INTO parent_tab VALUES (generate_series(0,29));
CREATE TABLE child_30_40 PARTITION OF parent_tab
FOR VALUES FROM (30) TO (40)
  PARTITION BY RANGE(id);
CREATE TABLE child_30_35 PARTITION OF child_30_40
  FOR VALUES FROM (30) TO (35);
CREATE TABLE child_35_40 PARTITION OF child_30_40
   FOR VALUES FROM (35) TO (40);
INSERT INTO parent_tab VALUES (generate_series(30,39));

Then with \dP+ I got that:
=# \dP+
            List of partitioned relations
 Schema |    Name     | Owner  |  Size  | Description
--------+-------------+--------+--------+-------------
 public | child_30_40 | ioltas | 48 kB  |
 public | parent_tab  | ioltas | 120 kB |
(2 rows)
Showing the parent partition looks like a pretty important to me as I
would expect multi-level partitions to be a frequent case (perhaps it
should show up as well in the non-verbose output?).  The field should be
NULL if the relation is the top of the tree.


it looks like bug for me much more.

your example - on my comp

                               List of relations
+--------+-------------+-------------------+-------+------------+-------------+
| Schema |    Name     |       Type        | Owner |    Size    | Description |
+--------+-------------+-------------------+-------+------------+-------------+
| public | child_0_10  | table             | pavel | 8192 bytes |             |
| public | child_10_20 | table             | pavel | 8192 bytes |             |
| public | child_20_30 | table             | pavel | 8192 bytes |             |
| public | child_30_35 | table             | pavel | 8192 bytes |             |
| public | child_30_40 | partitioned table | pavel | 0 bytes    |             |
| public | child_35_40 | table             | pavel | 8192 bytes |             |
| public | parent_tab  | partitioned table | pavel | 0 bytes    |             |
+--------+-------------+-------------------+-------+------------+-------------+
(7 rows)


there is about 5x 8KB data .. 40KB

But in views I got

              List of partitioned tables
+--------+-------------+-------+-------+-------------+
| Schema |    Name     | Owner | Size  | Description |
+--------+-------------+-------+-------+-------------+
| public | child_30_40 | pavel | 16 kB |             |
| public | parent_tab  | pavel | 40 kB |             |
+--------+-------------+-------+-------+-------------+
(2 rows)


there is 16KB more, what is really messy.

I think so most correct is removing child_30_40 from the report.

test=# SELECT n.nspname as "Schema",
  c.relname as "Name",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
  (SELECT pg_catalog.pg_size_pretty(sum(pg_catalog.pg_table_size(relid)))
     FROM pg_catalog.pg_partition_tree(c.oid)) AS "Size",
  pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('p') and not c.relispartition 
      AND n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema'
      AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
+--------+------------+-------+-------+-------------+
| Schema |    Name    | Owner | Size  | Description |
+--------+------------+-------+-------+-------------+
| public | parent_tab | pavel | 40 kB |             |
+--------+------------+-------+-------+-------------+
(1 row)


I afraid of unreadable result if we allow overlap in report. I think so can be strange if some disk space will be reported 2x or more times in one report. Unfortunately It means so some information will be hidden. In this moment I prefer readability and simple meaning.

I am not strong in this topics. Another possibility is show parent (this should be displayed every time, without it it is messy).

This query is much more complex, but the result is more informative

SELECT n.nspname as "Schema",
  c.relname as "Name",
  n2.nspname as "Parent schema",
  c2.relname as "Parent name",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
  s.max as "Hiearchy deep",
  s.size as "Size",
  pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
     LEFT JOIN pg_catalog.pg_inherits i ON c.oid = i.inhrelid
     LEFT JOIN pg_catalog.pg_class c2 ON c2.oid = i.inhparent
     LEFT JOIN pg_catalog.pg_namespace n2 ON n2.oid = c2.relnamespace,
     LATERAL (SELECT max(level), pg_catalog.pg_size_pretty(sum(pg_catalog.pg_table_size(relid))) as size
     FROM pg_catalog.pg_partition_tree(c.oid)) s
WHERE c.relkind IN ('p')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema'
      AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;

+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
| Schema |    Name     | Parent schema | Parent name | Owner | Hiearchy deep | Size  | Description |
+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
| public | child_30_40 | public        | parent_tab  | pavel |             1 | 16 kB |             |
| public | parent_tab  |               |             | pavel |             2 | 40 kB |             |
+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
(2 rows)


Still I prefer to not show nested partitioned tables for simplicity, readability reasons. Displaying nested objects in one table doesn't look like good idea for me. But I am ready to accept different common opinion.

Still do you think so variant with parent should be preferred?

 
Again, with the previous schema:
=# \dPi *idx
            List of partitioned indexes
 Schema |        Name        | Owner  |    Table
--------+--------------------+--------+-------------
 public | child_30_40_id_idx | ioltas | child_30_40
(1 row)
=# \dP *idx
Did not find any partitioned relations named "*idx"
I would have expected in the second case to have the partitioned
*relations* showing up in the output, and a relation can be an index as
well if the pattern matches.

I think so it is correct - I don't would to see the index here, because index size is calculated by total_relation_size already.

Here my position is strong. \dP for me doesn't mean "tables or indexes" - it means "partition tables with total relation size". I don't see any sense to show tables and indexes in one report.

Regards

Pavel


Could you please address those problems first?  The basic shape of the
patch with the three new sub-commands is fine I think, so we can go
ahead with that, but the two problems reported are blockers in my
opinion.
--
Michael

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Haozhou Wang
Дата:
Сообщение: Re: Control your disk usage in PG: Introduction to Disk Quota Extension
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: New function pg_stat_statements_reset_query() to reset statisticsof a specific query