Proposal: Clean up RangeTblEntry nodes after query preparation

Поиск
Список
Период
Сортировка
От Daniel Migowski
Тема Proposal: Clean up RangeTblEntry nodes after query preparation
Дата
Msg-id 523a68d3-88f0-a503-f34b-3dd21bd6cf19@ikoffice.de
обсуждение исходный текст
Список pgsql-hackers

Hello,

I currently have the problem that a simple prepared statement for a query like

    select * from vw_report_invoice where id = $1

results in 33MB memory consumption on my side. It is a query that does about >20 joins over partially wide tables, but only a very small subset of columns is really needed. I already argued if PreparedStatements contain all the metadata about all used tables and it turns out it is even worse (because that metadata is even copied into mem multiple times).

The reason for the absurd memory consumption are RangeTableEntrys which are created for every touched table and for every join set done. I printed the query lists created after preparation and found multiple RangeTableEntrys containing >4000 columns, including the names of the columns as well as a list of all columns types, even when only a very small subset is really required.

The minimum memory consumption is 46 bytes + the name of the column, guessing it will be 64+32? bytes when palloced, resulting in 400k memory for just one of the many RTEs in the large query where maybe 50 columns are really used.

One can test this easily. When I create two simple tables:

CREATE TABLE invoice (
    id serial PRIMARY KEY,
    number varchar(10),
    amount float8,
    customer_id int4
);

CREATE TABLE invoiceitem (
    id serial PRIMARY KEY,
    invoice_id int4,
    position int4,
    quantity float8,
    priceperunit float8,
    amount float8,
    description text
);

ALTER TABLE invoiceitem ADD CONSTRAINT fk_invoiceitem_invoice_id
    FOREIGN KEY (invoice_id) REFERENCES invoice(id);

And now I preparey a simple join over these tables:

PREPARE invoicequerytest AS
SELECT inv.id, inv.number, item.id, item.description
  FROM invoice inv
  LEFT JOIN invoiceitem item ON item.invoice_id = inv.id
 WHERE inv.id = $1;

The pprint-ed RTE for the join alone is this:

      {RTE
      :alias <>
      :eref
         {ALIAS
         :aliasname unnamed_join
         :colnames ("id" "number" "amount" "customer_id" "id" "invoice_id" "po
         sition" "quantity" "priceperunit" "amount" "description")
         }
      :rtekind 2
      :jointype 1
      :joinaliasvars (
         {VAR
         :varno 1
         :varattno 1
         :vartype 23
         :vartypmod -1
         :varcollid 0
         :varlevelsup 0
         :varnoold 1
         :varoattno 1
         :location -1
         }
         {VAR
         :varno 1
         :varattno 2
         :vartype 1043
         :vartypmod 14
         :varcollid 100
         :varlevelsup 0
         :varnoold 1
         :varoattno 2
         :location -1
         }
         {VAR
         :varno 1
         :varattno 3
         :vartype 701
         :vartypmod -1
         :varcollid 0
         :varlevelsup 0
         :varnoold 1
         :varoattno 3
         :location -1
         }
         {VAR
         :varno 1
         :varattno 4
         :vartype 23
         :vartypmod -1
         :varcollid 0
         :varlevelsup 0
         :varnoold 1
         :varoattno 4
         :location -1
         }
         {VAR
         :varno 2
         :varattno 1
         :vartype 23
         :vartypmod -1
         :varcollid 0
         :varlevelsup 0
         :varnoold 2
         :varoattno 1
         :location -1
         }
         {VAR
         :varno 2
         :varattno 2
         :vartype 23
         :vartypmod -1
         :varcollid 0
         :varlevelsup 0
         :varnoold 2
         :varoattno 2
         :location -1
         }
         {VAR
         :varno 2
         :varattno 3
         :vartype 23
         :vartypmod -1
         :varcollid 0
         :varlevelsup 0
         :varnoold 2
         :varoattno 3
         :location -1
         }
         {VAR
         :varno 2
         :varattno 4
         :vartype 701
         :vartypmod -1
         :varcollid 0
         :varlevelsup 0
         :varnoold 2
         :varoattno 4
         :location -1
         }
         {VAR
         :varno 2
         :varattno 5
         :vartype 701
         :vartypmod -1
         :varcollid 0
         :varlevelsup 0
         :varnoold 2
         :varoattno 5
         :location -1
         }
         {VAR
         :varno 2
         :varattno 6
         :vartype 701
         :vartypmod -1
         :varcollid 0
         :varlevelsup 0
         :varnoold 2
         :varoattno 6
         :location -1
         }
         {VAR
         :varno 2
         :varattno 7
         :vartype 25
         :vartypmod -1
         :varcollid 100
         :varlevelsup 0
         :varnoold 2
         :varoattno 7
         :location -1
         }
      )
      :lateral false
      :inh false
      :inFromCl true
      :requiredPerms 0
      :checkAsUser 0
      :selectedCols (b)
      :insertedCols (b)
      :updatedCols (b)
      :securityQuals <>
      }

It contains every col from both tables.

Useful cols: "id" "number" "id" "invoice_id" "description"

Useless because complety unreferenced cols: "amount" "customer_id" "position" "quantity" "priceperunit" "amount"

I believe one could easily drop the unrefernced names from the RTE as well as the Var nodes which would cut mem usage drastically.

Final questions: Is there a reason we don't just null the unused values from the RTEs? I would love to implement such a cleanup step. Or if null is not possible, just replace stuff with a simpler NOVAR node and replace names with empty strings?

I believe this would reduce mem usage for PreparedStatements by >90% at least here.

Regards,
Daniel Migowski

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Adam Lee
Дата:
Сообщение: Re: Memory-Bounded Hash Aggregation
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Hash join explain is broken