Re: Inheritance

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Inheritance
Дата
Msg-id CAMsr+YFGkM1dxDmqunBxufgRsWUyyxkyg=+mzF1988u_2HO=-w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Inheritance  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы Re: Inheritance  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-hackers
On 24 May 2016 at 22:45, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
There is one aspect of inheritance support which was not mentioned: polymorphic queries.
Actually polymorphism is the fundamental feature of OOP, without it there is no behavioral inheritance and inheritance can be considered just as "syntax sugar" for sharing some common subset of attributes between tables.

The main problem with supporting polymorphic queries is that SQL query returns set of tuples, not set of objects.
So there is no nice way to return both instances of based and derived tables. There are several alternatives
(for example return joined set of attributes in all derived tables, leaving missed as NULLs) but none of them is good.

Exactly. We have a sort-of-object-ish storage option, but none of the surrounding stuff to make it useful for actual OO / object-relational work.

The "joined set of attributes" approach is exactly what ORMs already do, and many direct implementations of the same idea will use too. So we'd offer no advantage over what they already do in a way that works with multiple DBMSes, except we might be able to do it faster. Maybe.

The lack of polymorphism is critical. It's not really usefully OO but it costs you important relational features if you use it. We have some very limited polymorphism in the sense that you can query the parent table and see rows in child tables, but you only get the subset of cols that exists at that level of the heirarchy.

One thing I'd like to explore one day is a nice, user-friendly way to express "SELECT this row and the corresponding sets of rows from [these tables and their children in turn] as a structured object". Right now users have to write series of LEFT JOINs and de-duplicate the left-hand sides. Or do multiple queries (n+1 selects), possibly expensively with repeated join work involved. Or they have to write pretty baroque queries to construct a jsonb object with jsonb_agg with multiple levels of group-by in subqueries-in-from. We should be able to do this for them, so they can say

SELECTOBJECT customer
  CHILD JOIN invoice ON (customer.customer_id = invoice.customer_id AND invoice_date > $1)
  CHILD JOIN invoiceline USING (invoice_id)
  CHILD JOIN address USING (customer_id)
WHERE customer.in_debt_collections;

instead of the current nested mess of aggregation and subqueries needed, like:

SELECT
    to_jsonb(customer) || (
      SELECT jsonb_build_object('invoices', jsonb_agg(invoice_obj))
      FROM (
        SELECT to_jsonb(invoice) || jsonb_build_object('invoice_lines', jsonb_agg(invoice_line))
        FROM invoice
        LEFT OUTER JOIN invoice_line ON (invoice.invoice_id = invoice_line.invoice_id)
        WHERE invoice.customer_id = customer.customer_id AND invoice_date >= current_date
        GROUP BY invoice.invoice_id
      ) invoice_obj
    ) || (
      SELECT jsonb_build_object('addresses', jsonb_agg(address))
      FROM address
      WHERE address.customer_id = customer.customer_id
    )
FROM customer
WHERE customer.in_debt_collections

which is just pure joy to read, and gets even more contorted as the layers of parent/child relationships get deeper. The resulting query plan clearly expresses the desired result, but writing it in SQL is horrible:

                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on customer  (cost=0.00..130226.27 rows=1270 width=64)
   SubPlan 1
     ->  Aggregate  (cost=77.47..77.48 rows=1 width=24)
           ->  Subquery Scan on invoice_obj  (cost=77.37..77.46 rows=3 width=24)
                 ->  HashAggregate  (cost=77.37..77.43 rows=3 width=104)
                       Group Key: invoice.invoice_id
                       ->  Hash Right Join  (cost=50.84..77.36 rows=3 width=104)
                             Hash Cond: (invoice_line.invoice_id = invoice.invoice_id)
                             ->  Seq Scan on invoice_line  (cost=0.00..22.00 rows=1200 width=68)
                             ->  Hash  (cost=50.80..50.80 rows=3 width=40)
                                   ->  Seq Scan on invoice  (cost=0.00..50.80 rows=3 width=40)
                                         Filter: ((customer_id = customer.customer_id) AND (invoice_date >= ('now'::cstring)::date))
   SubPlan 2
     ->  Aggregate  (cost=25.02..25.03 rows=1 width=32)
           ->  Seq Scan on address  (cost=0.00..25.00 rows=6 width=32)
                 Filter: (customer_id = customer.customer_id)
(16 rows)


Maybe grouping sets can help avoid the nested joins, but I couldn't figure out how without wrapping the grouping set output query in another query to aggregate the produced objects into a top level one.

Inheritance does nothing to help with this.
 
 
But I do not think that presence of ORM excludes necessity to have internal support of OO in DBMS.


I'd agree if it was a finished and general OO feature. I just don't think what we have now is.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Inheritance
Следующее
От: Kouhei Kaigai
Дата:
Сообщение: Does people favor to have matrix data type?