Обсуждение: ORM integration?

Поиск
Список
Период
Сортировка

ORM integration?

От
Craig Ringer
Дата:
Hi folks

I know many people here loathe ORM systems. I'm one of them, but I still
use them when they appear to be appropriate, despite their problems.

In the process I've come to realize that ORMs in general have a couple
of issues that could be avoided with some help from the database.
Specifically:

- They need to fetch graphs of records that keep records associated
  with peers in relationships. Essentially they want nested sets.
  Ideally they need to be able to do this with a WHERE or
  LIMIT/OFFSET on the "root" relation, so they can do batch
  fetching of blocks of records in contexts where holding a
  transaction open (permitting cursor use) isn't appropriate.

  Currently ORMs do this by doing multiple LEFT OUTER JOINs and
  post-processing the results to eliminate duplication of data.
  Needless to say this is incredibly inefficient. It also makes
  using LIMIT/OFFSET nigh impossible, so they often fetch the whole
  data set into local memory (!!) even if the app only asks for a
  small fragment of it.

  A native way to fetch a query's results as a nested set, as
  (say) XML or JSON, would potentially be a huge bonus if ORM
  systems could be convinced to use it. It's potentially possible
  already with use of nested subqueries and array_agg.

  I've even been idly playing with the idea of using
  PL/Java to build a Java object graph in memory and send that
  to the client!

  I'm wondering  if anyone's been down  the path of building a
  more ORM-friendly relation graph representation database-side
  and sending it to the client.


- ORMs tend to lack cache coherence. They generally maintain a cache
  of records fetched from the database - partly because their fetching
  is so expensive (as noted above) and partly just as a general
  performance optimisation. The problem is that the database has no
  way to invalidate the ORM's cache of a particular record when changes
  are made to it in the database, so the ORM's cache tends to get out
  of sync with the database.

  In my code I can work around that by turning the blasted thing off.
  I don't need the kind of performance that the cache is intended for.
  Others do need it, and I'm curious about whether anyone's looked into
  approaches to help integrate the caches in ORMs like Hibernate with
  the backend database to keep the cache consistent.

  I'm currently thinking that the upgraded LISTEN/NOTIFY mechanism
  in 9.0 might be a good channel for sending cache invalidation
  messages with. Thoughts? Ideas?


--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: ORM integration?

От
Adrian von Bidder
Дата:
Heyho!

On Friday 13 August 2010 08.52:30 Craig Ringer wrote:
[ ... ORMs ... ]

I wonder if it were worthwhile to collect information on various ORMs on the
postgres wiki.  Not to duplicate the ORM's documentation, but to show the
most important highlights and pitfalls when using this or that ORM against
PostgreSQL (and perhaps: how well is the support if some non-standard
postgres extensions should be used...)

cheers
-- vbi

--
How to overclock the board to the attachment from Windows?

You should telnet from the floppy disk and from the tools menu inside
Netscape you either never have to log from the POP3 miditower, or can't
debug a clock of a OpenGL file of a software of a BIOS in order to
explore the editor.

Вложения

Re: ORM integration?

От
Peter Hunsberger
Дата:
On Fri, Aug 13, 2010 at 1:52 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> Hi folks
>
> I know many people here loathe ORM systems. I'm one of them, but I still
> use them when they appear to be appropriate, despite their problems.
>
> In the process I've come to realize that ORMs in general have a couple
> of issues that could be avoided with some help from the database.
> Specifically:
>
> - They need to fetch graphs of records that keep records associated
>  with peers in relationships. Essentially they want nested sets.
>  Ideally they need to be able to do this with a WHERE or
>  LIMIT/OFFSET on the "root" relation, so they can do batch
>  fetching of blocks of records in contexts where holding a
>  transaction open (permitting cursor use) isn't appropriate.
>
>  Currently ORMs do this by doing multiple LEFT OUTER JOINs and
>  post-processing the results to eliminate duplication of data.
>  Needless to say this is incredibly inefficient. It also makes
>  using LIMIT/OFFSET nigh impossible, so they often fetch the whole
>  data set into local memory (!!) even if the app only asks for a
>  small fragment of it.
>
>  A native way to fetch a query's results as a nested set, as
>  (say) XML or JSON, would potentially be a huge bonus if ORM
>  systems could be convinced to use it. It's potentially possible
>  already with use of nested subqueries and array_agg.
>
>  I've even been idly playing with the idea of using
>  PL/Java to build a Java object graph in memory and send that
>  to the client!
>
>  I'm wondering  if anyone's been down  the path of building a
>  more ORM-friendly relation graph representation database-side
>  and sending it to the client.
>
>
> - ORMs tend to lack cache coherence. They generally maintain a cache
>  of records fetched from the database - partly because their fetching
>  is so expensive (as noted above) and partly just as a general
>  performance optimisation. The problem is that the database has no
>  way to invalidate the ORM's cache of a particular record when changes
>  are made to it in the database, so the ORM's cache tends to get out
>  of sync with the database.
>
<snip/>

We do not use an ORM, but we do do very generalized graphs of metadata
in the database, with the end results very similar to what you
describe.  I've written about this a bit on this list and won't repeat
myself other than to say once more, that we use the set / subset model
popularized by Joe Celko.  We decompose the graphs into trees and use
in-order tree traversal with individual tree nodes sharing a common FK
to identify them in the forest.  A single tree is a single flat query.
Our largest individual trees run to 60,000 nodes or so.  Graphs are
built and managed in memory. We do cache because, at the moment, the
construction of the metadata that is pointed to by these trees is
expensive (long story, unrelated to the issues you're talking about).
However, there is a task to optimize that and I suspect that we won't
need caching once we're done (though we may keep it since it is in
place and working).  All tree management is through the same code that
creates the in memory graphs in the first place so cache invalidation
is not an issue.

In the end we have a sort of ORM, however, we can add new tables to
our schema (with some fairly strict limitations) and define the
metadata to manage the CRUD operations on it all at run time.  Is this
applicable in general?  Probably but the code is not trivial...
Could we benefit from hooks into the database for this?  Perhaps,
datatypes that explicitly knew they had the set / subset relationship
to each other could allow moving big chunks (if not all) of the tree
management out of the Java code and into some form of operation within
the database.  Thinking out loud here, I guess that could be generally
useful, it seems you'd end up with a sort of ltree but I think to move
the generalization this far into the database means that the tree
nodes are managing (possibly poorly defined) FK relationships.  That's
probably ok for this kind of stuff, there are ways to get the
equivalent of  strong typing back either on the DB side or at run
time.  You're essentially end up hacking a relational database to
support network database type operations, so to the extent that people
need that you've got something useful...

--
Peter Hunsberger

Re: ORM integration?

От
Peter Hunsberger
Дата:
On Fri, Aug 13, 2010 at 1:52 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
>  I'm currently thinking that the upgraded LISTEN/NOTIFY mechanism
>  in 9.0 might be a good channel for sending cache invalidation
>  messages with. Thoughts? Ideas?
>

Forgot to comment on this.  Since we've already got our cache
management in place we don't need this, but yeah, this could have
simplified things in many ways.  But, what's the mechanism / transport
for the notification? MQ type stuff <eek>?

--
Peter Hunsberger