Обсуждение: Re: [PATCHES] Adding fulldisjunctions to the contrib

Поиск
Список
Период
Сортировка

Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Bruce Momjian
Дата:
I am still waiting for someone to tell us that they would use this
capability for a real-world problem.

---------------------------------------------------------------------------

Tzahi Fadida wrote:
> On Friday 11 August 2006 07:18, Bruce Momjian wrote:
> > I have looked over this addition, and I think I finally understand it.
> > Given three tables, A, B, C, which join as A->B, B->C, C->A, you can
> > really join them as A->B->C, and A->C->B.  What full disjunction does is
> > to perform both of those joins, and return a one row for each join. Here
> 
> What it does is to return all the possible natural joins, i.e.:
> A
> B
> C
> A,B
> A,C
> ...
> A,B,C
> 
> And, it removes any redundant information so that if we have a tuple
> that already contains another tuple's information that tuple is discarded.
> Also, note that the full disjunction algorithm i implemented 
> is commonly used in cases where the scheme graph is cyclic 
> and thus, you cannot use natural full outer join
> to compute the FD.
> 
> Finally, you can FD(A,B,C,D,...) any number of relations (limited to 32 in
> the implementation) with no regard to the order between them.
> 
> A case study and comparison can be found here:
> http://www.technion.ac.il/~tzahi/soc.html
> 
> > is an example from the README:
> >
> >     Example of an input and output of a full disjunctions:
> >     INPUT:
> >
> >         --A---|---B---|---C--
> >         X---Y-|-Y---Z-|-X---Z
> >         a-|-b-|-b-|-c-|-a-|-d
> >
> >     A,B and C are relations. X,Y and Z are attributes. a,b,c and d are
> > values.
> >
> >     Note that A,B and C are connected in a cycle. That is:
> >     A is connected to B on attribute Y,
> >     B is connected to C on attribute Z,
> >     C is connected to A on attribute X.
> >
> >     The output of the full disjunctions FD(A,B,C):
> >
> >            FD
> >         X---Y---Z
> >         a-|-b-|-c
> >         a-|-b-|-d
> >
> > This code is pretty complex, so I can see why it should be in /contrib.
> > Are there reasonable use cases for this capability?
> >
> > ---------------------------------------------------------------------------
> >
> > Tzahi Fadida wrote:
> > > Hi,
> > > I wish to add the fulldisjunctions function to the contrib.
> > > With the help of Jonah, we (or rather he :) created a patch with
> > > regression tests. The function is finished programmatically but
> > > still a little more code documentation touches and improved error
> > > messages are needed. All the rest was extensively tested.
> > >
> > > Attached is the patch.
> > >
> > > Works great. Just compiled from a fresh cvs which i patched with the
> > > attached diff. ran the fulldijsjunction.sql in the
> > > share/contrib/fulldisjunction and let it run and it works great.
> > > 10x.
> > >
> > > --
> > > Regards,
> > > ????????Tzahi.
> > > --
> > > Tzahi Fadida
> > > Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
> > > WARNING TO SPAMMERS: ?see at
> > > http://members.lycos.co.uk/my2nis/spamwarning.html
> >
> > [ Attachment, skipping... ]
> >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 3: Have you checked our extensive FAQ?
> > >
> > >                http://www.postgresql.org/docs/faq
> 
> -- 
> Regards,
> ????????Tzahi.
> --
> Tzahi Fadida
> Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
> WARNING TO SPAMMERS: ?see at 
> http://members.lycos.co.uk/my2nis/spamwarning.html

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Tzahi Fadida
Дата:
On Saturday 12 August 2006 07:22, Bruce Momjian wrote:
> I am still waiting for someone to tell us that they would use this
> capability for a real-world problem.

I suggest looking into web applications.
The example here
http://www.technion.ac.il/~tzahi/soc.html

shows a possible 3 separate web resources.
I.e. heterogeneous sources. Naturally, since the sources
did not know each other in advance, they did not form
relations that would not end up cyclic in the scheme graph.
XMLs are usually like these. Obviously you have to turn them into
relations first of course.
In addition, i have recently added a feature where you give alias to column
names so if you have "country" column and a "state" column that really means
country, you can do "country=public.relation_with_state.state,..." dictionary
style. This is commonly needed in web applications.

Here is another example (improvising :) ):
site1: user_name,email,favorite_book_isbn
site2: user_name,email,favorite_chat_room
site3: user_name,credit_card

So, let's say i wanted to advertise discounts using a certain credit card
for certain books, i would do FD(site1,site2,site3).
Natural join will give - so you get data on people who read some books and
visit certain chat rooms and users credit cards.
FD will give - some people did not buy books but have a credit card and a chat room so you want to advertise anyway.
Somepeople did buy books and usesa certain credit cards but you don't know where they chat, however,you know you want
toadv some best seller that most buy anyway.certain people did buy books and visit chat rooms but you can't offera
specificdiscount, so you will advertise all credit cards.... 

However, caution. FD is a very,very expensive operation even with the new
algorithms so it is best to do FD separately and put the results into a table
and use that table. Unless of course, as common to web applications, the
relations are quite small (few thousands of rows) and they don't connect
strongly. In this cases, on my p1.6 it comes out about 2-3 secs.
However, i can generate the same experiment with strong connectivity
between the relations and it can take hours to compute.
On the other hand i have seen experiments with 100 thousans of records
that finished in a matter of minutes so it all depends on how many join
combination there are in the data.

>
> ---------------------------------------------------------------------------
>
> Tzahi Fadida wrote:
> > On Friday 11 August 2006 07:18, Bruce Momjian wrote:
> > > I have looked over this addition, and I think I finally understand it.
> > > Given three tables, A, B, C, which join as A->B, B->C, C->A, you can
> > > really join them as A->B->C, and A->C->B.  What full disjunction does
> > > is to perform both of those joins, and return a one row for each join.
> > > Here
> >
> > What it does is to return all the possible natural joins, i.e.:
> > A
> > B
> > C
> > A,B
> > A,C
> > ...
> > A,B,C
> >
> > And, it removes any redundant information so that if we have a tuple
> > that already contains another tuple's information that tuple is
> > discarded. Also, note that the full disjunction algorithm i implemented
> > is commonly used in cases where the scheme graph is cyclic
> > and thus, you cannot use natural full outer join
> > to compute the FD.
> >
> > Finally, you can FD(A,B,C,D,...) any number of relations (limited to 32
> > in the implementation) with no regard to the order between them.
> >
> > A case study and comparison can be found here:
> > http://www.technion.ac.il/~tzahi/soc.html
> >
> > > is an example from the README:
> > >
> > >     Example of an input and output of a full disjunctions:
> > >     INPUT:
> > >
> > >         --A---|---B---|---C--
> > >         X---Y-|-Y---Z-|-X---Z
> > >         a-|-b-|-b-|-c-|-a-|-d
> > >
> > >     A,B and C are relations. X,Y and Z are attributes. a,b,c and d are
> > > values.
> > >
> > >     Note that A,B and C are connected in a cycle. That is:
> > >     A is connected to B on attribute Y,
> > >     B is connected to C on attribute Z,
> > >     C is connected to A on attribute X.
> > >
> > >     The output of the full disjunctions FD(A,B,C):
> > >
> > >            FD
> > >         X---Y---Z
> > >         a-|-b-|-c
> > >         a-|-b-|-d
> > >
> > > This code is pretty complex, so I can see why it should be in /contrib.
> > > Are there reasonable use cases for this capability?
> > >
> > > -----------------------------------------------------------------------
> > >----
> > >
> > > Tzahi Fadida wrote:
> > > > Hi,
> > > > I wish to add the fulldisjunctions function to the contrib.
> > > > With the help of Jonah, we (or rather he :) created a patch with
> > > > regression tests. The function is finished programmatically but
> > > > still a little more code documentation touches and improved error
> > > > messages are needed. All the rest was extensively tested.
> > > >
> > > > Attached is the patch.
> > > >
> > > > Works great. Just compiled from a fresh cvs which i patched with the
> > > > attached diff. ran the fulldijsjunction.sql in the
> > > > share/contrib/fulldisjunction and let it run and it works great.
> > > > 10x.
> > > >
> > > > --
> > > > Regards,
> > > > ????????Tzahi.
> > > > --
> > > > Tzahi Fadida
> > > > Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
> > > > WARNING TO SPAMMERS: ?see at
> > > > http://members.lycos.co.uk/my2nis/spamwarning.html
> > >
> > > [ Attachment, skipping... ]
> > >
> > > > ---------------------------(end of
> > > > broadcast)--------------------------- TIP 3: Have you checked our
> > > > extensive FAQ?
> > > >
> > > >                http://www.postgresql.org/docs/faq
> >
> > --
> > Regards,
> > ????????Tzahi.
> > --
> > Tzahi Fadida
> > Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
> > WARNING TO SPAMMERS: ?see at
> > http://members.lycos.co.uk/my2nis/spamwarning.html

--
Regards,
        Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at
http://members.lycos.co.uk/my2nis/spamwarning.html


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
AgentM
Дата:
On Aug 12, 2006, at 6:01 , Tzahi Fadida wrote:

> On Saturday 12 August 2006 07:22, Bruce Momjian wrote:
>> I am still waiting for someone to tell us that they would use this
>> capability for a real-world problem.

Notice that if you google "full disjunction" that the first link is  
this project.

You won't find anyone to vouch for it because this is the first  
implementation of full disjunctions in any database. That doesn't  
mean it isn't useful- it means no one is using it because it hasn't  
existed until now.

This is the point where one needs to decide whether PostgreSQL is a  
copier of features from other databases or whether it can lead with a  
few unique features of its own.


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Bruce Momjian
Дата:
AgentM wrote:
> 
> On Aug 12, 2006, at 6:01 , Tzahi Fadida wrote:
> 
> > On Saturday 12 August 2006 07:22, Bruce Momjian wrote:
> >> I am still waiting for someone to tell us that they would use this
> >> capability for a real-world problem.
> 
> Notice that if you google "full disjunction" that the first link is  
> this project.
> 
> You won't find anyone to vouch for it because this is the first  
> implementation of full disjunctions in any database. That doesn't  
> mean it isn't useful- it means no one is using it because it hasn't  
> existed until now.
> 
> This is the point where one needs to decide whether PostgreSQL is a  
> copier of features from other databases or whether it can lead with a  
> few unique features of its own.

OK, that is helpful.  Now, does any current user think they will use
full disjunctions?  Is that a fair question?

The point is not whether it should work with PostgreSQL, but whether we
ship it in /contrib, or it is on pgfoundry.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Tom Lane
Дата:
AgentM <agentm@themactionfaction.com> writes:
> You won't find anyone to vouch for it because this is the first  
> implementation of full disjunctions in any database. That doesn't  
> mean it isn't useful- it means no one is using it because it hasn't  
> existed until now.

> This is the point where one needs to decide whether PostgreSQL is a  
> copier of features from other databases or whether it can lead with a  
> few unique features of its own.

Somewhere along here we need to remember that "most new ideas are bad".

More seriously: the current state of affairs is that the
full-disjunction code exists as a pgfoundry project.  If it's indeed the
second greatest thing since sliced bread, then I think we could assume
that people will find it and use it from pgfoundry.  The question that's
on the table is whether it needs to be in contrib right now.  I have not
seen either a technical argument or popularity argument why it ought to
move into contrib.
        regards, tom lane


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
"Jonah H. Harris"
Дата:
On 8/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> More seriously: the current state of affairs is that the
> full-disjunction code exists as a pgfoundry project.  If it's indeed the
> second greatest thing since sliced bread, then I think we could assume
> that people will find it and use it from pgfoundry.

That goes back to assuming people not only know about pgfoundry, but
are similarly willing to search it.

> The question that's on the table is whether it needs to be in contrib right now.
> I have not seen either a technical argument or popularity argument why it
> ought to move into contrib.

In addition to knowing that Tzahi has put a *very* significant amount
of work into his research as well as this code over the past few
months, I have to agree with several items stated by "Agent M".

This is the *first* implementation of this concept in any database
system, so there's not going to be anyone jumping up and down singing
it's praises just yet.  However, when people do get a chance to play
with it, I believe we'll have a number of them saying how useful it
is.  There are several contrib modules still included in the system
that aren't that heavily used... I don't see the harm in including
this one for at least this release.  If no one uses it, take it out
for 8.3.

IMHO, this is just a really cool piece of technology that provides
functionality which can't be done any other way; why not give it a
chance?

-- 
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 2nd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Bruce Momjian
Дата:
Jonah H. Harris wrote:
> On 8/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > More seriously: the current state of affairs is that the
> > full-disjunction code exists as a pgfoundry project.  If it's indeed the
> > second greatest thing since sliced bread, then I think we could assume
> > that people will find it and use it from pgfoundry.
> 
> That goes back to assuming people not only know about pgfoundry, but
> are similarly willing to search it.
> 
> > The question that's on the table is whether it needs to be in contrib right now.
> > I have not seen either a technical argument or popularity argument why it
> > ought to move into contrib.
> 
> In addition to knowing that Tzahi has put a *very* significant amount
> of work into his research as well as this code over the past few
> months, I have to agree with several items stated by "Agent M".
> 
> This is the *first* implementation of this concept in any database
> system, so there's not going to be anyone jumping up and down singing
> it's praises just yet.  However, when people do get a chance to play
> with it, I believe we'll have a number of them saying how useful it
> is.  There are several contrib modules still included in the system
> that aren't that heavily used... I don't see the harm in including
> this one for at least this release.  If no one uses it, take it out
> for 8.3.
> 
> IMHO, this is just a really cool piece of technology that provides
> functionality which can't be done any other way; why not give it a
> chance?

Our distribution is not a place to experiment with things.  That's what
separate pgfoundry projects are for.  The fact we have some unusual
things in /contrib is not a reason to add more.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
David Fetter
Дата:
On Sun, Aug 13, 2006 at 10:07:06AM -0400, Bruce Momjian wrote:
> Jonah H. Harris wrote:
> > On 8/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > More seriously: the current state of affairs is that the
> > > full-disjunction code exists as a pgfoundry project.  If it's
> > > indeed the second greatest thing since sliced bread, then I
> > > think we could assume that people will find it and use it from
> > > pgfoundry.
> > 
> > That goes back to assuming people not only know about pgfoundry,
> > but are similarly willing to search it.
> > 
> > > The question that's on the table is whether it needs to be in
> > > contrib right now.  I have not seen either a technical argument
> > > or popularity argument why it ought to move into contrib.
> > 
> > In addition to knowing that Tzahi has put a *very* significant
> > amount of work into his research as well as this code over the
> > past few months, I have to agree with several items stated by
> > "Agent M".
> > 
> > This is the *first* implementation of this concept in any database
> > system, so there's not going to be anyone jumping up and down
> > singing it's praises just yet.  However, when people do get a
> > chance to play with it, I believe we'll have a number of them
> > saying how useful it is.  There are several contrib modules still
> > included in the system that aren't that heavily used... I don't
> > see the harm in including this one for at least this release.  If
> > no one uses it, take it out for 8.3.
> > 
> > IMHO, this is just a really cool piece of technology that provides
> > functionality which can't be done any other way; why not give it a
> > chance?
> 
> Our distribution is not a place to experiment with things.  That's
> what separate pgfoundry projects are for.  The fact we have some
> unusual things in /contrib is not a reason to add more.

If it's on track to become part of PostgreSQL, as other innovative
features have in the past, it very much does belong there.  Why
marginalize the very thing that PostgreSQL is really good
at--innovative new features--by putting it somewhere where few people
will ever even see it?

If there were some very, very clear language every place a person
could download, check references, or install PostgreSQL that new
experimental features are at pgFoundry, that might be different.  As
it is, you have to be truly dedicated even to discover that pgFoundry
exists.

Let's get full disjunctions in contrib with a good README and have
people figure out what to do with them from there.  If no one demands
full inclusion in a couple of versions, let's take it out.

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Bruce Momjian
Дата:
David Fetter wrote:
> > Our distribution is not a place to experiment with things.  That's
> > what separate pgfoundry projects are for.  The fact we have some
> > unusual things in /contrib is not a reason to add more.
> 
> If it's on track to become part of PostgreSQL, as other innovative
> features have in the past, it very much does belong there.  Why
> marginalize the very thing that PostgreSQL is really good
> at--innovative new features--by putting it somewhere where few people
> will ever even see it?
> 
> If there were some very, very clear language every place a person
> could download, check references, or install PostgreSQL that new
> experimental features are at pgFoundry, that might be different.  As
> it is, you have to be truly dedicated even to discover that pgFoundry
> exists.
> 
> Let's get full disjunctions in contrib with a good README and have
> people figure out what to do with them from there.  If no one demands
> full inclusion in a couple of versions, let's take it out.

Where does it stop, then?  Do we have everything in /contrib.  I don't
see how this scales.  When we took the code from Berkely, it had
everyone's doctoral thesis in there, and we had to remove alot of it
because it was just too messy.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
David Fetter
Дата:
On Sun, Aug 13, 2006 at 11:45:43AM -0400, Bruce Momjian wrote:
> David Fetter wrote:
> > > Our distribution is not a place to experiment with things.
> > > That's what separate pgfoundry projects are for.  The fact we
> > > have some unusual things in /contrib is not a reason to add
> > > more.
> > 
> > If it's on track to become part of PostgreSQL, as other innovative
> > features have in the past, it very much does belong there.  Why
> > marginalize the very thing that PostgreSQL is really good
> > at--innovative new features--by putting it somewhere where few
> > people will ever even see it?
> > 
> > If there were some very, very clear language every place a person
> > could download, check references, or install PostgreSQL that new
> > experimental features are at pgFoundry, that might be different.
> > As it is, you have to be truly dedicated even to discover that
> > pgFoundry exists.
> > 
> > Let's get full disjunctions in contrib with a good README and have
> > people figure out what to do with them from there.  If no one
> > demands full inclusion in a couple of versions, let's take it out.
> 
> Where does it stop, then?  Do we have everything in /contrib.  I
> don't see how this scales.  When we took the code from Berkely, it
> had everyone's doctoral thesis in there, and we had to remove alot
> of it because it was just too messy.

If it were just me laying out the boundary, I'd say that anything that
changes the grammar of SQL--for example, adding FULL
DISJUNCTION--can't really be a viable trial outside the main
distribution channels and deserves a couple of versions' stay in one
of those channels if it passes the scrutiny of -hackers.  I'd love to
see a "main distribution channel" that's not contrib, but that's for
the future and full disjunctions are now. :)

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Tom Lane
Дата:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> I don't see the harm in including this one for at least this release.
> If no one uses it, take it out for 8.3.

Once stuff is in contrib, it tends to stay there.  The above argument
is completely disingenuous --- we'd have to have the same argument
again at the end of the 8.3 cycle, only then we'd already have expended
a development cycle's worth of maintenance work on a quite-large module.

There is quite a lot of stuff in contrib that the core committee
wouldn't accept nowadays ... it got in when there wasn't any alternative
such as pgfoundry.  These days I think something has to be pretty
clearly useful to a wide variety of people before we'll accept it into
contrib, and I'm not seeing that that case has been made for
fulldisjunctions.  FD may be cool, but coolness isn't a (sufficient)
criterion.
        regards, tom lane


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Andrew Dunstan
Дата:

David Fetter wrote:
> If it were just me laying out the boundary, I'd say that anything that
> changes the grammar of SQL--for example, adding FULL
> DISJUNCTION--can't really be a viable trial outside the main
> distribution channels and deserves a couple of versions' stay in one
> of those channels if it passes the scrutiny of -hackers.  I'd love to
> see a "main distribution channel" that's not contrib, but that's for
> the future and full disjunctions are now. :)
>
>
>   

As I understand it, FD as implemented does not require a grammar change. 
Arguably, a more complete implementation would have support at the SQL 
level, and then it would have to go in core, without question.

cheers

andrew


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> If it were just me laying out the boundary, I'd say that anything that
> changes the grammar of SQL--for example, adding FULL
> DISJUNCTION--can't really be a viable trial outside the main
> distribution channels and deserves a couple of versions' stay in one
> of those channels if it passes the scrutiny of -hackers.

Well, one of the things I don't especially like about this patch is
exactly that it doesn't change the grammar (it can't really, as a
contrib module :-().  There's no way that it would get into core without
a different API and probably a complete code rewrite.  So what we've got
here, at bottom, is a toy prototype that might serve for people to
experiment with the feature and find out whether it's useful or not ---
but it's not code that could be mainstream with just a bit more
maturity.

Perhaps contrib/dblink is a useful comparison point; that code will
never get into core in its current form either.  The reason it's in
contrib is that the use-case is so compelling that we're willing to
accept it even though we all know it's pretty klugy.

The case for FD seems to be basically "if you build it they will come",
and I'm sorry but I'm not sold.  If it gets some traction as a pgfoundry
project then we could look at doing a second-generation implementation
in a form that could actually get into core... but until then I'm
inclined to see it as an academic curiosity.
        regards, tom lane


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Josh Berkus
Дата:
Bruce Momjian wrote:
> I am still waiting for someone to tell us that they would use this
> capability for a real-world problem.

It's extremely useful for data mining and data consolidation where 
you're given messy or sparse data to "clean up" and present intelligently.

For example, if it had existed 4 years ago, I would have used it for 
importing ELBS data from the UDF table (with lots of null rows) into 
PostgreSQL.

--Josh


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Josh Berkus
Дата:
Tom,

> The case for FD seems to be basically "if you build it they will come",
> and I'm sorry but I'm not sold.  If it gets some traction as a pgfoundry
> project then we could look at doing a second-generation implementation
> in a form that could actually get into core... but until then I'm
> inclined to see it as an academic curiosity.

I've given my reason for wanting it in another post (data mining and 
conversion).  Let me say this additionally:  full disjunctions is an 
example of the kind of thing we need to have in order to defend our 
title as "the most advanced open source database".   Stuff like 
partitioning and PITR don't cut it anymore ... MySQL and Ingres have 
those.  We need to keep adding innovative features or we lose a great 
deal of our reason for existance as "yet another DBMS", and our ability 
to attract new, smart, original developers.

The reason why it makes sense for FD to be in /contrib is that if it 
works out it will be a new join type, which is definitely core-code stuff.

If QBE were ready, I'd be pushing for that too.  Now, if the statement 
is that FD is too buggy to include in contrib at this time, I'm happy to 
accept that, but I've not seen that argument.

--Josh Berkus


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
"Jonah H. Harris"
Дата:
On 8/13/06, Josh Berkus <josh@agliodbs.com> wrote:

My sentiments exactly.

-- 
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 2nd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> Bruce Momjian wrote:
>> I am still waiting for someone to tell us that they would use this
>> capability for a real-world problem.

> It's extremely useful for data mining and data consolidation where 
> you're given messy or sparse data to "clean up" and present intelligently.

Could we see a concrete, real-world example?  So far I've seen a lot of
arm-waving but nothing very specific.
        regards, tom lane


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> The reason why it makes sense for FD to be in /contrib is that if it 
> works out it will be a new join type, which is definitely core-code stuff.

You seem to have missed my point, which is that implementation as a new
join type would probably have nothing in common with the externally-coded
version.  The one and only reason for it to be in contrib instead of
on pgfoundry is that you think it will get more attention that way.
Which might be true, but it's a fairly weak argument for asking the
core developers to take on maintenance and bug-fixing for what is
ultimately going to be a dead-end code base.  This code will either die
for lack of interest, or be largely rewritten so it can go into core.
I don't pretend to know which will happen, but I see no technical
advantage to it being in contrib meanwhile.

> Let me say this additionally:  full disjunctions is an 
> example of the kind of thing we need to have in order to defend our 
> title as "the most advanced open source database".

[ shrug... ]  As an argument for having it in contrib instead of
pgfoundry, that impresses me not at all.  You could argue that the
ability to do this (even poorly) *without* any core code changes is
a far greater demonstration of Postgres' basic strength --- namely
extensibility --- than it would be in core.
        regards, tom lane


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Josh Berkus
Дата:
Tom,

> Could we see a concrete, real-world example?  So far I've seen a lot of
> arm-waving but nothing very specific.

Sure.  Imagine that you work for an arts nonprofit and you have 3 (or more) 
separate box office lists from last season, each of which has different 
amounts of contact information.  You want to get "best of" the information 
-- that is, address if it's available, zip if it's there, phone if it's 
there, etc.    FD will reduce that process from several procedural loops 
to a single query for the first pre-deduplication run.

Or, imagine that you have 5 weblogs in different formats from 5 different 
servers.   Due to the different logging/site design, you have and lack 
different information from each log.  You want to munge all of the data 
together to extract the maximum amount of data about each visitor you can 
get, without having multiple records per visit.

This supposition holds true for court records, customer records, etc ... 
anywhere you may have relational data with a high degree of incompleteness 
from different sources ... something I encountered on about 30% of all the 
DB development projects I worked on.

> You seem to have missed my point, which is that implementation as a new
> join type would probably have nothing in common with the
> externally-coded version.  The one and only reason for it to be in
> contrib instead of on pgfoundry is that you think it will get more
> attention that way. Which might be true, but it's a fairly weak argument
> for asking the core developers to take on maintenance and bug-fixing for
> what is ultimately going to be a dead-end code base.  

OK, point taken.  I'll admit that I had hopes for it for PR reasons, which 
is not usually why we make decisions.  It would be cool to be the first 
database system to ship with any implementation of Full Disjunctions, and 
I can't announce that if it's on pgFoundry.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Andrew Dunstan
Дата:

Josh Berkus wrote:
> I'll admit that I had hopes for it for PR reasons, which 
> is not usually why we make decisions.  It would be cool to be the first 
> database system to ship with any implementation of Full Disjunctions, and 
> I can't announce that if it's on pgFoundry.
>
>   

I don't see that having it on pgfoundry makes it less announceable. But 
if/when we get support at the SQL level, then we'll *really* have 
something worth announcing.

cheers

andrew


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> Josh Berkus wrote:
>> I'll admit that I had hopes for it for PR reasons, which 
>> is not usually why we make decisions.  It would be cool to be the first 
>> database system to ship with any implementation of Full Disjunctions, and 
>> I can't announce that if it's on pgFoundry.

> I don't see that having it on pgfoundry makes it less announceable. But 
> if/when we get support at the SQL level, then we'll *really* have 
> something worth announcing.

I think Andrew's first point here is spot-on.  We *must* take pgfoundry
seriously as part of the available technology for Postgres.  Otherwise
all the effort we've put into building up pgfoundry (and gborg before it)
was a waste of time.  Are Perl modules taken less seriously because
they're on CPAN rather than part of the minimal Perl distribution?
No, they're not.  That is the model that we've got to strive for,
because the core developers simply haven't got enough cycles to deal
with core Postgres development and the entire kitchen sink as well.

This is not just a matter of core-developer laziness, either.  The
concept of a cloud of useful code around a small core is something
I think is absolutely critical to PG's long-term success.  We have
a built-in advantage here because of PG's historical commitment to
extensibility.  Can you see Oracle, DB2, or MySQL operating that way?
No, you can't, because their core code is not open, or they have a
business need to control everything going on, or both.

We need to *exploit* our ability to support important outside-the-core
projects.  Not assume that anything outside core can't be important.
        regards, tom lane


Re: [PATCHES] Adding fulldisjunctions to the contrib

От
"Joshua D. Drake"
Дата:
> OK, point taken.  I'll admit that I had hopes for it for PR reasons, which 
> is not usually why we make decisions.  It would be cool to be the first 
> database system to ship with any implementation of Full Disjunctions, and 
> I can't announce that if it's on pgFoundry.
>
>   
You could announce it as an available module however.

Joshua D. Drake