Обсуждение: [HACKERS] GSoC 2017

Поиск
Список
Период
Сортировка

[HACKERS] GSoC 2017

От
Alexander Korotkov
Дата:
Hi all!

In 2016 PostgreSQL project didn't pass to GSoC program.  In my understanding the reasons for that are following.

1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been changed.  In particular, it required more detailed information about possible project.

As result we didn't manage to make a good enough application that time.  Thus, our application was declined. See [1] and [2] for details.

I think that the right way to manage this in 2017 would be to start collecting required information in advance.  According to GSoC 2017 timeline [3] mentoring organization can submit their applications from January 19 to February 9.  Thus, now it's a good time to start collecting project ideas and make call for mentors.  Also, we need to decide who would be our admin this year.

In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?

BTW, I'm ready to be mentor this year.  I'm also open to be an admin if needed.


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017

От
Atri Sharma
Дата:
Count me in as a mentor

On 10-Jan-2017 3:24 PM, "Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
Hi all!

In 2016 PostgreSQL project didn't pass to GSoC program.  In my understanding the reasons for that are following.

1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been changed.  In particular, it required more detailed information about possible project.

As result we didn't manage to make a good enough application that time.  Thus, our application was declined. See [1] and [2] for details.

I think that the right way to manage this in 2017 would be to start collecting required information in advance.  According to GSoC 2017 timeline [3] mentoring organization can submit their applications from January 19 to February 9.  Thus, now it's a good time to start collecting project ideas and make call for mentors.  Also, we need to decide who would be our admin this year.

In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?

BTW, I'm ready to be mentor this year.  I'm also open to be an admin if needed.


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017

От
Andrew Borodin
Дата:
2017-01-10 14:53 GMT+05:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
> 1. What project ideas we have?

Hi!
I'd like to propose project on sorting algorithm research. I’m ready
to be a mentor on this project.

===Topic===
Sorting algorithms benchmark and implementation.

===Idea===
Currently the PostgreSQL uses Hoare’s Quicksort implementation based
on work of Bentley and McIlroy [1] from 1993, while there exist some
more novel algorithms [2], [3], and [4] which are actively used by
highly optimized code like Java and .NET. Probably, use of optimized
sorting algorithm could yield general system performance improvement.
Also, use of non-comparison based algorithms deserves attention and
benchmarking [5].

===Project details===
The project has four essential parts:
1.       Implementation of benchmark for sorting. Making sure that
operations using sorting are represented proportionally to some
“average” use cases.
2.       Selection of benchmark algorithms. Selection can be based,
for example, on scientific papers or community opinions.
3.       Benchmark implementation of selected algorithms. Analysis of
results, picking of winner.
4.       Industrial implementation for pg_qsort(), pg_qsort_args() and
gen_qsort_tuple.pl. Implemented patch is submitted to commitfest,
other patch is reviewed by the student.

[1] Bentley, Jon L., and M. Douglas McIlroy. "Engineering a sort
function." Software: Practice and Experience 23.11 (1993): 1249-1265.
[2] Musser, David R. "Introspective sorting and selection algorithms."
Softw., Pract. Exper. 27.8 (1997): 983-993.
[3] Auger, Nicolas, Cyril Nicaud, and Carine Pivoteau. "Merge
Strategies: from Merge Sort to TimSort." (2015).
[4] Beniwal, Sonal, and Deepti Grover. "Comparison of various sorting
algorithms: A review." International Journal of Emerging Research in
Management &Technology 2 (2013).
[5] Mcllroy, Peter M., Keith Bostic, and M. Douglas Mcllroy.
"Engineering radix sort." Computing systems 6.1 (1993): 5-27.

Best regards, Andrey Borodin.



Re: [HACKERS] GSoC 2017

От
Andrew Borodin
Дата:
2017-01-10 14:53 GMT+05:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
> 1. What project ideas we have?

I have one more project of interest which I can mentor.

Topic. GiST API advancement

===Idea===
GiST API was designed at the beginning of 90th to reduce boilerplate
code around data access methods over balanced tree. Now, after 30
years, there are some ideas on improving this API.

===Project details===
Opclass developer must specify 4 core operations to make a type GiST-indexable:
1. Split: a function to split set of datatype instances into two parts.
2. Penalty calculation: a function to measure penalty for unification
of two keys.
3. Collision check: a function which determines whether two keys may
have overlap or are not intersecting.
4. Unification: a function to combine two keys into one so that
combined key collides with both input keys.

Functions 2 and 3 can be improved.
For example, Revised R*-tree[1] algorithm of insertion cannot be
expressed in terms of penalty-based algorithms. There was some
attempts to bring parts of RR*-tree insertion, but they come down to
ugly hacks [2]. Current GiST API, due to penalty-based insertion
algorithm, does not allow to implement important feature of RR*-tree:
overlap optimization. As Norbert Beckman, author of RR*-tree, put it
in discussion: “Overlap optimization is one of the main elements, if
not the main query performance tuning element of the RR*-tree. You
would fall back to old R-Tree times if that would be left off.”

Collision check currently returns binary result:
1.       Query may be collides with subtree MBR
2.       Query do not collides with subtree
This result may be augmented with a third state: subtree is totally
within query. In this case GiST scan can scan down subtree without key
checks.

Potential effect of these improvements must be benchmarked. Probably,
implementation of these two will spawn more ideas on GiST performance
improvements.

Finally, GiST do not provide API for bulk loading. Alexander Korotkov
during GSoC 2011 implemented buffered GiST build. This index
construction is faster, but yields the index tree with virtually same
querying performance. There are different algorithms aiming to provide
better indexing tree due to some knowledge of data, e.g. [3]


[1] Beckmann, Norbert, and Bernhard Seeger. "A revised r*-tree in
comparison with related index structures." Proceedings of the 2009 ACM
SIGMOD International Conference on Management of data. ACM, 2009.
[2]
https://www.postgresql.org/message-id/flat/CAJEAwVFMo-FXaJ6Lkj8Wtb1br0MtBY48EGMVEJBOodROEGykKg%40mail.gmail.com#CAJEAwVFMo-FXaJ6Lkj8Wtb1br0MtBY48EGMVEJBOodROEGykKg@mail.gmail.com
[3] Achakeev, Daniar, Bernhard Seeger, and Peter Widmayer. "Sort-based
query-adaptive loading of r-trees." Proceedings of the 21st ACM
international conference on Information and knowledge management. ACM,
2012.

Best regards, Andrey Borodin.



Re: [HACKERS] GSoC 2017

От
Jim Nasby
Дата:
On 1/10/17 1:53 AM, Alexander Korotkov wrote:
> 1. What project ideas we have?

Perhaps allowing SQL-only extensions without requiring filesystem files 
would be a good project.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] GSoC 2017

От
Pavel Stehule
Дата:


2017-01-12 21:21 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/10/17 1:53 AM, Alexander Korotkov wrote:
1. What project ideas we have?

Perhaps allowing SQL-only extensions without requiring filesystem files would be a good project.

Implementation safe evaluation untrusted PL functions - evaluation under different user under different process.

Regards

Pavel

 
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017

От
Peter van Hardenberg
Дата:
A new data type, and/or a new index type could both be nicely scoped bits of work.

On Thu, Jan 12, 2017 at 12:27 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2017-01-12 21:21 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/10/17 1:53 AM, Alexander Korotkov wrote:
1. What project ideas we have?

Perhaps allowing SQL-only extensions without requiring filesystem files would be a good project.

Implementation safe evaluation untrusted PL functions - evaluation under different user under different process.

Regards

Pavel

 
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut

Re: [HACKERS] GSoC 2017

От
Alvaro Herrera
Дата:
Jim Nasby wrote:
> On 1/10/17 1:53 AM, Alexander Korotkov wrote:
> > 1. What project ideas we have?
> 
> Perhaps allowing SQL-only extensions without requiring filesystem files
> would be a good project.

Don't we already have that in patch form?  Dimitri submitted it as I
recall.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GSoC 2017

От
Jim Nasby
Дата:
On 1/13/17 4:08 PM, Alvaro Herrera wrote:
> Jim Nasby wrote:
>> On 1/10/17 1:53 AM, Alexander Korotkov wrote:
>>> 1. What project ideas we have?
>>
>> Perhaps allowing SQL-only extensions without requiring filesystem files
>> would be a good project.
>
> Don't we already have that in patch form?  Dimitri submitted it as I
> recall.

My recollection is that he tried to boil the ocean and also support 
handing compiled C libraries to the database, which was enough to sink 
the patch. It might be nice to support that if we could, and maybe it 
could be a follow-on project.

I do think complete lack of support for non-FS extensions is *seriously* 
hurting use of the feature thanks to environments like RDS and heroku. 
As Pavel mentioned, untrusted languages are in a similar boat. So maybe 
the best way to address these things is to advertise them as "increase 
usability in cloud environments" since cloud excites people.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] GSoC 2017

От
Jim Nasby
Дата:
On 1/13/17 3:09 PM, Peter van Hardenberg wrote:
> A new data type, and/or a new index type could both be nicely scoped
> bits of work.

Did you have any particular data/index types in mind?

Personally I'd love something that worked like a python dictionary, but 
I'm not sure how that'd work without essentially supporting a variant 
data type. I've got code for a variant type[1], and I don't think 
there's any holes in it, but the casting semantics are rather ugly. IIRC 
that problem appeared to be solvable if there was a hook in the current 
casting code right before Postgres threw in the towel and said a cast 
was impossible.

1: https://github.com/BlueTreble/variant/
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] GSoC 2017

От
Anastasia Lubennikova
Дата:

I'm ready to be a mentor.

10.01.2017 12:53, Alexander Korotkov:
Hi all!

In 2016 PostgreSQL project didn't pass to GSoC program.  In my understanding the reasons for that are following.

1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been changed.  In particular, it required more detailed information about possible project.

As result we didn't manage to make a good enough application that time.  Thus, our application was declined. See [1] and [2] for details.

I think that the right way to manage this in 2017 would be to start collecting required information in advance.  According to GSoC 2017 timeline [3] mentoring organization can submit their applications from January 19 to February 9.  Thus, now it's a good time to start collecting project ideas and make call for mentors.  Also, we need to decide who would be our admin this year.

In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?

BTW, I'm ready to be mentor this year.  I'm also open to be an admin if needed.


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017

От
Stephen Frost
Дата:
All,

* Alexander Korotkov (a.korotkov@postgrespro.ru) wrote:
> Also, we need to decide who would
> be our admin this year.

I don't see anyone jumping at the bit to be the admin (it's not exactly
a fun and exciting job, after all), so, unless someone really wants it
(or someone wishs to object), I volunteer as tribute to be the admin
this year.

As such, we need to get this whole thing moving, and pretty quickly, as
Alexander noted.

The first thing we need is an "Ideas" page which includes:

- Brief descriptions of projects that can be completed in about 12 weeks.
- For each project, a list of prerequisites, description of programming skills needed and estimation of difficulty
level.
- A list of potential mentors.

The GSoC 2016 page was a start on this.  I copied that page and updated
it to be a somewhat clearer format, but it could probably use more work.

Here's what google says about the ideas page:

----------
The best pages include links to more detailed descriptions and related
materials for each project. They might even include actual use cases!

Keep in mind that this page is often the first view of your organization
by Google and potential student applicants. A link to your bug tracker
does not an Ideas Page make. Put your best foot forward. In addition to
a basic list, you might also consider providing links to relevant
resources for mentors and students, particular FAQ entries, the
timeline, etc. You might include a section on communication, giving
specific advice on which mailing lists, channels and emails to use and
how to use them. If your organization puts together an application
template for students, you should include that on your page as well.
Think of your Ideas Page as the GSoC portal to your organization.
----------

Would be great for folks to review what's there, maybe provide actual
use-cases for the existing project suggestions, verify that the projects
listed are still valid and appropriate at this point, and, please:

ADD YOUR PROJECTS.

https://wiki.postgresql.org/wiki/GSoC_2017

More information about what the project definition should look like is
included here:

http://write.flossmanuals.net/gsoc-mentoring/defining-a-project/

Before submitting it to Google, I'm going to either expand or nuke
everything under the 'core' section, so if there's something that that
you are really interested in, expand it out so we can have it properly
included in our application to Google.

Also, Google has said that they actually *like* "Umbrella" projects.  As
such, I believe we should encourage projects which are closely related
to PostgreSQL to submit projects for consideration.  I don't think "just
uses PostgreSQL" would be reasonable, but I do think something like "Add
feature XYZ to the pgconf.eu code base to help PostgreSQL-based
organizations and community conferences" would be.

Let's make this year's PostgreSQL GSoC awesome!

Thanks!

Stephen

Re: [HACKERS] GSoC 2017

От
Peter van Hardenberg
Дата:
A new currency type would be nice, and if kept small in scope, might be manageable. Bringing Christoph Berg's PostgreSQL-units into core and extending it could be interesting. Peter E's URL and email types might be good candidates. What else? Informix Datablades had a media type way back in the day... That's still a gap in community Postgres.

On Mon, Jan 16, 2017 at 6:43 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/13/17 3:09 PM, Peter van Hardenberg wrote:
A new data type, and/or a new index type could both be nicely scoped
bits of work.

Did you have any particular data/index types in mind?

Personally I'd love something that worked like a python dictionary, but I'm not sure how that'd work without essentially supporting a variant data type. I've got code for a variant type[1], and I don't think there's any holes in it, but the casting semantics are rather ugly. IIRC that problem appeared to be solvable if there was a hook in the current casting code right before Postgres threw in the towel and said a cast was impossible.

1: https://github.com/BlueTreble/variant/

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut

Re: [HACKERS] GSoC 2017

От
Jim Nasby
Дата:
On 1/23/17 3:45 PM, Peter van Hardenberg wrote:
> A new currency type would be nice, and if kept small in scope, might be
> manageable.

I'd be rather nervous about this. My impression of community consensus 
on this is a currency type that doesn't somehow support conversion 
between different currencies is pretty useless, and supporting 
conversions opens a 55 gallon drum of worms. I could certainly be 
mistaken in my impression, but I think there'd need to be some kind of 
consensus on what a currency type should do before putting that up for GSoC.

But, speaking of types, I wish we had a timestamp type that stored what 
the original timezone was, as well as the relevant TZDATA entry that was 
in place for that timestamp when it was created. Since it'd be 
completely impractical to store TZDATA as part of the dataum, there 
would need to be an immutable catalog table that stored the contents of 
TZDATA any time it changed, as well as a fast way to find the surrogate 
key for the current TZDATA.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] GSoC 2017

От
Peter van Hardenberg
Дата:
On Mon, Jan 23, 2017 at 4:12 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/23/17 3:45 PM, Peter van Hardenberg wrote:
A new currency type would be nice, and if kept small in scope, might be
manageable.

I'd be rather nervous about this. My impression of community consensus on this is a currency type that doesn't somehow support conversion between different currencies is pretty useless, and supporting conversions opens a 55 gallon drum of worms. I could certainly be mistaken in my impression, but I think there'd need to be some kind of consensus on what a currency type should do before putting that up for GSoC.

There's a relatively simple solution to the currency conversion problem which avoids running afoul of the various mistakes some previous implementations have made. Track currencies separately and always ask for a conversion chart at operation time.

Let the user specify the values they want at conversion time. That looks like this:

=> select '1 CAD'::currency + '1 USD'::currency + '1 CHF'::currency
'1.00CAD 1.00USD 1.00CHF'

=> select convert('10.00CAD'::new_currency, ('USD, '1.25', 'CHF', '1.50')::array, 'USD')
12.50USD

The basic concept is that the value of a currency type is that it would allow you to operate in multiple currencies without accidentally adding them. You'd flatten them to a single type if when and how you wanted for any given operation but could work without fear of losing information.

I have no opinion about the most pleasing notation for the currency conversion chart, but I imagine it would be reasonable to let users provide a default set of conversion values somewhere.

There are interesting and worthwhile conversations to have about non-decimal currencies, but I think it would be totally reasonable not to support them at all in a first release. As for currency precision, I would probably consider leaning on numeric under the hood for the actual currency values themselves but IANAA (though I have done quite a lot of work on billing systems).

If it would be helpful, I could provide a detailed proposal on the wiki for others to critique?

-
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut

Re: [HACKERS] GSoC 2017

От
Greg Stark
Дата:
On 24 January 2017 at 03:42, Peter van Hardenberg <pvh@pvh.ca> wrote:
> The basic concept is that the value of a currency type is that it would
> allow you to operate in multiple currencies without accidentally adding
> them. You'd flatten them to a single type if when and how you wanted for any
> given operation but could work without fear of losing information.

I don't think this even needs to be tied to currencies. I've often
thought this would be generally useful for any value with units. This
would prevent you from accidentally adding miles to kilometers or
hours to parsecs which is just as valid as preventing you from adding
CAD to USD.

Then you could imagine having a few entirely optional helper functions
that could automatically provide conversion factors using units.dat or
currency exchange rates. But even if you don't use these helper
functions they would still be useful.

-- 
greg



Re: [HACKERS] GSoC 2017

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> On 24 January 2017 at 03:42, Peter van Hardenberg <pvh@pvh.ca> wrote:
>> The basic concept is that the value of a currency type is that it would
>> allow you to operate in multiple currencies without accidentally adding
>> them. You'd flatten them to a single type if when and how you wanted for any
>> given operation but could work without fear of losing information.

> I don't think this even needs to be tied to currencies. I've often
> thought this would be generally useful for any value with units.

There already is an extension somewhere for attaching units to numeric
values, which would be a place to start from for this purpose.  The
things I think are unique to the currency situation are:

* Time-varying conversion ratios.

* Conventional number of decimal places for any given currency.

* Idiosyncratic I/O formats (symbol to left or right of number,
odd rules for negatives, etc).  I think the space here is covered
by the POSIX currency locale rules.
        regards, tom lane



Re: [HACKERS] GSoC 2017

От
Brad DeJong
Дата:
On January 27, 2017 07:08, Tom Lane wrote:
> ... The things I think are unique to the currency situation are: ...

Add the potential for regulatory requirements to change at any time - sort of like timezone information. So no hard
codedbehavior.   rounding method/accuracy   storage precision different than display precision   conversion method
(multiply,divide, triangulate, other)   use of spot rates (multiple rate sources) rather than/in addition to
time-varyingrates 

responding to the overall idea of a currency type

Numeric values with units so that you get a warning/error when you mix different units in calculations? Ability to
specifyrounding methods and intermediate precisions for calculations? 
+1 Good ideas with lots of potential applications.

Built-in currency type?
-1 I suspect this is one of those things that seems like a good idea but really isn't.



Re: [HACKERS] GSoC 2017

От
Thomas Kellerer
Дата:
Greg Stark wrote
> I don't think this even needs to be tied to currencies. I've often
> thought this would be generally useful for any value with units. This
> would prevent you from accidentally adding miles to kilometers or
> hours to parsecs which is just as valid as preventing you from adding
> CAD to USD.

There is already such a concept - not tied to currencies or units in
general. The SQL standard calls it DISTINCT types. And it can prevent
comparing apples to oranges. 

I don't have the exact syntax at hand, but it's something like this:

create distinct type customer_id_type as integer;
create distinct type order_id_type as integer;

create table customers (id customer_id_type primary key);
create table orders (id order_id_type primary key, customer_id
customer_id_type not null);

And because those columns are defined with different types, the database
will refuse to compare customers.id with orders.id (just like it would
refuse to compare an integer with a date). 

So an accidental join like this:
 select * from orders o   join customers c using (id);

would throw an error because the data types of the IDs can not be compared.









--
View this message in context: http://postgresql.nabble.com/GSoC-2017-tp5938331p5941383.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: [HACKERS] GSoC 2017

От
Jim Nasby
Дата:
On 1/27/17 8:17 AM, Brad DeJong wrote:
> Add the potential for regulatory requirements to change at any time - sort of like timezone information. So no hard
codedbehavior.
 

Well, I wish we had support for storing those changing requirements as 
well. If we had that it would greatly simplify having a timestamp type 
that stores the original timezone.

BTW, time itself fits in the multi-unit pattern, since months don't have 
a fixed conversion to days (and technically seconds don't have a fixed 
conversion to anything thanks to leap seconds).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] GSoC 2017

От
Peter van Hardenberg
Дата:
On Fri, Jan 27, 2017 at 2:48 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/27/17 8:17 AM, Brad DeJong wrote:
Add the potential for regulatory requirements to change at any time - sort of like timezone information. So no hard coded behavior.

Well, I wish we had support for storing those changing requirements as well. If we had that it would greatly simplify having a timestamp type that stores the original timezone.

BTW, time itself fits in the multi-unit pattern, since months don't have a fixed conversion to days (and technically seconds don't have a fixed conversion to anything thanks to leap seconds).

I agree with Jim here.

I think we don't need to solve all the possible currency problems to have a useful type. I'll reiterate what I think is the key point here:

A currency type should work like a wallet. If I have 20USD in my wallet and I put 20EUR in the wallet, I have 20USD and 20EUR in the wallet, not 42USD (or whatever the conversion rate is these days). If I want to convert those to a single currency, I need to perform an operation.

If we had this as a basic building block, support for some of the major currency formats, and a function that a user could call (think of the way we justify_interval sums of intervals to account for the ambiguities in day lengths and so on), I think we'd have a pretty useful type.

As to Tom's point, conversion rates do not vary with time, they vary with time, space, vendor, whether you're buying or selling, and in what quantity, and so on. We can give people the tools to more easily and accurately execute this math without actually building a whole financial tool suite in the first release.

I'll also note that in the absence of progress here, users continue to get bad advice about using the existing MONEY type such as here: http://stackoverflow.com/questions/15726535/postgresql-which-datatype-should-be-used-for-currency

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut

Re: [HACKERS] GSoC 2017

От
Greg Stark
Дата:
On 27 January 2017 at 14:52, Thomas Kellerer <spam_eater@gmx.net> wrote:
>
> I don't have the exact syntax at hand, but it's something like this:
>
> create distinct type customer_id_type as integer;
> create distinct type order_id_type as integer;
>
> create table customers (id customer_id_type primary key);
> create table orders (id order_id_type primary key, customer_id
> customer_id_type not null);

That seems like a useful thing but it's not exactly the same use case.

Measurements with units and currency amounts both have the property
that you are likely to want to have a single column that uses
different units for different rows. You can aggregate across them
without converting as long as you have an appropriate where clause or
group by clause -- GROUP BY units_of(debit_amount) for example.


-- 
greg



Re: [HACKERS] GSoC 2017

От
Ruben Buchatskiy
Дата:
2017-01-10 12:53 GMT+03:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:

1. What project ideas we have?


Hi!

We would like to propose a project on rewriting PostgreSQL executor from

traditional Volcano-style [1] to so-called push-based architecture as implemented in

Hyper [2][3] and VitesseDB [4]. The idea is to reverse the direction of data flow

control: instead of pulling up tuples one-by-one with ExecProcNode(), we suggest

pushing them from below to top until blocking operator (e.g. Aggregation) is

encountered. There’s a good example and more detailed explanation for this approach in [2].

The advantages of this approach:

 * It allows to completely avoid the need of loading/storing the internal state of the bottommost

   (scanning) nodes, which will significantly reduce overhead. With current pull-based model,

   we call functions like heapgettup_pagemode() (and many others) number-of-tuples-to-retrieve

   times, while in push-based model we will call them only once. Currently, we have

   implemented a prototype for SeqScan node and achieved 2x speedup on query

   “select * from lineitem”;

 * The number of memory accesses is minimized; generally better code and data locality,

    cache is used more effectively;

 * Switching to push model also makes a good base for building effective JIT-compiler.

   Currently we have working LLVM-based JIT compiler for expressions [5], as well as whole query

   JIT-compiler [6], which speeds up TPC-H queries up to 4-5 times, but the latter took manually

   re-implementing the executor logic with LLVM API using push model to get this speedup. JIT-compiling

   from original Postgres C code didn't give significant improvement because of Volcano-style model

   inherent inefficiency. After making a switch to push-model we expect to achieve speedup comparable

   to stand-alone JIT, but using the same code for both JIT and the interpreter.

Also, while working on this project, we are likely be revealing and fixing other

weak places of the current query executor. Volcano-style model is known to have

inadequate performance characteristics [7][8], e.g. function call overhead,

and we should deal with it anyway. We also plan to make relatively small patches,

which will optimize the redundant reload of the internal state in the current pull-model.

Many DB systems with support of full query compilation (e.g. LegoBase [9], Hekaton [10]) implement it in push-based manner.

Also we have seen in the mailing list that Kumar Rajeev had been investigating this idea too, and he reported that the results were impressive (unfortunately, without specifying more details):

https://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904%40szxeml521-mbs.china.huawei.com

References

[1]  Graefe G.. Volcano — an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng.,6(1): 120–135, 1994.

[2] Efficiently Compiling Efficient Query Plans for Modern Hardware,

   http://www.vldb.org/pvldb/vol4/p539-neumann.pdf

[3] Compiling Database Queries into Machine Code,

   http://sites.computer.org/debull/A14mar/p3.pdf

[4] https://docs.google.com/presentation/d/1R0po7_Wa9fym5U9Y5qHXGlUi77nSda2LlZXPuAxtd-M/pub?slide=id.g9b338944f_4_131

[5] PostgreSQL with JIT compiler for expressions,

   https://github.com/ispras/postgres

[6] LLVM Cauldron, slides,

   http://llvm.org/devmtg/2016-09/slides/Melnik-PostgreSQLLLVM.pdf

[7] MonetDB/X100: Hyper-Pipelining Query Execution

   http://cidrdb.org/cidr2005/papers/P19.pdf

[8] Vectorization vs. Compilation in Query Execution,

   https://pdfs.semanticscholar.org/dcee/b1e11d3b078b0157325872a581b51402ff66.pdf

[9] http://www.vldb.org/pvldb/vol7/p853-klonatos.pdf[10] https://www.microsoft.com/en-us/research/wp-content/uploads/2013/06/Hekaton-Sigmod2013-final.pdf


--
Best Regards,
Ruben. <ruben@ispras.ru>
ISP RAS.
Вложения

Re: [HACKERS] GSoC 2017

От
Amit Langote
Дата:
On 2017/02/06 20:51, Ruben Buchatskiy wrote:
> Also we have seen in the mailing list that Kumar Rajeev had been
> investigating this idea too, and he reported that the results were
> impressive (unfortunately, without specifying more details):
> 
> https://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904%40szxeml521-mbs.china.huawei.com

You might also want to take a look at some of the ongoing work in this area:

WIP: Faster Expression Processing and Tuple Deforming (including JIT)
https://www.postgresql.org/message-id/flat/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de

Thanks,
Amit





Re: [HACKERS] GSoC 2017

От
Stephen Frost
Дата:
Greetings,

* Amit Langote (Langote_Amit_f8@lab.ntt.co.jp) wrote:
> On 2017/02/06 20:51, Ruben Buchatskiy wrote:
> > Also we have seen in the mailing list that Kumar Rajeev had been
> > investigating this idea too, and he reported that the results were
> > impressive (unfortunately, without specifying more details):
> >
> > https://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904%40szxeml521-mbs.china.huawei.com
>
> You might also want to take a look at some of the ongoing work in this area:
>
> WIP: Faster Expression Processing and Tuple Deforming (including JIT)
> https://www.postgresql.org/message-id/flat/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de

Yes, exactly that.  Please review what's been currently done and,
ideally, have someone like Andres comment on your plan.

Perhaps you could arrange something with him as the mentor, since it
looked like you didn't have any specific mentors listed in a quick look.
That's definitely something that will be needed to include this project.

Thanks!

Stephen

Re: [HACKERS] GSoC 2017

От
Stephen Frost
Дата:
* Stephen Frost (sfrost@snowman.net) wrote:
> * Amit Langote (Langote_Amit_f8@lab.ntt.co.jp) wrote:
> > On 2017/02/06 20:51, Ruben Buchatskiy wrote:
> > > Also we have seen in the mailing list that Kumar Rajeev had been
> > > investigating this idea too, and he reported that the results were
> > > impressive (unfortunately, without specifying more details):
> > >
> > > https://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904%40szxeml521-mbs.china.huawei.com
> >
> > You might also want to take a look at some of the ongoing work in this area:
> >
> > WIP: Faster Expression Processing and Tuple Deforming (including JIT)
> > https://www.postgresql.org/message-id/flat/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de
>
> Yes, exactly that.  Please review what's been currently done and,
> ideally, have someone like Andres comment on your plan.
>
> Perhaps you could arrange something with him as the mentor, since it
> looked like you didn't have any specific mentors listed in a quick look.
> That's definitely something that will be needed to include this project.

Apologies, looks like you do have a couple of mentors listed on the
wiki, so that looks good.

Thanks!

Stephen

Re: [HACKERS] GSoC 2017

От
Stephen Frost
Дата:
Ruben,

* Ruben Buchatskiy (ruben@ispras.ru) wrote:
> Difficulty Level
> Moderate-level; however, microoptimizations might be hard.
> Probably it will also be hard to keep the whole architecture as clean as it is
> now.

The above difficulty level looks fine, but doesn't match what's on the
wiki.  What's on the wiki looks like a copy/paste from one of the
SSI-related items.

Please fix.

Thanks!

Stephen

Re: [HACKERS] GSoC 2017

От
Robert Haas
Дата:
On Mon, Feb 6, 2017 at 6:51 AM, Ruben Buchatskiy <ruben@ispras.ru> wrote:
> 2017-01-10 12:53 GMT+03:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
>> 1. What project ideas we have?
>
> We would like to propose a project on rewriting PostgreSQL executor from
>
> traditional Volcano-style [1] to so-called push-based architecture as
> implemented in
>
> Hyper [2][3] and VitesseDB [4]. The idea is to reverse the direction of data
> flow
>
> control: instead of pulling up tuples one-by-one with ExecProcNode(), we
> suggest
>
> pushing them from below to top until blocking operator (e.g. Aggregation) is
>
> encountered. There’s a good example and more detailed explanation for this
> approach in [2].

I think this very possibly a good idea but extremely unlikely to be
something that a college student or graduate student can complete in
one summer.  More like an existing expert developer and a year of
doing not much else.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] GSoC 2017

От
Pavel Stehule
Дата:


2017-02-08 17:06 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Feb 6, 2017 at 6:51 AM, Ruben Buchatskiy <ruben@ispras.ru> wrote:
> 2017-01-10 12:53 GMT+03:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
>> 1. What project ideas we have?
>
> We would like to propose a project on rewriting PostgreSQL executor from
>
> traditional Volcano-style [1] to so-called push-based architecture as
> implemented in
>
> Hyper [2][3] and VitesseDB [4]. The idea is to reverse the direction of data
> flow
>
> control: instead of pulling up tuples one-by-one with ExecProcNode(), we
> suggest
>
> pushing them from below to top until blocking operator (e.g. Aggregation) is
>
> encountered. There’s a good example and more detailed explanation for this
> approach in [2].

I think this very possibly a good idea but extremely unlikely to be
something that a college student or graduate student can complete in
one summer.  More like an existing expert developer and a year of
doing not much else.

+1

Pavel
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017

От
Dmitry Melnik
Дата:
The expected result for this work is push-based executor working for many types of queries (currently we aim at TPC-H), but it's unlikely to be a production-ready patch to commit into mainline at that stage. This work is the actual topic for our student's thesis, so he has already started, and has working prototypes for very simple plans. Also, he won't be working on this alone, but rather will make use of support and experience of our team (as well as mentor's help).
So this is not about replacing current pull executor right away, but rather to develop working prototype to find out about the benefits of switching from pull to push model (for both the interpreter and LLVM JIT).

On Wed, Feb 8, 2017 at 7:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Feb 6, 2017 at 6:51 AM, Ruben Buchatskiy <ruben@ispras.ru> wrote:
> 2017-01-10 12:53 GMT+03:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
>> 1. What project ideas we have?
>
> We would like to propose a project on rewriting PostgreSQL executor from
>
> traditional Volcano-style [1] to so-called push-based architecture as
> implemented in
>
> Hyper [2][3] and VitesseDB [4]. The idea is to reverse the direction of data
> flow
>
> control: instead of pulling up tuples one-by-one with ExecProcNode(), we
> suggest
>
> pushing them from below to top until blocking operator (e.g. Aggregation) is
>
> encountered. There’s a good example and more detailed explanation for this
> approach in [2].

I think this very possibly a good idea but extremely unlikely to be
something that a college student or graduate student can complete in
one summer.  More like an existing expert developer and a year of
doing not much else.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Best regards,
  Dmitry

Re: [HACKERS] GSoC 2017

От
Alexander Korotkov
Дата:
Hi all!

It seems that PostgreSQL has passed to GSoC mentoring organizations this year!
Congratulations!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017

От
Thomas Munro
Дата:
On Tue, Feb 28, 2017 at 11:42 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Hi all!
>
> It seems that PostgreSQL has passed to GSoC mentoring organizations this
> year!
> https://summerofcode.withgoogle.com/organizations/4558465230962688/
> Congratulations!

Very cool!

By the way, that page claims that PostgreSQL runs on Irix and Tru64,
which hasn't been true for a few years.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] GSoC 2017

От
Jim Nasby
Дата:
On 2/27/17 4:52 PM, Thomas Munro wrote:
> By the way, that page claims that PostgreSQL runs on Irix and Tru64,
> which hasn't been true for a few years.

There could be a GSoC project to add support for those back in... ;P
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] GSoC 2017

От
Robert Haas
Дата:
On Thu, Mar 2, 2017 at 3:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 2/27/17 4:52 PM, Thomas Munro wrote:
>> By the way, that page claims that PostgreSQL runs on Irix and Tru64,
>> which hasn't been true for a few years.
>
> There could be a GSoC project to add support for those back in... ;P

Greg Stark and Tom Lane did some work to fix problems in our VAX
support a few years ago (try git log --grep=VAX), but I don't think
Greg ever got it fully working.  There could be some point to putting
more effort into making PostgreSQL scale to very small systems.  We
seen to run pretty well even on very low-end hardware like a Raspberry
Pi, but there's always something lower-end, and having compile or
runtime options that lower our memory footprint would probably be
useful as the natural opposite of the scalability and parallel query
work we've been doing over the last few years.  Whether it's also
useful to try to support running the system on unobtainable operating
systems is less clear to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] GSoC 2017

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Mar 2, 2017 at 3:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> On 2/27/17 4:52 PM, Thomas Munro wrote:
>>> By the way, that page claims that PostgreSQL runs on Irix and Tru64,
>>> which hasn't been true for a few years.

>> There could be a GSoC project to add support for those back in... ;P

> ...  Whether it's also
> useful to try to support running the system on unobtainable operating
> systems is less clear to me.

I seriously doubt that we'd take patches to run on non-mainstream OSes
without a concomitant promise to support buildfarm animals running such
OSes for the foreseeable future.  Without that we don't know if the
patches still work even a week after they're committed.  We killed the
above-mentioned OSes mainly for lack of any such animals, IIRC.
        regards, tom lane