Обсуждение: about google summer of code 2016

Поиск
Список
Период
Сортировка

about google summer of code 2016

От
Shubham Barai
Дата:
Hello everyone,

I am currently pursuing my bachelor of engineering in computer science at Maharashtra Institute of Technology, Pune ,India. I am very excited about contributing to postgres through google summer of code program.

Is postgres   applying for gsoc 2016 as mentoring organization ?


Thanks,
Shubham Barai

Re: about google summer of code 2016

От
Amit Langote
Дата:
Hi Shubham,

On 2016/02/17 16:27, Shubham Barai wrote:
> Hello everyone,
> 
> I am currently pursuing my bachelor of engineering in computer science
> at Maharashtra
> Institute of Technology, Pune ,India. I am very excited about contributing
> to postgres through google summer of code program.
> 
> Is postgres   applying for gsoc 2016 as mentoring organization ?

I think it does.  Track this page for updates:
http://www.postgresql.org/developer/summerofcode/

You can contact one of the people listed on that page for the latest.

I didn't find for 2016 but here is the PostgreSQL wiki page for the last
year's GSoC page: https://wiki.postgresql.org/wiki/GSoC_2015#Project_Ideas

Thanks,
Amit





Re: about google summer of code 2016

От
Alexander Korotkov
Дата:
On Wed, Feb 17, 2016 at 10:40 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2016/02/17 16:27, Shubham Barai wrote:
> Hello everyone,
>
> I am currently pursuing my bachelor of engineering in computer science
> at Maharashtra
> Institute of Technology, Pune ,India. I am very excited about contributing
> to postgres through google summer of code program.
>
> Is postgres   applying for gsoc 2016 as mentoring organization ?

I think it does.  Track this page for updates:
http://www.postgresql.org/developer/summerofcode/

You can contact one of the people listed on that page for the latest.

I didn't find for 2016 but here is the PostgreSQL wiki page for the last
year's GSoC page: https://wiki.postgresql.org/wiki/GSoC_2015#Project_Ideas

I've created wiki page for GSoC 2016. It contains unimplemented ideas from 2015 page.
Now, GSoC accepting proposals from organizations. Typically, we have call for mentors in hackers mailing list in this period.
Thom, do we apply this year?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: about google summer of code 2016

От
Amit Langote
Дата:
On 2016/02/18 22:44, Alexander Korotkov wrote:
> On Wed, Feb 17, 2016 at 10:40 AM, Amit Langote wrote:
>> I didn't find for 2016 but here is the PostgreSQL wiki page for the last
>> year's GSoC page: https://wiki.postgresql.org/wiki/GSoC_2015#Project_Ideas
> 
> 
> I've created wiki page for GSoC 2016. It contains unimplemented ideas from
> 2015 page.
> Now, GSoC accepting proposals from organizations. Typically, we have call
> for mentors in hackers mailing list in this period.
> Thom, do we apply this year?

Apparently, the deadline is: February 20, 2016 at 04:00 (+0900 UTC)

https://summerofcode.withgoogle.com/

Thanks,
Amit





Re: about google summer of code 2016

От
Chapman Flack
Дата:
On 02/18/16 19:35, Amit Langote wrote:

> Apparently, the deadline is: February 20, 2016 at 04:00 (+0900 UTC)
> 
> https://summerofcode.withgoogle.com/

For anybody finding that web site as anti-navigable as I did, here
are more direct links to the actual rules, and terms of agreement
for the various participants:

https://summerofcode.withgoogle.com/rules/
https://summerofcode.withgoogle.com/terms/org
https://summerofcode.withgoogle.com/terms/mentor
https://summerofcode.withgoogle.com/terms/student

Here is a question: does it ever happen that PostgreSQL acts as
the org for a project that is PostgreSQL-related but isn't
directly PGDG-led?

... there are definitely interesting and promising areas for further
development in PL/Java beyond what I would ever have time to tackle
solo, and I could easily enjoy mentoring someone through one or
another of them over a summer, which could also help reinvigorate
the project and get another developer familiar with it at a
non-superficial level.  While I could easily see myself mentoring,
I think it would feel like overkill to apply individually as a
one-trick 'organization'.

I see that there was a "based on PL/Java" GSoC'12 project, so maybe
there is some room for non-core ideas under the PostgreSQL ægis?

In any case, I am quite confident that I could *not* complete a
separate org application by tomorrow 2 pm EST. In reading the rules,
it looks possible that the Ideas List does not have to accompany
the org application, but would be needed shortly after acceptance?

If acceptance announcements are 29 February, I could have some
ideas drafted by then.

Is this a thinkable thought?

-Chap



Re: about google summer of code 2016

От
Atri Sharma
Дата:
<p dir="ltr"><br /> On 19 Feb 2016 8:30 am, "Chapman Flack" <<a
href="mailto:chap@anastigmatix.net">chap@anastigmatix.net</a>>wrote:<br /> ><br /> > On 02/18/16 19:35, Amit
Langotewrote:<br /> ><br /> > > Apparently, the deadline is: February 20, 2016 at 04:00 (+0900 UTC)<br /> >
><br/> > > <a href="https://summerofcode.withgoogle.com/">https://summerofcode.withgoogle.com/</a><br />
><br/> > For anybody finding that web site as anti-navigable as I did, here<br /> > are more direct links to
theactual rules, and terms of agreement<br /> > for the various participants:<br /> ><br /> > <a
href="https://summerofcode.withgoogle.com/rules/">https://summerofcode.withgoogle.com/rules/</a><br/> > <a
href="https://summerofcode.withgoogle.com/terms/org">https://summerofcode.withgoogle.com/terms/org</a><br/> > <a
href="https://summerofcode.withgoogle.com/terms/mentor">https://summerofcode.withgoogle.com/terms/mentor</a><br/> >
<ahref="https://summerofcode.withgoogle.com/terms/student">https://summerofcode.withgoogle.com/terms/student</a><br />
><br/> > Here is a question: does it ever happen that PostgreSQL acts as<br /> > the org for a project that is
PostgreSQL-relatedbut isn't<br /> > directly PGDG-led?<br /> ><br /> > ... there are definitely interesting
andpromising areas for further<br /> > development in PL/Java beyond what I would ever have time to tackle<br />
>solo, and I could easily enjoy mentoring someone through one or<br /> > another of them over a summer, which
couldalso help reinvigorate<br /> > the project and get another developer familiar with it at a<br /> >
non-superficiallevel.  While I could easily see myself mentoring,<br /> > I think it would feel like overkill to
applyindividually as a<br /> > one-trick 'organization'.<br /> ><br /> > I see that there was a "based on
PL/Java"GSoC'12 project, so maybe<br /> > there is some room for non-core ideas under the PostgreSQL ægis?<p
dir="ltr">FWIWit wasn't a PL/Java based project per se, it was a JDBC FDW. <p dir="ltr">I agree, there might be scope
fornon core projects and PL/Java sounds like a good area.<p dir="ltr">Regards, <p dir="ltr">Atri 

Re: about google summer of code 2016

От
Alvaro Herrera
Дата:
Atri Sharma wrote:

> I agree, there might be scope for non core projects and PL/Java sounds like
> a good area.

We've hosted MADlib-based projects in the past, so why not.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: about google summer of code 2016

От
Álvaro Hernández Tortosa
Дата:
    Hi.
    Oleg and I discussed recently that a really good addition to a GSoC 
item would be to study whether it's convenient to have a binary 
serialization format for jsonb over the wire. Some argue this should be 
benchmarked first. So the scope for this project would be to benchmark 
and analyze the potential improvements and then agree on which format 
jsonb could be serialized to (apart from the current on-disk format, 
there are many json or nested k-v formats that could be used for sending 
over the wire).
    I would like to mentor this project with Oleg.
    Thanks,
    Álvaro


-- 
Álvaro Hernández Tortosa


-----------
8Kdata




On 17/02/16 08:40, Amit Langote wrote:
> Hi Shubham,
>
> On 2016/02/17 16:27, Shubham Barai wrote:
>> Hello everyone,
>>
>> I am currently pursuing my bachelor of engineering in computer science
>> at Maharashtra
>> Institute of Technology, Pune ,India. I am very excited about contributing
>> to postgres through google summer of code program.
>>
>> Is postgres   applying for gsoc 2016 as mentoring organization ?
> I think it does.  Track this page for updates:
> http://www.postgresql.org/developer/summerofcode/
>
> You can contact one of the people listed on that page for the latest.
>
> I didn't find for 2016 but here is the PostgreSQL wiki page for the last
> year's GSoC page: https://wiki.postgresql.org/wiki/GSoC_2015#Project_Ideas
>
> Thanks,
> Amit
>
>
>
>




Re: about google summer of code 2016

От
Josh berkus
Дата:
On 02/19/2016 10:10 AM, Álvaro Hernández Tortosa wrote:
>
>      Hi.
>
>      Oleg and I discussed recently that a really good addition to a GSoC
> item would be to study whether it's convenient to have a binary
> serialization format for jsonb over the wire. Some argue this should be
> benchmarked first. So the scope for this project would be to benchmark
> and analyze the potential improvements and then agree on which format
> jsonb could be serialized to (apart from the current on-disk format,
> there are many json or nested k-v formats that could be used for sending
> over the wire).
>
>      I would like to mentor this project with Oleg.

+1


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)



Re: about google summer of code 2016

От
Heikki Linnakangas
Дата:
On 19/02/16 10:10, Álvaro Hernández Tortosa wrote:
>       Oleg and I discussed recently that a really good addition to a GSoC
> item would be to study whether it's convenient to have a binary
> serialization format for jsonb over the wire. Some argue this should be
> benchmarked first. So the scope for this project would be to benchmark
> and analyze the potential improvements and then agree on which format
> jsonb could be serialized to (apart from the current on-disk format,
> there are many json or nested k-v formats that could be used for sending
> over the wire).

Seems a bit risky for a GSoC project. We don't know if a different 
serialization format will be a win, or whether we want to do it in the 
end, until the benchmarking is done. It's also not clear what we're 
trying to achieve with the serialization format: smaller on-the-wire 
size, faster serialization in the server, faster parsing in the client, 
or what?

- Heikki




Re: about google summer of code 2016

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 19/02/16 10:10, Álvaro Hernández Tortosa wrote:
>> Oleg and I discussed recently that a really good addition to a GSoC
>> item would be to study whether it's convenient to have a binary
>> serialization format for jsonb over the wire.

> Seems a bit risky for a GSoC project. We don't know if a different 
> serialization format will be a win, or whether we want to do it in the 
> end, until the benchmarking is done. It's also not clear what we're 
> trying to achieve with the serialization format: smaller on-the-wire 
> size, faster serialization in the server, faster parsing in the client, 
> or what?

Another variable is that your answers might depend on what format you
assume the client is trying to convert from/to.  (It's presumably not
text JSON, but then what is it?)

Having said that, I'm not sure that risk is a blocking factor here.
History says that a large fraction of our GSoC projects don't result
in a commit to core PG.  As long as we're clear that "success" in this
project isn't measured by getting a feature committed, it doesn't seem
riskier than any other one.  Maybe it's even less risky, because there's
less of the success condition that's not under the GSoC student's control.
        regards, tom lane



Re: about google summer of code 2016

От
Chapman Flack
Дата:
On 02/21/16 23:10, Tom Lane wrote:

> Another variable is that your answers might depend on what format you
> assume the client is trying to convert from/to.  (It's presumably not
> text JSON, but then what is it?)

This connects tangentially to a question I've been meaning to ask
for a while, since I was looking at the representation of XML.

As far as I can tell, XML is simply stored in its character serialized
representation (very likely compressed, if large enough to TOAST), and
the text in/out methods simply deal in that representation. The 'binary'
send/recv methods seem to differ only in possibly using a different
character encoding on the wire.

Now, also as I understand it, there's no requirement that a type even
/have/ binary send/recv methods. Text in/out it always needs, but send/recv
only if they are interesting enough to buy you something. I'm not sure
the XML send/recv really do buy anything. It is not as if they present the
XML in any more structured or tokenized form. If they buy anything at all,
it may be only an extra transcoding that the other end will probably
immediately do in reverse.

So, if that's the situation, is there some other, really simple, choice
for what XML send/recv might usefully do, that would buy more than what
they do now?

Well, PGLZ is in libpqcommon now, right? What if xml send wrote a flag
to indicate compressed or not, and then if the value is compressed TOAST,
streamed it right out as is, with no expansion on the server? I could see
that being a worthwhile win, /without even having to devise some
XML-specific encoding/. (XML has a big expansion ratio.)

And, since that idea is not inherently XML-specific ... does the JSONB
representation have the same properties?  How about even text or bytea?




The XML question has a related, JDBC-specific part. JDBC presents XML
via interfaces that can deal in Source and Result objects, and these
come in different flavors (DOMSource, an all-in-memory tree, SAXSource
and StAXSource, both streaming tokenized forms, or StreamSource, a
streaming, character-serialized form). Client code can ask for one of
those forms explicitly, or use null to say it doesn't care. In the
doesn't-care case, the driver is expected to choose the form closest
to what it's got under the hood; the client can convert if necessary,
and if it had any other preference, it would have said so. For PGJDBC,
that choice would naturally be the character StreamSource, because that
/is/ the form it's got under the hood, but for reasons mysterious to me,
pgjdbc actually chooses DOMSource in the don't-care case, and then
expends the full effort of turning the serialized stream it does have
into a full in-memory DOM that the client hasn't asked for and might
not even want. I know this is more a PGJDBC question, but I mention it
here just because it's so much like the what-should-send/recv-do question,
repeated at another level.

-Chap



Re: about google summer of code 2016

От
Álvaro Hernández Tortosa
Дата:

On 21/02/16 21:15, Heikki Linnakangas wrote:
> On 19/02/16 10:10, Álvaro Hernández Tortosa wrote:
>>       Oleg and I discussed recently that a really good addition to a 
>> GSoC
>> item would be to study whether it's convenient to have a binary
>> serialization format for jsonb over the wire. Some argue this should be
>> benchmarked first. So the scope for this project would be to benchmark
>> and analyze the potential improvements and then agree on which format
>> jsonb could be serialized to (apart from the current on-disk format,
>> there are many json or nested k-v formats that could be used for sending
>> over the wire).
>
> Seems a bit risky for a GSoC project. We don't know if a different 
> serialization format will be a win, 
    Over the current serialization (text) is hard to believe there will 
be no wins.

> or whether we want to do it in the end, until the benchmarking is 
> done. It's also not clear what we're trying to achieve with the 
> serialization format: smaller on-the-wire size, faster serialization 
> in the server, faster parsing in the client, or what?
    Probably all of them (it would be ideal if it could be selectable). 
Some may favor small on-the-wire size (which can be significant with 
several serialization formats) or faster decoding (de-serialization 
takes a significant execution time). Of course, all this should be 
tested and benchmarked before, but we're not alone here.
    This is a significant request from many, at least from the Java 
users, where it has been discussed many times. Specially if wire format 
adheres to one well-known (or even Standard) format, so that the 
receiving side and the drivers could expose an API based on that format 
--one of the other big pains today in this side.
    I think it fits very well for a GSoC! :)
    Álvaro


-- 
Álvaro Hernández Tortosa


-----------
8Kdata




Re: about google summer of code 2016

От
Álvaro Hernández Tortosa
Дата:

On 22/02/16 05:10, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> On 19/02/16 10:10, �lvaro Hernández Tortosa wrote:
>>> Oleg and I discussed recently that a really good addition to a GSoC
>>> item would be to study whether it's convenient to have a binary
>>> serialization format for jsonb over the wire.
>> Seems a bit risky for a GSoC project. We don't know if a different
>> serialization format will be a win, or whether we want to do it in the
>> end, until the benchmarking is done. It's also not clear what we're
>> trying to achieve with the serialization format: smaller on-the-wire
>> size, faster serialization in the server, faster parsing in the client,
>> or what?
> Another variable is that your answers might depend on what format you
> assume the client is trying to convert from/to.  (It's presumably not
> text JSON, but then what is it?)
    As I mentioned before, there are many well-known JSON serialization 
formats, like:

- http://ubjson.org/
- http://cbor.io/
- http://msgpack.org/
- BSON (ok, let's skip that one hehehe)
- http://wiki.fasterxml.com/SmileFormatSpec

>
> Having said that, I'm not sure that risk is a blocking factor here.
> History says that a large fraction of our GSoC projects don't result
> in a commit to core PG.  As long as we're clear that "success" in this
> project isn't measured by getting a feature committed, it doesn't seem
> riskier than any other one.  Maybe it's even less risky, because there's
> less of the success condition that's not under the GSoC student's control.
    Agreed :)
    Álvaro


-- 
Álvaro Hernández Tortosa


-----------
8Kdata




Re: about google summer of code 2016

От
Tom Lane
Дата:
Álvaro Hernández Tortosa <aht@8kdata.com> writes:
> On 22/02/16 05:10, Tom Lane wrote:
>> Another variable is that your answers might depend on what format you
>> assume the client is trying to convert from/to.  (It's presumably not
>> text JSON, but then what is it?)

>      As I mentioned before, there are many well-known JSON serialization 
> formats, like:

> - http://ubjson.org/
> - http://cbor.io/
> - http://msgpack.org/
> - BSON (ok, let's skip that one hehehe)
> - http://wiki.fasterxml.com/SmileFormatSpec

Ah, the great thing about standards is there are so many to choose from :-(

So I guess part of the GSoC project would have to be figuring out which
one of these would make the most sense for us to adopt.
        regards, tom lane



Re: about google summer of code 2016

От
Álvaro Hernández Tortosa
Дата:

On 22/02/16 23:34, Tom Lane wrote:
> Álvaro Hernández Tortosa <aht@8kdata.com> writes:
>> On 22/02/16 05:10, Tom Lane wrote:
>>> Another variable is that your answers might depend on what format you
>>> assume the client is trying to convert from/to.  (It's presumably not
>>> text JSON, but then what is it?)
>>       As I mentioned before, there are many well-known JSON serialization
>> formats, like:
>> - http://ubjson.org/
>> - http://cbor.io/
>> - http://msgpack.org/
>> - BSON (ok, let's skip that one hehehe)
>> - http://wiki.fasterxml.com/SmileFormatSpec
> Ah, the great thing about standards is there are so many to choose from :-(
>
> So I guess part of the GSoC project would have to be figuring out which
> one of these would make the most sense for us to adopt.
>
>             regards, tom lane
    Yes.
    And unless I'm mistaken, there's an int16 to identify the data 
format. Apart from the chosen format, others may be provided as an 
alternative using different data formats. Or alternatives (like 
compressed text json). Of course, this may be better suited for a next 
GSoC project, of course.
    Álvaro


-- 
Álvaro Hernández Tortosa


-----------
8Kdata




Re: about google summer of code 2016

От
Álvaro Hernández Tortosa
Дата:

On 22/02/16 23:23, Álvaro Hernández Tortosa wrote:
>
>
> On 22/02/16 05:10, Tom Lane wrote:
>> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>>> On 19/02/16 10:10, �lvaro Hernández Tortosa wrote:
>>>> Oleg and I discussed recently that a really good addition to a GSoC
>>>> item would be to study whether it's convenient to have a binary
>>>> serialization format for jsonb over the wire.
>>> Seems a bit risky for a GSoC project. We don't know if a different
>>> serialization format will be a win, or whether we want to do it in the
>>> end, until the benchmarking is done. It's also not clear what we're
>>> trying to achieve with the serialization format: smaller on-the-wire
>>> size, faster serialization in the server, faster parsing in the client,
>>> or what?
>> Another variable is that your answers might depend on what format you
>> assume the client is trying to convert from/to.  (It's presumably not
>> text JSON, but then what is it?)
>
>     As I mentioned before, there are many well-known JSON 
> serialization formats, like:
>
> - http://ubjson.org/
> - http://cbor.io/
> - http://msgpack.org/
> - BSON (ok, let's skip that one hehehe)
> - http://wiki.fasterxml.com/SmileFormatSpec
>
>>
>> Having said that, I'm not sure that risk is a blocking factor here.
>> History says that a large fraction of our GSoC projects don't result
>> in a commit to core PG.  As long as we're clear that "success" in this
>> project isn't measured by getting a feature committed, it doesn't seem
>> riskier than any other one.  Maybe it's even less risky, because there's
>> less of the success condition that's not under the GSoC student's 
>> control.
>
    I wanted to bring an update here. It looks like someone did the 
expected benchmark "for us" :)

https://eng.uber.com/trip-data-squeeze/    (thanks Alam for the link)
    While this is Uber's own test, I think the conclusions are quite 
significant: an encoding like message pack + zlib requires only 14% of 
the size and encodes+decodes in 76% of the time of JSON. There are of 
course other contenders that trade better encoding times over slightly 
slower decoding and bigger size. But there are very interesting numbers 
on this benchmark. MessagePack, CBOR and UJSON (all + zlib) look like 
really good options.
    So now that we have this data I would like to ask these questions 
to the community:

- Is this enough, or do we need to perform our own, different benchmarks?

- If this is enough, and given that we weren't elected for GSoC, is 
there interest in the community to work on this nonetheless?

- Regarding GSoC: it looks to me that we failed to submit in time. Is 
this what happened, or we weren't selected? If the former (and no 
criticism here, just realizing a fact) what can we do next year to avoid 
this happening again? Is anyone "appointed" to take care of it?

    Álvaro

-- 
Álvaro Hernández Tortosa


-----------
8Kdata




Re: about google summer of code 2016

От
Amit Langote
Дата:
On 2016/03/23 9:19, Álvaro Hernández Tortosa wrote:
> - Regarding GSoC: it looks to me that we failed to submit in time. Is this
> what happened, or we weren't selected? If the former (and no criticism
> here, just realizing a fact) what can we do next year to avoid this
> happening again? Is anyone "appointed" to take care of it?

See Thom's message here:

http://www.postgresql.org/message-id/CAA-aLv6i3jh1H-5UHb8jSB0gMwA9sg_cqw3=MwddVzr=pXAwug@mail.gmail.com

Thanks,
Amit





Re: about google summer of code 2016

От
Álvaro Hernández Tortosa
Дата:

On 23/03/16 01:56, Amit Langote wrote:
> On 2016/03/23 9:19, Álvaro Hernández Tortosa wrote:
>> - Regarding GSoC: it looks to me that we failed to submit in time. Is this
>> what happened, or we weren't selected? If the former (and no criticism
>> here, just realizing a fact) what can we do next year to avoid this
>> happening again? Is anyone "appointed" to take care of it?
> See Thom's message here:
>
> http://www.postgresql.org/message-id/CAA-aLv6i3jh1H-5UHb8jSB0gMwA9sg_cqw3=MwddVzr=pXAwug@mail.gmail.com
>
> Thanks,
> Amit
>
>
    OK, read the thread, thanks for the info :)
    Álvaro

-- 
Álvaro Hernández Tortosa


-----------
8Kdata