Re: about google summer of code 2016

Поиск
Список
Период
Сортировка
От Álvaro Hernández Tortosa
Тема Re: about google summer of code 2016
Дата
Msg-id 56F1E106.6060900@8kdata.com
обсуждение исходный текст
Ответ на Re: about google summer of code 2016  (Álvaro Hernández Tortosa <aht@8kdata.com>)
Ответы Re: about google summer of code 2016
Список pgsql-hackers

On 22/02/16 23:23, Álvaro Hernández Tortosa wrote:
>
>
> On 22/02/16 05:10, Tom Lane wrote:
>> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>>> On 19/02/16 10:10, �lvaro Hernández Tortosa wrote:
>>>> Oleg and I discussed recently that a really good addition to a GSoC
>>>> item would be to study whether it's convenient to have a binary
>>>> serialization format for jsonb over the wire.
>>> Seems a bit risky for a GSoC project. We don't know if a different
>>> serialization format will be a win, or whether we want to do it in the
>>> end, until the benchmarking is done. It's also not clear what we're
>>> trying to achieve with the serialization format: smaller on-the-wire
>>> size, faster serialization in the server, faster parsing in the client,
>>> or what?
>> Another variable is that your answers might depend on what format you
>> assume the client is trying to convert from/to.  (It's presumably not
>> text JSON, but then what is it?)
>
>     As I mentioned before, there are many well-known JSON 
> serialization formats, like:
>
> - http://ubjson.org/
> - http://cbor.io/
> - http://msgpack.org/
> - BSON (ok, let's skip that one hehehe)
> - http://wiki.fasterxml.com/SmileFormatSpec
>
>>
>> Having said that, I'm not sure that risk is a blocking factor here.
>> History says that a large fraction of our GSoC projects don't result
>> in a commit to core PG.  As long as we're clear that "success" in this
>> project isn't measured by getting a feature committed, it doesn't seem
>> riskier than any other one.  Maybe it's even less risky, because there's
>> less of the success condition that's not under the GSoC student's 
>> control.
>
    I wanted to bring an update here. It looks like someone did the 
expected benchmark "for us" :)

https://eng.uber.com/trip-data-squeeze/    (thanks Alam for the link)
    While this is Uber's own test, I think the conclusions are quite 
significant: an encoding like message pack + zlib requires only 14% of 
the size and encodes+decodes in 76% of the time of JSON. There are of 
course other contenders that trade better encoding times over slightly 
slower decoding and bigger size. But there are very interesting numbers 
on this benchmark. MessagePack, CBOR and UJSON (all + zlib) look like 
really good options.
    So now that we have this data I would like to ask these questions 
to the community:

- Is this enough, or do we need to perform our own, different benchmarks?

- If this is enough, and given that we weren't elected for GSoC, is 
there interest in the community to work on this nonetheless?

- Regarding GSoC: it looks to me that we failed to submit in time. Is 
this what happened, or we weren't selected? If the former (and no 
criticism here, just realizing a fact) what can we do next year to avoid 
this happening again? Is anyone "appointed" to take care of it?

    Álvaro

-- 
Álvaro Hernández Tortosa


-----------
8Kdata




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Roma Sokolov
Дата:
Сообщение: Re: [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Updated backup APIs for non-exclusive backups