Re: about google summer of code 2016

Поиск

Список

Период

Сортировка

От	Álvaro Hernández Tortosa
Тема	Re: about google summer of code 2016
Дата	23 марта 2016 г. 00:19:31
Msg-id	56F1E106.6060900@8kdata.com обсуждение исходный текст
Ответ на	Re: about google summer of code 2016 (Álvaro Hernández Tortosa <aht@8kdata.com>)
Ответы	Re: about google summer of code 2016
Список	pgsql-hackers

Дерево обсуждения


On 22/02/16 23:23, Álvaro Hernández Tortosa wrote:
>
>
> On 22/02/16 05:10, Tom Lane wrote:
>> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>>> On 19/02/16 10:10, Ã�lvaro HernÃ¡ndez Tortosa wrote:
>>>> Oleg and I discussed recently that a really good addition to a GSoC
>>>> item would be to study whether it's convenient to have a binary
>>>> serialization format for jsonb over the wire.
>>> Seems a bit risky for a GSoC project. We don't know if a different
>>> serialization format will be a win, or whether we want to do it in the
>>> end, until the benchmarking is done. It's also not clear what we're
>>> trying to achieve with the serialization format: smaller on-the-wire
>>> size, faster serialization in the server, faster parsing in the client,
>>> or what?
>> Another variable is that your answers might depend on what format you
>> assume the client is trying to convert from/to.  (It's presumably not
>> text JSON, but then what is it?)
>
>     As I mentioned before, there are many well-known JSON 
> serialization formats, like:
>
> - http://ubjson.org/
> - http://cbor.io/
> - http://msgpack.org/
> - BSON (ok, let's skip that one hehehe)
> - http://wiki.fasterxml.com/SmileFormatSpec
>
>>
>> Having said that, I'm not sure that risk is a blocking factor here.
>> History says that a large fraction of our GSoC projects don't result
>> in a commit to core PG.  As long as we're clear that "success" in this
>> project isn't measured by getting a feature committed, it doesn't seem
>> riskier than any other one.  Maybe it's even less risky, because there's
>> less of the success condition that's not under the GSoC student's 
>> control.
>
    I wanted to bring an update here. It looks like someone did the 
expected benchmark "for us" :)

https://eng.uber.com/trip-data-squeeze/    (thanks Alam for the link)
    While this is Uber's own test, I think the conclusions are quite 
significant: an encoding like message pack + zlib requires only 14% of 
the size and encodes+decodes in 76% of the time of JSON. There are of 
course other contenders that trade better encoding times over slightly 
slower decoding and bigger size. But there are very interesting numbers 
on this benchmark. MessagePack, CBOR and UJSON (all + zlib) look like 
really good options.
    So now that we have this data I would like to ask these questions 
to the community:

- Is this enough, or do we need to perform our own, different benchmarks?

- If this is enough, and given that we weren't elected for GSoC, is 
there interest in the community to work on this nonetheless?

- Regarding GSoC: it looks to me that we failed to submit in time. Is 
this what happened, or we weren't selected? If the former (and no 
criticism here, just realizing a fact) what can we do next year to avoid 
this happening again? Is anyone "appointed" to take care of it?

    Álvaro

-- 
Álvaro Hernández Tortosa


-----------
8Kdata

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: about google summer of code 2016