Обсуждение: GSOC 2018 ideas

Поиск

Список

Период

Сортировка

GSOC 2018 ideas

От

Charles Cui

Дата:

25 февраля 2018 г., 06:25:36

Hi Aleksander,

This is Yan from Columbia University. I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the ideas of thrift data types support that proposed by you. So, I want to prepare for a proposal based on this idea. Can I have more detailed information of what documents or code that I need to understand? Also, if this idea is allocated to other student (or in other worlds, you prefer some student to work on it), do let me know, so that I can pick some other project in PostgreSQL. Any comments or suggestions are welcomed!

Hope for your reply!

Thanks Charles!

Re: GSOC 2018 ideas

От

Aleksander Alekseev

Дата:

26 февраля 2018 г., 18:21:11

Hello Charles,

> I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the
> ideas of thrift data types support that proposed by you. So, I want to
> prepare for a proposal based on this idea.

Glad you are interested in this project!

> Can I have more detailed information of what documents or code that I
> need to understand?

I would recommend the following documents and code:

* Source code of pg_protobuf
  https://github.com/afiskon/pg_protobuf
* "Writing Postgres Extensions" tutorial series by Manuel Kniep
  http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/
* "So you want to make an extension?" talk by Keith Fiske
  http://slides.keithf4.com/extension_dev/#/
* Apache Thrift official website
  https://thrift.apache.org/
* Also a great explanation of the Thrift format can be found in the
  book "Designing Data-Intensive Applications" by Martin Kleppmann
  http://dataintensive.net/

> Also, if this idea is allocated to other student (or in other worlds,
> you prefer some student to work on it), do let me know, so that I can
> pick some other project in PostgreSQL. Any comments or suggestions are
> welcomed!

To my best knowledge currently there are no other students interested in
this particular work.

--
Best regards,
Aleksander Alekseev

Вложения

signature.asc

Re: GSOC 2018 ideas

От

Charles Cui

Дата:

03 марта 2018 г., 11:11:00

Got it, Aleksander! Will study these documents carefully!

2018-02-26 4:21 GMT-08:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:

Hello Charles,

> I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the
> ideas of thrift data types support that proposed by you. So, I want to
> prepare for a proposal based on this idea.

Glad you are interested in this project!

> Can I have more detailed information of what documents or code that I
> need to understand?

I would recommend the following documents and code:

* Source code of pg_protobuf
https://github.com/afiskon/pg_protobuf
* "Writing Postgres Extensions" tutorial series by Manuel Kniep
http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/
* "So you want to make an extension?" talk by Keith Fiske
http://slides.keithf4.com/extension_dev/#/
* Apache Thrift official website
https://thrift.apache.org/
* Also a great explanation of the Thrift format can be found in the
book "Designing Data-Intensive Applications" by Martin Kleppmann
http://dataintensive.net/

> Also, if this idea is allocated to other student (or in other worlds,
> you prefer some student to work on it), do let me know, so that I can
> pick some other project in PostgreSQL. Any comments or suggestions are
> welcomed!

To my best knowledge currently there are no other students interested in
this particular work.

--
Best regards,
Aleksander Alekseev

Re: GSOC 2018 ideas

От

Charles Cui

Дата:

04 марта 2018 г., 11:00:22

Hi Aleksander,

Went through the documents listed by you, and they are helpful!

It seems the main purpose of extension pg_protobuf is to parse

a protobuf struct and return the decoded field. May I ask how these kinds

of extensions are used in postgreSQL (or in other words, the scenarios to

use these plugins)?

Thanks Charles!

2018-03-02 21:11 GMT-08:00 Charles Cui <charles.cui1984@gmail.com>:

Got it, Aleksander! Will study these documents carefully!

2018-02-26 4:21 GMT-08:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
Hello Charles,

> I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the
> ideas of thrift data types support that proposed by you. So, I want to
> prepare for a proposal based on this idea.

Glad you are interested in this project!

> Can I have more detailed information of what documents or code that I
> need to understand?

I would recommend the following documents and code:

* Source code of pg_protobuf
https://github.com/afiskon/pg_protobuf
* "Writing Postgres Extensions" tutorial series by Manuel Kniep
http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/
* "So you want to make an extension?" talk by Keith Fiske
http://slides.keithf4.com/extension_dev/#/
* Apache Thrift official website
https://thrift.apache.org/
* Also a great explanation of the Thrift format can be found in the
book "Designing Data-Intensive Applications" by Martin Kleppmann
http://dataintensive.net/

> Also, if this idea is allocated to other student (or in other worlds,
> you prefer some student to work on it), do let me know, so that I can
> pick some other project in PostgreSQL. Any comments or suggestions are
> welcomed!

To my best knowledge currently there are no other students interested in
this particular work.

--
Best regards,
Aleksander Alekseev

Re: GSOC 2018 ideas

От

Aleksander Alekseev

Дата:

05 марта 2018 г., 15:42:27

Hello Charles,

>    Went through the documents listed by you, and they are helpful!
> It seems the main purpose of extension pg_protobuf is to parse
> a protobuf struct and return the decoded field. May I ask how these kinds
> of extensions are used in postgreSQL (or in other words, the scenarios to
> use these plugins)?

There are a few ideas behind all of this.

1) Sometimes people are not quite happy with strict relational schema by
various reasons and prefer something more agile, like XML or JSON. These
formats are indeed more convenient under certain circumstances, for
instance in terms of ease of changing and migrating the schema.

2) One drawback of JSON is redundancy. For instance, you have to store
the names of all document fields. These names don't carry much
information but consume disk space and RAM thus affecting the overall
performance. ZSON extension [1] partially solved this issue. However I
wouldn't call it particularly convenient and the whole approach of
compressing JSON seems to me more like a dirty hack, not a solution. The
problem appeared because of using the wrong data format in the first
place.

3) Unlike JSON, formats like Protobuf or Thrift are binary formats and
most importantly don't store any field names. Thus they don't create a
problem described above. However, PostgreSQL is not capable to access
Protobuf fields out-of-the-box, for instance to index these fields. This
is what pg_protobuf is for.

Hopefully this answers you question. If you have other questions please
don't hesitate to ask!

[1]: https://github.com/postgrespro/zson


--
Best regards,
Aleksander Alekseev

Вложения

signature.asc

Re: GSOC 2018 ideas

От

Charles Cui

Дата:

07 марта 2018 г., 09:55:07

2018-03-05 1:42 GMT-08:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:

Hello Charles,

> Went through the documents listed by you, and they are helpful!
> It seems the main purpose of extension pg_protobuf is to parse
> a protobuf struct and return the decoded field. May I ask how these kinds
> of extensions are used in postgreSQL (or in other words, the scenarios to
> use these plugins)?

There are a few ideas behind all of this.

1) Sometimes people are not quite happy with strict relational schema by
various reasons and prefer something more agile, like XML or JSON. These
formats are indeed more convenient under certain circumstances, for
instance in terms of ease of changing and migrating the schema.

2) One drawback of JSON is redundancy. For instance, you have to store
the names of all document fields. These names don't carry much
information but consume disk space and RAM thus affecting the overall
performance. ZSON extension [1] partially solved this issue. However I
wouldn't call it particularly convenient and the whole approach of
compressing JSON seems to me more like a dirty hack, not a solution. The
problem appeared because of using the wrong data format in the first
place.

3) Unlike JSON, formats like Protobuf or Thrift are binary formats and
most importantly don't store any field names. Thus they don't create a
problem described above. However, PostgreSQL is not capable to access
Protobuf fields out-of-the-box, for instance to index these fields. This
is what pg_protobuf is for.

The idea of using flexible schema and build index on top of them is awesome!

Will definitely submit a proposal and focus on this if get selected.

Thanks for answering my questions.

Hopefully this answers you question. If you have other questions please
don't hesitate to ask!

[1]: https://github.com/postgrespro/zson

--
Best regards,
Aleksander Alekseev

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: GSOC 2018 ideas

GSOC 2018 ideas

Re: GSOC 2018 ideas

Вложения

Re: GSOC 2018 ideas

Re: GSOC 2018 ideas

Re: GSOC 2018 ideas

Вложения

Re: GSOC 2018 ideas