Обсуждение: GSOC 2018 ideas

Поиск
Список
Период
Сортировка

GSOC 2018 ideas

От
Charles Cui
Дата:
Hi Aleksander,

   This is Yan from Columbia University. I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the ideas of thrift data types support that proposed by you. So, I want to prepare for a proposal based on this idea. Can I have more detailed information of what documents or code that I need to understand? Also, if this idea is allocated to other student (or in other worlds, you prefer some student to work on it), do let me know, so that I can pick some other project in PostgreSQL. Any comments or suggestions are welcomed!

Hope for your reply!


Thanks Charles!

Re: GSOC 2018 ideas

От
Aleksander Alekseev
Дата:
Hello Charles,

> I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the
> ideas of thrift data types support that proposed by you. So, I want to
> prepare for a proposal based on this idea.

Glad you are interested in this project!

> Can I have more detailed information of what documents or code that I
> need to understand?

I would recommend the following documents and code:

* Source code of pg_protobuf
  https://github.com/afiskon/pg_protobuf
* "Writing Postgres Extensions" tutorial series by Manuel Kniep
  http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/
* "So you want to make an extension?" talk by Keith Fiske
  http://slides.keithf4.com/extension_dev/#/
* Apache Thrift official website
  https://thrift.apache.org/
* Also a great explanation of the Thrift format can be found in the
  book "Designing Data-Intensive Applications" by Martin Kleppmann
  http://dataintensive.net/

> Also, if this idea is allocated to other student (or in other worlds,
> you prefer some student to work on it), do let me know, so that I can
> pick some other project in PostgreSQL. Any comments or suggestions are
> welcomed!

To my best knowledge currently there are no other students interested in
this particular work.

--
Best regards,
Aleksander Alekseev

Вложения

Re: GSOC 2018 ideas

От
Charles Cui
Дата:
Got it, Aleksander! Will study these documents carefully!

2018-02-26 4:21 GMT-08:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
Hello Charles,

> I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the
> ideas of thrift data types support that proposed by you. So, I want to
> prepare for a proposal based on this idea.

Glad you are interested in this project!

> Can I have more detailed information of what documents or code that I
> need to understand?

I would recommend the following documents and code:

* Source code of pg_protobuf
  https://github.com/afiskon/pg_protobuf
* "Writing Postgres Extensions" tutorial series by Manuel Kniep
  http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/
* "So you want to make an extension?" talk by Keith Fiske
  http://slides.keithf4.com/extension_dev/#/
* Apache Thrift official website
  https://thrift.apache.org/
* Also a great explanation of the Thrift format can be found in the
  book "Designing Data-Intensive Applications" by Martin Kleppmann
  http://dataintensive.net/

> Also, if this idea is allocated to other student (or in other worlds,
> you prefer some student to work on it), do let me know, so that I can
> pick some other project in PostgreSQL. Any comments or suggestions are
> welcomed!

To my best knowledge currently there are no other students interested in
this particular work.

--
Best regards,
Aleksander Alekseev

Re: GSOC 2018 ideas

От
Charles Cui
Дата:
Hi Aleksander,

   Went through the documents listed by you, and they are helpful!
It seems the main purpose of extension pg_protobuf is to parse
a protobuf struct and return the decoded field. May I ask how these kinds
of extensions are used in postgreSQL (or in other words, the scenarios to 
use these plugins)?


Thanks Charles!

2018-03-02 21:11 GMT-08:00 Charles Cui <charles.cui1984@gmail.com>:
Got it, Aleksander! Will study these documents carefully!

2018-02-26 4:21 GMT-08:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
Hello Charles,

> I saw PostgreSQL is selected in GSOC 2018 and pretty interested in the
> ideas of thrift data types support that proposed by you. So, I want to
> prepare for a proposal based on this idea.

Glad you are interested in this project!

> Can I have more detailed information of what documents or code that I
> need to understand?

I would recommend the following documents and code:

* Source code of pg_protobuf
  https://github.com/afiskon/pg_protobuf
* "Writing Postgres Extensions" tutorial series by Manuel Kniep
  http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/
* "So you want to make an extension?" talk by Keith Fiske
  http://slides.keithf4.com/extension_dev/#/
* Apache Thrift official website
  https://thrift.apache.org/
* Also a great explanation of the Thrift format can be found in the
  book "Designing Data-Intensive Applications" by Martin Kleppmann
  http://dataintensive.net/

> Also, if this idea is allocated to other student (or in other worlds,
> you prefer some student to work on it), do let me know, so that I can
> pick some other project in PostgreSQL. Any comments or suggestions are
> welcomed!

To my best knowledge currently there are no other students interested in
this particular work.

--
Best regards,
Aleksander Alekseev


Re: GSOC 2018 ideas

От
Aleksander Alekseev
Дата:
Hello Charles,

>    Went through the documents listed by you, and they are helpful!
> It seems the main purpose of extension pg_protobuf is to parse
> a protobuf struct and return the decoded field. May I ask how these kinds
> of extensions are used in postgreSQL (or in other words, the scenarios to
> use these plugins)?

There are a few ideas behind all of this.

1) Sometimes people are not quite happy with strict relational schema by
various reasons and prefer something more agile, like XML or JSON. These
formats are indeed more convenient under certain circumstances, for
instance in terms of ease of changing and migrating the schema.

2) One drawback of JSON is redundancy. For instance, you have to store
the names of all document fields. These names don't carry much
information but consume disk space and RAM thus affecting the overall
performance. ZSON extension [1] partially solved this issue. However I
wouldn't call it particularly convenient and the whole approach of
compressing JSON seems to me more like a dirty hack, not a solution. The
problem appeared because of using the wrong data format in the first
place.

3) Unlike JSON, formats like Protobuf or Thrift are binary formats and
most importantly don't store any field names. Thus they don't create a
problem described above. However, PostgreSQL is not capable to access
Protobuf fields out-of-the-box, for instance to index these fields. This
is what pg_protobuf is for.

Hopefully this answers you question. If you have other questions please
don't hesitate to ask!

[1]: https://github.com/postgrespro/zson


--
Best regards,
Aleksander Alekseev

Вложения

Re: GSOC 2018 ideas

От
Charles Cui
Дата:


2018-03-05 1:42 GMT-08:00 Aleksander Alekseev <a.alekseev@postgrespro.ru>:
Hello Charles,

>    Went through the documents listed by you, and they are helpful!
> It seems the main purpose of extension pg_protobuf is to parse
> a protobuf struct and return the decoded field. May I ask how these kinds
> of extensions are used in postgreSQL (or in other words, the scenarios to
> use these plugins)?

There are a few ideas behind all of this.

1) Sometimes people are not quite happy with strict relational schema by
various reasons and prefer something more agile, like XML or JSON. These
formats are indeed more convenient under certain circumstances, for
instance in terms of ease of changing and migrating the schema.

2) One drawback of JSON is redundancy. For instance, you have to store
the names of all document fields. These names don't carry much
information but consume disk space and RAM thus affecting the overall
performance. ZSON extension [1] partially solved this issue. However I
wouldn't call it particularly convenient and the whole approach of
compressing JSON seems to me more like a dirty hack, not a solution. The
problem appeared because of using the wrong data format in the first
place.

3) Unlike JSON, formats like Protobuf or Thrift are binary formats and
most importantly don't store any field names. Thus they don't create a
problem described above. However, PostgreSQL is not capable to access
Protobuf fields out-of-the-box, for instance to index these fields. This
is what pg_protobuf is for.

The idea of using flexible schema and build index on top of them is awesome!
Will definitely submit a proposal and focus on this if get selected. 
Thanks for answering my questions. 
 
Hopefully this answers you question. If you have other questions please
don't hesitate to ask!

[1]: https://github.com/postgrespro/zson


--
Best regards,
Aleksander Alekseev