Обсуждение: export to parquet

Поиск
Список
Период
Сортировка

export to parquet

От
Scott Ribe
Дата:
I have no Hadoop, no HDFS. Just looking for the easiest way to export some PG tables into Parquet format for
testing--needto determine what kind of space reduction we can get before deciding whether to look into it more. 

Any suggestions on particular tools? (PG 12, Linux)


--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/






Re: export to parquet

От
Chris Travers
Дата:


On Wed, Aug 26, 2020 at 9:00 PM Scott Ribe <scott_ribe@elevated-dev.com> wrote:
I have no Hadoop, no HDFS. Just looking for the easiest way to export some PG tables into Parquet format for testing--need to determine what kind of space reduction we can get before deciding whether to look into it more.

Any suggestions on particular tools? (PG 12, Linux)

For simple exporting, the simplest thing is a single-node instance of Spark.

You can read parquet files in Postgres using https://github.com/adjust/parquet_fdw if you so desire but it does not support writing as parquet files are basically immutable.
 

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/







--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: export to parquet

От
Scott Ribe
Дата:
> On Aug 26, 2020, at 1:11 PM, Chris Travers <chris.travers@gmail.com> wrote:
>
> For simple exporting, the simplest thing is a single-node instance of Spark.

Thanks.

> You can read parquet files in Postgres using https://github.com/adjust/parquet_fdw if you so desire but it does not
supportwriting as parquet files are basically immutable. 

Yep, that's the next step. Well, really it is what I am interested in testing, but first I need my data in parquet
format(and confirmation that it gets decently compressed). 


Re: export to parquet

От
George Woodring
Дата:
I don't know how many hoops you want to jump through, we use AWS and Athena to create them.
  • Export table as JSON
  • Put on AWS S3
  • Create JSON table in Athena
  • Use the JSON table to create a parquet table
The parquet files will be in S3 as well after the parquet table is created.  If you are interested I can share the AWS CLI commands we use.

George Woodring
iGLASS Networks
www.iglass.net


On Wed, Aug 26, 2020 at 3:00 PM Scott Ribe <scott_ribe@elevated-dev.com> wrote:
I have no Hadoop, no HDFS. Just looking for the easiest way to export some PG tables into Parquet format for testing--need to determine what kind of space reduction we can get before deciding whether to look into it more.

Any suggestions on particular tools? (PG 12, Linux)


--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/