Обсуждение: Parquet support
Does psycopg support parquet as an input format?
Thanks,
Christopher Bader
Staff Data Scientist
Zscaler
On Wed, 23 Nov 2022 at 19:49, Christopher Bader <cbader@zscaler.com> wrote: > > Does psycopg support parquet as an input format? No, not yet. I had some conversation in the past around parquet input/output: it is a major project which I would like to either develop or see developed, but at the moment I don't have the several months required to do the former, and nobody has volunteered for the latter. Cheers -- Daniele
Just curious folks, what are your thoughts about the scope of that potential support? What is the use case? Is it loading data from Parquet to Postgres (and back)? Why is the combination with Python modules like pyarrow not enough?
Regards,
--VR
On Wed, 23 Nov 2022 at 10:56, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Wed, 23 Nov 2022 at 19:49, Christopher Bader <cbader@zscaler.com> wrote:
>
> Does psycopg support parquet as an input format?
No, not yet.
I had some conversation in the past around parquet input/output: it is
a major project which I would like to either develop or see developed,
but at the moment I don't have the several months required to do the
former, and nobody has volunteered for the latter.
Cheers
-- Daniele
On Wed, 23 Nov 2022 at 20:56, Vladimir Ryabtsev <greatvovan@gmail.com> wrote: > > Just curious folks, what are your thoughts about the scope of that potential support? What is the use case? Is it loadingdata from Parquet to Postgres (and back)? Why is the combination with Python modules like pyarrow not enough? I am not an expert, but I understand that Python-Postgres roundtrip goes via generating and parsing CSV files, whereas there is some performance gain to be had by creating native arrow data. -- Daniele
Hi - desktop linux user/maker here in California -- The engineering stakes are high in the clouds these days. There are some important efforts underway to make "cloud-native" ways for python, python installation, python data and python communication tools. In my corners of the world (remote sensing, urban planning) that means DASK and xarray. As a desktop linux distribution, we/OSGeoLive ship both, and enthusiastically so.. the "cloud-native" data storage formats ZARR and parquet, not so much. My best understanding is xarray is a happy medium between "what only runs on cloud" and "the powerful Linux I can run myself on standard equipment today" .. I support a python ecosystem that individual people can run entirely locally, and can interoperate well with standard networking and data formats. Not every python environment is doing that.. change happens interested to see the common and useful Python discussion here, regarding Postgresql, PostGIS and cloudy interoperability. --Brian M Hamlin / MAPLABS / OSGeoLive PSC On 11/23/22 12:00, Daniele Varrazzo wrote: > On Wed, 23 Nov 2022 at 20:56, Vladimir Ryabtsev <greatvovan@gmail.com> wrote: >> Just curious folks, what are your thoughts about the scope of that potential support? What is the use case? Is it loadingdata from Parquet to Postgres (and back)? Why is the combination with Python modules like pyarrow not enough? > I am not an expert, but I understand that Python-Postgres roundtrip > goes via generating and parsing CSV files, whereas there is some > performance gain to be had by creating native arrow data. > > -- Daniele > >