Re: On columnar storage (2)
От | Jim Nasby |
---|---|
Тема | Re: On columnar storage (2) |
Дата | |
Msg-id | 56833C57.1090400@BlueTreble.com обсуждение исходный текст |
Ответ на | Re: On columnar storage (2) (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Список | pgsql-hackers |
On 12/28/15 1:15 PM, Alvaro Herrera wrote: > Currently within the executor > a tuple is a TupleTableSlot which contains one Datum array, which has > all the values coming out of the HeapTuple; but for split storage > tuples, we will need to have a TupleTableSlot that has multiple "Datum > arrays" (in a way --- because, actually, once we get to vectorise as in > the preceding paragraph, we no longer have a Datum array, but some more > complex representation). > > I think that trying to make the FDW API address all these concerns, > while at the same time*also* serving the needs of external data > sources, insanity will ensue. Are you familiar with DataFrames in Pandas[1]? They're a collection of Series[2], which are essentially vectors. (Technically, they're more complex than that because you can assign arbitrary indexes). So instead of the normal collection of rows, a DataFrame is a collection of columns. Series are also sparse (like our tuples), but the sparse value can be anything, not just NULL (or NaN in panda-speak). There's also DataFrames in R; not sure how equivalent they are. I mention this because there's a lot being done with dataframes and they might be a good basis for a columnstore API, killing 2 birds with one stone. BTW, the underlying python type for Series is ndarrays[3], which are specifically designed to interface to things like C arrays. So a column store could potentially be accessed directly. Aside from potential API inspiration, it might be useful to prototype a columnstore using Series (or maybe ndarrays). [1] http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html [2] http://pandas.pydata.org/pandas-docs/stable/api.html#series [3] http://docs.scipy.org/doc/numpy-1.10.0/reference/internals.html -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
В списке pgsql-hackers по дате отправления: