Re: Use Postgres as a column store by creating one table per column
От | Tomas Vondra |
---|---|
Тема | Re: Use Postgres as a column store by creating one table per column |
Дата | |
Msg-id | 20190524180629.42d3u4m7a3jgh5hq@development обсуждение исходный текст |
Ответ на | Re: Use Postgres as a column store by creating one table per column (George Neuner <gneuner2@comcast.net>) |
Список | pgsql-performance |
On Thu, May 23, 2019 at 01:08:42AM -0400, George Neuner wrote: >On Tue, 21 May 2019 21:28:07 -0700, Lev Kokotov ><lev.kokotov@gmail.com> wrote: > >>Is it efficient to use Postgres as a column store by creating one >table per >>column? >> >>I would query it with something like `[...] UNION SELECT value AS <table> >>FROM <table> WHERE value = <value> UNION [...]` to build a row. > >I think you mean JOIN. > >You'd need more than that: Postgresql uses MVCC for concurrency, so >whenever you update any row in a table, the ordering of the rows >within the table changes. And the JOIN operation inherently is >unordered - you need to sort the result deliberately to control >ordering. > >To emulate a column-store, at the very least you need a way to >associate values from different "columns" that belong to the same >"row" of the virtual table. IOW, every value in every "column" needs >an explicit "row" identifier. E.g., > > col1 = { rowid, value1 }, col2 = { rowid, value2 }, ... > >For performance you would need to have indexes on at least the rowid >in each of the "column" tables. > >This is a bare minimum and can only work if the columns of your >virtual table and the queries against it are application controlled or >statically known. If you want to do something more flexible that will >support ad hoc table modifications, elastically sized values (strings, >bytes, arrays, JSON, XML), etc. this example is not suffice and the >implementation can get very complicated very quickly > > >Justin Pryzby was not joking when he said the performance could be >awful ... at least as compared to a more normal row-oriented >structure. Performance of a query that involves more than a handful >of "columns", in general, will be horrible. It is up to you to decide >whether some (maybe little) increase in performance in processing >*single* columns will offset likely MASSIVE loss of performance in >processing multiple columns. > Maybe take a look at this paper: http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf which essentially compares this approach to a "real" column store. It certainly won't give you performance comparable to column store, it adds quite a bit of overhead (disk space because of row headers, CPU because of extra joins, etc.). And it can't give you the column-store benefits - compression and/or more efficient execution. > >>I'm thinking since Postgres stores tables in continuous blocks of >16MB each >>(I think that's the default page size?) > >Default page size is 8 KB. You'd have to recompile to change that, >and it might break something - a whole lot of code depends on the >knowing the size of storage pages. > > Right. And the largest page size is 64kB. But 8kB is a pretty good trade-off, in most cases. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-performance по дате отправления: