Re: [HACKERS] Custom compression methods

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [HACKERS] Custom compression methods
Дата
Msg-id e02b7f01-ad9a-8b4c-609e-092a37d75926@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Custom compression methods  (Chris Travers <chris.travers@adjust.com>)
Список pgsql-hackers

On 3/19/19 4:44 PM, Chris Travers wrote:
> 
> 
> On Tue, Mar 19, 2019 at 12:19 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
>     On 3/19/19 10:59 AM, Chris Travers wrote:
>     >
>     >
>     > Not discussing whether any particular committer should pick this
>     up but
>     > I want to discuss an important use case we have at Adjust for this
>     sort
>     > of patch.
>     >
>     > The PostgreSQL compression strategy is something we find
>     inadequate for
>     > at least one of our large deployments (a large debug log spanning
>     > 10PB+).  Our current solution is to set storage so that it does not
>     > compress and then run on ZFS to get compression speedups on
>     spinning disks.
>     >
>     > But running PostgreSQL on ZFS has some annoying costs because we have
>     > copy-on-write on copy-on-write, and when you add file
>     fragmentation... I
>     > would really like to be able to get away from having to do ZFS as an
>     > underlying filesystem.  While we have good write throughput, read
>     > throughput is not as good as I would like.
>     >
>     > An approach that would give us better row-level compression  would
>     allow
>     > us to ditch the COW filesystem under PostgreSQL approach.
>     >
>     > So I think the benefits are actually quite high particularly for those
>     > dealing with volume/variety problems where things like JSONB might
>     be a
>     > go-to solution.  Similarly I could totally see having systems which
>     > handle large amounts of specialized text having extensions for dealing
>     > with these.
>     >
> 
>     Sure, I don't disagree - the proposed compression approach may be a big
>     win for some deployments further down the road, no doubt about it. But
>     as I said, it's unclear when we get there (or if the interesting stuff
>     will be in some sort of extension, which I don't oppose in principle).
> 
> 
> I would assume that if extensions are particularly stable and useful
> they could be moved into core.
> 
> But I would also assume that at first, this area would be sufficiently
> experimental that folks (like us) would write our own extensions for it.
>  
> 
> 
>     >
>     >     But hey, I think there are committers working for postgrespro,
>     who might
>     >     have the motivation to get this over the line. Of course,
>     assuming that
>     >     there are no serious objections to having this functionality
>     or how it's
>     >     implemented ... But I don't think that was the case.
>     >
>     >
>     > While I am not currently able to speak for questions of how it is
>     > implemented, I can say with very little doubt that we would almost
>     > certainly use this functionality if it were there and I could see
>     plenty
>     > of other cases where this would be a very appropriate direction
>     for some
>     > other projects as well.
>     >
>     Well, I guess the best thing you can do to move this patch forward is to
>     actually try that on your real-world use case, and report your results
>     and possibly do a review of the patch.
> 
> 
> Yeah, I expect to do this within the next month or two.
>  
> 
> 
>     IIRC there was an extension [1] leveraging this custom compression
>     interface for better jsonb compression, so perhaps that would work for
>     you (not sure if it's up to date with the current patch, though).
> 
>     [1]
>     https://www.postgresql.org/message-id/20171130182009.1b492eb2%40wp.localdomain
> 
> Yeah I will be looking at a couple different approaches here and
> reporting back. I don't expect it will be a full production workload but
> I do expect to be able to report on benchmarks in both storage and
> performance.
>  

FWIW I was a bit curious how would that jsonb compression affect the
data set I'm using for testing jsonpath patches, so I spent a bit of
time getting it to work with master. It attached patch gets it to
compile, but unfortunately then it fails like this:

    ERROR:  jsonbd: worker has detached

It seems there's some bug in how sh_mq is used, but I don't have time
investigate that further.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Libpq support to connect to standby server as priority
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Transaction commits VS Transaction commits (with parallel) VSquery mean time