Re: [HACKERS] Custom compression methods

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [HACKERS] Custom compression methods
Дата
Msg-id 87f7b3e6-8d48-654e-6ccc-571c99dce805@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Custom compression methods  (Chris Travers <chris.travers@adjust.com>)
Ответы Re: [HACKERS] Custom compression methods
Список pgsql-hackers
On 3/19/19 10:59 AM, Chris Travers wrote:
> 
> 
> On Mon, Mar 18, 2019 at 11:09 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
> 
>     On 3/15/19 12:52 PM, Ildus Kurbangaliev wrote:
>     > On Fri, 15 Mar 2019 14:07:14 +0400
>     > David Steele <david@pgmasters.net <mailto:david@pgmasters.net>> wrote:
>     >
>     >> On 3/7/19 11:50 AM, Alexander Korotkov wrote:
>     >>> On Thu, Mar 7, 2019 at 10:43 AM David Steele
>     <david@pgmasters.net <mailto:david@pgmasters.net>
>     >>> <mailto:david@pgmasters.net <mailto:david@pgmasters.net>>> wrote:
>     >>>
>     >>>     On 2/28/19 5:44 PM, Ildus Kurbangaliev wrote:
>     >>>
>     >>>      > there are another set of patches.
>     >>>      > Only rebased to current master.
>     >>>      >
>     >>>      > Also I will change status on commitfest to 'Needs review'.
>     >>>
>     >>>     This patch has seen periodic rebases but no code review that I
>     >>> can see since last January 2018.
>     >>>
>     >>>     As Andres noted in [1], I think that we need to decide if this
>     >>> is a feature that we want rather than just continuing to push it
>     >>> from CF to CF.
>     >>>
>     >>>
>     >>> Yes.  I took a look at code of this patch.  I think it's in pretty
>     >>> good shape.  But high level review/discussion is required.
>     >>
>     >> OK, but I think this patch can only be pushed one more time,
>     maximum,
>     >> before it should be rejected.
>     >>
>     >> Regards,
>     >
>     > Hi,
>     > in my opinion this patch is usually skipped not because it is not
>     > needed, but because of its size. It is not hard to maintain it until
>     > commiters will have time for it or I will get actual response that
>     > nobody is going to commit it.
>     >
> 
>     That may be one of the reasons, yes. But there are other reasons, which
>     I think may be playing a bigger role.
> 
>     There's one practical issue with how the patch is structured - the docs
>     and tests are in separate patches towards the end of the patch series,
>     which makes it impossible to commit the preceding parts. This needs to
>     change. Otherwise the patch size kills the patch as a whole.
> 
>     But there's a more important cost/benefit issue, I think. When I look at
>     patches as a committer, I naturally have to weight how much time I spend
>     on getting it in (and then dealing with fallout from bugs etc) vs. what
>     I get in return (measured in benefits for community, users). This patch
>     is pretty large and complex, so the "costs" are quite high, while the
>     benefits from the patch itself is the ability to pick between pg_lz and
>     zlib. Which is not great, and so people tend to pick other patches.
> 
>     Now, I understand there's a lot of potential benefits further down the
>     line, like column-level compression (which I think is the main goal
>     here). But that's not included in the patch, so the gains are somewhat
>     far in the future.
> 
> 
> Not discussing whether any particular committer should pick this up but
> I want to discuss an important use case we have at Adjust for this sort
> of patch.
> 
> The PostgreSQL compression strategy is something we find inadequate for
> at least one of our large deployments (a large debug log spanning
> 10PB+).  Our current solution is to set storage so that it does not
> compress and then run on ZFS to get compression speedups on spinning disks.
> 
> But running PostgreSQL on ZFS has some annoying costs because we have
> copy-on-write on copy-on-write, and when you add file fragmentation... I
> would really like to be able to get away from having to do ZFS as an
> underlying filesystem.  While we have good write throughput, read
> throughput is not as good as I would like.
> 
> An approach that would give us better row-level compression  would allow
> us to ditch the COW filesystem under PostgreSQL approach.
> 
> So I think the benefits are actually quite high particularly for those
> dealing with volume/variety problems where things like JSONB might be a
> go-to solution.  Similarly I could totally see having systems which
> handle large amounts of specialized text having extensions for dealing
> with these.
> 

Sure, I don't disagree - the proposed compression approach may be a big
win for some deployments further down the road, no doubt about it. But
as I said, it's unclear when we get there (or if the interesting stuff
will be in some sort of extension, which I don't oppose in principle).

> 
>     But hey, I think there are committers working for postgrespro, who might
>     have the motivation to get this over the line. Of course, assuming that
>     there are no serious objections to having this functionality or how it's
>     implemented ... But I don't think that was the case.
> 
> 
> While I am not currently able to speak for questions of how it is
> implemented, I can say with very little doubt that we would almost
> certainly use this functionality if it were there and I could see plenty
> of other cases where this would be a very appropriate direction for some
> other projects as well.
>
Well, I guess the best thing you can do to move this patch forward is to
actually try that on your real-world use case, and report your results
and possibly do a review of the patch.

IIRC there was an extension [1] leveraging this custom compression
interface for better jsonb compression, so perhaps that would work for
you (not sure if it's up to date with the current patch, though).

[1]
https://www.postgresql.org/message-id/20171130182009.1b492eb2%40wp.localdomain


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Imai, Yoshikazu"
Дата:
Сообщение: RE: speeding up planning with partitions
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] Block level parallel vacuum