Re: Improve compression speeds in pg_lzcompress.c

Поиск
Список
Период
Сортировка
От Benedikt Grundmann
Тема Re: Improve compression speeds in pg_lzcompress.c
Дата
Msg-id CADbMkNPrKe2P7Oku=2sNGyLrd8+wQad_YBpvJtmJBtV17Tmf4A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improve compression speeds in pg_lzcompress.c  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers

Personally, my biggest gripe about the way we do compression is that
it's easy to detoast the same object lots of times.  More generally,
our in-memory representation of user data values is pretty much a
mirror of our on-disk representation, even when that leads to excess
conversions.  Beyond what we do for TOAST, there's stuff like numeric
where not only toast but then post-process the results into yet
another internal form before performing any calculations - and then of
course we have to convert back before returning from the calculation
functions.  And for things like XML, JSON, and hstore we have to
repeatedly parse the string, every time someone wants to do anything
to do.  Of course, solving this is a very hard problem, and not
solving it isn't a reason not to have more compression options - but
more compression options will not solve the problems that I personally
have in this area, by and large.

At the risk of saying something totally obvious and stupid as I haven't looked at the actual representation this sounds like a memoisation problem.  In ocaml terms:

type 'a rep =
  | On_disk_rep     of Byte_sequence
  | In_memory_rep of 'a

type 'a t = 'a rep ref

let get_mem_rep t converter =
  match !t with
  | On_disk_rep seq ->
    let res = converter seq in
    t := In_memory_rep res;
    res
  | In_memory_rep x -> x
;;

... (if you need the other direction that it's straightforward too)...

Translating this into c is relatively straightforward if you have the luxury of a fresh start
and don't have to be super efficient:

typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;
 
type t = {
  rep_kind_t rep_kind;
  union {
    char *on_disk;
    void *in_memory;
  } rep;
};

void *get_mem_rep(t *t, void * (*converter)(char *)) {
  void *res;
  switch (t->rep_kind) {
     case ON_DISK_REP:
        res = converter(t->on_disk);
        t->rep.in_memory = res;
        t->rep_kind = IN_MEMORY_REP;
        return res;
     case IN_MEMORY_REP;
        return t->rep.in_memory;
  }
}

Now of course fitting this into the existing types and ensuring that there is neither too early freeing of memory nor memory leaks or other bugs is probably a nightmare and why you said that this is a hard problem.

Cheers,

Bene

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Shigeru Hanada
Дата:
Сообщение: Re: PATCH: optimized DROP of multiple tables within a transaction
Следующее
От: Amit kapila
Дата:
Сообщение: Re: Performance Improvement by reducing WAL for Update Operation