Re: jsonb format is pessimal for toast compression

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: jsonb format is pessimal for toast compression
Дата
Msg-id 54187DA1.7060106@vmware.com
обсуждение исходный текст
Ответ на Re: jsonb format is pessimal for toast compression  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: jsonb format is pessimal for toast compression  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-hackers
On 09/16/2014 07:47 PM, Josh Berkus wrote:
> On 09/16/2014 06:31 AM, Robert Haas wrote:
>> On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>>> Actually, having the keys all at the same level *is* relevant for the
>>>> issue we're discussing.  If those 270 keys are organized in a tree, it's
>>>> not the same as having them all on one level (and not as problematic).
>>>
>>> I believe Robert meant that the 270 keys are not at the top level, but
>>> are at some level (in other words, some object has 270 pairs). That is
>>> equivalent to having them at the top level for the purposes of this
>>> discussion.
>>
>> Yes, that's exactly what I meant.
>>
>>> FWIW, I am slightly concerned about weighing use cases around very
>>> large JSON documents too heavily. Having enormous jsonb documents just
>>> isn't going to work out that well, but neither will equivalent designs
>>> in popular document database systems for similar reasons. For example,
>>> the maximum BSON document size supported by MongoDB is 16 megabytes,
>>> and that seems to be something that their users don't care too much
>>> about. Having 270 pairs in an object isn't unreasonable, but it isn't
>>> going to be all that common either.
>
> Well, I can only judge from the use cases I personally have, none of
> which involve more than 100 keys at any level for most rows.  So far
> I've seen some people argue hypotetical use cases involving hundreds of
> keys per level, but nobody who *actually* has such a use case.  Also,
> note that we currently don't know where the "last value" extraction
> becomes a performance problem at this stage, except that it's somewhere
> between 200 and 100,000.  Also, we don't have a test which shows the
> hybrid approach (Heikki's patch) performing better with 1000's of keys.
>
> Basically, if someone is going to make a serious case for Heikki's
> hybrid approach over the simpler lengths-only approach, then please post
> some test data showing the benefit ASAP, since I can't demonstrate it.
> Otherwise, let's get beta 3 out the door so we can get the 9.4 release
> train moving again.

Are you looking for someone with a real life scenario, or just synthetic
test case? The latter is easy to do.

See attached test program. It's basically the same I posted earlier.
Here are the results from my laptop with Tom's jsonb-lengths-merged.patch:

postgres=# select * from testtimes ;
  elem | duration_ms
------+-------------
    11 |    0.289508
    12 |    0.288122
    13 |    0.290558
    14 |    0.287889
    15 |    0.286303
    17 |    0.290415
    19 |    0.289829
    21 |    0.289783
    23 |    0.287104
    25 |    0.289834
    28 |    0.290735
    31 |    0.291844
    34 |    0.293454
    37 |    0.293866
    41 |    0.291217
    45 |    0.289243
    50 |    0.290385
    55 |    0.292085
    61 |    0.290892
    67 |    0.292335
    74 |    0.292561
    81 |    0.291416
    89 |    0.295714
    98 |     0.29844
   108 |    0.297421
   119 |    0.299471
   131 |    0.299877
   144 |    0.301604
   158 |    0.303365
   174 |    0.304203
   191 |    0.303596
   210 |    0.306526
   231 |    0.304189
   254 |    0.307782
   279 |    0.307372
   307 |    0.306873
   338 |    0.310471
   372 |      0.3151
   409 |    0.320354
   450 |     0.32038
   495 |    0.322127
   545 |    0.323256
   600 |    0.330419
   660 |    0.334226
   726 |    0.336951
   799 |     0.34108
   879 |    0.347746
   967 |    0.354275
  1064 |    0.356696
  1170 |    0.366906
  1287 |    0.375352
  1416 |    0.392952
  1558 |    0.392907
  1714 |    0.402157
  1885 |    0.412384
  2074 |    0.425958
  2281 |    0.435415
  2509 |     0.45301
  2760 |    0.469983
  3036 |    0.487329
  3340 |    0.505505
  3674 |    0.530412
  4041 |    0.552585
  4445 |    0.581815
  4890 |    0.610509
  5379 |    0.642885
  5917 |    0.680395
  6509 |    0.713849
  7160 |    0.757561
  7876 |    0.805225
  8664 |    0.856142
  9530 |    0.913255
(72 rows)

That's up to 9530 elements - it's pretty easy to extrapolate from there
to higher counts, it's O(n).

With unpatched git master, the runtime is flat, regardless of which
element is queried, at about 0.29 s. With
jsonb-with-offsets-and-lengths-2.patch, there's no difference that I
could measure.

The difference starts to be meaningful at around 500 entries. In
practice, I doubt anyone's going to notice until you start talking about
tens of thousands of entries.

I'll leave it up to the jury to decide if we care or not. It seems like
a fairly unusual use case, where you push around large enough arrays or
objects to notice. Then again, I'm sure *someone* will do it. People do
strange things, and they find ways to abuse the features that the
original developers didn't think of.

- Heikki


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Brightwell, Adam"
Дата:
Сообщение: Re: replicating DROP commands across servers
Следующее
От: Tapan Halani
Дата:
Сообщение: Need guidance to startup