Re: optimize lookups in snapshot [sub]xip arrays
От | Nathan Bossart |
---|---|
Тема | Re: optimize lookups in snapshot [sub]xip arrays |
Дата | |
Msg-id | 20220717035957.GA3631071@nathanxps13 обсуждение исходный текст |
Ответ на | Re: optimize lookups in snapshot [sub]xip arrays (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: optimize lookups in snapshot [sub]xip arrays
|
Список | pgsql-hackers |
Hi Andres, Thanks for taking a look. On Fri, Jul 15, 2022 at 01:08:57PM -0700, Andres Freund wrote: > Hm. Are there any contexts where we'd not want the potential for failing due > to OOM? I'm not sure about this one. > I wonder if we additionally / alternatively could use a faster method of > searching the array linearly, e.g. using SIMD. I looked into using SIMD. The patch is attached, but it is only intended for benchmarking purposes and isn't anywhere close to being worth serious review. There may be a simpler/better way to implement the linear search, but this seemed to work well. Overall, SIMD provided a decent improvement. I had to increase the number of writers quite a bit in order to demonstrate where the hash tables began winning. Here are the numbers: writers head simd hash 256 663 632 694 512 530 618 637 768 489 544 573 1024 364 508 562 2048 185 306 485 4096 146 197 441 While it is unsurprising that the hash tables perform the best, there are a couple of advantages to SIMD that might make that approach worth considering. For one, there's really no overhead (i.e., you don't need to sort the array or build a hash table), so we can avoid picking an arbitrary threshold and just have one code path. Also, a SIMD implementation for a linear search through an array of integers could likely be easily reused elsewhere. > Another thing that might be worth looking into is to sort the xip/subxip > arrays into a binary-search optimized layout. That'd make the binary search > faster, wouldn't require additional memory (a boolean indicating whether > sorted somewhere, I guess), and would easily persist across copies of the > snapshot. I spent some time looking into this, but I haven't attempted to implement it. IIUC the most difficult part of this is sorting the array in place to the special layout. >> These hash tables are regarded as ephemeral; they only live in >> process-local memory and are never rewritten, copied, or >> serialized. > > What does rewriting refer to here? I mean that a hash table created for one snapshot will not be cleared and reused for another. > Not convinced that's the right idea in case of copying. I think we often end > up copying snapshots frequently, and building & allocating the hashed xids > separately every time seems not great. Right. My concern with reusing the hash tables is that we'd need to allocate much more space that would go largely unused in many cases. >> + snapshot->xiph = NULL; >> + snapshot->subxiph = NULL; > > Do we need separate hashes for these? ISTM that if we're overflowed then we > don't need ->subxip[h], and if not, then the action for an xid being in ->xip > and ->subxiph is the same? Do you mean that we can combine these into one hash table? That might work. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: