Re: 回复: An implementation of multi-key sort

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: 回复: An implementation of multi-key sort
Дата
Msg-id aefdfd09-df1b-4613-9157-fdd7ca38c734@enterprisedb.com
обсуждение исходный текст
Ответ на Re: 回复: An implementation of multi-key sort  (Yao Wang <yao-yw.wang@broadcom.com>)
Ответы Re: 回复: An implementation of multi-key sort
Список pgsql-hackers
Hello,

Thanks for posting a new version of the patch, and for reporting a bunch
of issues in the bash scripts I used for testing. I decided to repeat
those fixed tests on both the old and new version of the patches, and I
finally have the results from three machines (the i5/xeon I usually use,
and also rpi5 for fun).

The complete scripts, raw results (CSV), and various reports (ODS and
PDF) are available in my github:

  https://github.com/tvondra/mksort-tests

I'm not going to attach all of it to this message, because the raw CSV
results alone are ~3MB for each of the three machines.

You can do your own analysis on the raw CSV results, of course - see the
'csv' directory, there are data for the clean branch and the two patch
versions.

But I've also prepared PDF reports comparing how the patches work on
each of the machines - see the 'pdf' directory. There are two types of
reports, depending on what's compared to what.

The general report structure is the same - columns with results for
different combinations of parameters, followed by comparison of the
results and a heatmap (red - bad/regression, green - good/speedup).

The "patch comparison" reports compare v5/v4, so it's essentially

    (timing with v5) / (timing with v4)

with the mksort enabled or disabled. And the charts are pretty green,
which means v5 is much faster than v4 - so seems like a step in the
right direction.

The "patch impact" reports compare v4/master and v5/master, i.e. this is
what the users would see after an upgrade. Attached is an small example
from the i5 machine, but the other machines behave in almost exactly the
same way (including the tiny rpi5).

For v4, the results were not great - almost everything regressed (red
color), except for the "text" data type (green).

You can immediately see v5 does much better - it still regresses, but
the regressions are way smaller. And the speedup for "text" it actually
a bit more significant (there's more/darker green).

So as I said before, I think v5 is definitely moving in the right
direction, but the regressions still seem far too significant. If you're
sorting a lot of text data, then sure - this will help a lot. But if
you're sorting int data, and it happens to be random/correlated, you're
going to pay 10-20% more. That's not great.

I haven't analyzed the code very closely, and I don't have a great idea
on how to fix this. But I think to make this patch committable, this
needs to be solved.

Considering the benefits seems to be pretty specific to "text" (and
perhaps some other data types), maybe the best solution would be to only
enable this for those cases. Yes, there are some cases where this helps
for the other data types too, but that also comes with the regressions.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: 回复:Re: 回复:Re: speed up pg_upgrade with large number of tables
Следующее
От: Tom Lane
Дата:
Сообщение: Re: array_in sub function ReadArrayDimensions error message