Обсуждение: code contributions for 2024, WIP version

Поиск
Список
Период
Сортировка

code contributions for 2024, WIP version

От
Robert Haas
Дата:
Hi,

As many of you are probably aware, I have been doing an annual blog
post on who contributes to PostgreSQL development for some years now.
It includes information on lines of code committed to PostgreSQL, and
also emails sent to the list. This year, I got a jump on analyzing the
commit log, and a draft of the data covering January-November of 2024
has been uploaded in pg_dump format to here:

https://sites.google.com/site/robertmhaas/contributions

I'm sending this message to invite anyone who is interested to review
the data in the commits2024 table and send me corrections. For
example, it's possible that there are cases where I've failed to pick
out the correct primary author for a commit; or where somebody's name
is spelled in two different ways; or where somebody's name is not
spelled the way that they prefer.

You'll notice that the table has columns "lines" and "xlines". I have
set xlines=0 in cases where (a) I considered the commit to be a large,
mechanical commit such as a pgindent run or translation updates; or
(b) the commit was reverting some other commit that occurred earlier
in 2024; or (c) the commit was subsequently reverted. When I run the
final statistics, those commits will still count for the statistics
that count the number of commits, but the lines they inserted will not
be counted as lines of code contributed in 2024. Also for clarity,
please be aware that the "ncauthor" column is not used in the final
reporting; that is just there so that I can set
author=coalesce(ncauthor,committer) at a certain phase of the data
preparation. Corrections should be made to the author column, not
ncauthor.

If you would like to correct the data, please send me your corrections
off-list, as a reply to this email, ideally in the form of one or more
UPDATE statements. If you would like to complain about the
methodology, I can't stop you, but please bear in mind that (1) this
is already a lot of work and (2) I've always been upfront in my blog
post about what the limitations of the methodology are and I do my
best not to suggest that this method is somehow perfect or
unimpeachable and (3) you're welcome to publish your own blog post
where you compute things differently. I'm open to reasonable
suggestions for improvement, but if your overall view is that this
sucks or that I suck for doing it, I'm sorry that you feel that way
but giving me that feedback probably will not induce me to do anything
differently.

Donning my asbestos underwear, I remain yours faithfully,

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: code contributions for 2024, WIP version

От
Nathan Bossart
Дата:
On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
> On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
>> Donning my asbestos underwear, I remain yours faithfully,
> 
> Thanks for taking the time to compile all that.  That's really nice.

+1, I always look forward to the blog post.

-- 
nathan



Re: code contributions for 2024, WIP version

От
Robert Haas
Дата:
On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
> > On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
> >> Donning my asbestos underwear, I remain yours faithfully,
> >
> > Thanks for taking the time to compile all that.  That's really nice.
>
> +1, I always look forward to the blog post.

Thanks, glad it's appreciated.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: code contributions for 2024, WIP version

От
Joe Conway
Дата:
On 12/3/24 10:44, Robert Haas wrote:
> On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
>> > On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
>> >> Donning my asbestos underwear, I remain yours faithfully,
>> >
>> > Thanks for taking the time to compile all that.  That's really nice.
>>
>> +1, I always look forward to the blog post.
> 
> Thanks, glad it's appreciated.

It is definitely appreciated.

While I know you said "you will do you" when it comes to your annual 
blog, there are a number of similar efforts -- top of mind is the 
analysis done (as I understand it) by Daniel Gustafsson and Claire 
Giordano [1], as well as ongoing/recurring analysis done by the 
contributor committee. And there is the adjacent related discussion 
around commit messages/authors. It makes me wonder if there isn't a way 
to make all of our lives easier going forward.

[1] 

https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024
-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: code contributions for 2024, WIP version

От
Robert Haas
Дата:
On Tue, Dec 3, 2024 at 11:19 AM Joe Conway <mail@joeconway.com> wrote:
> While I know you said "you will do you" when it comes to your annual
> blog, there are a number of similar efforts -- top of mind is the
> analysis done (as I understand it) by Daniel Gustafsson and Claire
> Giordano [1], as well as ongoing/recurring analysis done by the
> contributor committee. And there is the adjacent related discussion
> around commit messages/authors. It makes me wonder if there isn't a way
> to make all of our lives easier going forward.

Yes, I'm game to try to figure out how to combine our efforts. I don't
think it's a bad thing that different people have different takes;
this is complicated and looking at it through just one lens is
limiting. But people duplicating work is, well, not so good.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: code contributions for 2024, WIP version

От
Daniel Gustafsson
Дата:
> On 3 Dec 2024, at 17:41, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Dec 3, 2024 at 11:19 AM Joe Conway <mail@joeconway.com> wrote:
>> While I know you said "you will do you" when it comes to your annual
>> blog, there are a number of similar efforts -- top of mind is the
>> analysis done (as I understand it) by Daniel Gustafsson and Claire
>> Giordano [1], as well as ongoing/recurring analysis done by the
>> contributor committee. And there is the adjacent related discussion
>> around commit messages/authors. It makes me wonder if there isn't a way
>> to make all of our lives easier going forward.
>
> Yes, I'm game to try to figure out how to combine our efforts. I don't
> think it's a bad thing that different people have different takes;
> this is complicated and looking at it through just one lens is
> limiting. But people duplicating work is, well, not so good.

If we settled on a meta-data standard for how to identify authors, reviewers,
backpatches etc I think that would go a very long way to lower the complexity
of getting to the data and keep folks focused on doing interesting analysis.

--
Daniel Gustafsson




Re: code contributions for 2024, WIP version

От
Alvaro Herrera
Дата:
Hello Robert,

On 2024-Dec-02, Robert Haas wrote:

> As many of you are probably aware, I have been doing an annual blog
> post on who contributes to PostgreSQL development for some years now.
> It includes information on lines of code committed to PostgreSQL, and
> also emails sent to the list. This year, I got a jump on analyzing the
> commit log, and a draft of the data covering January-November of 2024
> has been uploaded in pg_dump format to here:
> 
> https://sites.google.com/site/robertmhaas/contributions
> 
> I'm sending this message to invite anyone who is interested to review
> the data in the commits2024 table and send me corrections.

No corrections here -- I noticed nothing wrong with the commits I am
involved with, in a quick read.  I did notice that for patches with
multiple authors, only the first one is listed.  For instance,
53c2a97a926's author ("Improve performance of subsystems on top of
SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out.  I realize
that addressing this would complicate the schema and queries, but maybe
it's worth thinking about for next time.  We have plenty of patches with
multiple authors, after all.

Hmm, maybe
UPDATE commits2024 SET xlines = 0 WHERE commitid in
  ('43ce181059d', '4632e5cf4bc', '6377e12a5a5', 'ff9f72c68f6',
  '21ef4d4d897', '592a2283721');

How did you come up with the 'lines' number for each commit anyway?
Judging by 592a2283721 it's not just the number of lines added, since
that commit added 3 lines and you have lines=2.


An unrelated (and possibly useless) thing is that some committers seem
firmly in the camp of ending commit titles with a period, others are
firmly in the other camp; only two people seem not to have made up their
minds about that:

     committer      │ with end period │ without end period │ fraction with end period 
────────────────────┼─────────────────┼────────────────────┼──────────────────────────
 Etsuro Fujita      │               6 │                  0 │                   100.00
 Peter Geoghegan    │              39 │                  0 │                   100.00
 Tatsuo Ishii       │               8 │                  0 │                   100.00
 Amit Kapila        │              87 │                  0 │                   100.00
 Fujii Masao        │              35 │                  0 │                   100.00
 Tom Lane           │             296 │                  1 │                    99.66
 Nathan Bossart     │             131 │                  1 │                    99.24
 Jeff Davis         │              88 │                  1 │                    98.88
 Noah Misch         │              61 │                  1 │                    98.39
 Thomas Munro       │              59 │                  1 │                    98.33
 Masahiko Sawada    │              39 │                  1 │                    97.50
 Dean Rasheed       │              23 │                  1 │                    95.83
 Robert Haas        │              77 │                 10 │                    88.51
 Joe Conway         │               1 │                  2 │                    33.33
 Alexander Korotkov │               4 │                153 │                     2.55
 Andrew Dunstan     │               1 │                 40 │                     2.44
 Bruce Momjian      │               2 │                 82 │                     2.38
 Heikki Linnakangas │               4 │                174 │                     2.25
 Peter Eisentraut   │               6 │                309 │                     1.90
 Amit Langote       │               1 │                 54 │                     1.82
 Álvaro Herrera     │               1 │                118 │                     0.84
 Michael Paquier    │               1 │                275 │                     0.36
 Andres Freund      │               0 │                 26 │                     0.00
 Richard Guo        │               0 │                 27 │                     0.00
 Daniel Gustafsson  │               0 │                 99 │                     0.00
 Magnus Hagander    │               0 │                  4 │                     0.00
 John Naylor        │               0 │                 33 │                     0.00
 Melanie Plageman   │               0 │                  6 │                     0.00
 David Rowley       │               0 │                106 │                     0.00
 Tomas Vondra       │               0 │                 33 │                     0.00

Query was:
select committer,
  count(*) filter (where subject     like '%.') as "with end period",
  count(*) filter (where subject not like '%.') "without end period",
  ((count(*) filter (where subject like '%.'))::numeric / count(*) * 100)::numeric(5,2) as "fraction with end period"
from commits2024
group by committer
order by 4 desc, split_part(committer, ' ', 2);


Thanks!

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
                  -- Paul Graham, http://www.paulgraham.com/opensource.html



Re: code contributions for 2024, WIP version

От
"Andrey M. Borodin"
Дата:

> On 5 Dec 2024, at 17:46, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> We have plenty of patches with
> multiple authors, after all.

+1, thanks for raising this. A lot of stuff is actually joint work.
It’s much more fun to develop something in a group of co-authors.


Best regards, Andrey Borodin.


Re: code contributions for 2024, WIP version

От
Robert Haas
Дата:
On Thu, Dec 5, 2024 at 7:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> No corrections here -- I noticed nothing wrong with the commits I am
> involved with, in a quick read.  I did notice that for patches with
> multiple authors, only the first one is listed.  For instance,
> 53c2a97a926's author ("Improve performance of subsystems on top of
> SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out.  I realize
> that addressing this would complicate the schema and queries, but maybe
> it's worth thinking about for next time.  We have plenty of patches with
> multiple authors, after all.

I agree, but I don't know how to apportion the work between the
authors. I think dividing credit equally between two or three authors
would often be very unfair to the first author. If we want to annotate
commit messages in a way that allows me to apportion credit more
fairly, I'm totally game to do that, but otherwise I think that giving
the credit to the first author is probably more fair on average.

> Hmm, maybe
> UPDATE commits2024 SET xlines = 0 WHERE commitid in
>   ('43ce181059d', '4632e5cf4bc', '6377e12a5a5', 'ff9f72c68f6',
>   '21ef4d4d897', '592a2283721');

Thanks.

> How did you come up with the 'lines' number for each commit anyway?
> Judging by 592a2283721 it's not just the number of lines added, since
> that commit added 3 lines and you have lines=2.

git log --before=${YEAR}-12-31 --after=${YEAR}-01-01 --shortstat -w -M

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: code contributions for 2024, WIP version

От
Tom Kincaid
Дата:



While I know you said "you will do you" when it comes to your annual
blog, there are a number of similar efforts -- top of mind is the
analysis done (as I understand it) by Daniel Gustafsson and Claire
Giordano [1], as well as ongoing/recurring analysis done by the
contributor committee. And there is the adjacent related discussion
around commit messages/authors. It makes me wonder if there isn't a way
to make all of our lives easier going forward.

Perhaps slightly off topic, so how does one provide input to the contributor committee?
 

[1]
https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




--
Thomas John Kincaid

Re: code contributions for 2024, WIP version

От
Alvaro Herrera
Дата:
On 2024-Dec-05, Robert Haas wrote:

> On Thu, Dec 5, 2024 at 7:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > No corrections here -- I noticed nothing wrong with the commits I am
> > involved with, in a quick read.  I did notice that for patches with
> > multiple authors, only the first one is listed.  For instance,
> > 53c2a97a926's author ("Improve performance of subsystems on top of
> > SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out.  I realize
> > that addressing this would complicate the schema and queries, but maybe
> > it's worth thinking about for next time.  We have plenty of patches with
> > multiple authors, after all.
> 
> I agree, but I don't know how to apportion the work between the
> authors. I think dividing credit equally between two or three authors
> would often be very unfair to the first author. If we want to annotate
> commit messages in a way that allows me to apportion credit more
> fairly, I'm totally game to do that, but otherwise I think that giving
> the credit to the first author is probably more fair on average.

Just give credit to all lines for all authors, would be my approach.  Is
that unfair?  Perhaps, but I'd rather err on the side of giving too much
credit, than on not giving enough.

> > How did you come up with the 'lines' number for each commit anyway?
> > Judging by 592a2283721 it's not just the number of lines added, since
> > that commit added 3 lines and you have lines=2.
> 
> git log --before=${YEAR}-12-31 --after=${YEAR}-01-01 --shortstat -w -M

Ah, it's -w that makes the difference, got it.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Right now the sectors on the hard disk run clockwise, but I heard a rumor that
you can squeeze 0.2% more throughput by running them counterclockwise.
It's worth the effort. Recommended."  (Gerry Pourwelle)



Re: code contributions for 2024, WIP version

От
Bruce Momjian
Дата:
On Thu, Dec  5, 2024 at 10:39:38AM -0500, Tom Kincaid wrote:
>     While I know you said "you will do you" when it comes to your annual
>     blog, there are a number of similar efforts -- top of mind is the
>     analysis done (as I understand it) by Daniel Gustafsson and Claire
>     Giordano [1], as well as ongoing/recurring analysis done by the
>     contributor committee. And there is the adjacent related discussion
>     around commit messages/authors. It makes me wonder if there isn't a way
>     to make all of our lives easier going forward.
>
> Perhaps slightly off topic, so how does one provide input to the contributor
> committee?

The committee is responsible for updating the contributors list web page:

    https://www.postgresql.org/community/contributors/

and does analysis of contributions to the Postgres community to help
update the list.

Their email address at the bottom:

    contributors@postgresql.org

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.





Re: code contributions for 2024, WIP version

От
Robert Haas
Дата:
On Thu, Dec 5, 2024 at 11:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Just give credit to all lines for all authors, would be my approach.  Is
> that unfair?  Perhaps, but I'd rather err on the side of giving too much
> credit, than on not giving enough.

I'm not against somebody putting that together, but I don't think it
would be useful for me. I think it would inflate the numbers for
committers by quite a lot more than what is fair, because if I commit
a 1000 line patch and I add 50 lines of code, I'm going to get an
awful lot more credit than I deserve. It would probably also inflate
or distort the numbers for some other people as well. But what I would
say is -- if you think it's a useful thing, try doing it.

--
Robert Haas
EDB: http://www.enterprisedb.com



code contributions for 2025, WIP version

От
Robert Haas
Дата:
Hi,

A draft of my analysis of code contributions for 2025 can be found at
https://sites.google.com/site/robertmhaas/contributions in
contributions2025-wip.dmp. In contrast to previous years, I was able
to do much more of this in an automated way this year: the principal
author of the commit was computed by grabbing the first Author or
Co-Authored-By tag from the commit message rather than by manual
inspection of all the commit messages. Yay!

Just like last year, I invite corrections from anyone who is
interested in providing them. The table of interest is commits2025,
which has columns lines and xlines. xlines is what will be used to
produce the final blog post. As usual, I've set xlines=0 if a commit
seemed to be a large, mechanical commit that shouldn't count toward
someone's lines contributed. Also, this year, for certain patches that
touched the Unicode translate tables, instead of setting xlines=0,
I've decremented it by the size of the changes to the Unicode tables,
to avoid overcounting the significance of those commits relative to
others while still giving credit for the net new code. I did not
bother to account for reverts as carefully this year, because,
thankfully, most of them touched only relatively small numbers of
lines of code, and so it didn't seem to me that they affected the
statistics very much. If I did account for reverts more carefully,
what I would do is set xlines=0 for both reverts and the reverted
commits, but only when both occurred in the same calendar year. I'm
open to feedback on whether that should be pursued further in the
interest of accuracy, but so far it didn't seem especially important
given the shape of this year's data.

My main reason for putting this out for possible corrections is to fix
author names. If the primary author of a commit is not as listed, or
where I have multiple spellings for the same person's name, or where
someone's name is not spelled as they prefer, corrections are welcome.
Secondarily, if you think I should set xlines=0 for some mechanical
commit that was not identified as such in my initial analysis, you can
also tell me about that. As before, please send corrections off-list
as proposed UPDATE statements against the commits2025 table.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: code contributions for 2025, WIP version

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Jan 13, 2026 at 02:54:59PM -0500, Robert Haas wrote:
> Hi,
> 
> A draft of my analysis of code contributions for 2025 can be found at
> https://sites.google.com/site/robertmhaas/contributions in
> contributions2025-wip.dmp.

Thanks for taking the time to do that!

> My main reason for putting this out for possible corrections is to fix
> author names.

I did a quick scan and it looks like "Hou Zhijie" is listed twice: one as 
"Zhijie Hou" and one as "Hou Zhijie". So it looks like those related numbers
should be added and displayed as a single entry.

Looking at the commit log for 2025, they were all associated with the same
email "houzj.fnst@fujitsu.com".

So, generally speaking, maybe the counts should be based on the email
address instead and then pick up one of the "name surname"?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: code contributions for 2025, WIP version

От
Bertrand Drouvot
Дата:
On Wed, Jan 14, 2026 at 06:01:56PM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Tue, Jan 13, 2026 at 02:54:59PM -0500, Robert Haas wrote:
> > Hi,
> > 
> > A draft of my analysis of code contributions for 2025 can be found at
> > https://sites.google.com/site/robertmhaas/contributions in
> > contributions2025-wip.dmp.
> 
> Thanks for taking the time to do that!
> 
> > My main reason for putting this out for possible corrections is to fix
> > author names.
> 
> I did a quick scan and it looks like "Hou Zhijie" is listed twice: one as 
> "Zhijie Hou" and one as "Hou Zhijie". So it looks like those related numbers
> should be added and displayed as a single entry.

Maybe all those ones could be double checked? (already done for "Hou Zhijie").

postgres=# SELECT a1.author, a2.author,
       similarity(a1.author, a2.author) as similarity_score
FROM top_authors2025 a1
JOIN top_authors2025 a2 ON a1.author < a2.author
WHERE similarity(a1.author, a2.author) > 0.6
ORDER BY similarity_score DESC;
        author        |        author         | similarity_score
----------------------+-----------------------+------------------
 Hou Zhijie [*]       | Zhijie Hou [*]        |                1
 Maksim Melnikov [*]  | Melnikov Maksim [*]   |                1
 Andrei Lepikhov [*]  | Andrey Lepikhov [*]   |        0.7777778
 Mihail Nikalayeu [*] | Mikhail Nikalayeu [*] |             0.75
 Lukas Fitti [*]      | Lukas Fittl [*]       |       0.71428573
 Dmitry Koval [*]     | Dmitry Kovalenko [*]  |        0.6666667
(6 rows)

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: code contributions for 2025, WIP version

От
Kirill Reshke
Дата:


On Wed, 14 Jan 2026, 23:52 Bertrand Drouvot, <bertranddrouvot.pg@gmail.com> wrote:
On Wed, Jan 14, 2026 at 06:01:56PM +0000, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Jan 13, 2026 at 02:54:59PM -0500, Robert Haas wrote:
> > Hi,
> >
> > A draft of my analysis of code contributions for 2025 can be found at
> > https://sites.google.com/site/robertmhaas/contributions in
> > contributions2025-wip.dmp.
>
> Thanks for taking the time to do that!
>
> > My main reason for putting this out for possible corrections is to fix
> > author names.
>
> I did a quick scan and it looks like "Hou Zhijie" is listed twice: one as
> "Zhijie Hou" and one as "Hou Zhijie". So it looks like those related numbers
> should be added and displayed as a single entry.

Maybe all those ones could be double checked? (already done for "Hou Zhijie").

postgres=# SELECT a1.author, a2.author,
       similarity(a1.author, a2.author) as similarity_score
FROM top_authors2025 a1
JOIN top_authors2025 a2 ON a1.author < a2.author
WHERE similarity(a1.author, a2.author) > 0.6
ORDER BY similarity_score DESC;
        author        |        author         | similarity_score
----------------------+-----------------------+------------------
 Hou Zhijie [*]       | Zhijie Hou [*]        |                1
 Maksim Melnikov [*]  | Melnikov Maksim [*]   |                1
 Andrei Lepikhov [*]  | Andrey Lepikhov [*]   |        0.7777778
 Mihail Nikalayeu [*] | Mikhail Nikalayeu [*] |             0.75
 Lukas Fitti [*]      | Lukas Fittl [*]       |       0.71428573
 Dmitry Koval [*]     | Dmitry Kovalenko [*]  |        0.6666667
(6 rows)

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Hi

Dmitry Koval and Dmitry Kovalenko and not the same person 

Other cases, yes, the same person 

Re: code contributions for 2025, WIP version

От
Robert Haas
Дата:
On Wed, Jan 14, 2026 at 1:52 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> postgres=# SELECT a1.author, a2.author,
>        similarity(a1.author, a2.author) as similarity_score
> FROM top_authors2025 a1
> JOIN top_authors2025 a2 ON a1.author < a2.author
> WHERE similarity(a1.author, a2.author) > 0.6
> ORDER BY similarity_score DESC;
>         author        |        author         | similarity_score
> ----------------------+-----------------------+------------------
>  Hou Zhijie [*]       | Zhijie Hou [*]        |                1
>  Maksim Melnikov [*]  | Melnikov Maksim [*]   |                1
>  Andrei Lepikhov [*]  | Andrey Lepikhov [*]   |        0.7777778
>  Mihail Nikalayeu [*] | Mikhail Nikalayeu [*] |             0.75
>  Lukas Fitti [*]      | Lukas Fittl [*]       |       0.71428573
>  Dmitry Koval [*]     | Dmitry Kovalenko [*]  |        0.6666667
> (6 rows)

I have made these corrections:

update commits2025 set author = 'Hou Zhijie' where author = 'Zhijie Hou';
update commits2025 set author = 'Maksim Melnikov' where author =
'Melnikov Maksim';
update commits2025 set author = 'Andrei Lepikhov' where author =
'Andrey Lepikhov';
update commits2025 set author = 'Lukas Fittl' where author = 'Lukas Fitti';

Please let me know if you see anything else.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: code contributions for 2025, WIP version

От
Mihail Nikalayeu
Дата:


чт, 15 янв. 2026 г., 22:46 Robert Haas <robertmhaas@gmail.com>:
I have made these corrections:

update commits2025 set author = 'Hou Zhijie' where author = 'Zhijie Hou';
update commits2025 set author = 'Maksim Melnikov' where author =
'Melnikov Maksim';
update commits2025 set author = 'Andrei Lepikhov' where author =
'Andrey Lepikhov';
update commits2025 set author = 'Lukas Fittl' where author = 'Lukas Fitti';

Looks like you missed me,
Mihail Nikalayeu [*] | Mikhail Nikalayeu [*] | 0.75

Thanks!

RE: code contributions for 2025, WIP version

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Robert,

Thanks for working on it, I really appreciate you. I found a small issue for old data.

My name seemed to be registered in reverse order. My given name is "Hayato", and family name is "Kuroda".
I'm unfamiliar around here but should they be "Hayato Kuroda"?

```
postgres=# SELECT author, COUNT(*) FROM commits2019 WHERE author LIKE '%Hayato%' GROUP BY author;
    author     | count 
---------------+-------
 Kuroda Hayato |     1
(1 row)

postgres=# SELECT author, COUNT(*) FROM commits2024 WHERE author LIKE '%Hayato%' GROUP BY author;
    author     | count 
---------------+-------
 Hayato Kuroda |    12
 Kuroda Hayato |     5
(2 rows)

postgres=# SELECT author, COUNT(*) FROM commits2025 WHERE author LIKE '%Hayato%' GROUP BY author;
    author     | count 
---------------+-------
 Hayato Kuroda |    25
(1 row)
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: code contributions for 2025, WIP version

От
Robert Haas
Дата:
On Thu, Jan 15, 2026 at 9:26 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
> My name seemed to be registered in reverse order. My given name is "Hayato", and family name is "Kuroda".
> I'm unfamiliar around here but should they be "Hayato Kuroda"?

We can list people however they want to be listed. However, I'm not
entertaining historical corrections, as those blogs have already been
published. What I'm looking to get updated right now is making the
commits2025 table correct, and as your output shows, in your case it
is already consistent for the current year. If you wanted it to say
Kuroda Hayato rather than Hayato Kuroda, you could send me an UPDATE
statement which I would apply to the database. If you like it the way
it is, then there is no need to do anything.

In general, a big reason why people got listed inconsistently is that
their email name wasn't consistent. In your case, I think the
inconsistency is actually a difference in practice between committers.
Amit Kapila seems to routinely list you as Kuroda Hayato, while other
people are listing you as Hayato Kuroda <kuroda.hayato@fujitsu.com>
(and my scripts then strip out the email address, leaving just the
name). What I would encourage all committers to do going forward is
make the headers in the commit message match the way that the name is
shown in the email, and what I would encourage people submitting
patches to do is make sure that their email name matches how they want
to be listed. There are a number of people who either post from
multiple email accounts with slightly different names, or who actually
change the email name from time to time throughout the year, as by
adding or removing a middle initial. If you do this, it's not entirely
surprising if the result isn't entirely consistent.

A new trend that I find somewhat alarming is people posting with an
email name that is completely and totally different from the name that
they put in the email. This seems to happen mostly with people from
Russia and China. The email name might be something like, you know,
Fred Smith, and then the signature in the email will be like, Alena
Rostova. I feel this is quite bad because it makes the identity of the
person contributing to PostgreSQL completely unclear: is it Fred Smith
or is it Alena Rostova? But at the very least, it's not surprising if
it messes up the contributions statistics or the release note credits.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: code contributions for 2025, WIP version

От
Robert Haas
Дата:
On Thu, Jan 15, 2026 at 4:42 PM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:
> Looks like you missed me,
> Mihail Nikalayeu [*] | Mikhail Nikalayeu [*] | 0.75

I have applied this correction:

update commits2025 set author = 'Mihail Nikalayeu' where commitid =
'8b18ed6dfbb';

As I mentioned in the original email, the best way to send corrections
is an off-list email that contains an UPDATE statement. Of course,
on-list discussion is fine if there is something we need to discuss as
a group, but every individual correction isn't necessary of general
interest.

--
Robert Haas
EDB: http://www.enterprisedb.com