Обсуждение: Feature Request: Extending PostgreSQL's Identifier Length Limit

Поиск
Список
Период
Сортировка

Feature Request: Extending PostgreSQL's Identifier Length Limit

От
David HJ
Дата:
The Current Situation
As a long-time PostgreSQL user, I've increasingly run into issues with the 63-byte limit for identifiers, particularly table names. This limit, while historically sufficient, is becoming a significant pain point in modern database design and usage.

Real-World Examples
The problem is particularly evident in English table names, which make up a large portion of PostgreSQL's user base:
  • "Gross Domestic Product, Billions of Dollars, Not Seasonally Adjusted (GDPA)~Percent Change from Year Ago"
  • "Inflation, consumer prices for the United States, Percent, Not Seasonally Adjusted (FPCPITOTLZGUSA)"
  • "Annual Average Consumer Price Index for All Urban Consumers (CPI-U): U.S. City Average, All Items"
These names, while descriptive and useful, exceed our current limit. The issue extends to multi-byte character sets as well, such as this Chinese table name:
  • "能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" (Energy consumption statistics table)
Why This Matters
In my experience, the complexity of data and the need for self-documenting schemas have grown significantly. We're dealing with:
  • Increasingly descriptive table names for clarity and self-documentation
  • Automated systems generating tables with detailed, often lengthy names
  • A growing international user base requiring support for multi-byte characters
While workarounds exist, such as using abbreviations or moving descriptions to comments, these solutions often lead to less intuitive database designs and reduced readability.

Potential Solutions
Based on community discussions, I believe we should consider:
  1. Increasing NAMEDATALEN to 256 bytes or more
  2. Making NAMEDATALEN configurable at compile-time
  3. Changing NAMEDATALEN to represent character count instead of byte count
  4. Implementing a variable-length "name" data type
Challenges and Considerations
I'm aware that this change presents significant technical challenges:
  1. Maintaining backward compatibility
  2. Potential impacts on storage and performance
  3. Complexities in implementation, especially if moving to a variable-length system
Despite these challenges, I believe addressing this limitation is crucial for maintaining PostgreSQL's position as a versatile, user-friendly database system.

Call to Action
I respectfully request the PostgreSQL development team to consider this feature request. While I understand the technical complexities involved, I believe the benefits to the user experience would be substantial.
I'm eager to contribute to discussions and potentially assist in testing any proposed solutions. Let's work together to enhance PostgreSQL's capability to handle the evolving needs of modern database users.
Thank you for your consideration and your ongoing efforts in improving PostgreSQL.

Re: Feature Request: Extending PostgreSQL's Identifier Length Limit

От
Aleksander Alekseev
Дата:
Hi David,

> As a long-time PostgreSQL user, I've increasingly run into issues with the 63-byte limit for identifiers,
particularlytable names. This limit, while historically sufficient, is becoming a significant pain point in modern
databasedesign and usage.
 

I can understand your pain. Unfortunately there are a number of
complications involved.

Take pg_class catalog table [1] as an example and its column `relname`
of type `name` [2]. On disk it is stored as an fixed-sized array of
chars:

```
typedef struct nameData
{
    char        data[NAMEDATALEN];
} NameData;
typedef NameData *Name;
```

Why not use TEXT? Mostly for performance reasons. In general case TEXT
data can be TOASTed. When using TEXT one should do an additional call
of heap_deform_tuple().

Using NAME allows the code to interpret tuple data as is, e.g.:
```
typedef struct FormData_phonebook
{
    int32 id;
    NameData name;
    int32 phone;
} FormData_phonebook;

typedef FormData_phonebook* Form_phonebook;

/* ... */

    while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
    {
        Form_phonebook record = (Form_phonebook) GETSTRUCT(tup);

        if(strcmp(record->name.data, name->data) == 0)
        {
            found_phone = record->phone;
            break;
        }
    }
```

So if you change NAME definition several things will happen:

1. You have to rebuild catalog tables. This can't be done in-place
because larger tuples may not fit into pages. Note that page size may
also vary depending on how PostgreSQL was compiled. Note that the
indexes will also be affected.

2. You will break all the extensions that use NAME and the
corresponding heap_* and index_* APIs.

3. The performance will generally decrease - many existing
applications will just waste memory or do unnecessary work due to
extra calls to heap_deform_tuple().

If (1) is doable in theory, I don't think (2) and (3) are something we
do in this project.

On top of that there is are relatively simple workarounds for the situation:

1. An application may have a function like shorten_name(x) = substr(x,
1, 50) || '_' || substr(md5(x), 1, 8). So instead of `SELECT * FROM x`
you just do `SELECT * FROM shorten_name(x)`.

2. You may fork the code and enlarge NAMEDATALEN. This is not
recommended and not guaranteed to work but worth a try.

This makes me think that solving the named limitation isn't worth the effort.

Personally I'm not opposed to the idea in general but you will need to
come up with a specific RFC that explains how exactly you propose to
solve named problems.

[1]: https://www.postgresql.org/docs/current/catalog-pg-class.html
[2]: https://www.postgresql.org/docs/current/datatype-character.html

-- 
Best regards,
Aleksander Alekseev



Re: Feature Request: Extending PostgreSQL's Identifier Length Limit

От
Alvaro Herrera
Дата:
On 2024-Jul-18, David HJ wrote:

> As a long-time PostgreSQL user, I've increasingly run into issues with the
> 63-byte limit for identifiers, particularly table names. This limit, while
> historically sufficient, is becoming a significant pain point in modern
> database design and usage.

This has been discussed before.  I think the latest discussion, and some
preliminary proof-of-concept patches, were around here:

https://postgr.es/m/CAFBsxsF2V8n9w0SGK56bre3Mk9fzZS=9aaA8Gfs_n+woa3Dr-Q@mail.gmail.com

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle."  (Larry Wall, Apocalypse 6)