Обсуждение: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

Поиск
Список
Период
Сортировка

PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Rajin Raj
Дата:
Is there any impact of using the character varying without providing the length while creating tables? 
I have created two tables and inserted 1M records. But I don't see any difference in pg_class. (size, relpage)

create table test_1(name varchar);
create table test_2(name varchar(50));

insert into test_1 ... 10M records
insert into test_2 ... 10M records

vacuum (full,analyze) db_size_test_1;
vacuum (full,analyze) db_size_test_2;

Which option is recommended? 

Regards,
Rajin 

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Holger Jakobs
Дата:

No, there is no impact.

Am 28.04.20 um 11:22 schrieb Rajin Raj:
Is there any impact of using the character varying without providing the length while creating tables? 
I have created two tables and inserted 1M records. But I don't see any difference in pg_class. (size, relpage)

create table test_1(name varchar);
create table test_2(name varchar(50));

insert into test_1 ... 10M records
insert into test_2 ... 10M records

vacuum (full,analyze) db_size_test_1;
vacuum (full,analyze) db_size_test_2;

Which option is recommended? 

Regards,
Rajin 
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Ashutosh Bapat
Дата:
On Tue, Apr 28, 2020 at 2:53 PM Rajin Raj <rajin.raj@opsveda.com> wrote:
>
> Is there any impact of using the character varying without providing the length while creating tables?
> I have created two tables and inserted 1M records. But I don't see any difference in pg_class. (size, relpage)
>
> create table test_1(name varchar);
> create table test_2(name varchar(50));

I don't think there's a difference in the way these two are stored
on-disk. But if you know that your strings will be at most 50
characters long, better set that limit so that server takes
appropriate action (i.e. truncates the strings to 50).

-- 
Best Wishes,
Ashutosh Bapat



Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Ashutosh Bapat
Дата:
On Tue, Apr 28, 2020 at 2:53 PM Rajin Raj <rajin.raj@opsveda.com> wrote:
>
> Is there any impact of using the character varying without providing the length while creating tables?
> I have created two tables and inserted 1M records. But I don't see any difference in pg_class. (size, relpage)
>
> create table test_1(name varchar);
> create table test_2(name varchar(50));

I don't think there's a difference in the way these two are stored
on-disk. But if you know that your strings will be at most 50
characters long, better set that limit so that server takes
appropriate action (i.e. truncates the strings to 50).

-- 
Best Wishes,
Ashutosh Bapat



Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Holger Jakobs
Дата:
Truncation will NEVER happen. PostgreSQL throws an ERROR on any attempt 
of saving more characters (not bytes!) into a VARCHAR(50) column.

There is some other well-known system which silently truncates, but we 
all know why we would never use that.

Am 28.04.20 um 13:46 schrieb Ashutosh Bapat:
> On Tue, Apr 28, 2020 at 2:53 PM Rajin Raj <rajin.raj@opsveda.com> wrote:
>> Is there any impact of using the character varying without providing the length while creating tables?
>> I have created two tables and inserted 1M records. But I don't see any difference in pg_class. (size, relpage)
>>
>> create table test_1(name varchar);
>> create table test_2(name varchar(50));
> I don't think there's a difference in the way these two are stored
> on-disk. But if you know that your strings will be at most 50
> characters long, better set that limit so that server takes
> appropriate action (i.e. truncates the strings to 50).
>
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012




Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Paul Carlucci
Дата:
PG the text, character varying, character varying(length), character column types are all the same thing with each column type inheriting the properties from the parent type.  With each successive type further properties are added but they're all basically just "text" with some additional metadata.  If you're coming from other database engines or just general programming languages where text and fixed length string fields are handled differently then the above can seem a bit different form what you're used to.  Heck, I can think of one engine where if you have a text column you have to query the table for the blob identifier and then issue a separate call to retrieve it.  Here in PG it's literally all the same, handled the same, performs the same.  Use what limiters make sense for your application.

On Tue, Apr 28, 2020 at 5:22 AM Rajin Raj <rajin.raj@opsveda.com> wrote:
Is there any impact of using the character varying without providing the length while creating tables? 
I have created two tables and inserted 1M records. But I don't see any difference in pg_class. (size, relpage)

create table test_1(name varchar);
create table test_2(name varchar(50));

insert into test_1 ... 10M records
insert into test_2 ... 10M records

vacuum (full,analyze) db_size_test_1;
vacuum (full,analyze) db_size_test_2;

Which option is recommended? 

Regards,
Rajin 

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Paul Carlucci
Дата:
PG the text, character varying, character varying(length), character column types are all the same thing with each column type inheriting the properties from the parent type.  With each successive type further properties are added but they're all basically just "text" with some additional metadata.  If you're coming from other database engines or just general programming languages where text and fixed length string fields are handled differently then the above can seem a bit different form what you're used to.  Heck, I can think of one engine where if you have a text column you have to query the table for the blob identifier and then issue a separate call to retrieve it.  Here in PG it's literally all the same, handled the same, performs the same.  Use what limiters make sense for your application.

On Tue, Apr 28, 2020 at 5:22 AM Rajin Raj <rajin.raj@opsveda.com> wrote:
Is there any impact of using the character varying without providing the length while creating tables? 
I have created two tables and inserted 1M records. But I don't see any difference in pg_class. (size, relpage)

create table test_1(name varchar);
create table test_2(name varchar(50));

insert into test_1 ... 10M records
insert into test_2 ... 10M records

vacuum (full,analyze) db_size_test_1;
vacuum (full,analyze) db_size_test_2;

Which option is recommended? 

Regards,
Rajin 

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Ron
Дата:
An example:

test=# create table bargle (f1 varchar(10));
CREATE TABLE
test=# insert into bargle values ('01234567890123');
ERROR:  value too long for type character varying(10)


On 4/28/20 10:10 AM, Holger Jakobs wrote:
> Truncation will NEVER happen. PostgreSQL throws an ERROR on any attempt of 
> saving more characters (not bytes!) into a VARCHAR(50) column.
>
> There is some other well-known system which silently truncates, but we all 
> know why we would never use that.
>
> Am 28.04.20 um 13:46 schrieb Ashutosh Bapat:
>> On Tue, Apr 28, 2020 at 2:53 PM Rajin Raj <rajin.raj@opsveda.com> wrote:
>>> Is there any impact of using the character varying without providing the 
>>> length while creating tables?
>>> I have created two tables and inserted 1M records. But I don't see any 
>>> difference in pg_class. (size, relpage)
>>>
>>> create table test_1(name varchar);
>>> create table test_2(name varchar(50));
>> I don't think there's a difference in the way these two are stored
>> on-disk. But if you know that your strings will be at most 50
>> characters long, better set that limit so that server takes
>> appropriate action (i.e. truncates the strings to 50).
>>

-- 
Angular momentum makes the world go 'round.



Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
raf
Дата:
Paul Carlucci wrote:

> On Tue, Apr 28, 2020 at 5:22 AM Rajin Raj <rajin.raj@opsveda.com> wrote:
> 
> > Is there any impact of using the character varying without providing the
> > length while creating tables?
> > I have created two tables and inserted 1M records. But I don't see any
> > difference in pg_class. (size, relpage)
> >
> > create table test_1(name varchar);
> > create table test_2(name varchar(50));
> >
> > insert into test_1 ... 10M records
> > insert into test_2 ... 10M records
> >
> > vacuum (full,analyze) db_size_test_1;
> > vacuum (full,analyze) db_size_test_2;
> >
> > Which option is recommended?
> >
> > *Regards,*
> > *Rajin *
> >
> PG the text, character varying, character varying(length), character column
> types are all the same thing with each column type inheriting the
> properties from the parent type.  With each successive type further
> properties are added but they're all basically just "text" with some
> additional metadata.  If you're coming from other database engines or just
> general programming languages where text and fixed length string fields are
> handled differently then the above can seem a bit different form what
> you're used to.  Heck, I can think of one engine where if you have a text
> column you have to query the table for the blob identifier and then issue a
> separate call to retrieve it.  Here in PG it's literally all the same,
> handled the same, performs the same.  Use what limiters make sense for your
> application.

My advice is to never impose arbitrary limits on text.
You will probably regret the choice of limit at some
point. I recently encountered people complaining that
they (thought they) needed to store 21 characters in
a field that they had limited to 10 characters (even
though they were originally told that the recipient
of the data would accept up to 40 characters).

I just use "text" for everything. It's less typing. :-)

The only good reason I can think of for limiting the
length would be to mitigate the risk of some kind of
denial of service, so a limit of 1KiB or 1MiB maybe.
But even that sounds silly. I've never done it (except
to limit CPU usage for slow password hashing but even
then, the 1KiB limit was imposed by input validation,
not by the database schema).

cheers,
raf

P.S. My aversion to arbitrary length limits applies to
postgres identifier names as well. I wish they weren't
limited to 63 characters.




Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
raf
Дата:
raf wrote:

> Paul Carlucci wrote:
> 
> > On Tue, Apr 28, 2020 at 5:22 AM Rajin Raj <rajin.raj@opsveda.com> wrote:
> > 
> > > Is there any impact of using the character varying without providing the
> > > length while creating tables?
> > > I have created two tables and inserted 1M records. But I don't see any
> > > difference in pg_class. (size, relpage)
> > >
> > > create table test_1(name varchar);
> > > create table test_2(name varchar(50));
> > >
> > > insert into test_1 ... 10M records
> > > insert into test_2 ... 10M records
> > >
> > > vacuum (full,analyze) db_size_test_1;
> > > vacuum (full,analyze) db_size_test_2;
> > >
> > > Which option is recommended?
> > >
> > > *Regards,*
> > > *Rajin *
> > >
> > PG the text, character varying, character varying(length), character column
> > types are all the same thing with each column type inheriting the
> > properties from the parent type.  With each successive type further
> > properties are added but they're all basically just "text" with some
> > additional metadata.  If you're coming from other database engines or just
> > general programming languages where text and fixed length string fields are
> > handled differently then the above can seem a bit different form what
> > you're used to.  Heck, I can think of one engine where if you have a text
> > column you have to query the table for the blob identifier and then issue a
> > separate call to retrieve it.  Here in PG it's literally all the same,
> > handled the same, performs the same.  Use what limiters make sense for your
> > application.
> 
> My advice is to never impose arbitrary limits on text.
> You will probably regret the choice of limit at some
> point. I recently encountered people complaining that
> they (thought they) needed to store 21 characters in
> a field that they had limited to 10 characters (even
> though they were originally told that the recipient
> of the data would accept up to 40 characters).
> 
> I just use "text" for everything. It's less typing. :-)
> 
> The only good reason I can think of for limiting the
> length would be to mitigate the risk of some kind of
> denial of service, so a limit of 1KiB or 1MiB maybe.
> But even that sounds silly. I've never done it (except
> to limit CPU usage for slow password hashing but even
> then, the 1KiB limit was imposed by input validation,
> not by the database schema).

Sorry, the above is misleading/a bad example. The hash
stored in the database is a fixed reasonable length. It
only varies according to the hashing scheme used. It's
only the unhashed password (that isn't stored anywhere)
that was limited by input validation to limit CPU
usage.

> cheers,
> raf
> 
> P.S. My aversion to arbitrary length limits applies to
> postgres identifier names as well. I wish they weren't
> limited to 63 characters.
> 



Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Rui DeSousa
Дата:


On Apr 28, 2020, at 7:43 PM, raf <raf@raf.org> wrote:

I just use "text" for everything. It's less typing. :-)


Ugh, I see it as sign that the designers of the schema didn’t fully think about the actual requirements or care about them and it usually shows.    

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
"David G. Johnston"
Дата:
On Tue, Apr 28, 2020 at 5:21 PM Rui DeSousa <rui@crazybean.net> wrote:
I just use "text" for everything. It's less typing. :-)
Ugh, I see it as sign that the designers of the schema didn’t fully think about the actual requirements or care about them and it usually shows.    

There are very few situations where a non-arbitrary free-form text field is going to have a non-arbitrary length constraint - that is also immutable.  Generally, spending time to figure out those rare exceptions is wasted effort better spent elsewhere.  They are also mostly insufficient when used for their typical "protection" purpose.  If you really want protection add well thought out constraints.

Its less problematic now that increasing the generally arbitrary length doesn't require a table rewrite but you still need to rebuild dependent objects.

David J.

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Rui DeSousa
Дата:


On Apr 28, 2020, at 8:34 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

Its less problematic now that increasing the generally arbitrary length doesn't require a table rewrite but you still need to rebuild dependent objects.


To increase a column length does not require a table rewrite or table scan; however, reducing its size will require a full table scan.  So cleaning up a schema like the one proposed sucks. 

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
"David G. Johnston"
Дата:
On Tue, Apr 28, 2020 at 5:40 PM Rui DeSousa <rui@crazybean.net> wrote:
To increase a column length does not require a table rewrite or table scan; however, reducing its size will require a full table scan.  So cleaning up a schema like the one proposed sucks. 

I estimate the probability of ever desiring to reduce the length of a varchar field to be indistinguishable from zero.

David J.

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
raf
Дата:
Rui DeSousa wrote:

> > On Apr 28, 2020, at 7:43 PM, raf <raf@raf.org> wrote:
> > 
> > I just use "text" for everything. It's less typing. :-)
> 
> Ugh, I see it as sign that the designers of the schema didn’t fully
> think about the actual requirements or care about them and it usually
> shows.

You are mistaken. I care a lot. That's why I
future-proof designs whenever possible by
not imposing arbitrarily chosen limits that
appear to suit current requirements.

In other words, I know I'm not smart enough
to predict the future so I don't let that
fact ruin my software. :-)

cheers,
raf




Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Rui DeSousa
Дата:


On Apr 28, 2020, at 10:29 PM, raf <raf@raf.org> wrote:

Rui DeSousa wrote:

On Apr 28, 2020, at 7:43 PM, raf <raf@raf.org> wrote:

I just use "text" for everything. It's less typing. :-)

Ugh, I see it as sign that the designers of the schema didn’t fully
think about the actual requirements or care about them and it usually
shows.

You are mistaken. I care a lot. That's why I
future-proof designs whenever possible by
not imposing arbitrarily chosen limits that
appear to suit current requirements.

In other words, I know I'm not smart enough
to predict the future so I don't let that
fact ruin my software. :-)

cheers,
raf


Arbitrarily? What’s a cusip, vin, ssn?  Why would you put a btree index on a text field? Because it’s not.

What you’re advocating is a NoSQL design — defer your schema design.  Letting the application code littered in multiple places elsewhere define what a cusip, etc. is. 



Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
"David G. Johnston"
Дата:
On Tuesday, April 28, 2020, Rui DeSousa <rui@crazybean.net> wrote:

Arbitrarily? What’s a cusip, vin, ssn?  Why would you put a btree index on a text field? Because it’s not.

What you’re advocating is a NoSQL design — defer your schema design.  Letting the application code littered in multiple places elsewhere define what a cusip, etc. is. 


All of those would be defined as PKs somewhere with a constraint that limits not only their length but also allowable characters so you don’t get something like !@#$%^&*( as a valid ssn of length 9.  A domain is probably even better though has implementation trade-offs.

A length constraint by itself is insufficient in those examples, which are still arbitrary though the decision is outside the control of the modeler.  If the supplied values are external, which they likely are, the system under design should probably just define the values loosely and accept whatever the source system provides as-is.

David J.

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Rui DeSousa
Дата:


On Apr 29, 2020, at 12:34 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tuesday, April 28, 2020, Rui DeSousa <rui@crazybean.net> wrote:

Arbitrarily? What’s a cusip, vin, ssn?  Why would you put a btree index on a text field? Because it’s not.

What you’re advocating is a NoSQL design — defer your schema design.  Letting the application code littered in multiple places elsewhere define what a cusip, etc. is. 


All of those would be defined as PKs somewhere with a constraint that limits not only their length but also allowable characters so you don’t get something like !@#$%^&*( as a valid ssn of length 9.  A domain is probably even better though has implementation trade-offs.

A length constraint by itself is insufficient in those examples, which are still arbitrary though the decision is outside the control of the modeler.  If the supplied values are external, which they likely are, the system under design should probably just define the values loosely and accept whatever the source system provides as-is.

David J.

That is the worst; seeing a text field being used in a primary key; seriously?  Trying to understand how wide a table is when it’s 40 columns wide and 35 of them are text fields, ugh.  When someone asks for btree index on a column and it is a text field; why?

Don’t fool yourself, you are not future proofing your application; what really is happening is a slow creeping data quality issue which later needs a special project just clean up.

I think we can both agree that you need to model your data correctly or at least to your best knowledge and ability.

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
"David G. Johnston"
Дата:
On Tuesday, April 28, 2020, Rui DeSousa <rui@crazybean.net> wrote:
Don’t fool yourself, you are not future proofing your application; what really is happening is a slow creeping data quality issue which later needs a special project just clean up.

I don’t use text instead of varchar(n) for future proofing and use it quite well within well defined relational schemas.  Using varchar(n) in a table always has a better solution, use text and a constraint.

David J.

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Rui DeSousa
Дата:


On Apr 29, 2020, at 1:09 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tuesday, April 28, 2020, Rui DeSousa <rui@crazybean.net> wrote:
Don’t fool yourself, you are not future proofing your application; what really is happening is a slow creeping data quality issue which later needs a special project just clean up.

I don’t use text instead of varchar(n) for future proofing and use it quite well within well defined relational schemas.  Using varchar(n) in a table always has a better solution, use text and a constraint.

David J.

I would agree with you that "text and a constraint" is a lot better than just text; and would be functionally equivalent to varchar(n).

It does requires the reader to look into each constraint to know what’s going on.

Also, when porting the schema to a different database engine and the create table statement fails because it’s too wide and doesn’t fit on a page; the end result is having to go back and redefine the text fields as varchar(n)/char(n) anyway.

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Tim Cross
Дата:
Rui DeSousa <rui@crazybean.net> writes:

>> On Apr 28, 2020, at 10:29 PM, raf <raf@raf.org> wrote:
>>
>> Rui DeSousa wrote:
>>
>>>> On Apr 28, 2020, at 7:43 PM, raf <raf@raf.org> wrote:
>>>>
>>>> I just use "text" for everything. It's less typing. :-)
>>>
>>> Ugh, I see it as sign that the designers of the schema didn’t fully
>>> think about the actual requirements or care about them and it usually
>>> shows.
>>
>> You are mistaken. I care a lot. That's why I
>> future-proof designs whenever possible by
>> not imposing arbitrarily chosen limits that
>> appear to suit current requirements.
>>
>> In other words, I know I'm not smart enough
>> to predict the future so I don't let that
>> fact ruin my software. :-)
>>
>> cheers,
>> raf
>>
>
> Arbitrarily? What’s a cusip, vin, ssn?  Why would you put a btree index on a text field? Because it’s not.
>
> What you’re advocating is a NoSQL design — defer your schema design.  Letting the application code littered in
multipleplaces elsewhere define what a cusip, etc. is.  

I think the key term in this thread is 'arbitrary'. When implementing a
schema design, it should reflect the known constraints inherent in the
model, but it should avoid imposing arbitrary constraints if none exist
or cannot be determined.

So, if you know that a customer ID field has a current limitation of 50
characters, then use a definition which reflects that. It may be that at
some point in the future, this will be increased, but then again, it may
not and that bit of information provides useful information for
application developers and helps with consistency across APIs. Without
some guideline, different developers will impose different values,
leading to maintenance issues and bugs down the track.

On the other hand, imposing an arbitrary limitation, based on little
more than a guess by the designer, can cause enormous problems. As an
example, I was working on an identity management system where there was
a constraint of 8 characters on the username and password. This was an
arbitrary limit based on what was common practice, but was not a
limitation imposed by any of the systems the IAM system interacted with.
It was recognised that both fields were too small and needed to be
increased. The easy solution would have been to make these fields text.
However, that would cause a problem with some of the systems we needed
to integrate with because either they had a limit on username size or
they had a limit on password size. There were also multiple different
APIs which needed to work with this system and when we performed
analysis, they had varying limits on both fields.

What we did was look at all the systems we had to integrate with and
found the maximum supported username and password lengths for each
system and set the fields to have the maximum length supported by the
systems with the shortest lengths. Having that information in the
database schema also informed those developing other interfaces what the
maximums were. It is quite likely these limits would be increased in the
future and the database definition would need to be increased - in fact,
some years after going into production, exactly this occurred with the
password field when a different encryption algorithm was adopted which
did not have the previous character limitation and the client wanted to
encourage users to use pass phrases rather than a word.

The point is, just using text for all character fields loses information
and results in your model and schema being less expressive. Providing
this information is sometimes critical in ensuring limits are maintained
and provides useful documentation about the model that developers can
use. However, imposing limits based on little more than a guess is
usually a bad idea and if you cannot find any reason to impose a limit,
then don't. I disagree with approaches which claim using text everywhere
is easier and future proofing. In reality, it is just pushing the
problem out for someone else to deal with. The best way to future proof
your application is to have a clear well defined data model that fits
the domain and is well documented and reflected in your database schema.


--
Tim Cross



Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
"David G. Johnston"
Дата:
On Tuesday, April 28, 2020, Rui DeSousa <rui@crazybean.net> wrote:
I would agree with you that "text and a constraint" is a lot better than just text; and would be functionally equivalent to varchar(n).
 
Close enough...

It does requires the reader to look into each constraint to know what’s going on.

 And “n” is so informative...please.  The name of the field tells me most of what I care about, the “n” and/or constraint are fluff.


Also, when porting the schema to a different database engine and the create table statement fails because it’s too wide and doesn’t fit on a page; the end result is having to go back and redefine the text fields as varchar(n)/char(n) anyway.

Not something I’m concerned about and if that other db doesn’t have something like TOAST it seems like an undesirable target.

David J.

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Rui DeSousa
Дата:


On Apr 29, 2020, at 1:32 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:


 And “n” is so informative...please.  The name of the field tells me most of what I care about, the “n” and/or constraint are fluff.


That was your recommendation; so I’m confused as to why it’s no longer valid.


Also, when porting the schema to a different database engine and the create table statement fails because it’s too wide and doesn’t fit on a page; the end result is having to go back and redefine the text fields as varchar(n)/char(n) anyway.

Not something I’m concerned about and if that other db doesn’t have something like TOAST it seems like an undesirable target.


Fine, I assume you will be employed by your employer in perpetuity and the system will remain on PostgreSQL.

Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

От
Rui DeSousa
Дата:

> On Apr 29, 2020, at 1:30 AM, Tim Cross <theophilusx@gmail.com> wrote:
>
> I think the key term in this thread is 'arbitrary'. When implementing a
> schema design, it should reflect the known constraints inherent in the
> model, but it should avoid imposing arbitrary constraints if none exist
> or cannot be determined.
>
> So, if you know that a customer ID field has a current limitation of 50
> characters, then use a definition which reflects that. It may be that at
> some point in the future, this will be increased, but then again, it may
> not and that bit of information provides useful information for
> application developers and helps with consistency across APIs. Without
> some guideline, different developers will impose different values,
> leading to maintenance issues and bugs down the track.
>
> On the other hand, imposing an arbitrary limitation, based on little
> more than a guess by the designer, can cause enormous problems. As an
> example, I was working on an identity management system where there was
> a constraint of 8 characters on the username and password. This was an
> arbitrary limit based on what was common practice, but was not a
> limitation imposed by any of the systems the IAM system interacted with.
> It was recognised that both fields were too small and needed to be
> increased. The easy solution would have been to make these fields text.
> However, that would cause a problem with some of the systems we needed
> to integrate with because either they had a limit on username size or
> they had a limit on password size. There were also multiple different
> APIs which needed to work with this system and when we performed
> analysis, they had varying limits on both fields.
>
> What we did was look at all the systems we had to integrate with and
> found the maximum supported username and password lengths for each
> system and set the fields to have the maximum length supported by the
> systems with the shortest lengths. Having that information in the
> database schema also informed those developing other interfaces what the
> maximums were. It is quite likely these limits would be increased in the
> future and the database definition would need to be increased - in fact,
> some years after going into production, exactly this occurred with the
> password field when a different encryption algorithm was adopted which
> did not have the previous character limitation and the client wanted to
> encourage users to use pass phrases rather than a word.
>
> The point is, just using text for all character fields loses information
> and results in your model and schema being less expressive. Providing
> this information is sometimes critical in ensuring limits are maintained
> and provides useful documentation about the model that developers can
> use. However, imposing limits based on little more than a guess is
> usually a bad idea and if you cannot find any reason to impose a limit,
> then don't. I disagree with approaches which claim using text everywhere
> is easier and future proofing. In reality, it is just pushing the
> problem out for someone else to deal with. The best way to future proof
> your application is to have a clear well defined data model that fits
> the domain and is well documented and reflected in your database schema.
>
>
> --
> Tim Cross
>

I can’t agree more… Thanks Tim.