Обсуждение: ALTER TEXT field to VARCHAR(1024)

Поиск
Список
Период
Сортировка

ALTER TEXT field to VARCHAR(1024)

От
Marius Grama
Дата:
Hello,


i am using Postgres 9.2 and I'd like to perform the following ALTER statement on a database table with about 30M entries :

ALTER TABLE images ALTER COLUMN mask_descriptors TYPE VARCHAR(1024);


The mask_descriptors field is currently having the type TEXT.
I want to perform the ALTER due to the fact that it seems that copying the contents of the table to a BI SQL Server is done in row by row (instead of batch) when handling CLOBs.

From the Postgres documentation I got the following :

http://www.postgresql.org/docs/8.3/static/datatype-character.html

Tip: There are no performance differences between these three types, apart from increased storage size when using the blank-padded type, and a few extra cycles to check the length when storing into a length-constrained column. While character has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead.



Can anybody explain me what happens in the background when the alter statement is executed? I've tried it out on a small copy of the table (70K) and the operation completed in 0.2 seconds.
Will the table be completely locked during the execution of the ALTER statement?



Re: ALTER TEXT field to VARCHAR(1024)

От
Gavin Flower
Дата:
On 19/09/14 19:32, Marius Grama wrote:
Hello,


i am using Postgres 9.2 and I'd like to perform the following ALTER statement on a database table with about 30M entries :

ALTER TABLE images ALTER COLUMN mask_descriptors TYPE VARCHAR(1024);

The mask_descriptors field is currently having the type TEXT.
I want to perform the ALTER due to the fact that it seems that copying the contents of the table to a BI SQL Server is done in row by row (instead of batch) when handling CLOBs.

From the Postgres documentation I got the following :

http://www.postgresql.org/docs/8.3/static/datatype-character.html

Tip: There are no performance differences between these three types, apart from increased storage size when using the blank-padded type, and a few extra cycles to check the length when storing into a length-constrained column. While character has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead.



Can anybody explain me what happens in the background when the alter statement is executed? I've tried it out on a small copy of the table (70K) and the operation completed in 0.2 seconds.
Will the table be completely locked during the execution of the ALTER statement?



I am curious as to why you want to change text to VARCHAR(1024), especially as I think that the middleware should be controlling how long a string is saved in the database rather than end user client code (for several reasons., including security concerns).  However, I do not know your use cases, nor your overall situation - so my concerns may not apply to you.


Cheers,
Gavin


Re: ALTER TEXT field to VARCHAR(1024)

От
Bill Moran
Дата:
On Fri, 19 Sep 2014 09:32:09 +0200
Marius Grama <mariusneo@gmail.com> wrote:

> Hello,
>
>
> i am using Postgres 9.2 and I'd like to perform the following ALTER
> statement on a database table with about 30M entries :
>
> ALTER TABLE images ALTER COLUMN mask_descriptors TYPE VARCHAR(1024);
>
>
>
> The mask_descriptors field is currently having the type TEXT.
> I want to perform the ALTER due to the fact that it seems that copying the
> contents of the table to a BI SQL Server is done in row by row (instead of
> batch) when handling CLOBs.
>
> From the Postgres documentation I got the following :
>
> http://www.postgresql.org/docs/8.3/static/datatype-character.html
>
> Tip: There are no performance differences between these three types, apart
> from increased storage size when using the blank-padded type, and a few
> extra cycles to check the length when storing into a length-constrained
> column. While character has performance advantages in some other database
> systems, it has no such advantages in PostgreSQL. In most situations text
> or character varying should be used instead.
>
>
>
> Can anybody explain me what happens in the background when the alter
> statement is executed? I've tried it out on a small copy of the table (70K)
> and the operation completed in 0.2 seconds.
> Will the table be completely locked during the execution of the ALTER
> statement?

I share Gavin's concern that you're fixing this in the wrong place.  I expect
that you'll be better served by configuring the middleware to do the right thing.

However, a more direct answer to your question: VARCHAR and TEXT are _the_same_
internally.  Thus:

ALTER TABLE images ALTER COLUMN mask_descriptors TYPE VARCHAR;

would do nothing more than change the table definition.  There is no need for
that statement to touch any data.

However, adding the length constraint of (1024) will force Postgres to check
every single value to ensure it complies with the constraint.  I believe if
any row is longer than 1024 it will throw an error and abort the entire ATLER.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com


Re: ALTER TEXT field to VARCHAR(1024)

От
Merlin Moncure
Дата:
On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:
> On Fri, 19 Sep 2014 09:32:09 +0200
> Marius Grama <mariusneo@gmail.com> wrote:
>> Can anybody explain me what happens in the background when the alter
>> statement is executed? I've tried it out on a small copy of the table (70K)
>> and the operation completed in 0.2 seconds.
>> Will the table be completely locked during the execution of the ALTER
>> statement?
>
> I share Gavin's concern that you're fixing this in the wrong place.  I expect
> that you'll be better served by configuring the middleware to do the right thing.

I'll pile on here: in almost 20 years of professional database
development I've never had an actual problem that was solved by
introducing or shortening a length constraint to text columns except
in cases where overlong strings violate the data model (like a two
character state code for example).  It's a database equivalent of "C
programmer's disease".  Input checks from untrusted actors should
happen in the application.

merlin


Re: ALTER TEXT field to VARCHAR(1024)

От
John McKown
Дата:
On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:
>> On Fri, 19 Sep 2014 09:32:09 +0200
>> Marius Grama <mariusneo@gmail.com> wrote:
>>> Can anybody explain me what happens in the background when the alter
>>> statement is executed? I've tried it out on a small copy of the table (70K)
>>> and the operation completed in 0.2 seconds.
>>> Will the table be completely locked during the execution of the ALTER
>>> statement?
>>
>> I share Gavin's concern that you're fixing this in the wrong place.  I expect
>> that you'll be better served by configuring the middleware to do the right thing.
>
> I'll pile on here: in almost 20 years of professional database
> development I've never had an actual problem that was solved by
> introducing or shortening a length constraint to text columns except
> in cases where overlong strings violate the data model (like a two
> character state code for example).  It's a database equivalent of "C
> programmer's disease".  Input checks from untrusted actors should
> happen in the application.
>
> merlin
>

I do not have your experience level with data bases, but if I may, I
will make an addition. Input checks should also happen in the RDBMS
server. I have learned you cannot trust end users _or_ programmers.
Most are good and conscientious. But there are a few who just aren't.
And those few seem to be very prolific in making _subtle_ errors. Had
one person who was really good at replacing every p with a [ and P
with {


--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown


Re: ALTER TEXT field to VARCHAR(1024)

От
Rob Sargent
Дата:
On 09/22/2014 09:40 AM, John McKown wrote:
On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:
On Fri, 19 Sep 2014 09:32:09 +0200
Marius Grama <mariusneo@gmail.com> wrote:
Can anybody explain me what happens in the background when the alter
statement is executed? I've tried it out on a small copy of the table (70K)
and the operation completed in 0.2 seconds.
Will the table be completely locked during the execution of the ALTER
statement?
I share Gavin's concern that you're fixing this in the wrong place.  I expect
that you'll be better served by configuring the middleware to do the right thing.
I'll pile on here: in almost 20 years of professional database
development I've never had an actual problem that was solved by
introducing or shortening a length constraint to text columns except
in cases where overlong strings violate the data model (like a two
character state code for example).  It's a database equivalent of "C
programmer's disease".  Input checks from untrusted actors should
happen in the application.

merlin

I do not have your experience level with data bases, but if I may, I
will make an addition. Input checks should also happen in the RDBMS
server. I have learned you cannot trust end users _or_ programmers.
Most are good and conscientious. But there are a few who just aren't.
And those few seem to be very prolific in making _subtle_ errors. Had
one person who was really good at replacing every p with a [ and P
with {


You don't want that string to get all the way to the server and fail, blow out a transaction and carry that joyous news back to the user who now has to start over completely.  Further no mear length constraint is going to fix p<=>[.  Not say the db cannot have the constraint (no [ allowed?) but a good app checks input on the fly.



Re: ALTER TEXT field to VARCHAR(1024)

От
Tim Clarke
Дата:
On 22/09/14 17:18, Rob Sargent wrote:
> On 09/22/2014 09:40 AM, John McKown wrote:
>> On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:
>>>> On Fri, 19 Sep 2014 09:32:09 +0200
>>>> Marius Grama <mariusneo@gmail.com> wrote:
>>>>> Can anybody explain me what happens in the background when the alter
>>>>> statement is executed? I've tried it out on a small copy of the table (70K)
>>>>> and the operation completed in 0.2 seconds.
>>>>> Will the table be completely locked during the execution of the ALTER
>>>>> statement?
>>>> I share Gavin's concern that you're fixing this in the wrong place.  I expect
>>>> that you'll be better served by configuring the middleware to do the right thing.
>>> I'll pile on here: in almost 20 years of professional database
>>> development I've never had an actual problem that was solved by
>>> introducing or shortening a length constraint to text columns except
>>> in cases where overlong strings violate the data model (like a two
>>> character state code for example).  It's a database equivalent of "C
>>> programmer's disease".  Input checks from untrusted actors should
>>> happen in the application.
>>>
>>> merlin
>>>
>> I do not have your experience level with data bases, but if I may, I
>> will make an addition. Input checks should also happen in the RDBMS
>> server. I have learned you cannot trust end users _or_ programmers.
>> Most are good and conscientious. But there are a few who just aren't.
>> And those few seem to be very prolific in making _subtle_ errors. Had
>> one person who was really good at replacing every p with a [ and P
>> with {
>>
>>
> You don't want that string to get all the way to the server and fail,
> blow out a transaction and carry that joyous news back to the user who
> now has to start over completely.  Further no mear length constraint
> is going to fix p<=>[.  Not say the db cannot have the constraint (no
> [ allowed?) but a good app checks input on the fly.
>
>
>

Indeed - both is the answer; back-end (trigger) checks for safety,
front-end application polite messages for clarity and ease of use.

--
Tim Clarke



Re: ALTER TEXT field to VARCHAR(1024)

От
John McKown
Дата:
Sorry guess I wasn't being as clear as I thought. To be a bit more
precise, I really think that validation should occur _first_ at the
point of entry (for a web browser, I put in Javascript code to verify
it there as well as in the web service doing the same validation
because some people disable Javascript as a possible security breach
vector), then also do the same, or even more, validation in the back
end server. I.e. don't trust any step of the process which is not
under your immediate control. As the "owner" of the data base, I want
to validate the data "myself" according to the proper business rules.
The application developer should also validate the input. What I don't
believe in is a "trusted application" from which I would accept data
and not validate it before updating the data base. If such an
application were to exist, due to management dictum, I would audit
everything that I could to prove any corruption to the data base was
caused by this "can't ever be wrong" application. Yes, I am a
paranoid.

On Mon, Sep 22, 2014 at 11:18 AM, Rob Sargent <robjsargent@gmail.com> wrote:
> On 09/22/2014 09:40 AM, John McKown wrote:
>
> On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
> On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com>
> wrote:
>
> On Fri, 19 Sep 2014 09:32:09 +0200
> Marius Grama <mariusneo@gmail.com> wrote:
>
> Can anybody explain me what happens in the background when the alter
> statement is executed? I've tried it out on a small copy of the table (70K)
> and the operation completed in 0.2 seconds.
> Will the table be completely locked during the execution of the ALTER
> statement?
>
> I share Gavin's concern that you're fixing this in the wrong place.  I
> expect
> that you'll be better served by configuring the middleware to do the right
> thing.
>
> I'll pile on here: in almost 20 years of professional database
> development I've never had an actual problem that was solved by
> introducing or shortening a length constraint to text columns except
> in cases where overlong strings violate the data model (like a two
> character state code for example).  It's a database equivalent of "C
> programmer's disease".  Input checks from untrusted actors should
> happen in the application.
>
> merlin
>
> I do not have your experience level with data bases, but if I may, I
> will make an addition. Input checks should also happen in the RDBMS
> server. I have learned you cannot trust end users _or_ programmers.
> Most are good and conscientious. But there are a few who just aren't.
> And those few seem to be very prolific in making _subtle_ errors. Had
> one person who was really good at replacing every p with a [ and P
> with {
>
>
> You don't want that string to get all the way to the server and fail, blow
> out a transaction and carry that joyous news back to the user who now has to
> start over completely.  Further no mear length constraint is going to fix
> p<=>[.  Not say the db cannot have the constraint (no [ allowed?) but a good
> app checks input on the fly.
>
>
>



--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown


Re: ALTER TEXT field to VARCHAR(1024)

От
Merlin Moncure
Дата:
On Mon, Sep 22, 2014 at 10:40 AM, John McKown
<john.archie.mckown@gmail.com> wrote:
> On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> I'll pile on here: in almost 20 years of professional database
>> development I've never had an actual problem that was solved by
>> introducing or shortening a length constraint to text columns except
>> in cases where overlong strings violate the data model (like a two
>> character state code for example).  It's a database equivalent of "C
>> programmer's disease".  Input checks from untrusted actors should
>> happen in the application.
>>
>> merlin
>>
>
> I do not have your experience level with data bases, but if I may, I
> will make an addition. Input checks should also happen in the RDBMS
> server. I have learned you cannot trust end users _or_ programmers.
> Most are good and conscientious. But there are a few who just aren't.
> And those few seem to be very prolific in making _subtle_ errors. Had
> one person who was really good at replacing every p with a [ and P
> with {

Sure.  The point is distinguishing things which are *demonstrably*
false (like a US VIN must be exactly 17 chars) from those that are
based assumption (such as a cityname must be <= 50 characters).  The
former should be validated in the schema and the latter should not be.
If you're paranoid about the user submitting 100mb strings for
"username" and don't trust the application to deal with that, I'd
maybe consider making a domain 'safetext' which checks length on the
order of a few thousand bytes and using that instead of 'text' and use
it everywhere.  This will prevent the dba from outsmarting the
datamodel which is a *much* bigger problem in practice than the one
length checks attempt to solve.

Domains have certain disadvantages (like no array type) -- be advised.

merlin


Re: ALTER TEXT field to VARCHAR(1024)

От
David G Johnston
Дата:
Merlin Moncure-2 wrote
> On Mon, Sep 22, 2014 at 10:40 AM, John McKown
> <

> john.archie.mckown@

> > wrote:
>> On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <

> mmoncure@

> > wrote:
>>> I'll pile on here: in almost 20 years of professional database
>>> development I've never had an actual problem that was solved by
>>> introducing or shortening a length constraint to text columns except
>>> in cases where overlong strings violate the data model (like a two
>>> character state code for example).  It's a database equivalent of "C
>>> programmer's disease".  Input checks from untrusted actors should
>>> happen in the application.
>>>
>>> merlin
>>>
>>
>> I do not have your experience level with data bases, but if I may, I
>> will make an addition. Input checks should also happen in the RDBMS
>> server. I have learned you cannot trust end users _or_ programmers.
>> Most are good and conscientious. But there are a few who just aren't.
>> And those few seem to be very prolific in making _subtle_ errors. Had
>> one person who was really good at replacing every p with a [ and P
>> with {
>
> Sure.  The point is distinguishing things which are *demonstrably*
> false (like a US VIN must be exactly 17 chars) from those that are
> based assumption (such as a cityname must be <= 50 characters).  The
> former should be validated in the schema and the latter should not be.
> If you're paranoid about the user submitting 100mb strings for
> "username" and don't trust the application to deal with that, I'd
> maybe consider making a domain 'safetext' which checks length on the
> order of a few thousand bytes and using that instead of 'text' and use
> it everywhere.  This will prevent the dba from outsmarting the
> datamodel which is a *much* bigger problem in practice than the one
> length checks attempt to solve.
>
> Domains have certain disadvantages (like no array type) -- be advised.
>
> merlin

These responses all seem beside the point.  The OP isn't concerned that
too-long data is making it into the database but rather that an unadorned
text type is functionally a CLOB which the application he is using is
treating like a document instead of a smallish text field that would be
treated like any other value.  It's like the difference between choosing
input/text or textarea in HTML.  Now, some tools distinguish between "text"
and "varchar" only and the length piece is irrelevant; but whether that
applies here I have no idea.

It might be easier to simply create a view over the table, using the desired
type (truncating the actual value if needed), and feed that view to the
reporting engine.

In the end the two questions are:
1) does adding a length restriction cause a table rewrite?
2) what level of locking occurs while the length check is resolving?

I don't confidently know the answers to those two questions.

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/ALTER-TEXT-field-to-VARCHAR-1024-tp5819608p5819939.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: ALTER TEXT field to VARCHAR(1024)

От
Tom Lane
Дата:
David G Johnston <david.g.johnston@gmail.com> writes:
> In the end the two questions are:
> 1) does adding a length restriction cause a table rewrite?

Yes.  In principle the restriction could be checked with just a scan,
not a rewrite, but ALTER TABLE doesn't currently understand that --- and
in any case a scan would still be potentially a long time.

> 2) what level of locking occurs while the length check is resolving?

AccessExclusiveLock.  This would be necessary in any case for a data type
change.

            regards, tom lane


Re: ALTER TEXT field to VARCHAR(1024)

От
Jeff Janes
Дата:
On Mon, Sep 22, 2014 at 8:40 AM, John McKown <john.archie.mckown@gmail.com> wrote:
On Mon, Sep 22, 2014 at 10:31 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Sep 19, 2014 at 7:16 AM, Bill Moran <wmoran@potentialtech.com> wrote:
>> On Fri, 19 Sep 2014 09:32:09 +0200
>> Marius Grama <mariusneo@gmail.com> wrote:
>>> Can anybody explain me what happens in the background when the alter
>>> statement is executed? I've tried it out on a small copy of the table (70K)
>>> and the operation completed in 0.2 seconds.
>>> Will the table be completely locked during the execution of the ALTER
>>> statement?
>>
>> I share Gavin's concern that you're fixing this in the wrong place.  I expect
>> that you'll be better served by configuring the middleware to do the right thing.
>
> I'll pile on here: in almost 20 years of professional database
> development I've never had an actual problem that was solved by
> introducing or shortening a length constraint to text columns except
> in cases where overlong strings violate the data model (like a two
> character state code for example).  It's a database equivalent of "C
> programmer's disease".  Input checks from untrusted actors should
> happen in the application.
>
> merlin
>

I do not have your experience level with data bases, but if I may, I
will make an addition. Input checks should also happen in the RDBMS
server. I have learned you cannot trust end users _or_ programmers.
Most are good and conscientious. But there are a few who just aren't.

So fire them.
 
And those few seem to be very prolific in making _subtle_ errors. Had
one person who was really good at replacing every p with a [ and P
with {

Your solution is what, arbitrarily forbidding the use of '[' when that is not a logically forbidden character, just because someone might make a mistake?  What do you do when they wish that someone have "a lot of gun on your vacation"?

Nothing ticks me off more than some DBA deciding that it is unreasonable for my street address to be more than 25 characters long, when obviously neither I nor the USPS agrees with that arbitrary limitation.  Unless 25 is the maximum number of characters that physically fit on the mailing label (and you are sure you will never change label printers), it is not your job to decide how long my street name can be.  Get over yourself.

If you need to verify that the data is accurate, then implement methods to verify that.  Verifying that the data is "reasonable", according to some ignorant standard of reasonableness, is not the same thing as verifying that it is accurate.

More than one company has lost business by refusing to acknowledge that I might know how to spell my own address.

Cheers,

Jeff, whose street address has 27 characters, whether you like it or not.