Re: text fields and performance for ETL

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: text fields and performance for ETL
Дата
Msg-id CAKFQuwbaVh+-gzTCnD654mqbdkGmjDKETV2M3oBtCO-f2mKzsA@mail.gmail.com
обсуждение исходный текст
Ответ на text fields and performance for ETL  (PG Doc comments form <noreply@postgresql.org>)
Список pgsql-docs
On Wed, Nov 3, 2021 at 8:35 AM Grega Jesih <Grega.Jesih@actual-it.si> wrote:

Dear David,

> The goal in our docs is to point out that using an arbitrary length specification is not required in PostgreSQL.

Then it should say so. Because unexperienced reader then uses this limited focus and generalizes.


It does (but maybe it could be improved, see the FAQ entry linked below for more detail).

 

It is for the very same reason that progreammers don't perceive the need to limit the string size to its realistic size that ETL processes

are slowed down.


Given the number of people I find agreeing with "just use text" and the general lack of people making good arguments for using varchar(n) I'm inclined to believe the status quo best reflects the majority of usage in the wild.

For example a currency where 3-char encoding is used, the field should be char(3) and not text.


The char data type has its own problems (see the other FAQ entry linked below)

> The documentation assumes that the sizes involved here are reasonable for such behavior

On the contrary. When you say "performance is the same.." then you make a wrong impression it is an unversal case. But, if fact it depends.


As far as PostgreSQL itself is concerned there is no "it depends".  That is all we are claiming here.  And, frankly, generalization is correct in probably 90% of situations.  Maybe there is room for improvement, in documentation that is usually the case, do you have a concrete suggestion to offer?


When you include such table into some dataflow from server A to server B, this process will get slowed down. I explained why.
It is important to teach people that update the model to use logical values that make sense for such cases.


Teaching data modelling techniques isn't really a goal for our documentation.  We aim to inform how the PostgreSQL works.
 

IF you teach instead "ah it is not important, you may use text", then you actually ignore a part of informatic team that provides the usage of this data.


In most cases (how wide should a name field be) there is no good length to choose.  For the currency abbreviation example I would add a "check length(currency_name) = 3" constraint alongside a unique constraint on the lookup table - but the data type would still just be text.  I can also enforce all uppercase and letters only for the symbol in the formal constraint while the char(3) will happily allow a value of "u6D".  In either case the actual performance of processing that text field (input and output) would be the same in PostgreSQL.  So if servers A and B are both PostgreSQL you are simply incorrect.  If they are not then the example is largely out-of-scope for our documentation.

 (I refer to SSIS in this context) that provide a very fast dataflow in case there is a known record size.


varchar(n) says nothing about the minimum size allowed which means it does nothing to help for "known record size".  For that you need, and have, actual constraints.

If anything the documentation lacks in making these points clear by the very evidence of two FAQ entries covering the topic.


David J.


В списке pgsql-docs по дате отправления:

Предыдущее
От: Grega Jesih
Дата:
Сообщение: RE: text fields and performance for ETL
Следующее
От: Tom Lane
Дата:
Сообщение: Re: text fields and performance for ETL