RE: text fields and performance for ETL

Поиск
Список
Период
Сортировка
От Grega Jesih
Тема RE: text fields and performance for ETL
Дата
Msg-id 686716cb5c4242c28e5967d9b6fcfc75@actual-it.si
обсуждение исходный текст
Ответ на Re: text fields and performance for ETL  ("David G. Johnston" <david.g.johnston@gmail.com>)
Ответы Re: text fields and performance for ETL
Список pgsql-docs

Dear David,

> The goal in our docs is to point out that using an arbitrary length specification is not required in PostgreSQL.

Then it should say so. Because unexperienced reader then uses this limited focus and generalizes.

 

It is for the very same reason that progreammers don't perceive the need to limit the string size to its realistic size that ETL processes

are slowed down.

For example a currency where 3-char encoding is used, the field should be char(3) and not text.

>
The documentation assumes that the sizes involved here are reasonable for such behavior

On the contrary. When you say "performance is the same.." then you make a wrong impression it is an unversal case. But, if fact it depends.


When you include such table into some dataflow from server A to server B, this process will get slowed down. I explained why.
It is important to teach people that update the model to use logical values that make sense for such cases.

IF you teach instead "ah it is not important, you may use text", then you actually ignore a part of informatic team that provides the usage of this data.

With this you don't contribute. You create a problem.

 

> All of my ETL simply reads

Well perhaps yours. But there are pro-tools (I refer to SSIS in this context) that provide a very fast dataflow in case there is a known record size.

In this case you go 100x faster as a rule of thumb. When dealing with billions of records, it makes a biiig difference.

Best regards
Grega

PS
I work in actual-it.si and gmail.com mail is fwded to me. So I took a shorcut here..



 

 

From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Wednesday, November 3, 2021 3:38 PM
To: grega.jesih@gmail.com; Pg Docs <pgsql-docs@lists.postgresql.org>
Subject: Re: text fields and performance for ETL

 

On Wed, Nov 3, 2021 at 6:55 AM PG Doc comments form <noreply@postgresql.org> wrote:

But performance in ETL processes related to such data type is decreased
dramatically, because any process that takes this kind of data needs to
calculate its size on a row level and cannot take bigger chunks of data
based on max size.

 

All of my ETL simply reads in the entire contents of a text field.  There is no chunking.  The documentation assumes that the sizes involved here are reasonable for such behavior.  If you have a situation where you've chosen to use varchar(n) and can defend that choice more power to you.  Those special circumstances are not of particular interest here.  For the vast majority of users they use varchar(n) because they (or more likely their teachers) come from systems where it is required.  The goal in our docs is to point out that using an arbitrary length specification is not required in PostgreSQL.

 

David J.

 


NOTICE - NOT TO BE REMOVED.
This e-mail and any attachments are confidential and may contain legally privileged information and/or copyright material of Actual I.T. or third parties. If you are not an authorised recipient of this e-mail, please contact Actual I.T. immediately by return email or by telephone or facsimile on the above numbers.
You should not read, print, re-transmit, store or act in reliance on this email or any attachments and you should destroy all copies of them.

В списке pgsql-docs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: text fields and performance for ETL
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: text fields and performance for ETL