Обсуждение: Arrays vs separate tables

Поиск
Список
Период
Сортировка

Arrays vs separate tables

От
Rich Shepard
Дата:
I started developing business and science databases in the mid-1980s using
dBASE III, then Paradox on DOS. I defenestrated in 1997 and have used only
linux for both business and personal needs ever since and using only
PostgreSQL for my databaes (primarily for my own use.) I read Joe Celko's
monthly column in 'Database Advisor' and all his SQL books (in multiple
editions.)

Now I'm slowly cleaning up my business tracking database using features not
available way back when I developed it. That's why I ask questions that must
seem obvious to all of you who work with postgres everyday and have for
years. I'm reading 'SQL Queries for Mere Mortals' and just ordered the 4th
edition of 'Database Design for Mere Mortals' to update my SQL knowledge.

In the former book I read that sometimes it's better to have multiple values
for an atribute in a single row by creating a separate table for that
attribute rather than using the postgres array capability. The people table
in my database (1706 rows) has two attributes using the array type
(direct_phone and email), and only a few indivuals have multiple landline
phone numbers (cell_phone is a separate column) or email addresses (office
and personal?). Would it make sense for me to create new landline and email
address tables and replace the array contents?

Other than remembering to use curly braces {} when entering data into an
array column I don't think there'd be any performance benefit for making the
change. I'm curious to learn about arrays vs separate tables.

TIA,

Rich



Re: Arrays vs separate tables

От
Adrian Klaver
Дата:
On 10/19/25 13:43, Rich Shepard wrote:

> In the former book I read that sometimes it's better to have multiple 
> values
> for an atribute in a single row by creating a separate table for that
> attribute rather than using the postgres array capability. The people table
> in my database (1706 rows) has two attributes using the array type
> (direct_phone and email), and only a few indivuals have multiple landline
> phone numbers (cell_phone is a separate column) or email addresses (office
> and personal?). Would it make sense for me to create new landline and email
> address tables and replace the array contents?

For direct_phone and email entries that have more then one value, how do 
you know what the values are pointing at e.g home vs office vs second 
office location, etc?

> 
> Other than remembering to use curly braces {} when entering data into an
> array column I don't think there'd be any performance benefit for making 
> the
> change. I'm curious to learn about arrays vs separate tables.
> 
> TIA,
> 
> Rich
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Arrays vs separate tables

От
Ray O'Donnell
Дата:
On 19/10/2025 21:43, Rich Shepard wrote:

> In the former book I read that sometimes it's better to have multiple 
> values
> for an atribute in a single row by creating a separate table for that
> attribute rather than using the postgres array capability. The people 
> table
> in my database (1706 rows) has two attributes using the array type
> (direct_phone and email), and only a few indivuals have multiple landline
> phone numbers (cell_phone is a separate column) or email addresses 
> (office
> and personal?). Would it make sense for me to create new landline and 
> email
> address tables and replace the array contents?
>
My experience of doing something similar was that arrays work very well 
for the use-case you describe, as long as you don't have to search 
inside the arrays... I found that, if you have to search for a specific 
value inside an array, then performance really goes out the window. Mind 
you, clever use of an index would probably help here.

Ray.

-- 
Ray O'Donnell // Galway // Ireland
ray@rodonnell.ie




Re: Arrays vs separate tables

От
Rich Shepard
Дата:
On Sun, 19 Oct 2025, Adrian Klaver wrote:

> For direct_phone and email entries that have more then one value, how do you 
> know what the values are pointing at e.g home vs office vs second office 
> location, etc?

Adrian,

At this point I don't know. If there's no answer on one number I try
another. If there were separate tables the three colums would be in interger
PK, a description, and the number/address.

Rich



Re: Arrays vs separate tables

От
Rich Shepard
Дата:
On Sun, 19 Oct 2025, Ray O'Donnell wrote:

> My experience of doing something similar was that arrays work very well
> for the use-case you describe, as long as you don't have to search inside
> the arrays... I found that, if you have to search for a specific value
> inside an array, then performance really goes out the window. Mind you,
> clever use of an index would probably help here.

Ray,

So far searching has not been an issue so I'll keep the status quo.

Thanks,

Rich



Re: Arrays vs separate tables

От
"Peter J. Holzer"
Дата:
On 2025-10-19 13:43:09 -0700, Rich Shepard wrote:
> Now I'm slowly cleaning up my business tracking database using features not
> available way back when I developed it. That's why I ask questions that must
> seem obvious to all of you who work with postgres everyday and have for
> years. I'm reading 'SQL Queries for Mere Mortals' and just ordered the 4th
> edition of 'Database Design for Mere Mortals' to update my SQL knowledge.
>
> In the former book I read that sometimes it's better to have multiple values
> for an atribute in a single row by creating a separate table for that
> attribute rather than using the postgres array capability.

Replace "sometimes" with "usually", or "conventionally".

"No composite values" is the core of the first normal form. And of
course the other normal forms build on the first normal form, so your
dataase will never be normalized if you have arrays or other composite
values (e.g. json) in your database.

That said, sometimes it is advantageous to eschew normalization (if that
wasn't the case, PostgreSQL wouldn't have arrays, json, etc.).

The main reasons (IMHO) are:

* Performance: It's just a lot faster to get a row from a single table
  than to join multiple tables.
* Simplicity: If a set of values is always fetched together and updated
  together, you might as well treat it as a unit and not split it over
  multiple tables

I think the first reason is moot in your case. But it's the reason why
datawarehouses often contain huge tables with hundreds of columns and
lots and lots of redundant data.

The second may be relevant for you. If you always display and edit the
phone numbers of a contact together and your frontend makes it easier to
edit an array than a subset of rows from a table, you might just stuff
them into a table and ignore "purity".

        hjp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Arrays vs separate tables

От
Rich Shepard
Дата:
On Mon, 20 Oct 2025, Peter J. Holzer wrote:

> * Simplicity: If a set of values is always fetched together and updated
>  together, you might as well treat it as a unit and not split it over
>  multiple tables

> The second may be relevant for you. If you always display and edit the
> phone numbers of a contact together and your frontend makes it easier to
> edit an array than a subset of rows from a table, you might just stuff
> them into a table and ignore "purity".

Peter,

Thank you. That's my use case.

Regards,

Rich



Re: Arrays vs separate tables

От
Michał Kłeczek
Дата:
> On 20 Oct 2025, at 14:55, Rich Shepard <rshepard@appl-ecosys.com> wrote:
>
> On Mon, 20 Oct 2025, Peter J. Holzer wrote:
>
>> * Simplicity: If a set of values is always fetched together and updated
>> together, you might as well treat it as a unit and not split it over
>> multiple tables
>
>> The second may be relevant for you. If you always display and edit the
>> phone numbers of a contact together and your frontend makes it easier to
>> edit an array than a subset of rows from a table, you might just stuff
>> them into a table and ignore "purity".
>
> Peter,
>
> Thank you. That's my use case.

There is also another concern - do you want to make sure phone numbers are not shared?

—
Michal


Re: Arrays vs separate tables

От
Rich Shepard
Дата:
On Mon, 20 Oct 2025, Michał Kłeczek wrote:

> There is also another concern - do you want to make sure phone numbers are
> not shared?

Michal,

Shared with whom? I run a solo professional services consultancy so there's
only me here the database.

Rich



Re: Arrays vs separate tables

От
Michał Kłeczek
Дата:

> On 20 Oct 2025, at 16:07, Rich Shepard <rshepard@appl-ecosys.com> wrote:
>
> On Mon, 20 Oct 2025, Michał Kłeczek wrote:
>
>> There is also another concern - do you want to make sure phone numbers are
>> not shared?
>
> Michal,
>
> Shared with whom?

Sorry, I wasn’t clear.

How do you make sure a phone number is not shared by two (or more) customers?

—
Michal




Re: Arrays vs separate tables

От
Ron Johnson
Дата:
On Mon, Oct 20, 2025 at 10:07 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Mon, 20 Oct 2025, Michał Kłeczek wrote:

> There is also another concern - do you want to make sure phone numbers are
> not shared?

Michal,

Shared with whom? I run a solo professional services consultancy so there's
only me here the database.

Multiple contacts at the same client company.  If, for whatever reason, the phone number changes you've got to update X rows in your contacts table, whereas only one row needs to be updated when the schema meets 1NF.  This eliminates update anomalies.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Arrays vs separate tables

От
Rich Shepard
Дата:
On Mon, 20 Oct 2025, Michał Kłeczek wrote:

> How do you make sure a phone number is not shared by two (or more)
> customers?

Michal,

If it's the company's main telephone number then it's shared by all
employees with phones on their desks. If it's a direct number then it
connects to only a single phone.

Either way it doesn't matter to me. When the person I call is at their desk
my call will either be transferred by the receptionist or directly ring their
phone.

Rewgards,

Rich



Re: Arrays vs separate tables

От
Rich Shepard
Дата:
On Mon, 20 Oct 2025, Ron Johnson wrote:

> Multiple contacts at the same client company. If, for whatever reason, the
> phone number changes you've got to update X rows in your contacts table,
> whereas only one row needs to be updated when the schema meets 1NF. This
> eliminates update anomalies.

Ron,

Not an issue: the companies table has a 'phone' column while the people
table has both 'direct_phone' and 'cellphone' columns. And, the locations
table has a 'loc_phone' column because many industrial companies have
multiple facilities.

Regards,

Rich



Re: Arrays vs separate tables

От
Adrian Klaver
Дата:
On 10/20/25 07:07, Rich Shepard wrote:
> On Mon, 20 Oct 2025, Michał Kłeczek wrote:
> 
>> There is also another concern - do you want to make sure phone numbers 
>> are
>> not shared?
> 
> Michal,
> 
> Shared with whom? I run a solo professional services consultancy so there's
> only me here the database.

That is the heart of the matter, you should do what makes sense to you. 
If you can get the information you need in the manner you want then 
don't change things. Given the size of your datasets I don't see that 
changes will materially affect the performance of your queries. I am 
pretty sure most of the time the Postgres planner is resorting to a 
sequence scan anyway.

> 
> Rich
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Arrays vs separate tables

От
Rob Sargent
Дата:

> On Oct 20, 2025, at 8:42 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 10/20/25 07:07, Rich Shepard wrote:
>>> On Mon, 20 Oct 2025, Michał Kłeczek wrote:
>>> There is also another concern - do you want to make sure phone numbers are
>>> not shared?
>> Michal,
>> Shared with whom? I run a solo professional services consultancy so there's
>> only me here the database.
>
> That is the heart of the matter, you should do what makes sense to you. If you can get the information you need in
themanner you want then don't change things. Given the size of your datasets I don't see that changes will materially
affectthe performance of your queries. I am pretty sure most of the time the Postgres planner is resorting to a
sequencescan anyway. 

And all the hot tables have been paged in if not the entire db.


Re: Arrays vs separate tables

От
Michał Kłeczek
Дата:

> On 20 Oct 2025, at 16:26, Rich Shepard <rshepard@appl-ecosys.com> wrote:
>
> On Mon, 20 Oct 2025, Michał Kłeczek wrote:
>
>> How do you make sure a phone number is not shared by two (or more)
>> customers?
>
> Michal,
>
> If it's the company's main telephone number then it's shared by all
> employees with phones on their desks. If it's a direct number then it
> connects to only a single phone.
>
> Either way it doesn't matter to me. When the person I call is at their desk
> my call will either be transferred by the receptionist or directly ring their
> phone.
>

Do you have any means to make sure you didn’t enter the same number for two different companies then?

—
Michal


Re: Arrays vs separate tables

От
Rich Shepard
Дата:
On Mon, 20 Oct 2025, Michał Kłeczek wrote:

> Do you have any means to make sure you didn’t enter the same number for
> two different companies then?

Michal,

Yes. New companies are added one-at-a-time. In the 28 years I've used
postgres I've not had any such issues.

Regards,

Rich



Re: Arrays vs separate tables

От
Michał Kłeczek
Дата:
> On 20 Oct 2025, at 17:06, Rich Shepard <rshepard@appl-ecosys.com> wrote:
>
> On Mon, 20 Oct 2025, Michał Kłeczek wrote:
>
>> Do you have any means to make sure you didn’t enter the same number for
>> two different companies then?
>
> Michal,
>
> Yes. New companies are added one-at-a-time. In the 28 years I've used
> postgres I've not had any such issues.

Ok, if manual integrity validation works for you then arrays are fine.


—
Michal