Обсуждение: Number of Columns and Update

Поиск
Список
Период
Сортировка

Number of Columns and Update

От
Robert DiFalco
Дата:
This may fall into the category of over-optimization but I've become curious.

I have a user table with about 14 columns that are all 1:1 data - so they can't be normalized.

When I insert a row all columns need to be set. But when I update, I sometimes only update 1-2 columns at a time. Does the number of columns impact update speed?

For example:
     UPDATE users SET email = ? WHERE id = ?;

I can easily break this up into logical tables like user_profile, user_credential, user_contact_info, user_summary, etc with each table only having 1-4 columns. But with the multiple tables I would often be joining them to bring back a collection of columns. 

I know I'm over thinking this but I'm curious of what the performance trade offs are for breaking up a table into smaller logically grouped tables.

Thanks.

Re: Number of Columns and Update

От
Heikki Linnakangas
Дата:
On 12/22/2014 10:53 PM, Robert DiFalco wrote:
> This may fall into the category of over-optimization but I've become
> curious.
>
> I have a user table with about 14 columns that are all 1:1 data - so they
> can't be normalized.
>
> When I insert a row all columns need to be set. But when I update, I
> sometimes only update 1-2 columns at a time. Does the number of columns
> impact update speed?
>
> For example:
>       UPDATE users SET email = ? WHERE id = ?;

Yes, the number of columns in the table matters. The update is just as
expensive regardless of how many of the columns you update.

When a row is updated, PostgreSQL creates a new version of the whole
row. The new row version takes more space when the table has more
columns, leading to more bloating of the table, which generally slows
things down. In most applications the difference isn't big enough to
matter, but it can be significant if you have very wide rows, and you
update a lot.

PostgreSQL 9.4 made an improvement on this. In earlier versions, the new
row version was also included completely in the WAL record, which added
overhead. In 9.4, any columns at the beginning or end of the row that
are not modified are left out of the WAL record, as long as the new row
version is stored on the same page as the old one (which is common). For
updating a single column, or a few columns that are next to each other,
that's the same as saying that only the modified part of the row is
WAL-logged.

> I can easily break this up into logical tables like user_profile,
> user_credential, user_contact_info, user_summary, etc with each table only
> having 1-4 columns. But with the multiple tables I would often be joining
> them to bring back a collection of columns.

That would help with the above-mentioned issues, but dealing with
multiple tables certainly adds a lot of overhead too. Most likely you're
better off just having the single table, after all.

- Heikki



Re: Number of Columns and Update

От
Andrew Dunstan
Дата:
On 12/22/2014 03:53 PM, Robert DiFalco wrote:
> This may fall into the category of over-optimization but I've become
> curious.
>
> I have a user table with about 14 columns that are all 1:1 data - so
> they can't be normalized.
>
> When I insert a row all columns need to be set. But when I update, I
> sometimes only update 1-2 columns at a time. Does the number of
> columns impact update speed?
>
> For example:
>      UPDATE users SET email = ? WHERE id = ?;
>
> I can easily break this up into logical tables like user_profile,
> user_credential, user_contact_info, user_summary, etc with each table
> only having 1-4 columns. But with the multiple tables I would often be
> joining them to bring back a collection of columns.
>
> I know I'm over thinking this but I'm curious of what the performance
> trade offs are for breaking up a table into smaller logically grouped
> tables.
>
>

An update rewrites the whole row, not just the updated columns.

I think you are overthinking it.

cheers

andrew