Обсуждение: Please suggest me on my table design (indexes!)

Поиск
Список
Период
Сортировка

Please suggest me on my table design (indexes!)

От
DaNieL
Дата:
Hi guys, im tryin to optimize a simple table, suited for contain
users.
So, my table at the moment is:

-----
CREATE TABLE contacts(
 id BIGSERIAL PRIMARY KEY NOT NULL UNIQUE,
 company_id BIGINT,
 code varchar(10),
 company_name varchar(120),
 name varchar(120),
 surname varchar(120),
 phone varchar(80),
 email varchar(80),
 kind varchar(8)
);
-----

I use this layout in order to store 3 kind of users: Private, Company
and Company's Employee.. the col 'kind' infact will contain just
'private', 'company' or 'employee', but is unnecessary, i can
understand what kind a user is by those rules:
Private are stand-alone users, company_name and company_id are always
NULL;
Company have the company_name;
Employees have the company name and the company_id (with the id of the
company's row);

Example:

id|company_id|code| company_name|name|surname|phone|
email               |   kind
1 |NULL         |C001| Sunday Inc.     | John | Doe      |88888 |
j@sunday.com | company
2 | 1              |E001| Sunday Inc.     |Paul  | Smith   | 77777|
smith@sunday.com| employee
3 | NULL        |P001| NULL              | Rose | Mary   | 66666|
rose@mary.com |  private

So, first of all, does this layout looks good?
Before i used to keep employees in a different table, becose usually
employees have just few data (name, surname, direct email and direct
phone.. all the addresses, bank data, etc.. belongs tot he company),
but noe i preferred this way to avoid constant inner joins.

Now, i aspect that my users will search the contact table just for the
fields company_name, name, surname, email, code.
That kind of query cant be as
WHERE company_name = '$x'
but will be much like
WHERE company_name LIKE '$x%',
both becose i use an autocomplete field for the quick search, both
becose.. well, that's how users search data's (in my experience).

So i created those index, to let the query planner use the indexes in
the LIKE query:
-----
CREATE INDEX contact_companyname_i ON contact USING btree (lower
(company_name::text) varchar_pattern_ops)
#this kind of index is the same for name, surname
#the email must be unique:
CREATE UNIQUE INDEX contact_email_i ON contact USING btree (lower
(email::text) varchar_pattern_ops);
-----

Therefore the phone column will be searched with the LIKE condition,
but dont need the case-sensitive-loser-trick:
-----
CREATE INDEX contact_n_phone_i ON contact USING btree (phone
varchar_pattern_ops)
-----

However for the email, code, id and company_id columns i want to
permit an index search even with the exact pattern, so i added those
indexes too:
-----
CREATE INDEX contact_n_email_i ON contact USING btree (email);
CREATE INDEX contact_n_code_i ON contact USING btree (code);
CREATE UNIQUE INDEX contact_pkey ON contact USING btree (id); #PRIMARY
KEY
CREATE INDEX contact_n_idcompany_i ON contact USING btree (id_company)
-----

Again, feel free to suggest/insult me if this behavior looks bad (im
here to learn ;)
Well, the table will have other fields that dont need to be searched
directly (like the addresses) and so dont need indexes.

My doubt is: am i using too many indexes?
Will my insert/delete/update queryes be too slow, and does the select
optimization worth the price? (and, does this way really optimize the
selects queryes?)

Consider that my application wont do many insert-delete-update
sequentially.

Re: Please suggest me on my table design (indexes!)

От
"DaNieL..!"
Дата:
Almost forgot: one fo the heavier select query can be:
-----
SELECT
    contact.id,
    contact.company_id,
    contact.name AS nome,
    contact.surname AS cognome,
    contact.email AS email,
    contact.company_name AS azienda
FROM
    contact
WHERE
    (
     lower(contact.company_name) LIKE 'smit%' OR
        lower(contact.name) LIKE 'smit%' OR
        lower(contact.surname) LIKE 'smit%' OR
        contact.phone LIKE 'smit%' OR
        lower(contact.code) LIKE 'smit%' OR
        lower(contact.email) LIKE 'smit%'
    )
ORDER BY
    contact.company_name ASC, contact.company_id DESC
-----
And this is the queyr plain: http://explain.depesz.com/s/Il

Re: Please suggest me on my table design (indexes!)

От
Grzegorz Jaśkiewicz
Дата:
it looks ok on explain, that is - the cost isn't too high.
So what's the problem ?

Re: Please suggest me on my table design (indexes!)

От
"DaNieL..!"
Дата:
The `problem` is that i dont know if having so many indexes will raise
problems as the data dimension grown.

And i am not even sure that this design is truly reliable;
For example, if i would to know how many employees have every company,
i'll have to run that query:
-----
EXPLAIN ANALYZE
SELECT
    c.id,
    c.company_name,
    count(e.id) AS num_employee
FROM
    contact AS c
    LEFT JOIN contact AS e ON (c.id = e.id_company)
WHERE c.tipo = 'company'
GROUP BY c.id, c.company_name
-----

That seem to be not very efficient: http://explain.depesz.com/s/Q0m
(I loaded some test data from http://www.generatedata.com/#generator,
so now i have 23893 rows in the table.. oh, i added a index on the
'kind' (tipo) column too)



On 23 Giu, 13:47, gryz...@gmail.com (Grzegorz Jaśkiewicz) wrote:
> it looks ok on explain, that is - the cost isn't too high.
> So what's the problem ?
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general


Re: Please suggest me on my table design (indexes!)

От
Grzegorz Jaśkiewicz
Дата:
2009/6/23 DaNieL..! <daniele.pignedoli@gmail.com>:
> The `problem` is that i dont know if having so many indexes will raise
> problems as the data dimension grown.

> That seem to be not very efficient: http://explain.depesz.com/s/Q0m

Well, this is slow, because for some reason postgres decided to use
seq scan on contact e.
As for speed with many indices. Btree for varchar is going to be
slower, than - say for integer, or any other fixed length type.
This is due to nature of index. Having said that, if you expect a lot
of repetition, split it/normalize it.
Index performance also hurts, when you get a lot of variants of data
(worse case, all varchar rows are different, and don't share too many
leafs on index).
So it is always beneficial to have separate table, if data is
redundant - especially when it is text/varchar/bytea.

Also, index size grows pretty badly when you modify table's content a
lot in postgresql.
Rule of thumb, bigger the index in size, comparable to data size - the worse.


--
GJ

Re: Please suggest me on my table design (indexes!)

От
"DaNieL..!"
Дата:
Yes, after my post i've tryed the versione with 2 separate table (a
copy of the contact table) with inside just the employees, and, with
my surprise, the query planner looks identical, both with 1 big table
and with 2 splitted table.
This sound a bit strange for me, becose in my test the 'employees
rows' are about 13K, the company about 3k, the private around 7K.
So i thougt that moving the employees to another table would decrese
much the indexes weight on the original table, and so raise the speed
for the remaining original contact table (with inside just private and
company users)

On 23 Giu, 16:29, gryz...@gmail.com (Grzegorz Jaśkiewicz) wrote:
> 2009/6/23 DaNieL..! <daniele.pigned...@gmail.com>:
>
> > The `problem` is that i dont know if having so many indexes will raise
> > problems as the data dimension grown.
> > That seem to be not very efficient:http://explain.depesz.com/s/Q0m
>
> Well, this is slow, because for some reason postgres decided to use
> seq scan on contact e.
> As for speed with many indices. Btree for varchar is going to be
> slower, than - say for integer, or any other fixed length type.
> This is due to nature of index. Having said that, if you expect a lot
> of repetition, split it/normalize it.
> Index performance also hurts, when you get a lot of variants of data
> (worse case, all varchar rows are different, and don't share too many
> leafs on index).
> So it is always beneficial to have separate table, if data is
> redundant - especially when it is text/varchar/bytea.
>
> Also, index size grows pretty badly when you modify table's content a
> lot in postgresql.
> Rule of thumb, bigger the index in size, comparable to data size - the worse.
>
> --
> GJ
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general


Re: Please suggest me on my table design (indexes!)

От
"Chris Spotts"
Дата:
IMHO running queries on 23k'ish worth of rows isn't liable to stress any
reasonably modern server, likely several times over that shouldn't either
for simple "LIKE" searches.
What kind of growth are you expecting?

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of DaNieL..!
> Sent: Tuesday, June 23, 2009 9:44 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Please suggest me on my table design (indexes!)
>
> Yes, after my post i've tryed the versione with 2 separate table (a
> copy of the contact table) with inside just the employees, and, with
> my surprise, the query planner looks identical, both with 1 big table
> and with 2 splitted table.
> This sound a bit strange for me, becose in my test the 'employees
> rows' are about 13K, the company about 3k, the private around 7K.
> So i thougt that moving the employees to another table would decrese
> much the indexes weight on the original table, and so raise the speed
> for the remaining original contact table (with inside just private and
> company users)
>
> On 23 Giu, 16:29, gryz...@gmail.com (Grzegorz Jaśkiewicz) wrote:
> > 2009/6/23 DaNieL..! <daniele.pigned...@gmail.com>:
> >
> > > The `problem` is that i dont know if having so many indexes will
> raise
> > > problems as the data dimension grown.
> > > That seem to be not very efficient:http://explain.depesz.com/s/Q0m
> >
> > Well, this is slow, because for some reason postgres decided to use
> > seq scan on contact e.
> > As for speed with many indices. Btree for varchar is going to be
> > slower, than - say for integer, or any other fixed length type.
> > This is due to nature of index. Having said that, if you expect a lot
> > of repetition, split it/normalize it.
> > Index performance also hurts, when you get a lot of variants of data
> > (worse case, all varchar rows are different, and don't share too many
> > leafs on index).
> > So it is always beneficial to have separate table, if data is
> > redundant - especially when it is text/varchar/bytea.
> >
> > Also, index size grows pretty badly when you modify table's content a
> > lot in postgresql.
> > Rule of thumb, bigger the index in size, comparable to data size -
> the worse.
> >
> > --
> > GJ
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> > To make changes to your
> subscription:http://www.postgresql.org/mailpref/pgsql-general
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: Please suggest me on my table design (indexes!)

От
justin
Дата:
DaNieL wrote:
> Hi guys, im tryin to optimize a simple table, suited for contain
> users.
> So, my table at the moment is:
>
> -----
> CREATE TABLE contacts(
>  id BIGSERIAL PRIMARY KEY NOT NULL UNIQUE,
>  company_id BIGINT,
>  code varchar(10),
>  company_name varchar(120),
>  name varchar(120),
>  surname varchar(120),
>  phone varchar(80),
>  email varchar(80),
>  kind varchar(8)
> );
> -----
>
> I use this layout in order to store 3 kind of users: Private, Company
> and Company's Employee.. the col 'kind' infact will contain just
> 'private', 'company' or 'employee', but is unnecessary, i can
> understand what kind a user is by those rules:
> Private are stand-alone users, company_name and company_id are always
> NULL;
> Company have the company_name;
> Employees have the company name and the company_id (with the id of the
> company's row);
>

The layout looks find  although i would not use Bigserial unless you
expect to exceed 2.1 billion records

> Example:
>
> id|company_id|code| company_name|name|surname|phone|
> email               |   kind
> 1 |NULL         |C001| Sunday Inc.     | John | Doe      |88888 |
> j@sunday.com | company
> 2 | 1              |E001| Sunday Inc.     |Paul  | Smith   | 77777|
> smith@sunday.com| employee
> 3 | NULL        |P001| NULL              | Rose | Mary   | 66666|
> rose@mary.com |  private
>
> So, first of all, does this layout looks good?
> Before i used to keep employees in a different table, becose usually
> employees have just few data (name, surname, direct email and direct
> phone.. all the addresses, bank data, etc.. belongs tot he company),
> but noe i preferred this way to avoid constant inner joins.
>
> Now, i aspect that my users will search the contact table just for the
> fields company_name, name, surname, email, code.
> That kind of query cant be as
> WHERE company_name = '$x'
> but will be much like
> WHERE company_name LIKE '$x%',
> both becose i use an autocomplete field for the quick search, both
> becose.. well, that's how users search data's (in my experience).
>

What i have done with searches on small strings where the user is unsure
what they are looking for or the spelling, I do something like this

where substr(company_name,1,length($searchtext$ UserSearchString
$searchtext$)) ilike  $searchtext$UserSearchString $searchtext$

and mix it with soundex.  This way the user get a list of possible
matches with only handful to type characters

 Draw back is this type of search is it can't be indexed.

> So i created those index, to let the query planner use the indexes in
> the LIKE query:
>
> My doubt is: am i using too many indexes?
> Will my insert/delete/update queryes be too slow, and does the select
> optimization worth the price? (and, does this way really optimize the
> selects queryes?)
>

The more indexes you have the slower updates will be.  Yet not a
horrible amount.   The answer to this is it depends on the work load can
the system suffer the overhead of the indexes and still give adequate
results on queries.
> Consider that my application wont do many insert-delete-update
> sequentially.
>


Re: Please suggest me on my table design (indexes!)

От
"DaNieL..!"
Дата:
I thought to analyze the input chars to avoid useless searches, for
example, if the digit is EXXXXX, where X is number, it is the
user_code, and i'll search just that field; otherwise if the digit is
an email, i'll look only at the email column.

But, the things get little deeper, with the custom fields.
Like in wordpress, if for example someone need 2 email fields for the
user table, it can set how many custome field he want, this with 2
tables:

-----
CREATE TABLE contact_custom_field (
    id serial NOT NULL UNIQUE;
    kind varchar(20);
    input varchar(10);
);

CREATE TABLE contact_custom_values (
    id serial NOT NULL UNIQUE;
    id_contact integer NOT NULL REFERENCES contact ON (id);
    kind varchar(20) REFERENCES contact_custom_field ON (kind);
    value varchar(250)
);
-----

That layout is just  a draft;
So, if the user need 2 email fields, i'll  have those situation:

table contact_custom_value (i'll need a shorter name ;):
1 |email | manual

table contact_custom_values:
1 | 1 | 20 | email | something@example.com |

And so on for every contact who have 2 emaila address.
And obviusly the 'master search' query must look even in those custom
field (just in some kind of them).

So, the situation is a bit complicated -.-


> What i have done with searches on small strings where the user is unsure
> what they are looking for or the spelling, I do something like this
>
> where substr(company_name,1,length($searchtext$ UserSearchString
> $searchtext$)) ilike  $searchtext$UserSearchString $searchtext$
>
> and mix it with soundex.  This way the user get a list of possible
> matches with only handful to type characters
>
>  Draw back is this type of search is it can't be indexed.

Re: Please suggest me on my table design (indexes!)

От
"DaNieL..!"
Дата:
Yes, surely bigserial of overdimensioned.

Anyway, i need to keep the postgresql cpu and ram usage as lower as
possible, same for the disk usage.
I'm starded an application that would be similar to shopify.com, but
free and opensource (it will be opensource just when i'll finish at
least the first beta), so everyone can register, everyone will have it
own database, with the same structure for all.. and even the 'engine',
php and python scripts, will be the same for everyone.
And all that must run on one server, at the moment..
Till now i tested with 10 beta-tester (the old version of the project,
using mysql database) and everythings works fine, but i'll need to
have at least 1000 users using it 24/7, keeping the performance
acceptable.

On 23 Giu, 17:35, rfu...@gmail.com ("Chris Spotts") wrote:
> IMHO running queries on 23k'ish worth of rows isn't liable to stress any
> reasonably modern server, likely several times over that shouldn't either
> for simple "LIKE" searches.
> What kind of growth are you expecting?
>
>
>
> > -----Original Message-----
> > From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
> > ow...@postgresql.org] On Behalf Of DaNieL..!
> > Sent: Tuesday, June 23, 2009 9:44 AM
> > To: pgsql-gene...@postgresql.org
> > Subject: Re: [GENERAL] Please suggest me on my table design (indexes!)
>
> > Yes, after my post i've tryed the versione with 2 separate table (a
> > copy of the contact table) with inside just the employees, and, with
> > my surprise, the query planner looks identical, both with 1 big table
> > and with 2 splitted table.
> > This sound a bit strange for me, becose in my test the 'employees
> > rows' are about 13K, the company about 3k, the private around 7K.
> > So i thougt that moving the employees to another table would decrese
> > much the indexes weight on the original table, and so raise the speed
> > for the remaining original contact table (with inside just private and
> > company users)
>
> > On 23 Giu, 16:29, gryz...@gmail.com (Grzegorz Ja¶kiewicz) wrote:
> > > 2009/6/23 DaNieL..! <daniele.pigned...@gmail.com>:
>
> > > > The `problem` is that i dont know if having so many indexes will
> > raise
> > > > problems as the data dimension grown.
> > > > That seem to be not very efficient:http://explain.depesz.com/s/Q0m
>
> > > Well, this is slow, because for some reason postgres decided to use
> > > seq scan on contact e.
> > > As for speed with many indices. Btree for varchar is going to be
> > > slower, than - say for integer, or any other fixed length type.
> > > This is due to nature of index. Having said that, if you expect a lot
> > > of repetition, split it/normalize it.
> > > Index performance also hurts, when you get a lot of variants of data
> > > (worse case, all varchar rows are different, and don't share too many
> > > leafs on index).
> > > So it is always beneficial to have separate table, if data is
> > > redundant - especially when it is text/varchar/bytea.
>
> > > Also, index size grows pretty badly when you modify table's content a
> > > lot in postgresql.
> > > Rule of thumb, bigger the index in size, comparable to data size -
> > the worse.
>
> > > --
> > > GJ
>
> > > --
> > > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> > > To make changes to your
> > subscription:http://www.postgresql.org/mailpref/pgsql-general
>
> > --
> > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> > To make changes to your subscription:
> >http://www.postgresql.org/mailpref/pgsql-general
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general