Обсуждение: Question on COUNT performance

Поиск
Список
Период
Сортировка

Question on COUNT performance

От
Anders Østergaard Jensen
Дата:
Hello mailing list, 

I have a performance problem with my postgres 8.4.4 database. The query is the following:

SELECT count(*) AS count_all FROM "plan_events" WHERE (f_plan_event_acl(17, plan_events.id))   

                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2859.77..2859.78 rows=1 width=0) (actual time=4641.720..4641.720 rows=1 loops=1)
   ->  Seq Scan on plan_events  (cost=0.00..2851.44 rows=3331 width=0) (actual time=32.821..4640.116 rows=2669 loops=1)
         Filter: f_plan_event_acl(17, id)
 Total runtime: 4641.753 ms
(4 rows)



What can I do to improve the performance? The table has around 3000+ rows, so the data size is really limited.

The table has the following definition: 


-----

metabase=# \d plan_events; 
                                           Table "public.plan_events"
      Column      |           Type           |                             Modifiers                             
------------------+--------------------------+-------------------------------------------------------------------
 id               | integer                  | not null default nextval(('"plan_event_id_seq"'::text)::regclass)
 description      | text                     | not null
 status           | text                     | not null
 pct_completed    | integer                  | default 0
 due              | timestamp with time zone | not null
 due_to           | timestamp with time zone | not null
 priority         | integer                  | not null default 1
 created          | timestamp with time zone | not null
 user_id_created  | integer                  | not null
 plan_type_id     | integer                  | not null
 finished         | boolean                  | not null default false
 duration         | double precision         | not null default 0.0
 search_idx       | tsvector                 | 
 org_id           | integer                  | not null default 1
 personal_user_id | integer                  | 
 place            | text                     | 
 contact_log_id   | integer                  | 
Indexes:
    "plan_events_pkey" PRIMARY KEY, btree (id)
    "plan_event_contact_log_idx" btree (contact_log_id)
    "plan_event_search_idx" gin (search_idx)
    "plan_events_created_idx" btree (created)
    "plan_events_due_idx" btree (due)
    "plan_events_org_idx" btree (org_id)
    "plan_events_personal_user_idx" btree (personal_user_id)
    "plan_events_plan_type_id_idx" btree (plan_type_id)
    "plan_events_user_id_created_idx" btree (user_id_created)
Foreign-key constraints:
    "plan_events_contact_log_id_fkey" FOREIGN KEY (contact_log_id) REFERENCES contact_logs(id)
    "plan_events_org_id_fkey" FOREIGN KEY (org_id) REFERENCES orgs(id)
    "plan_events_personal_user_id_fkey" FOREIGN KEY (personal_user_id) REFERENCES users(id)
    "plan_events_plan_type_id_fkey" FOREIGN KEY (plan_type_id) REFERENCES plan_types(id)
    "plan_events_user_id_created_fkey" FOREIGN KEY (user_id_created) REFERENCES users(id)
Referenced by:
    TABLE "contact_plan_events" CONSTRAINT "contact_plan_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
    TABLE "custom_values" CONSTRAINT "custom_values_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
    TABLE "customer_plan_events" CONSTRAINT "customer_plan_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
    TABLE "generic_comments" CONSTRAINT "generic_comments_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
    TABLE "mail_queue_items" CONSTRAINT "mail_queue_items_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
    TABLE "plan_event_notifications" CONSTRAINT "plan_event_notifications_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
    TABLE "project_plan_events" CONSTRAINT "project_plan_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
    TABLE "to_do_list_events" CONSTRAINT "to_do_list_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
    TABLE "user_plan_events" CONSTRAINT "user_plan_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
Triggers:
    plan_events_update BEFORE INSERT OR UPDATE ON plan_events FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('search_idx', 'pg_catalog.danish', 'description', 'status')
------

And the f_plan_event function has the following definition: 

metabase=# \df+ f_plan_event_acl
                                                                                                                                                  List of functions
 Schema |       Name       | Result data type |     Argument data types      |  Type  | Volatility |  Owner   | Language |                                                                                 Source code                                                                                 | Description 
--------+------------------+------------------+------------------------------+--------+------------+----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 public | f_plan_event_acl | boolean          | uid integer, plan_id integer | normal | volatile   | postgres | plpgsql  |                                                                                                                                                                             | 
                                                                                                                         : declare                                                                                                                                                                       
                                                                                                                         :         user record;                                                                                                                                                          
                                                                                                                         :         customer record;                                                                                                                                                      
                                                                                                                         :         project record;                                                                                                                                                       
                                                                                                                         :         pcount_rel record;                                                                                                                                                    
                                                                                                                         :         ccount_rel record;                                                                                                                                                    
                                                                                                                         : begin                                                                                                                                                                         
                                                                                                                         :         select into user * from users where id = uid;                                                                                                                         
                                                                                                                         :         select into pcount_rel COUNT(*) as acl_count FROM project_plan_events ppe WHERE ppe.plan_event_id = plan_id;                                                          
                                                                                                                         :         if (pcount_rel.acl_count > 0) then                                                                                                                                    
                                                                                                                         :                 SELECT INTO project * FROM projects WHERE id IN (SELECT project_id FROM project_plan_events ppe2 WHERE ppe2.plan_event_id = plan_id) OFFSET 0 LIMIT 1;        
                                                                                                                         :                 return f_project_acl(uid, project.id);                                                                                                                        
                                                                                                                         :         end if;                                                                                                                                                               
                                                                                                                         :         select into ccount_rel COUNT(*) as acl_count FROM customer_plan_events cpe WHERE cpe.plan_event_id = plan_id;                                                         
                                                                                                                         :         if (ccount_rel.acl_count > 0) then                                                                                                                                    
                                                                                                                         :                 select into customer * from customers where id in (select customer_id from customer_plan_events cpe2 where cpe2.plan_event_id = plan_id) offset 0 limit 1;    
                                                                                                                         :                 return f_customer_acl(uid, customer.id);                                                                                                                      
                                                                                                                         :         end if;                                                                                                                                                               
                                                                                                                         :         return true;                                                                                                                                                          
                                                                                                                         : end;                                                                                                                                                                          
                                                                                                                         :     



--
Best Regards,

Anders Østergaard Jensen, B.Sc.
Technical Director & Managing Partner
Meeho! ApS

Meeho! Australia
5/28 Onslow Ave
Elizabeth Bay
2011 NSW

E-mail: aj@itersys.dk / aj@meeho.dk
Phone: +61 406 880 313
Web (Intl): http://www.meeho.net/
Web (DK): http:/www.meeho.dk/

Re: Question on COUNT performance

От
Reinoud van Leeuwen
Дата:
On Wed, Jul 14, 2010 at 09:58:10PM +1000, Anders ??stergaard Jensen wrote:
> Hello mailing list,
> 
> I have a performance problem with my postgres 8.4.4 database. The query is
> the following:
> 
> SELECT count(*) AS count_all FROM "plan_events" WHERE (f_plan_event_acl(17,
> plan_events.id))
> 
>                                                        QUERY PLAN
> 
>
------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=2859.77..2859.78 rows=1 width=0) (actual
> time=4641.720..4641.720 rows=1 loops=1)
>    ->  Seq Scan on plan_events  (cost=0.00..2851.44 rows=3331 width=0)
> (actual time=32.821..4640.116 rows=2669 loops=1)
>          Filter: f_plan_event_acl(17, id)
>  Total runtime: 4641.753 ms
> (4 rows)
> 
> 
> 
> What can I do to improve the performance? 

Have you tried 'select count (1)..."?

Reinoud


-- 
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen    reinoud.v@n.leeuwen.net
http://reinoud.van.leeuwen.net kvk 27320762
__________________________________________________


Re: Question on COUNT performance

От
Joshua Tolley
Дата:
On Wed, Jul 14, 2010 at 02:30:29PM +0200, Reinoud van Leeuwen wrote:
> On Wed, Jul 14, 2010 at 09:58:10PM +1000, Anders ??stergaard Jensen wrote:
> > SELECT count(*) AS count_all FROM "plan_events" WHERE (f_plan_event_acl(17,
> > plan_events.id))
> >
> >                                                        QUERY PLAN
> >
> >
------------------------------------------------------------------------------------------------------------------------
> >  Aggregate  (cost=2859.77..2859.78 rows=1 width=0) (actual
> > time=4641.720..4641.720 rows=1 loops=1)
> >    ->  Seq Scan on plan_events  (cost=0.00..2851.44 rows=3331 width=0)
> > (actual time=32.821..4640.116 rows=2669 loops=1)
> >          Filter: f_plan_event_acl(17, id)
> >  Total runtime: 4641.753 ms
> > (4 rows)
> >
> >
> >
> > What can I do to improve the performance?
>
> Have you tried 'select count (1)..."?

If this helps at all, it's unlikely to help much. I remember having seen
discussion somewhere that there's an optimization such that count(*) and
count(1) do the same thing anyway, but I can't find it in the code
immediately. In any case, if your WHERE clause frequently includes this
function with 17 and id as arguments, and if f_plan_event_acl is immutable,
you can create an index:
   CREATE INDEX foo_ix ON plan_events (f_plan_event_acl(17, id));

If PostgreSQL thinks that function will be true for a sufficiently small
proportion of the rows in the table, it will use the index instead of a
sequential scan, which might end up being faster. But the index won't help you
when you want to check values other than 17, e.g.
   SELECT count(*) FROM plan_events WHERE f_plan_event_acl(42, id)

Another option might be to precalculate these data, if you have a relatively
small set of values you pass to f_plan_event_acl(). A table somewhere could
store the f_plan_events_acl() argument (17, as well as any other values you
want to precalculate), and a count of plan_events rows where
f_plan_events_acl() returns true with that argument. A set of triggers would
ensure that whenever someone INSERTs, UPDATEs, or DELETEs a plan_events row,
these counts are updated accordingly. Then you can refer to that table when
you need a count. If values in other tables can change the results of
f_plan_events_acl(), you'd need triggers there, too (and this method would
probably start to become unworkably complicated).

As an alternative to the precalculation option, you could also cache the
results of this query somewhere, and presumably invalidate that cache using a
trigger on the plan_events table.

Finally, you can try to improve performance of your function itself. If it's
taking 4.6 sec. to read and process 2669 rows, either you're reading awfully
slowly (VACUUM problems, perhaps?) or your f_plan_events_acl() function takes
a long time to run.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: Question on COUNT performance

От
Reinoud van Leeuwen
Дата:
On Wed, Jul 14, 2010 at 07:30:39AM -0600, Joshua Tolley wrote:

> > Have you tried 'select count (1)..."?
> 
> If this helps at all, it's unlikely to help much. I remember having seen
> discussion somewhere that there's an optimization such that count(*) and
> count(1) do the same thing anyway, but I can't find it in the code
> immediately. 

oops, I was thinking too much about Sybase (where is makes a huge 
difference, since the optimiser sees that everything it needs can be found 
in the index so the table does not have to be read at all).

Reinoud

-- 
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen    reinoud.v@n.leeuwen.net
http://reinoud.van.leeuwen.net kvk 27320762
__________________________________________________


Re: Question on COUNT performance

От
Lee Hachadoorian
Дата:
The first statement of the function

>                                             :         select into user *
> from users where id = uid;

appears to be a useless drag, as I don't see the user record referred
to anywhere else in the function. There appears to be other
unnecessary statements. For example :

>                                             :         select into pcount_rel
> COUNT(*) as acl_count FROM project_plan_events ppe WHERE ppe.plan_event_id =
> plan_id;
>
>                                             :         if
> (pcount_rel.acl_count > 0) then
>
>
>
>                                             :                 SELECT INTO
> project * FROM projects WHERE id IN (SELECT project_id FROM
> project_plan_events ppe2 WHERE ppe2.plan_event_id = plan_id) OFFSET 0 LIMIT
> 1;
>
>                                             :                 return
> f_project_acl(uid, project.id);
>
>
>                                             :         end if;
>

This appears to check whether the plan_id exists in a link table, find
an associated project_id, and run some function on project_id.

This could instead be done as:

FOR project_record IN SELECT project_id FROM project_plan_events WHERE
plan_event_id = plan_id LIMIT 1 LOOP   return f_project_acl(uid, project_record.project_id)
END LOOP;

If plan_id doesn't exist in project_plan_events, the LOOP just gets skipped.

The same could be done for the next IF block in they query which
checks to see whether plan_id has a matching customer_id in a link
table.

Note that your LIMIT 1 (which I have retained) strongly implies a
1-to-1 relationship between project_id and plan_id. If not, this
function gets applied to an arbitrary project_id from among all
matching project_ids. (Same goes for customer_id.)

Assuming f_project_acl and f_customer_acl return TRUE if successful,
the whole thing (from the original SELECT COUNT(*) looks like it can
be summarized as:

Call a function with a plan_id
If a matching project_id exists   Do some function on the project_id   count +1
Else If a matching customer_id exists   Do some function on the customer_id   count +1
Else   count +1

Return count, which, since the function gets called once for each row
in plan_events, count should always equal the number of rows in plan
events.

I would be inclined to replace the whole thing with something like this:

SELECT newfunc(uid);

CREATE FUNCTION newfunc(uid int) RETURNS int AS $$
DECLAREplan_record record;i int := 0;
BEGINFOR plan_record IN SELECT DISTINCT plan_id, project_id, customer_id
FROM plan_events LEFT JOIN project_plan_events USING (plan_id) LEFT
JOIN customer_plan_events USING (plan_id)  LOOP    IF plan_record.project_id IS NOT NULL THEN        PERFORM
f_project_acl(uid,plan_record.project_id);    ELSEIF plan_record.customer_id IS NOT NULL THEN        PERFORM
f_customer_acl(uid,plan_record.customer_id);    END IF;    i := i + 1;    END LOOP ;RETURN i;
 
END;
$$ LANGUAGE plpgsql;

If I understand what's going on in your function, I *think* this would
reduce 9000-12,000 SELECT statements to 1 SELECT statement.

Obviously, not tested. Hope this is helpful.

--Lee

-- 
Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center


Re: Question on COUNT performance

От
Lee Hachadoorian
Дата:
In retrospect, it's a big assumption whether f_project_acl() or
f_customer_acl() always return TRUE. If they can return FALSE, you
probably want to replace the statements inside the FOR..LOOP with


>                IF plan_record.project_id IS NOT NULL THEN
>                        IF f_project_acl(uid, plan_record.project_id) THEN i := i + 1; END IF;
>                ELSEIF plan_record.customer_id IS NOT NULL THEN
>                        IF f_customer_acl(uid, plan_record.customer_id) THEN i := i + 1; END IF;
>                ELSE
>                        i := i + 1;
>                END IF;

This would mimic the results of your original query, although I must
confess I don't understand the usefulness of the count results, as a
number less that the number of rows in plan_events has an ambiguous
meaning. Either

(1) there is a matching event but f_project_acl returned FALSE

OR

(2) there is no matching event, there IS a matching customer, but
f_customer_acl returned FALSE

And of course you don't know which plan_ids these might be true of.

--Lee

2010/7/14 Lee Hachadoorian <lee.hachadoorian@gmail.com>:
> SELECT newfunc(uid);
>
> CREATE FUNCTION newfunc(uid int) RETURNS int AS $$
> DECLARE
>        plan_record record;
>        i int := 0;
> BEGIN
>        FOR plan_record IN SELECT DISTINCT plan_id, project_id, customer_id
> FROM plan_events LEFT JOIN project_plan_events USING (plan_id) LEFT
> JOIN customer_plan_events USING (plan_id)  LOOP
>                IF plan_record.project_id IS NOT NULL THEN
>                        PERFORM f_project_acl(uid, plan_record.project_id);
>                ELSEIF plan_record.customer_id IS NOT NULL THEN
>                        PERFORM f_customer_acl(uid, plan_record.customer_id);
>                END IF;
>                i := i + 1;
>        END LOOP ;
>        RETURN i;
> END;
> $$ LANGUAGE plpgsql;


Re: Question on COUNT performance

От
Anders Østergaard Jensen
Дата:
Hi all, 

Thank you so much for your kind replies. It has all been a great help. 

I tried the SELECT COUNT(1) but that didn't yield any improvement, sorry. 

Doing the index on f_plan_event_acl( ... ) wont work, as the parameters are frequently shifted (the second parameter denotes the id of a user in another table). 

As Mr. Leeuwen rightfully points out, there might be some performance problems in my acl functions (these are basic functions that determine wether or not a user has got access to a certain row in a table or not---fx f_customer_acl(customer_id, user_id) will return true if the user has access to the customer with ID customer_id etc, the same for f_project_acl on projects etc).. I  am not great at optimising PL/pgSQL, though I have the assumption that the speed of the procedural language might have a great impact here. 

Before I start changing the content of the function that Mr. Leeuwen kindly provided above, can I pleas ask for help on how to optimise the other acl functions first? 

CREATE OR REPLACE FUNCTION f_contact_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user record;
  contact record;
  customer record;
begin
  SELECT INTO customer cust.* FROM contacts
    JOIN customer_contacts cc ON cc.contact_id = contacts.id
    JOIN customers cust ON cust.id = cc.customer_id
    WHERE contacts.id = cid;

    SELECT INTO user * FROM users WHERE id=uid;

    if (customer.org_id != user.org_id) then
      return false;
    end if;

    return true;
end
$$ LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION f_customer_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user_id integer;
  customer_id integer;
  user record;
  customer record;
begin
  user_id = $1;
  customer_id = $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO customer * FROM customers WHERE id=customer_id;

  -- Assert that org_id matches: 
  if (customer.org_id != user.org_id) then
    return false;
  end if;

  -- Nothing more to check for:
  return true;
end;
$$ LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION f_doc_acl(uid integer, did integer)
RETURNS BOOL AS $$
DECLARE
  user_id   integer;
  doc_id    integer;
  user      record;
  doc       record;
  proj_rel  record;
BEGIN
  user_id := $1;
  doc_id  := $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO doc  * FROM documents WHERE id=doc_id;

  -- Check that org_id matches
  if (doc.org_id != user.org_id) then
    return false;
  end if;

  -- If document was created by user, accept it
  if (doc.user_id_created = user_id) then
    return true;
  end if;  -- if document is public, accept it
  if (doc.is_public) then
    return true;
  end if;

  -- else, check the project-document relations -- is the 
  -- user member of a project that allows access to the document? 
  SELECT INTO proj_rel COUNT(*) AS acl_count FROM project_users
    JOIN projects ON project_users.project_id = projects.id
    JOIN project_documents ON projects.id = project_documents.project_id
    JOIN documents ON project_documents.document_id = documents.id
    WHERE documents.id = doc_id
          AND project_users.user_id = $1;

  -- acl_count returns the number of allowed relationships to exactly 
  -- this document  
  return proj_rel.acl_count > 0;
END;
$$ LANGUAGE 'plpgsql';

Would it be more beneficial to drop the functions and rewrite my basic queries first? However, it is a nice feature having all security checks wrapped into a three-four basic functions. 
If my design is completely flawed, I am also open to other design suggestions on how to do proper row-based access control.

I am not asking for the complete solution but a few pointers on how to speed this up would be really great. Thanks! 

Re: Question on COUNT performance

От
Anders Østergaard Jensen
Дата:
Hi all, 

Thank you so much for your kind replies. It has all been a great help. 

I tried the SELECT COUNT(1) but that didn't yield any improvement, sorry. 

Doing the index on f_plan_event_acl( ... ) wont work, as the parameters are frequently shifted (the second parameter denotes the id of a user in another table). 

As Mr. Leeuwen rightfully points out, there might be some performance problems in my acl functions (these are basic functions that determine wether or not a user has got access to a certain row in a table or not---fx f_customer_acl(customer_id, user_id) will return true if the user has access to the customer with ID customer_id etc, the same for f_project_acl on projects etc).. I  am not great at optimising PL/pgSQL, though I have the assumption that the speed of the procedural language might have a great impact here. 

Before I start changing the content of the function that Mr. Leeuwen kindly provided above, can I pleas ask for help on how to optimise the other acl functions first? 

CREATE OR REPLACE FUNCTION f_contact_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user record;
  contact record;
  customer record;
begin
  SELECT INTO customer cust.* FROM contacts
    JOIN customer_contacts cc ON cc.contact_id = contacts.id
    JOIN customers cust ON cust.id = cc.customer_id
    WHERE contacts.id = cid;

    SELECT INTO user * FROM users WHERE id=uid;

    if (customer.org_id != user.org_id) then
      return false;
    end if;

    return true;
end
$$ LANGUAGE 'plpgsql';
- Hide quoted text -

CREATE OR REPLACE FUNCTION f_customer_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user_id integer;
  customer_id integer;
  user record;
  customer record;
begin
  user_id = $1;
  customer_id = $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO customer * FROM customers WHERE id=customer_id;

  -- Assert that org_id matches: 
  if (customer.org_id != user.org_id) then
    return false;
  end if;

  -- Nothing more to check for:
  return true;
end;
$$ LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION f_doc_acl(uid integer, did integer)
RETURNS BOOL AS $$
DECLARE
  user_id   integer;
  doc_id    integer;
  user      record;
  doc       record;
  proj_rel  record;
BEGIN
  user_id := $1;
  doc_id  := $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO doc  * FROM documents WHERE id=doc_id;

  -- Check that org_id matches
  if (doc.org_id != user.org_id) then
    return false;
  end if;

  -- If document was created by user, accept it
  if (doc.user_id_created = user_id) then
    return true;
  end if;  -- if document is public, accept it
  if (doc.is_public) then
    return true;
  end if;

  -- else, check the project-document relations -- is the 
  -- user member of a project that allows access to the document? 
  SELECT INTO proj_rel COUNT(*) AS acl_count FROM project_users
    JOIN projects ON project_users.project_id = projects.id
    JOIN project_documents ON projects.id = project_documents.project_id
    JOIN documents ON project_documents.document_id = documents.id
    WHERE documents.id = doc_id
          AND project_users.user_id = $1;

  -- acl_count returns the number of allowed relationships to exactly 
  -- this document  
  return proj_rel.acl_count > 0;
END;
$$ LANGUAGE 'plpgsql';

Would it be more beneficial to drop the functions and rewrite my basic queries first? However, it is a nice feature having all security checks wrapped into a three-four basic functions. 
If my design is completely flawed, I am also open to other design suggestions on how to do proper row-based access control.

I am not asking for the complete solution but a few pointers on how to speed this up would be really great. Thanks! 

Re: Question on COUNT performance

От
Lee Hachadoorian
Дата:
It appears that the acl functions use more SELECTs than necessary. For f_customer_acl(<span class="Apple-style-span"
style="font-family:arial,sans-serif; font-size: 13px; border-collapse: collapse;">uid integer, cid integer</span>), I
mightuse:<br /><br /> PERFORM 1 FROM customers JOIN users USING (org_id) WHERE customer_id = cid and user_id = uid;<br
/>RETURN FOUND;<br /><br /> This still requires one call to f_customer_acl() (and therefore one SELECT) for each row in
plan_events(since that's the way the calling function is written). If the goal is to count the number of plan_events a
specificuser has access rights to, I'm sure you can write a query that would accomplish that directly. It's beyond my
knowledgewhether it makes more sense to do this via these function calls.<br /><br /> --Lee<br /><br /><br /><pre
class="moz-signature"cols="72">-- 
 
Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center
</pre>

Re: Question on COUNT performance

От
REISS Thomas DSIC BIP
Дата:
Hello,

You can add another JOIN in your function to avoid the test to return either true or false.
Here's an example:
CREATE OR REPLACE FUNCTION f_contact_acl(integer, integer)
RETURNS BOOL AS $$
    WITH contact (id_contact) AS (VALUES (1),(2),(3),(7),(8)),
     customer (id_customer,id_org) AS (VALUES (4,1),(5,2),(6,3)),
     customer_contact (id_contact,id_customer) AS (VALUES (1,4),(2,5),(3,6)),
     util (id_user,id_org) AS (VALUES (1,1),(2,2))
    SELECT COUNT(*)!=0 FROM contact JOIN customer_contact ON contact.id_contact=customer_contact.id_contact
                                JOIN customer ON customer.id_customer=customer_contact.id_customer
                                JOIN util ON customer.id_org=util.id_org
    WHERE contact.id_contact=$2 AND util.id_user=$1;
$$ LANGUAGE SQL;   

The WITH clause and VALUES was supposed to give a test case. You simply have to remove them and keep the query. So the function becomes a simple SQL function.

Hope this helps :-)

Regards
Jean-Michel Souchard and Thomas Reiss



-------- Message original --------
Sujet : Re: [SQL] Question on COUNT performance
De : Anders Østergaard Jensen <aj@itersys.dk>
Pour : pgsql-sql@postgresql.org
Date : 15/07/2010 02:14
Hi all, 

Thank you so much for your kind replies. It has all been a great help. 

I tried the SELECT COUNT(1) but that didn't yield any improvement, sorry. 

Doing the index on f_plan_event_acl( ... ) wont work, as the parameters are frequently shifted (the second parameter denotes the id of a user in another table). 

As Mr. Leeuwen rightfully points out, there might be some performance problems in my acl functions (these are basic functions that determine wether or not a user has got access to a certain row in a table or not---fx f_customer_acl(customer_id, user_id) will return true if the user has access to the customer with ID customer_id etc, the same for f_project_acl on projects etc).. I  am not great at optimising PL/pgSQL, though I have the assumption that the speed of the procedural language might have a great impact here. 

Before I start changing the content of the function that Mr. Leeuwen kindly provided above, can I pleas ask for help on how to optimise the other acl functions first? 

CREATE OR REPLACE FUNCTION f_contact_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user record;
  contact record;
  customer record;
begin
  SELECT INTO customer cust.* FROM contacts
    JOIN customer_contacts cc ON cc.contact_id = contacts.id
    JOIN customers cust ON cust.id = cc.customer_id
    WHERE contacts.id = cid;

    SELECT INTO user * FROM users WHERE id=uid;

    if (customer.org_id != user.org_id) then
      return false;
    end if;

    return true;
end
$$ LANGUAGE 'plpgsql';
- Hide quoted text -

CREATE OR REPLACE FUNCTION f_customer_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user_id integer;
  customer_id integer;
  user record;
  customer record;
begin
  user_id = $1;
  customer_id = $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO customer * FROM customers WHERE id=customer_id;

  -- Assert that org_id matches: 
  if (customer.org_id != user.org_id) then
    return false;
  end if;

  -- Nothing more to check for:
  return true;
end;
$$ LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION f_doc_acl(uid integer, did integer)
RETURNS BOOL AS $$
DECLARE
  user_id   integer;
  doc_id    integer;
  user      record;
  doc       record;
  proj_rel  record;
BEGIN
  user_id := $1;
  doc_id  := $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO doc  * FROM documents WHERE id=doc_id;

  -- Check that org_id matches
  if (doc.org_id != user.org_id) then
    return false;
  end if;

  -- If document was created by user, accept it
  if (doc.user_id_created = user_id) then
    return true;
  end if;  -- if document is public, accept it
  if (doc.is_public) then
    return true;
  end if;

  -- else, check the project-document relations -- is the 
  -- user member of a project that allows access to the document? 
  SELECT INTO proj_rel COUNT(*) AS acl_count FROM project_users
    JOIN projects ON project_users.project_id = projects.id
    JOIN project_documents ON projects.id = project_documents.project_id
    JOIN documents ON project_documents.document_id = documents.id
    WHERE documents.id = doc_id
          AND project_users.user_id = $1;

  -- acl_count returns the number of allowed relationships to exactly 
  -- this document  
  return proj_rel.acl_count > 0;
END;
$$ LANGUAGE 'plpgsql';

Would it be more beneficial to drop the functions and rewrite my basic queries first? However, it is a nice feature having all security checks wrapped into a three-four basic functions. 
If my design is completely flawed, I am also open to other design suggestions on how to do proper row-based access control.

I am not asking for the complete solution but a few pointers on how to speed this up would be really great. Thanks!